From davem@redhat.com Sun Jun 1 01:33:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 01:33:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h518Wt2x021058 for ; Sun, 1 Jun 2003 01:33:01 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA15160; Sun, 1 Jun 2003 01:30:40 -0700 Date: Sun, 01 Jun 2003 01:30:40 -0700 (PDT) Message-Id: <20030601.013040.116362760.davem@redhat.com> To: mk@linux-ipv6.org Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 From: "David S. Miller" In-Reply-To: <87fzmv5ejc.wl@karaba.org> References: <87fzmv5ejc.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2798 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / $B?@ED(B $B=<(B Date: Sun, 01 Jun 2003 00:20:07 +0900 Hello Mitsuru-san! + t->id.spi = xfrm6_tunnel_addr_hash((xfrm_address_t *)&x->props.saddr); You misunderstood what I tried to explain to you. Consider, how do you guarentee that this t->id.spi value is unique across all xfrm6_tunnel tunnels using the same t->id.daddr and t->id.prot? The answer is that you cannot. You must generate fake "spi" values, they have no meaning outside of xfrm6_tunnel.c They serve purpose only to map 128-bit ipv6 address to 32-bit "xfrm6_tunnel" SPI value. I would suggest following implementation: 1) Implement something similar to xfrm_alloc_spi(t, 1, ~(u32)0) It just needs to allocate unique SPI numbers local to xfrm6_tunnel.c We mark "SPI" value zero as reserved and to indicate failed lookup. 2) Create hash table, it is keyed by ipv6 address and hash table entries give SPI values. So on input you would say something like: u32 spi; spi = spihash_lookup(&iph->saddr); if (!spi) goto drop; x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, spi, IPPROTO_IPV6, AF_INET6); Is the idea more clear now? Once you fix this up I'll apply your xfrm6_tunnel.c work. Thank you. From davem@redhat.com Sun Jun 1 01:37:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 01:37:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h518b32x021371 for ; Sun, 1 Jun 2003 01:37:04 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA15174; Sun, 1 Jun 2003 01:34:52 -0700 Date: Sun, 01 Jun 2003 01:34:52 -0700 (PDT) Message-Id: <20030601.013452.68050592.davem@redhat.com> To: jmorris@intercode.com.au Cc: mk@linux-ipv6.org, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 From: "David S. Miller" In-Reply-To: References: <87fzmv5ejc.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2799 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Sun, 1 Jun 2003 02:01:42 +1000 (EST) We need to either filter them out or make sure they are displayed as ipip. Part of the answer will depend on whether we want to expose xfrm-based ipip tunnels for general use, or only use them internally for ipcomp. I think it is an error to extend PF_KEY for our Linux purposes. Our API here is basically defined to be whatever is in KAME :-) However, setkey should filter entries it does not understand. Currently I see no use for exposing these tunnel transforms outside of the kernel. Mobile IPV6, if it decides to use xfrm6_tunnel, can configure them itself in the kernel side support. Or, if user side is more appropriate for MIPV6 access, we may allow it to use xfrm netlink interface somehow. From aj@dungeon.inka.de Sun Jun 1 04:48:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 04:48:22 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51BmG2x032765 for ; Sun, 1 Jun 2003 04:48:17 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19MQ81-0008Ud-00; Sun, 01 Jun 2003 12:31:53 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 2FBA420FAA; Sun, 1 Jun 2003 12:31:50 +0200 (CEST) From: Andreas Jellinghaus To: netdev@oss.sgi.com Subject: ipsec / pppoe Date: Sun, 1 Jun 2003 12:33:22 +0200 User-Agent: KMail/1.5.2 Cc: howto@lartc.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306011233.22544.aj@dungeon.inka.de> X-archive-position: 2800 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev with pppoe it is usualy necessary to clamp the maximum segment size down to 1452 bytes. This can be done with a netfilter module or with "-m 1452" option to pppoe. with ipsec (esp, tunnel mode) even on a wlan interface before the ppp connection I needed to clamp the mss down further to 1384 bytes. now all connections are working fine. my calculation gave me 1500 mtu (wlan0) - 20 (ip) - 48 (esp) - 20 (ip) - 20 (tcp) = 1392 or 1492 (ppp(oe)) - 20 (ip) - 20 (tcp) = 1452, so the min of 1392 should have been the right value. Don't know why I need to clamp the mss down to 1384, but e.g. http connections to www.microsoft.com work fine with 1384 and do not work at all with 1392. still I don't know why some machines don't respond to icmp packet to big errors with a smaller packet but not act on it at all. maybe some broken firewall thinks it is some kind of attack? I don't know what exactly is between me and websites such as www.google.com or www.microsoft.com, so I can't figure out. sorry to have bothered everyone and many thanks to james for his help. cc: to howto@lartc.org, it think this would make a nice howto entry. Regards, Andreas From jmorris@intercode.com.au Sun Jun 1 05:19:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 05:19:35 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:ypEXYr5lpcO3McCfmTLAPa7YYYHO+9Ax@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51CJQ2x004280 for ; Sun, 1 Jun 2003 05:19:28 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h51CIwr12637; Sun, 1 Jun 2003 22:18:58 +1000 Date: Sun, 1 Jun 2003 22:18:56 +1000 (EST) From: James Morris To: Andreas Jellinghaus cc: netdev@oss.sgi.com, Subject: Re: ipsec / pppoe In-Reply-To: <200306011233.22544.aj@dungeon.inka.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2801 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sun, 1 Jun 2003, Andreas Jellinghaus wrote: > cc: to howto@lartc.org, it think this would make a nice > howto entry. Actually, there is a bug in the way icmp pmtu messages are being generated here, which should be fixed soon. - James -- James Morris From gandalf@wlug.westbo.se Sun Jun 1 16:05:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 16:05:12 -0700 (PDT) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51N512x016698 for ; Sun, 1 Jun 2003 16:05:02 -0700 Received: by tux.rsn.bth.se (Postfix, from userid 501) id 6AE3836FE0; Mon, 2 Jun 2003 01:04:58 +0200 (CEST) Subject: [PATCH] fix use after free in e100 From: Martin Josefsson To: scott.feldman@intel.com Cc: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1054508698.24777.17.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 02 Jun 2003 01:04:58 +0200 X-archive-position: 2802 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Hi Scott. Here's a fix for a use-after-free in the e100 driver. You can't touch the skb after a call to netif_rx(), it might have been free'd. Caught with Manfred's unmap-page-debugging patch in -mm. Applies to both 2.4 and 2.5 --- linux-2.5.69-mm9/drivers/net/e100/e100_main.c.orig 2003-06-02 00:48:13.000000000 +0200 +++ linux-2.5.69-mm9/drivers/net/e100/e100_main.c 2003-06-02 00:50:09.000000000 +0200 @@ -2052,13 +2052,14 @@ skb->ip_summed = CHECKSUM_NONE; } + bdp->drv_stats.net_stats.rx_bytes += skb->len; + if(bdp->vlgrp && (rfd_status & CB_STATUS_VLAN)) { vlan_hwaccel_rx(skb, bdp->vlgrp, be16_to_cpu(rfd->vlanid)); } else { netif_rx(skb); } dev->last_rx = jiffies; - bdp->drv_stats.net_stats.rx_bytes += skb->len; rfd_cnt++; } /* end of rfd loop */ -- /Martin From Robert.Olsson@data.slu.se Mon Jun 2 03:59:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 03:59:19 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52AxB2x012408 for ; Mon, 2 Jun 2003 03:59:12 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA08655; Mon, 2 Jun 2003 12:58:32 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16091.11735.721251.925522@robur.slu.se> Date: Mon, 2 Jun 2003 12:58:31 +0200 To: Simon Kirby Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030529205125.GA30058@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 2803 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > Full profile output available here: > > http://blue.netnation.com/sim/ref/ > readprofile.full_route_table_hash_fixed_napi.* > > Note that if I increase the packet rate and NAPI kicks in, all of the > handle_IRQ and similar overhead basically disappears because it no longer > uses IRQs. Pretty spiffy. Here is a profile of that: > Full profile output available as: 8896 rt_garbage_collect 9.4237 8959 ip_route_input_slow 3.8885 10516 dst_alloc 73.0278 10666 kmem_cache_free 66.6625 15339 tg3_rx 16.2489 16553 ipt_do_table 14.9937 20193 fn_hash_lookup 70.1146 26833 rt_intern_hash 34.9388 64803 ip_route_input 150.0069 From DoS perspective a more interesting experiment compared to where you limited input rate to have 30% idle CPU. New dst is coming all the time first seached in hash (ip_route_input) and not found so ip_route_input_slow/fn_hash_lookup/dst_alloc/rt_intern_hash path is taken to add a new dst entry... And later GC have to remove all enties with spin_lock_bh hold (no packet processing runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something to observe. I had some idea to rate-limit new flows and try to isolate the device causing the DoS Something like (ip_route_input): [We don't have an hash entry] /* DoS check... Rate down but do not stop GC and creation of new hash entries until GC frees resources. We limit per interface so hogger dev(s) will be hit hardest. As a side effect we get dst_overrun per device. */ entries = atomic_read(&ipv4_dst_ops.entries); if (entries > ip_rt_max_size) { int drp = 4; if( dev->dst_hash_overrun++ % drp ) { if (net_ratelimit()) printk(KERN_WARNING "dst creation throttled\n"); return -ECONNREFUSED; } /* Also make sure the slow path gets a chance to create the dst entry */ if (ipv4_dst_ops.gc && ipv4_dst_ops.gc()) { RT_CACHE_STAT_INC(gc_dst_overflow); return -ENOBUFS; } } [ip_route_input_slow comes here] But more thinking is needed... Cheers. --ro From sim@netnation.com Mon Jun 2 08:18:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 08:19:05 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52FIr2x022153 for ; Mon, 2 Jun 2003 08:18:53 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Mr5I-0002Cv-6H; Mon, 02 Jun 2003 08:18:52 -0700 Date: Mon, 2 Jun 2003 08:18:52 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress Message-ID: <20030602151852.GA6070@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16091.11735.721251.925522@robur.slu.se> User-Agent: Mutt/1.5.4i X-archive-position: 2804 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 12:58:31PM +0200, Robert Olsson wrote: > New dst is coming all the time first seached in hash (ip_route_input) and not found > so ip_route_input_slow/fn_hash_lookup/dst_alloc/rt_intern_hash path is taken to add > a new dst entry... > > And later GC have to remove all enties with spin_lock_bh hold (no packet processing > runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something > to observe. > > I had some idea to rate-limit new flows and try to isolate the device causing the DoS > Something like (ip_route_input): ... > if (net_ratelimit()) > printk(KERN_WARNING "dst creation throttled\n"); > > return -ECONNREFUSED; This reminds me of the situation we experienced with the dst cache overflowing in early 2.2 kernels. This was a long time ago, when our traffic was only about 10 Mbits/second. We had recently upgraded from a 2.0 kernel. The dst cache was overflowing due to a bug in the garbage collector, and at the time, no messages were printed. It took me a _long_ time to figure out why connections to a server I hadn't previously connected to in a while would only work every so often, and not immediately like they should. I'm affraid this approach will have a similar effect, albeit (hopefully) only under an attack. Is it possible to have a dst LRU or a simpler approximation of such and recycle dst entries rather than deallocating/reallocating them? This would relieve a lot of work from the garbage collector and avoid the periodic large garbage collection latency. It could be tuned to only occur in an attack (I remember Alexey saying that the deferred garbage collection was implemented to reduce latency in normal opreation). Would this work? Cross-CPU thrashing issues? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From Robert.Olsson@data.slu.se Mon Jun 2 09:37:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 09:37:29 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52GbH2x023431 for ; Mon, 2 Jun 2003 09:37:19 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id SAA14178; Mon, 2 Jun 2003 18:36:37 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16091.32021.75335.227150@robur.slu.se> Date: Mon, 2 Jun 2003 18:36:37 +0200 To: Simon Kirby Cc: Robert Olsson , "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030602151852.GA6070@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 2805 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > This reminds me of the situation we experienced with the dst cache > overflowing in early 2.2 kernels. This was a long time ago, when our > traffic was only about 10 Mbits/second. We had recently upgraded from a > 2.0 kernel. The dst cache was overflowing due to a bug in the garbage > collector, and at the time, no messages were printed. It took me a > _long_ time to figure out why connections to a server I hadn't previously > connected to in a while would only work every so often, and not > immediately like they should. I'm affraid this approach will have a > similar effect, albeit (hopefully) only under an attack. We are given more work than we have resources for (max_size) what else than refuse can we do? But yes we have invested pretty much work already. Also remember we are looking into runs were 100% of incoming traffic has one new dst for every packet. So how is the situation in "real life"? In case of multiple devices at least NAPI gives all devs it's share. > Is it possible to have a dst LRU or a simpler approximation of such and > recycle dst entries rather than deallocating/reallocating them? This > would relieve a lot of work from the garbage collector and avoid the > periodic large garbage collection latency. It could be tuned to only > occur in an attack (I remember Alexey saying that the deferred garbage > collection was implemented to reduce latency in normal opreation). I don't see how this can be done. Others may? Cheers. --ro From rddunlap@osdl.org Mon Jun 2 10:08:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 10:08:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52H8T2x024261 for ; Mon, 2 Jun 2003 10:08:30 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52H8GX20706; Mon, 2 Jun 2003 10:08:16 -0700 Date: Mon, 2 Jun 2003 10:07:54 -0700 From: "Randy.Dunlap" To: Andi Kleen Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program Message-Id: <20030602100754.1e3e1ca8.rddunlap@osdl.org> In-Reply-To: <20030531120940.GB11898@wotan.suse.de> References: <20030530090015.7c435c9a.rddunlap@osdl.org> <20030530.171111.71099698.davem@redhat.com> <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030531120940.GB11898@wotan.suse.de> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2806 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Sat, 31 May 2003 14:09:40 +0200 Andi Kleen wrote: | > You really need something like rtnl_talk() or rtnl_dump_filter() | > from libnetlink to do this properly. | | In case it's helpful I wrote manpages for libnetlink some time ago. | | I also have some simple example programs using it. Other examples | can be found in the zebra or bird source code. Does this man page live somewhere? Do some distros ship it? Where can I find your example programs that use it? Couple of corrections below.... | rtnl_listen | Receive netlink data after a request and pass it to | handler. handler is a callback that gets the mes- | sage source address, the message itself, and the | jarg cookie as arguments. It will get called for + It will be called for {'get' should usually be avoided when easily done} | all received messages. Only one message bundle is | received. Unless there is no message pending this | function does not block. | | | rta_addattr32 | Initialize the rtnetlink attribute rta with a __u32 | data value. | | | rta_addattr32 + rta_addattr_l | Initialize the rtnetlink attribute rta with a vari- | able length data value. -- ~Randy From krkumar@us.ibm.com Mon Jun 2 10:33:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 10:34:04 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52HXp2x024952 for ; Mon, 2 Jun 2003 10:33:58 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e3.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52HWdE2115688; Mon, 2 Jun 2003 13:32:39 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52HWbTC245356; Mon, 2 Jun 2003 13:32:37 -0400 Message-ID: <3EDB8A41.2080305@us.ibm.com> Date: Mon, 02 Jun 2003 10:32:49 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 CC: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 References: <3ED80230.2030508@us.ibm.com> <20030531.110249.12960077.yoshfuji@linux-ipv6.org> In-Reply-To: <20030531.110249.12960077.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2807 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Yoshifuji, Thanks for your comments. >>+/* prefix list returned to user space in this structure */ >>+struct plist_user_info { > > ^ip6 or ipv6 or so. > >>+ char name[IFNAMSIZ]; /* interface name */ > > ~~~~~~~~~~~~~~~~~~~duplicate information. > Point noted. That can be removed (prefer to have name instead of ifindex). >>+ int ifindex; /* interface index */ >>+ int nprefixes; /* number of elements in 'prefix' */ >>+ struct var_plist_user_info { /* multiple elements */ >>+ char flags[3]; /* router advertised flags */ > > ~~~~~~~~this is not good interface. This is my mistake. When I added the original interface, it was using the proc filesystem and it made sense at that time for a user to cat /proc/net/ and actually see the flags. While converting to use netlink, I forgot to change this to real flags. This was not intended interface :-) >>+ int plen; /* prefix length */ >>+ __u32 valid; /* valid lifetime */ >>+ struct in6_addr ra_addr;/* advertising router */ >>+ struct in6_addr prefix; /* prefix */ >>+ } plist_vars[0]; >>+}; >>+ >> extern void addrconf_init(void); >> extern void addrconf_cleanup(void); >> > > > : > > I think we should use 1 fixed-length message per prefix, > instead of variable length message. I had got this idea from "struct fib_info" which also has variable size structure, but probably it is not worth the extra effort to save a few bytes. >>+ ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, >>+ &p_el->ra_addr); >>+ for (i = 0; i < 8; i++) >>+ pinfo->plist_vars[count].ra_addr.s6_addr16[i] = >>+ __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); >>+ ipv6_addr_copy(&pinfo->plist_vars[count].prefix, >>+ &p_el->pinfo.prefix); >>+ for (i = 0; i < p_el->pinfo.prefix_len/16; i++) >>+ pinfo->plist_vars[count].prefix.s6_addr16[i] = >>+ __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); > > > Absoletely nasty. > - don't use charaters to represent flags; use real flags. > - use network-byte order. network-byte order ? User will get prefix in network byte order, is that correct ? >>+static int prefix_list_proc_dump(char *buffer, char **start, off_t offset, >>+ int length) >>+{ > > : > > Please use seq_file. OK. > Again, what I proposed was to store prefix information on fib with > some flags to represent advertised by routers and give user-space > the RA information using new rtattr (RTA_RA6INFO or something like that). > > struct rta_ra6info { > u32 rta_ra6flags; > }; > In my mail, I had given problems with doing that in the fib. I can look to convert to fib, but please let me know which kernel routines I should look at. Thanks, - KK From sim@netnation.com Mon Jun 2 11:05:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 11:05:50 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52I5b2x025880 for ; Mon, 2 Jun 2003 11:05:38 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Mtgf-0000n4-A7; Mon, 02 Jun 2003 11:05:37 -0700 Date: Mon, 2 Jun 2003 11:05:37 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress Message-ID: <20030602180537.GB30957@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> <16091.32021.75335.227150@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16091.32021.75335.227150@robur.slu.se> User-Agent: Mutt/1.5.4i X-archive-position: 2808 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 06:36:37PM +0200, Robert Olsson wrote: > We are given more work than we have resources for (max_size) what else than > refuse can we do? But yes we have invested pretty much work already. Well, this is the problem. We do not and cannot know which entries we really want to remember (legitimate traffic). Adding code to actually refuse new dst entries is just going to make the DoS effective, which is NOT what we want. > Also remember we are looking into runs were 100% of incoming traffic has one > new dst for every packet. So how is the situation in "real life"? > In case of multiple devices at least NAPI gives all devs it's share. Right, so, when we are traffic saturated, we want to make sure the whole route cache and route path is as fast as possible. Recycling dst entries by simpy rewriting and rehashing them rather than allocating new and eventually freeing them all in the garbage collection cycle should reduce allocator overhead. If this is only done when the table is full, I don't see any downside...if this is in fact doable, that is. :) Simon- From kumarkr@us.ibm.com Mon Jun 2 12:48:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 12:48:44 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52JmR2x029134 for ; Mon, 2 Jun 2003 12:48:34 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52JlbXD283444; Mon, 2 Jun 2003 15:47:38 -0400 Received: from d03nm801.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52JlZhO019342; Mon, 2 Jun 2003 13:47:36 -0600 Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: davem@redhat.com Cc: kuznet@ms2.inr.ac.ru, linux-net@vger.kernel.org, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Krishna Kumar Date: Mon, 2 Jun 2003 12:46:55 -0700 X-MIMETrack: Serialize by Router on D03NM801/03/M/IBM(Release 6.0.1 [IBM]|May 27, 2003) at 06/02/2003 13:45:07 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 2809 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kumarkr@us.ibm.com Precedence: bulk X-list: netdev Regarding my previous mail : > > Again, what I proposed was to store prefix information on fib with > > some flags to represent advertised by routers and give user-space > > the RA information using new rtattr (RTA_RA6INFO or something like that). > > > > struct rta_ra6info { > > u32 rta_ra6flags; > > }; > In my mail, I had given problems with doing that in the fib. I can look to > convert to fib, but please let me know which kernel routines I should look at What I meant is whether you are referring to addrconf_prefix_route() when you mention storing prefix on fib ? > > Again, what I proposed was to store prefix information on fib with > > some flags to represent advertised by routers and give user-space > > the RA information using new rtattr (RTA_RA6INFO or something like that). > > This sounds very reasonable. Also since you prefer it to be implemented as part of routing table, would it be OK to return the prefix list via netstat or route command (this uses rt6_info_route() to print the information). The current user for prefix list has no problem using a command instead of writing netlink user code to get the list. I am not sure why rtnetlink is needed in this case. thanks, - KK From dlstevens@us.ibm.com Mon Jun 2 14:03:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:03:21 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52L322x032636 for ; Mon, 2 Jun 2003 14:03:09 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52L2EQs218628; Mon, 2 Jun 2003 17:02:14 -0400 Received: from d03nm121.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52L2Dba153836; Mon, 2 Jun 2003 15:02:13 -0600 Importance: Normal Sensitivity: Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: krkumar@us.ibm.com, davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: David Stevens Date: Mon, 2 Jun 2003 15:02:03 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.0.1 [IBM]|April 28, 2003) at 06/02/2003 15:02:13 MIME-Version: 1.0 Content-type: text/plain; charset=ISO-2022-JP X-archive-position: 2810 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev On the "location of the data" issue, I have some comment. The users of prefix list data don't know the prefix or the prefix length; they know the interface index, and need to get the prefixes. The data is fundamentally per-interface, and the routing table is per-destination. So, adding the prefixes to the routing table doesn't seem like the best choice because everything that currently uses the routing table will have to skip over these extra entries (which they'll never be interested in) and the users of the prefix data will have to skip over all existing routing table entries (which they're never interested in). Routes and prefixes are independent of each other, so throwing them in the same table to me seems like it only creates work to skip entries that aren't related, and because the users of the prefix data don't have the key needed for a fast look-up in the routing table, prefix users in particular have to skip through everything currently in the routing table, linearly, with no benefit at all for being there. I also see no relation between prefix list data and the FIB; current users are completely independent from prefix list users, and it appears to only slow both of them down. The prefix data is always looked-up by interface index, so I think it really belongs in the inet6 per-interface structure, unless I'm missing something. What benefits are there for lumping this with existing data structures that aren't per-interface, or keyed per-interface? +-DLS YOSHIFUJI Hideaki / $B5HF#1QL@(B @vger.kernel.org on 05/30/2003 07:02:49 PM Sent by: linux-net-owner@vger.kernel.org To: krkumar@us.ltcfwd.linux.ibm.com cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 In article <3ED80230.2030508@us.ibm.com> (at Fri, 30 May 2003 18:15:28 -0700), Krishna Kumar says: > +/* prefix list returned to user space in this structure */ > +struct plist_user_info { ^ip6 or ipv6 or so. > + char name[IFNAMSIZ]; /* interface name */ ~~~~~~~~~~~~~~~~~~~duplicate information. > + int ifindex; /* interface index */ > + int nprefixes; /* number of elements in 'prefix' */ > + struct var_plist_user_info { /* multiple elements */ > + char flags[3]; /* router advertised flags */ ~~~~~~~~this is not good interface. > + int plen; /* prefix length */ > + __u32 valid; /* valid lifetime */ > + struct in6_addr ra_addr;/* advertising router */ > + struct in6_addr prefix; /* prefix */ > + } plist_vars[0]; > +}; > + > extern void addrconf_init(void); > extern void addrconf_cleanup(void); > : I think we should use 1 fixed-length message per prefix, instead of variable length message. > + pinfo->plist_vars[count].plen = p_el->pinfo.prefix_len; > + pinfo->plist_vars[count].valid = p_el->pinfo.valid - > + (jiffies - p_el->timestamp)/HZ; > + if ((p_el->ra_flags & (ND_RA_FLAG_MANAGED | > + ND_RA_FLAG_OTHER)) > + == (ND_RA_FLAG_MANAGED|ND_RA_FLAG_OTHER)) > + strcpy(pinfo->plist_vars[count].flags, "MO"); > + else if (p_el->ra_flags & ND_RA_FLAG_MANAGED) > + strcpy(pinfo->plist_vars[count].flags, "M"); > + else if (p_el->ra_flags & ND_RA_FLAG_OTHER) > + strcpy(pinfo->plist_vars[count].flags, "O"); > + else > + strcpy(pinfo->plist_vars[count].flags, "-"); > + ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, > + &p_el->ra_addr); > + for (i = 0; i < 8; i++) > + pinfo->plist_vars[count].ra_addr.s6_addr16[i] = > + __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); > + ipv6_addr_copy(&pinfo->plist_vars[count].prefix, > + &p_el->pinfo.prefix); > + for (i = 0; i < p_el->pinfo.prefix_len/16; i++) > + pinfo->plist_vars[count].prefix.s6_addr16[i] = > + __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); Absoletely nasty. - don't use charaters to represent flags; use real flags. - use network-byte order. > +static int prefix_list_proc_dump(char *buffer, char **start, off_t offset, > + int length) > +{ : Please use seq_file. Again, what I proposed was to store prefix information on fib with some flags to represent advertised by routers and give user-space the RA information using new rtattr (RTA_RA6INFO or something like that). struct rta_ra6info { u32 rta_ra6flags; }; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From rddunlap@osdl.org Mon Jun 2 14:05:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:05:52 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52L5d2x000551 for ; Mon, 2 Jun 2003 14:05:40 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52L5FX04868; Mon, 2 Jun 2003 14:05:15 -0700 Date: Mon, 2 Jun 2003 14:04:52 -0700 From: "Randy.Dunlap" To: "David S. Miller" Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program Message-Id: <20030602140452.039248de.rddunlap@osdl.org> In-Reply-To: <20030530.234211.102567405.davem@redhat.com> References: <20030530090015.7c435c9a.rddunlap@osdl.org> <20030530.171111.71099698.davem@redhat.com> <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2811 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Fri, 30 May 2003 23:42:11 -0700 (PDT) "David S. Miller" wrote: | From: "Randy.Dunlap" | Date: Fri, 30 May 2003 20:22:12 -0700 (PDT) | | Oh well, it's at this URL, bugs and all. | | http://www.xenotime.net/linux/ipv6/rtnl_test.c | | I know you don't want to use libnetlink from iproute2, but I want to | stress that it takes care of all of the minutae of netlink socket | usage that you have to duplicate in your little test program and this | duplication leads to bugs. | | Firstly, you needs to be fixed to call recvmsg() multiple times, | you'll get one entry for each recvmsg call in the table you are | querying. Yes, I noticed that I was getting only 1 msg there. | You really need something like rtnl_talk() or rtnl_dump_filter() | from libnetlink to do this properly. Does anyone have documentation (or semantics) for rtnl_talk()? or just some blurb about it? Andi's libnetlink man page missed it somehow. Thanks, -- ~Randy From pb@bieringer.de Mon Jun 2 14:52:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:52:39 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52Lq02x002042 for ; Mon, 2 Jun 2003 14:52:21 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 098561387A; Mon, 2 Jun 2003 23:12:13 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 302D21387D; Mon, 2 Jun 2003 23:12:12 +0200 (CEST) X-AV-Checked: Mon Jun 2 23:12:12 2003 smtp2.aerasec.de Received: from [192.168.1.2] (p50805317.dip.t-dialin.net [80.128.83.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id D80391387A; Mon, 2 Jun 2003 23:12:10 +0200 (CEST) Date: Mon, 02 Jun 2003 23:12:08 +0200 From: Peter Bieringer To: Maillist netdev Cc: Maillist USAGI-users Subject: Is there already a doc available for the new IPsec code? Message-ID: <36990000.1054588328@worker.muc.bieringer.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2812 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi, i want to play a little bit with the new IPsec code. Is there already a doc available how to use it (config file of IKE daemon, etc., e.g. compared against the FreeS/WAN code). Thank you very much for input, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From acme@conectiva.com.br Mon Jun 2 14:57:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:57:57 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52LvV2x002424 for ; Mon, 2 Jun 2003 14:57:52 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YC5G-0004y9-00; Thu, 03 Jul 2003 18:57:43 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 8850C1966C; Mon, 2 Jun 2003 21:58:16 +0000 (UTC) Date: Mon, 2 Jun 2003 18:58:15 -0300 From: Arnaldo Carvalho de Melo To: Peter Bieringer Cc: Maillist netdev , Maillist USAGI-users Subject: Re: Is there already a doc available for the new IPsec code? Message-ID: <20030602215815.GL9312@conectiva.com.br> References: <36990000.1054588328@worker.muc.bieringer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <36990000.1054588328@worker.muc.bieringer.de> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2813 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Em Mon, Jun 02, 2003 at 11:12:08PM +0200, Peter Bieringer escreveu: > Hi, > > i want to play a little bit with the new IPsec code. > > Is there already a doc available how to use it (config file of IKE daemon, > etc., e.g. compared against the FreeS/WAN code). > > Thank you very much for input, Look at Bert Hubert's LART - Arnaldo From davem@redhat.com Mon Jun 2 14:58:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:58:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52LwC2x002530 for ; Mon, 2 Jun 2003 14:58:33 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA24872; Mon, 2 Jun 2003 14:56:20 -0700 Date: Mon, 02 Jun 2003 14:56:19 -0700 (PDT) Message-Id: <20030602.145619.71112623.davem@redhat.com> To: rddunlap@osdl.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <20030602140452.039248de.rddunlap@osdl.org> References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2814 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 14:04:52 -0700 Does anyone have documentation (or semantics) for rtnl_talk()? or just some blurb about it? I always have to wonder about someone who can't live with just working code to study, and absolutely requires some document describing it. What is better or more accurate description than code itself!?!?! From rddunlap@osdl.org Mon Jun 2 15:00:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 15:00:25 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52M012x003050 for ; Mon, 2 Jun 2003 15:00:21 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52LxfX22069; Mon, 2 Jun 2003 14:59:41 -0700 Date: Mon, 2 Jun 2003 14:59:17 -0700 From: "Randy.Dunlap" To: Arnaldo Carvalho de Melo Cc: pb@bieringer.de, netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: Is there already a doc available for the new IPsec code? Message-Id: <20030602145917.33fbd05d.rddunlap@osdl.org> In-Reply-To: <20030602215815.GL9312@conectiva.com.br> References: <36990000.1054588328@worker.muc.bieringer.de> <20030602215815.GL9312@conectiva.com.br> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2815 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Mon, 2 Jun 2003 18:58:15 -0300 Arnaldo Carvalho de Melo wrote: | Em Mon, Jun 02, 2003 at 11:12:08PM +0200, Peter Bieringer escreveu: | > Hi, | > | > i want to play a little bit with the new IPsec code. | > | > Is there already a doc available how to use it (config file of IKE daemon, | > etc., e.g. compared against the FreeS/WAN code). | > | > Thank you very much for input, | | Look at Bert Hubert's LART | | - Arnaldo that's www.lartc.org ... -- ~Randy From david-b@pacbell.net Mon Jun 2 18:56:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 18:56:11 -0700 (PDT) Received: from mta7.pltn13.pbi.net (mta7.pltn13.pbi.net [64.164.98.8]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h531u62x010771 for ; Mon, 2 Jun 2003 18:56:06 -0700 Received: from pacbell.net (ppp-67-118-246-97.dialup.pltn13.pacbell.net [67.118.246.97]) by mta7.pltn13.pbi.net (8.12.9/8.12.3) with ESMTP id h531trEQ002137; Mon, 2 Jun 2003 18:55:54 -0700 (PDT) Message-ID: <3EDC0047.7030007@pacbell.net> Date: Mon, 02 Jun 2003 18:56:23 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2816 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev David S. Miller wrote: > From: "Randy.Dunlap" > Date: Mon, 2 Jun 2003 14:04:52 -0700 > > Does anyone have documentation (or semantics) for rtnl_talk()? > or just some blurb about it? > > I always have to wonder about someone who can't live with just > working code to study, and absolutely requires some document > describing it. > > What is better or more accurate description than code itself!?!?! Well, the difference between code and its spec is generally a bug that needs to be fixed ... which can be in the code as well as in the spec. And for reasonable design specs, it's more likely in the code. But if there's only the code, it gets a lot more troublesome when things don't behave "as expected". People who are in a position to change the code to meet their expectations may not care, but that's rarely a significant chunk of the user community. And in particular, writing tests against the code is generally the wrong way to go. They need to be written against some kind of spec. - Dave From davem@redhat.com Mon Jun 2 19:04:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:04:42 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5324a2x011315 for ; Mon, 2 Jun 2003 19:04:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA25475; Mon, 2 Jun 2003 19:02:41 -0700 Date: Mon, 02 Jun 2003 19:02:40 -0700 (PDT) Message-Id: <20030602.190240.74724523.davem@redhat.com> To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC0047.7030007@pacbell.net> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0047.7030007@pacbell.net> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2817 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Brownell Date: Mon, 02 Jun 2003 18:56:23 -0700 Well, the difference between code and its spec is generally a bug that needs to be fixed ... See, a document is NOT the spec, the code is the spec. Because where the document is wrong, the code determines the final answer. This is true in all cases. I cannot tell you how much time I've seen people waste because they went for documents first, only to find them to be inaccurate for some corner case whilst the code has all of the accurate answers. When I see someone want docs, I interpret this as "I don't want to have to think or have to comprehend something, I'm too lazy to read the code." Well, such laziness leads the person in question only to be suscpetible to all of the inaccuracies and disconnect that always will exist between said docs (if they even exist) and the code. It is also the mechanism that leads people to send patches that add arbitrary crap all over the ipv4/ipv6 code, totally missing the point that the routing and/or netlink layer did %99 of what they wanted already. For example, I added a hoplimit route attribute to RTNETLINK. Who documented this? What document can you read that would teach you about this feature? None. And don't tell me this is a doc bug, every time I make a change the documentation will be instantly buggy and I'm not going to be required to document every diff I make to the tree. From jsd@monmouth.com Mon Jun 2 19:34:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:34:31 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h532Y42x012360 for ; Mon, 2 Jun 2003 19:34:25 -0700 Received: (qmail 4037 invoked from network); 3 Jun 2003 02:33:59 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 02:33:59 -0000 Message-ID: <3EDC0915.1080109@monmouth.com> Date: Mon, 02 Jun 2003 22:33:57 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, David Brownell Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> In-Reply-To: <20030602.145619.71112623.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2818 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/02/2003 05:56 PM, David S. Miller wrote: >> >> I always have to wonder about someone who can't live with just >> working code to study, and absolutely requires some document >> describing it. >> >> What is better or more accurate description than code itself!?!?! I am very grateful for the kindness of those who have written the code in question and made it available to help others. I wish to repay that with help and kindness, not the opposite. Today I can help by speaking the truth, and the truth is that documentation is sorely needed. There's no point in writing code if few people use it. Linux is hanging in there at a few percent market share. That's not going to grow unless there is better documentation. On 06/02/2003 09:56 PM, David Brownell replied: > > Well, the difference between code and its spec is > generally a bug that needs to be fixed ... which > can be in the code as well as in the spec. And for > reasonable design specs, it's more likely in the code. > > But if there's only the code, it gets a lot more > troublesome when things don't behave "as expected". > > People who are in a position to change the code > to meet their expectations may not care, but that's > rarely a significant chunk of the user community. > > And in particular, writing tests against the code > is generally the wrong way to go. They need to be > written against some kind of spec. I have to agree with Mr. Brownell on this one. >> What is better or more accurate description than code itself!?!?! There are two ideas mixed up there. -- Better documentation. -- Accurate description. 1) Yes, code is the most accurate description of the code. But it is not to be confused with good documentation. 2) Documentation should be clear and concise. Code must attend to all the details. 3) Sometimes efficiency requires that the code be tricky. Documentation must not be tricky. 4) In theory, very well-commented code might approximate its own documentation. But I haven't seen any such code lately. Here are _all_ the comments from xfrm_input.c, a 454-line file: /* Fetch spi and seq frpm ipsec header */ /* Allocate new secpath or COW existing one. */ /* Fetch spi and seq frpm ipsec header */ iph = skb->nh.ipv6h; /* ??? */ if (x->props.mode) { /* XXX */ /* Allocate new secpath or COW existing one. */ #endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ If you were teaching a programming course, how would you grade an assignment turned in with that level of commenting? 5) In the engine of a piston-driven airplane, there are two spark plugs in every cylinder. Obviously that costs twice as much and weighs twice as much as having only one. But the redundancy makes the system thousands of times more reliable. Similarly, writing code _and_ documetation is about twice as expensive as writing the code alone. But the redundancy makes it possible to achieve much greater reliability. Also maintainability and extensibility. 6) Adding extra vehemence '!?!?!' does not add clarity to the discussion. *) et cetera. From davem@redhat.com Mon Jun 2 19:40:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:40:56 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h532en2x012713 for ; Mon, 2 Jun 2003 19:40:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA25623; Mon, 2 Jun 2003 19:38:53 -0700 Date: Mon, 02 Jun 2003 19:38:53 -0700 (PDT) Message-Id: <20030602.193853.112598236.davem@redhat.com> To: jsd@monmouth.com Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, david-b@pacbell.net Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC0915.1080109@monmouth.com> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2819 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 22:33:57 -0400 There's no point in writing code if few people use it. People use this "undocumented" area of the kernel every time their machine boots up. From jsd@monmouth.com Mon Jun 2 20:21:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:21:10 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533L12x013419 for ; Mon, 2 Jun 2003 20:21:02 -0700 Received: (qmail 4663 invoked from network); 3 Jun 2003 03:20:56 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 03:20:56 -0000 Message-ID: <3EDC1418.6080808@monmouth.com> Date: Mon, 02 Jun 2003 23:20:56 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: netlink tester program References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> In-Reply-To: <20030602.193853.112598236.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2820 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev I wrote in part: > > There's no point in writing code if few people > use it. On 06/02/2003 10:38 PM, David S. Miller wrote: > > People use this "undocumented" area of the kernel every > time their machine boots up. That's not inconsistent with what I was saying. Mr. Miller said people use it. That's true. Some people use it. I said few people use it. That's true. The context of my original statement was: > Linux is hanging in there at a few > percent market share. That's not going to grow > unless there is better documentation. This is supposed to be open-source software, n'est-ce pas? Software that is copylefted but not documented is open according to the letter of the law, but lacks the spirit of openness. Mr. Miller is very smart and has spent years getting up to speed in this area. Is the code to be open only to those who are equally smart and willing to invest equally huge amounts of time? When people ask for help in understanding the code, it might mean they need help in understanding the code. From davem@redhat.com Mon Jun 2 20:24:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:24:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533OR2x013749 for ; Mon, 2 Jun 2003 20:24:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25722; Mon, 2 Jun 2003 20:22:33 -0700 Date: Mon, 02 Jun 2003 20:22:33 -0700 (PDT) Message-Id: <20030602.202233.39180859.davem@redhat.com> To: jsd@monmouth.com Cc: netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC1418.6080808@monmouth.com> References: <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> <3EDC1418.6080808@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2821 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 23:20:56 -0400 Mr. Miller is very smart and has spent years getting up to speed in this area. Is the code to be open only to those who are equally smart and willing to invest equally huge amounts of time? Are legal rights only available to people who understand the law and have a legal degree? No, this is why we hire lawyers if we choose not to study law ourselves. Your logic is heavily flawed. From david-b@pacbell.net Mon Jun 2 20:32:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:32:48 -0700 (PDT) Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533WO2x014086 for ; Mon, 2 Jun 2003 20:32:44 -0700 Received: from pacbell.net (ppp-67-118-247-59.dialup.pltn13.pacbell.net [67.118.247.59]) by mta4.rcsntx.swbell.net (8.12.9/8.12.3) with ESMTP id h533WDhi012216; Mon, 2 Jun 2003 22:32:18 -0500 (CDT) Message-ID: <3EDC173B.80909@pacbell.net> Date: Mon, 02 Jun 2003 20:34:19 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2823 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev > Well, the difference between code and its spec is > generally a bug that needs to be fixed ... > > See, a document is NOT the spec, the code is the spec. That's hardly the only development model. > Because where the document is wrong, the code determines > the final answer. This is true in all cases. Not "all". "Code-as-spec" works well when there's only one code base, but otherwise it's flawed. Even the model of a "reference implementation" is trouble ... since it invariably evolves into "everyone should use this code". Of course, bugs that stay unfixed for a long time can force the "spec" to change. It's a great vendor lock-in tool, and it can happen accidentally too. But most folk view such interop problems as bugs, not features. > I cannot tell you how much time I've seen people waste because they > went for documents first, only to find them to be inaccurate for some > corner case whilst the code has all of the accurate answers. Or where they notice the code is wrong in that corner case, and they can prove that easily since the spec (implemented correctly in several other places) and the code disagree. Or where this implementation uses this answer, and that one uses that answer ... and the poor user gets caught in the middle of a finger pointing war, which can't be resolved since each implementation's developers claim to be "the spec", and the user eventually gives up saying "a pox on you all!" You clipped out the text where I pointed out that bugs can be in specs as well as code. They can be fixed there, too. > When I see someone want docs, I interpret this as "I don't want to > have to think or have to comprehend something, I'm too lazy to read > the code." Well, such laziness leads the person in question only > to be suscpetible to all of the inaccuracies and disconnect that > always will exist between said docs (if they even exist) and the > code. That's an *extremely negative interpretation*, and while I've seen people that are that lazy, they happen to be in the minority of people I've known to ask for docs/specs. (Thank the Gods!) Consider one thing that docs/specs do that code can't: give the "30,000 foot view" rather than the "tree level view". It's not "lazy" to avoid using the tree-level view; sometimes such low-level perspectives can be counterproductive. People ask for docs for lots of reasons, and most of them have nothing at all to do with laziness. - Dave From rddunlap@osdl.org Mon Jun 2 20:32:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:32:34 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533WS2x014085 for ; Mon, 2 Jun 2003 20:32:29 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533WMX27576; Mon, 2 Jun 2003 20:32:22 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533WM5C029705; Mon, 2 Jun 2003 20:32:22 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:32:22 -0700 (PDT) Message-ID: <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:32:22 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.145619.71112623.davem@redhat.com> References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2822 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > From: "Randy.Dunlap" > Date: Mon, 2 Jun 2003 14:04:52 -0700 > > Does anyone have documentation (or semantics) for rtnl_talk()? > or just some blurb about it? > > I always have to wonder about someone who can't live with just > working code to study, and absolutely requires some document > describing it. > > What is better or more accurate description than code itself!?!?! The code is absolute, no doubt about it. It is the authority. That doesn't make it right in all cases AFAIK. And it lacks documentation, even in the source files. There are no semantics or meaning associated with that code except by the people who developed it. I'm not one of them, so I'm trying to ask them or others who know. And yes, I'm looking for the way that it should be done (IMHO) instead of the way it is done. Now, given that I think that the netlink interface is poorly documented, and that I'm trying to add some kernel code that uses it, and that I'm trying to test said kernel code with a userspace test program, I also plan to add such documentation that I think is warranted to make it easy to use, even by non-kernel devevlopers. This documentation might end up living outside of the kernel tree -- that's OK. But in any case, from both private and mailing list emails, I'm not alone in thinking that it's needed. ~Randy From david-b@pacbell.net Mon Jun 2 20:35:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:35:10 -0700 (PDT) Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533Z62x014700 for ; Mon, 2 Jun 2003 20:35:06 -0700 Received: from pacbell.net (ppp-67-118-247-59.dialup.pltn13.pacbell.net [67.118.247.59]) by mta4.rcsntx.swbell.net (8.12.9/8.12.3) with ESMTP id h533Z1hi017198; Mon, 2 Jun 2003 22:35:01 -0500 (CDT) Message-ID: <3EDC17E4.6070506@pacbell.net> Date: Mon, 02 Jun 2003 20:37:08 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "John S. Denker" CC: "David S. Miller" , rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2824 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev John S. Denker wrote: > Similarly, writing code _and_ documetation is about > twice as expensive as writing the code alone. But > the redundancy makes it possible to achieve much > greater reliability. Also maintainability and > extensibility. Excellent points. The developement process needs to address communities other than the folk writing the code ... like the people who inherit that code, and the ones trying to use it. (Testing being a rather specialize type of "use".) - Dave From davem@redhat.com Mon Jun 2 20:37:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:37:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533b12x015003 for ; Mon, 2 Jun 2003 20:37:01 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25846; Mon, 2 Jun 2003 20:35:05 -0700 Date: Mon, 02 Jun 2003 20:35:05 -0700 (PDT) Message-Id: <20030602.203505.59678701.davem@redhat.com> To: rddunlap@osdl.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2825 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:32:22 -0700 (PDT) The code is absolute, no doubt about it. It is the authority. That doesn't make it right in all cases AFAIK. I totally agree. Now, given that I think that the netlink interface is poorly documented, and that I'm trying to add some kernel code that uses it, and that I'm trying to test said kernel code with a userspace test program, I also plan to add such documentation that I think is warranted to make it easy to use, even by non-kernel devevlopers. This is exactly how things should work. Where there is a need for X _AND_ someone willing to create X, it will be created. No arguments from me on this :-) From jsd@monmouth.com Mon Jun 2 20:41:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:41:48 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533fi2x015402 for ; Mon, 2 Jun 2003 20:41:44 -0700 Received: (qmail 4890 invoked from network); 3 Jun 2003 03:41:39 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 03:41:39 -0000 Message-ID: <3EDC18F2.6090505@monmouth.com> Date: Mon, 02 Jun 2003 23:41:38 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: netlink tester program References: <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> In-Reply-To: <20030602.202233.39180859.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2826 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/02/2003 11:22 PM, David S. Miller wrote: > > Are legal rights only available to people who understand > the law and have a legal degree? > > No, this is why we hire lawyers if we choose not to study > law ourselves. If we are taking the legal system as our model of openness, then open-source software has come to a sorry pass indeed. ========= It is also important to distinguish what's best for *you* and what's best for the project. Maybe *you* don't want to be responsible for doing all the documentation. I can understand that. But the project as a whole would be better off it it had better documentation. Perhaps you could recruit other folks to help with this. But disdaining the whole concept isn't a good way to start the recruiting. From davem@redhat.com Mon Jun 2 20:44:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:44:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533iH2x015769 for ; Mon, 2 Jun 2003 20:44:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25861; Mon, 2 Jun 2003 20:38:35 -0700 Date: Mon, 02 Jun 2003 20:38:34 -0700 (PDT) Message-Id: <20030602.203834.115933659.davem@redhat.com> To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC173B.80909@pacbell.net> References: <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> <3EDC173B.80909@pacbell.net> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2827 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Brownell Date: Mon, 02 Jun 2003 20:34:19 -0700 > See, a document is NOT the spec, the code is the spec. That's hardly the only development model. It's the one that works for _me_ and Alexey and myself, and we're the ones doing all the work. When someone doing the work desires the docs and desires to WRITE it, it will appear. You can expect exactly nothing more in our development model. If you require me to write the docs, you misunderstand how the system works :) You clipped out the text where I pointed out that bugs can be in specs as well as code. They can be fixed there, too. Very true. So when Randy writes the more detailed netlink/rtnetlink docs, we'll be happy :-) There is even an official IETF RFC written by Jamal, Alexey, and others documenting netlink btw :-)))))))))))) Did anybody notice this? From davem@redhat.com Mon Jun 2 20:48:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:48:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533me2x016214 for ; Mon, 2 Jun 2003 20:48:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25892; Mon, 2 Jun 2003 20:46:45 -0700 Date: Mon, 02 Jun 2003 20:46:45 -0700 (PDT) Message-Id: <20030602.204645.48505284.davem@redhat.com> To: jsd@monmouth.com Cc: netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC18F2.6090505@monmouth.com> References: <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> <3EDC18F2.6090505@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2828 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 23:41:38 -0400 If we are taking the legal system as our model of openness, then open-source software has come to a sorry pass indeed. It does have connections where a "user" wants to do something with FOO but does not wish to do the legwork necessary to be an expert in FOO. They hire an expert. Or, in our case, they make an expert interested in the thing they want to do :-))) It is also important to distinguish what's best for *you* and what's best for the project. Maybe *you* don't want to be responsible for doing all the documentation. I'm not even going to attempt to document something that moves as fast as the kernel. I go to bookstores and I see many excellent attempts to document kernel internals, but these books are frozen in time. Specifically they are frozen in the time of the moment the kernel they write for is published. As a consequence they are all obsolete the moment they are published. Some poor student reads these books, written against 2.4.8 or whatever, then they go and try to contribute to 2.5.x and it doesn't work except for certain kinds of drivers where we've kept the APIs more or less the same. But I don't care that people do this, just don't require that I do it. I think this extra fluidity we get from being able to change so fast is a strength not a weakness. From rddunlap@osdl.org Mon Jun 2 20:49:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:49:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533nd2x016498 for ; Mon, 2 Jun 2003 20:49:39 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533nWX32403; Mon, 2 Jun 2003 20:49:32 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533nW5C030705; Mon, 2 Jun 2003 20:49:32 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:49:32 -0700 (PDT) Message-ID: <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:49:32 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.203834.115933659.davem@redhat.com> References: <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> <3EDC173B.80909@pacbell.net> <20030602.203834.115933659.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , , , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2829 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > From: David Brownell > Date: Mon, 02 Jun 2003 20:34:19 -0700 > > > See, a document is NOT the spec, the code is the spec. > > That's hardly the only development model. > > It's the one that works for _me_ and Alexey and myself, and we're the ones > doing all the work. Do you want it to remain that way? > When someone doing the work desires the docs and desires to > WRITE it, it will appear. > > You can expect exactly nothing more in our development model. > If you require me to write the docs, you misunderstand how the > system works :) > > You clipped out the text where I pointed out that bugs can > be in specs as well as code. They can be fixed there, too. > > Very true. So when Randy writes the more detailed netlink/rtnetlink docs, > we'll be happy :-) > > There is even an official IETF RFC written by Jamal, Alexey, and > others documenting netlink btw :-)))))))))))) > > Did anybody notice this? Yes. ~Randy From rddunlap@osdl.org Mon Jun 2 20:54:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:54:11 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533s62x016967 for ; Mon, 2 Jun 2003 20:54:07 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533s1X00402; Mon, 2 Jun 2003 20:54:01 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533s15C030774; Mon, 2 Jun 2003 20:54:01 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:54:01 -0700 (PDT) Message-ID: <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:54:01 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.204645.48505284.davem@redhat.com> References: <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> <3EDC18F2.6090505@monmouth.com> <20030602.204645.48505284.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2831 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > It is also important to distinguish what's best > for *you* and what's best for the project. > Maybe *you* don't want to be responsible for > doing all the documentation. > > I'm not even going to attempt to document something that > moves as fast as the kernel. That point is a real problem... > I go to bookstores and I see many excellent attempts to document > kernel internals, but these books are frozen in time. Specifically they are > frozen in the time of the moment the kernel they write for is published. As > a consequence they are all obsolete the moment they are published. No doubt. > Some poor student reads these books, written against 2.4.8 or > whatever, then they go and try to contribute to 2.5.x and it > doesn't work except for certain kinds of drivers where we've > kept the APIs more or less the same. > > But I don't care that people do this, just don't require that I do it. Sure. Are you willing to answer questions about it at least? > I think this extra fluidity we get from being able to change so fast is a > strength not a weakness. If it were only a strength, that would be great. I believe that it's both, however. ~Randy From davem@redhat.com Mon Jun 2 20:53:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:53:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533rt2x016943 for ; Mon, 2 Jun 2003 20:53:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25928; Mon, 2 Jun 2003 20:51:56 -0700 Date: Mon, 02 Jun 2003 20:51:56 -0700 (PDT) Message-Id: <20030602.205156.08346169.davem@redhat.com> To: rddunlap@osdl.org Cc: david-b@pacbell.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> References: <3EDC173B.80909@pacbell.net> <20030602.203834.115933659.davem@redhat.com> <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2830 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:49:32 -0700 (PDT) > It's the one that works for _me_ and Alexey and myself, and we're > the ones doing all the work. Do you want it to remain that way? Doesn't matter to _us_, _we_ know how these things work and how to use them. If we don't, we'll read the code to learn this. Other's care, and if someone writes the docs for _them_, that is _fine_. What I object to is "hey we have to have docs, why didn't dave and alexey write them". :) Franks a lot, David S. Miller davem@redhat.com From davem@redhat.com Mon Jun 2 20:56:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:56:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533uL2x017606 for ; Mon, 2 Jun 2003 20:56:21 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25955; Mon, 2 Jun 2003 20:54:26 -0700 Date: Mon, 02 Jun 2003 20:54:25 -0700 (PDT) Message-Id: <20030602.205425.21904841.davem@redhat.com> To: rddunlap@osdl.org Cc: jsd@monmouth.com, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> References: <3EDC18F2.6090505@monmouth.com> <20030602.204645.48505284.davem@redhat.com> <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2832 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:54:01 -0700 (PDT) > But I don't care that people do this, just don't require that I do it. Sure. Are you willing to answer questions about it at least? As long as others like Alexey and Jamal help field these questions and it's not just me sitting here becoming a Linux development support service :-) From pekkas@netcore.fi Mon Jun 2 21:48:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 21:48:56 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h534mf2x019134 for ; Mon, 2 Jun 2003 21:48:42 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h534mDd09020; Tue, 3 Jun 2003 07:48:13 +0300 Date: Tue, 3 Jun 2003 07:48:13 +0300 (EEST) From: Pekka Savola To: David Stevens cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , Subject: Re: [PATCH] Prefix List patch against 2.5.70 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 2833 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Mon, 2 Jun 2003, David Stevens wrote: > The users of prefix list data don't know the prefix or the prefix > length; they know the interface index, and need to get the prefixes. > > The data is fundamentally per-interface, and the routing table is > per-destination. So, adding the prefixes to the routing table doesn't seem > like the best choice because everything that currently uses the routing > table will have to skip over these extra entries (which they'll never be > interested in) Umm.. every prefix should have an interface route, so they're a required subset of the routing table, correct? > and the users of the prefix data will have to skip over all > existing routing table entries (which they're never interested in). ... > Routes and prefixes are independent of each other, so throwing them in the > same table to me seems like it only creates work to skip entries that > aren't related, and because the users of the prefix data don't have the key > needed for a fast look-up in the routing table, prefix users in particular > have to skip through everything currently in the routing table, linearly, > with no benefit at all for being there. > > I also see no relation between prefix list data and the FIB; current users > are completely independent from prefix list users, and it appears to only > slow both of them down. The prefix data is always looked-up by interface > index, so I think it really belongs in the inet6 per-interface structure, > unless I'm missing something. What benefits are there for lumping this with > existing data structures that aren't per-interface, or keyed per-interface? > > +-DLS > > > YOSHIFUJI Hideaki / $B5HF#1QL@(B @vger.kernel.org on > 05/30/2003 07:02:49 PM > > Sent by: linux-net-owner@vger.kernel.org > > > To: krkumar@us.ltcfwd.linux.ibm.com > cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, > netdev@oss.sgi.com, linux-net@vger.kernel.org > Subject: Re: [PATCH] Prefix List patch against 2.5.70 > > > > In article <3ED80230.2030508@us.ibm.com> (at Fri, 30 May 2003 18:15:28 > -0700), Krishna Kumar says: > > > +/* prefix list returned to user space in this structure */ > > +struct plist_user_info { > ^ip6 or ipv6 or so. > > + char name[IFNAMSIZ]; /* interface name */ > ~~~~~~~~~~~~~~~~~~~duplicate information. > > + int ifindex; /* interface index */ > > + int nprefixes; /* number of elements in 'prefix' */ > > + struct var_plist_user_info { /* multiple elements */ > > + char flags[3]; /* router advertised flags */ > ~~~~~~~~this is not good interface. > > + int plen; /* prefix length */ > > + __u32 valid; /* valid lifetime */ > > + struct in6_addr ra_addr;/* advertising router */ > > + struct in6_addr prefix; /* prefix */ > > + } plist_vars[0]; > > +}; > > + > > extern void addrconf_init(void); > > extern void addrconf_cleanup(void); > > > > : > > I think we should use 1 fixed-length message per prefix, > instead of variable length message. > > > > + pinfo->plist_vars[count].plen = p_el->pinfo.prefix_len; > > + pinfo->plist_vars[count].valid = p_el->pinfo.valid - > > + (jiffies - p_el->timestamp)/HZ; > > + if ((p_el->ra_flags & (ND_RA_FLAG_MANAGED | > > + ND_RA_FLAG_OTHER)) > > + == (ND_RA_FLAG_MANAGED|ND_RA_FLAG_OTHER)) > > + strcpy(pinfo->plist_vars[count].flags, "MO"); > > + else if (p_el->ra_flags & ND_RA_FLAG_MANAGED) > > + strcpy(pinfo->plist_vars[count].flags, "M"); > > + else if (p_el->ra_flags & ND_RA_FLAG_OTHER) > > + strcpy(pinfo->plist_vars[count].flags, "O"); > > + else > > + strcpy(pinfo->plist_vars[count].flags, "-"); > > + ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, > > + &p_el->ra_addr); > > + for (i = 0; i < 8; i++) > > + pinfo->plist_vars[count].ra_addr.s6_addr16[i] = > > + > __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); > > + ipv6_addr_copy(&pinfo->plist_vars[count].prefix, > > + &p_el->pinfo.prefix); > > + for (i = 0; i < p_el->pinfo.prefix_len/16; i++) > > + pinfo->plist_vars[count].prefix.s6_addr16[i] = > > + > __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); > > Absoletely nasty. > - don't use charaters to represent flags; use real flags. > - use network-byte order. > > > > +static int prefix_list_proc_dump(char *buffer, char **start, off_t > offset, > > + int length) > > +{ > : > > Please use seq_file. > > > Again, what I proposed was to store prefix information on fib with > some flags to represent advertised by routers and give user-space > the RA information using new rtattr (RTA_RA6INFO or something like that). > > struct rta_ra6info { > u32 rta_ra6flags; > }; > > -- > Hideaki YOSHIFUJI @ USAGI Project > GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA > > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From davem@redhat.com Mon Jun 2 21:52:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 21:52:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h534qE2x019507 for ; Mon, 2 Jun 2003 21:52:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA26095; Mon, 2 Jun 2003 21:49:41 -0700 Date: Mon, 02 Jun 2003 21:49:41 -0700 (PDT) Message-Id: <20030602.214941.102551312.davem@redhat.com> To: pekkas@netcore.fi Cc: dlstevens@us.ibm.com, yoshfuji@linux-ipv6.org, krkumar@us.ibm.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2834 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Pekka Savola Date: Tue, 3 Jun 2003 07:48:13 +0300 (EEST) Umm.. every prefix should have an interface route, so they're a required subset of the routing table, correct? That's entirely correct, thanks for noticing this :-) This is why I said that they could add to a global list all routes that meet this criteria. Thus making any querying mechanism simple to implement. From vnuorval@tcs.hut.fi Mon Jun 2 23:41:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 23:42:09 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h536fv2x023287 for ; Mon, 2 Jun 2003 23:41:58 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Tue, 3 Jun 2003 09:35:54 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h536ZqjH019079; Tue, 3 Jun 2003 09:35:52 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h536ZjF1019075; Tue, 3 Jun 2003 09:35:46 +0300 Date: Tue, 3 Jun 2003 09:35:45 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030531.000319.114704530.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2836 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Sat, 31 May 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Let us test the patch. It seemed buggy when USAGI tested before. Any feedback would have been (and still is) of course welcome. The bugs are much easier to locate and fix if people report about them :-) -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From dlstevens@us.ibm.com Mon Jun 2 23:43:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 23:43:29 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h536hP2x023608 for ; Mon, 2 Jun 2003 23:43:25 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h536gXuT188876; Tue, 3 Jun 2003 02:42:33 -0400 Received: from d03nm121.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h536gWba166304; Tue, 3 Jun 2003 00:42:33 -0600 Importance: Normal Sensitivity: Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: Pekka Savola Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , krkumar@us.ibm.com, , , , X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: David Stevens Date: Tue, 3 Jun 2003 00:42:25 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.0.1 [IBM]|April 28, 2003) at 06/03/2003 00:42:32 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 2837 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev >Umm.. every prefix should have an interface route, so they're a required >subset of the routing table, correct? I'm not sure that it has to be, except if you make that the only way to access prefix list information. :-) Administrators and routing daemons are free to mess with the routing table in creative ways (aggregating, creating static routes to enforce some policy, whatever), but when the routing table holds more than just routing information, either those routes can't be messed with, or the prefix information is lost. And that's more relevant when not using address autoconfiguration. The prefix list information that's relevant is the prefix, the prefix length and the M and O bits, as they were in the router advertisement. For routing purposes, it wouldn't be a problem to aggregate interface routes that cover a contiguous portion of the address space, but doing that would lose the prefix information if the routing table is your only source. So, routing daemons would have to check a funky flag and leave prefix-list-relevant routes alone. M and O bits are per-interface; they have no relevance at all in the routing table, but they'd all have to be updated if they changed. There is an example already where routes are installed for per-interface information: local addresses. There are host routes corresponding to local addresses in the routing table now, but there is also a list of local addresses associated with the interface. Is that a bad idea? Certainly, it's possible to flag all of the host routes that are for local addresses (really, just check for interface loopback) and search the entire routing table when trying to answer the question "what addresses are on this interface," but it's much better to have that address list associated directly with the interface (especially for source selection). The consumers of prefix list (DHCPv6 and mobile IPv6) need the entire prefix list, length and M&O bits for a given interface. The prefixes (the key) aren't known for the search, and no other interfaces or destination routes are ever interesting for those consumers. The interface routes can be deleted, forced to something else, or modified now without losing any information, because they are only relevant for packet routing. If the prefix information is divined from the routing table, the interface routes suddenly contain more than routing information, and should then have special restrictions on them that other routes don't have (they should be immutable). I don't think that's a good idea, when you can hang the prefix list right off the interface and return the full list whenever you need it. The interface routes can be overridden or aggregated without messing at all with the prefix list information. That seems pretty simple to me. +-DLS From etsh_cucu@yahoo.com Tue Jun 3 00:57:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 00:57:46 -0700 (PDT) Received: from web14305.mail.yahoo.com (web14305.mail.yahoo.com [216.136.173.81]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h537vg2x025791 for ; Tue, 3 Jun 2003 00:57:42 -0700 Message-ID: <20030603075742.34434.qmail@web14305.mail.yahoo.com> Received: from [213.158.161.140] by web14305.mail.yahoo.com via HTTP; Tue, 03 Jun 2003 00:57:42 PDT Date: Tue, 3 Jun 2003 00:57:42 -0700 (PDT) From: Hisham Kotry Subject: Re: netlink tester program To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030602.203834.115933659.davem@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2838 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: etsh_cucu@yahoo.com Precedence: bulk X-list: netdev --- "David S. Miller" wrote: > There is even an official IETF RFC written by Jamal, > Alexey, and > others documenting netlink btw :-)))))))))))) > > Did anybody notice this? It was defenitly a nice read, but the netlink2 draft is somewhat inconsistent, it mentions reducing the 32-bit length field to 16-bits and equally distributing the remaining 16-bits between the new version and extended flags fields, but the draft makes no further refrence to the version field. Infact the netlink2 message header diagram on page 16, as well as the pseudo message on page 28, show a 16-bits extended flags field with no version field in the header. So this is probably one of those cases in wich specs aren't clear enough and code usually has the final word in such situations. I mailed Jamal about this a while ago but never got a reply back. BTW, is netlink2 support planned for linux in the near future? David, sorry for the private mail, but it was unintentional as I (by mistake) pressed reply instead of reply all. Chaow, kotry __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com From hch@lst.de Tue Jun 3 01:23:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 01:23:43 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h538NT2x028700 for ; Tue, 3 Jun 2003 01:23:31 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h538NRJT022960 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 3 Jun 2003 10:23:27 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h538NQ0w022958 for netdev@oss.sgi.com; Tue, 3 Jun 2003 10:23:26 +0200 Date: Tue, 3 Jun 2003 10:23:26 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] move dmascc away from setup.c Message-ID: <20030603082326.GA22946@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2839 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Yeah, it's a isa driver but it already was in the setup.c pci probes list. Also use SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT. --- 1.15/drivers/net/setup.c Thu May 29 11:57:13 2003 +++ edited/drivers/net/setup.c Mon Jun 2 12:16:07 2003 @@ -9,8 +9,6 @@ #include #include -extern int dmascc_init(void); - extern int scc_enet_init(void); extern int fec_enet_init(void); @@ -29,10 +27,6 @@ /* * Early setup devices */ - -#if defined(CONFIG_DMASCC) - {dmascc_init, 0}, -#endif #if defined(CONFIG_SCC_ENET) {scc_enet_init, 0}, #endif --- 1.15/drivers/net/hamradio/dmascc.c Fri May 30 08:50:31 2003 +++ edited/drivers/net/hamradio/dmascc.c Mon Jun 2 12:31:48 2003 @@ -250,8 +250,6 @@ /* Function declarations */ - -int dmascc_init(void) __init; static int setup_adapter(int card_base, int type, int n) __init; static void write_scc(struct scc_priv *priv, int reg, int val); @@ -299,23 +297,12 @@ static unsigned long rand; -/* Module functions */ - -#ifdef MODULE - - MODULE_AUTHOR("Klaus Kudielka"); MODULE_DESCRIPTION("Driver for high-speed SCC boards"); MODULE_PARM(io, "1-" __MODULE_STRING(MAX_NUM_DEVS) "i"); MODULE_LICENSE("GPL"); - -int init_module(void) { - return dmascc_init(); -} - - -void cleanup_module(void) { +static void __exit dmascc_exit(void) { int i; struct scc_info *info; @@ -341,24 +328,16 @@ } } - -#else - - +#ifndef MODULE void __init dmascc_setup(char *str, int *ints) { int i; for (i = 0; i < MAX_NUM_DEVS && i < ints[0]; i++) io[i] = ints[i+1]; } - - #endif - -/* Initialization functions */ - -int __init dmascc_init(void) { +static int __init dmascc_init(void) { int h, i, j, n; int base[MAX_NUM_DEVS], tcmd[MAX_NUM_DEVS], t0[MAX_NUM_DEVS], t1[MAX_NUM_DEVS]; @@ -461,6 +440,9 @@ return -EIO; } +module_init(dmascc_init); +module_exit(dmascc_exit); + int __init setup_adapter(int card_base, int type, int n) { int i, irq, chip; @@ -580,6 +562,7 @@ if (sizeof(dev->name) == sizeof(char *)) dev->name = priv->name; #endif sprintf(dev->name, "dmascc%i", 2*n+i); + SET_MODULE_OWNER(dev); dev->base_addr = card_base; dev->irq = irq; dev->open = scc_open; @@ -707,12 +690,9 @@ struct scc_info *info = priv->info; int card_base = priv->card_base; - MOD_INC_USE_COUNT; - /* Request IRQ if not already used by other channel */ if (!info->irq_used) { if (request_irq(dev->irq, scc_isr, 0, "dmascc", info)) { - MOD_DEC_USE_COUNT; return -EAGAIN; } } @@ -722,7 +702,6 @@ if (priv->param.dma >= 0) { if (request_dma(priv->param.dma, "dmascc")) { if (--info->irq_used == 0) free_irq(dev->irq, info); - MOD_DEC_USE_COUNT; return -EAGAIN; } else { unsigned long flags = claim_dma_lock(); @@ -866,7 +845,6 @@ } if (--info->irq_used == 0) free_irq(dev->irq, info); - MOD_DEC_USE_COUNT; return 0; } From aj@dungeon.inka.de Tue Jun 3 02:10:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 02:10:14 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h539A72x032064 for ; Tue, 3 Jun 2003 02:10:08 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19N7nx-0002K9-00; Tue, 03 Jun 2003 11:10:05 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 9D90820FAD; Tue, 3 Jun 2003 10:11:10 +0200 (CEST) From: Andreas Jellinghaus To: Peter Bieringer , Maillist netdev Subject: Re: Is there already a doc available for the new IPsec code? Date: Tue, 3 Jun 2003 10:13:02 +0200 User-Agent: KMail/1.5.2 Cc: Maillist USAGI-users References: <36990000.1054588328@worker.muc.bieringer.de> In-Reply-To: <36990000.1054588328@worker.muc.bieringer.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306031013.02982.aj@dungeon.inka.de> X-archive-position: 2840 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev I will send him my notes (too large for this list). except for the kernel config, the netbsd ipsec howto is a very good source. Andreas From bunk@fs.tum.de Tue Jun 3 06:03:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 06:03:44 -0700 (PDT) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53D3H2x009171 for ; Tue, 3 Jun 2003 06:03:39 -0700 Received: (qmail 3223 invoked from network); 3 Jun 2003 13:03:10 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 3 Jun 2003 13:03:10 -0000 Date: Tue, 3 Jun 2003 15:03:08 +0200 From: Adrian Bunk To: Margit Schubert-While , lksctp-developers@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: SCTP config 2.5.70(-bk) Message-ID: <20030603130308.GC27168@fs.tum.de> References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> User-Agent: Mutt/1.4.1i X-archive-position: 2841 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > CONFIG_IPV6_SCTP__ is always being set to "y" even though > not selected (CONFIG_IPV6 not set) First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't result in anything getting compiled. But besides, it seems a bit broken. From net/sctp/Kconfig: <-- snip --> ... config IPV6_SCTP__ tristate default y if IPV6=n default IPV6 if IPV6 config IP_SCTP tristate "The SCTP Protocol (EXPERIMENTAL)" depends on IPV6_SCTP__ ... <-- snip --> Semantically equivalent is the following for IPV6_SCTP__: config IPV6_SCTP__ tristate default y if IPV6=n || IPV6=y default m if IPV6=m If it was intended to disallow a static IP_SCTP with a modular IPV6 it doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and later compile and install a modular IPV6 for the same kernel. Could someone from the SCTP developers comment on the intentions behind IPV6_SCTP__ ? > Margit cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From gandalf@wlug.westbo.se Tue Jun 3 10:41:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 10:41:29 -0700 (PDT) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53HfE2x016462 for ; Tue, 3 Jun 2003 10:41:17 -0700 Received: by tux.rsn.bth.se (Postfix, from userid 501) id A13E036FED; Tue, 3 Jun 2003 19:41:11 +0200 (CEST) Subject: Re: fix TCP roundtrip time update code From: Martin Josefsson To: davidm@hpl.hp.com Cc: kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com In-Reply-To: <200306031552.h53FqknC023999@napali.hpl.hp.com> References: <200306031552.h53FqknC023999@napali.hpl.hp.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1054662070.701.6.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 03 Jun 2003 19:41:11 +0200 X-archive-position: 2842 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev (trimmed CC line and added netdev) On Tue, 2003-06-03 at 17:52, David Mosberger wrote: > One of those very-hard-to-track-down, trivial-to-fix kind of problems: > without this patch, TCP roundtrip time measurements will corrupt the > routing cache's RTT estimates under heavy network load (the bug causes > RTAX_RTT to go negative, but since its type is u32, you end up with a > huge positive value...). From there on, later TCP connections quickly > will go south. > > The typo was introduced 8 months ago in v1.29 of the file by the patch > entitled "Cleanup DST metrics and abstrct MSS/PMTU further". I tested this patch and it looks like it has cured my mysterious TCP stalls. without patch: cache mtu 1500 rtt 479411ms rttvar 953813ms cwnd 46 advmss 1460 I see that before and during the stall if not using this patch. (rtt is never above 20ms accoring to ping) With the patch I see normal rtt and rttvar times. Havn't seen a stall yet (~30 kernelcompiles with distcc over a sometimes congested link), will continue testing. > ===== net/ipv4/tcp_input.c 1.36 vs edited ===== > --- 1.36/net/ipv4/tcp_input.c Mon Apr 28 09:27:57 2003 > +++ edited/net/ipv4/tcp_input.c Tue Jun 3 08:19:36 2003 > @@ -556,8 +556,8 @@ > if (m >= dst_metric(dst, RTAX_RTTVAR)) > dst->metrics[RTAX_RTTVAR-1] = m; > else > - dst->metrics[RTAX_RTT-1] -= > - (dst->metrics[RTAX_RTT-1] - m)>>2; > + dst->metrics[RTAX_RTTVAR-1] -= > + (dst->metrics[RTAX_RTTVAR-1] - m)>>2; > } > > if (tp->snd_ssthresh >= 0xFFFF) { -- /Martin From garzik@gtf.org Tue Jun 3 10:59:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 10:59:48 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53HxM2x017048 for ; Tue, 3 Jun 2003 10:59:43 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 983EF6641; Tue, 3 Jun 2003 13:59:21 -0400 (EDT) Date: Tue, 3 Jun 2003 13:59:21 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Regarding SET_NETDEV_DEV Message-ID: <20030603175921.GE2079@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2843 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev For janitors and other developers placing this in net drivers... please don't :) This can be done in upper layers, accomplishing the same goal without changing the low-level net driver code at all. Jeff From davidm@napali.hpl.hp.com Tue Jun 3 11:45:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 11:45:51 -0700 (PDT) Received: from palrel12.hp.com (palrel12.hp.com [156.153.255.237]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53IjQ2x018031 for ; Tue, 3 Jun 2003 11:45:47 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel12.hp.com (Postfix) with ESMTP id 434C41C011B1; Tue, 3 Jun 2003 11:45:26 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h53IjOxV004188; Tue, 3 Jun 2003 11:45:25 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h53IjOrK025527; Tue, 3 Jun 2003 11:45:24 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h53IjOlS025523; Tue, 3 Jun 2003 11:45:24 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16092.60612.352739.581639@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 11:45:24 -0700 To: Martin Josefsson Cc: davidm@hpl.hp.com, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <1054662070.701.6.camel@tux.rsn.bth.se> References: <200306031552.h53FqknC023999@napali.hpl.hp.com> <1054662070.701.6.camel@tux.rsn.bth.se> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2844 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev >>>>> On 03 Jun 2003 19:41:11 +0200, Martin Josefsson said: Martin> (trimmed CC line and added netdev) On Tue, 2003-06-03 at Martin> 17:52, David Mosberger wrote: >> One of those very-hard-to-track-down, trivial-to-fix kind of >> problems: without this patch, TCP roundtrip time measurements >> will corrupt the routing cache's RTT estimates under heavy >> network load (the bug causes RTAX_RTT to go negative, but since >> its type is u32, you end up with a huge positive value...). From >> there on, later TCP connections quickly will go south. >> The typo was introduced 8 months ago in v1.29 of the file by the >> patch entitled "Cleanup DST metrics and abstrct MSS/PMTU >> further". Martin> I tested this patch and it looks like it has cured my Martin> mysterious TCP stalls. Yes, this sounds reasonable. I wasn't very clear on this point, but "by going south" I meant that TCP is starting to misbehave. In particular, you'll likely end up with the kernel aborting ESTABLISHED TCP connections with extreme prejudice (and in violation of the TCP protocol), because it thought that it had been unable to communicate with the remote end for a _very_ long time. The net effect typically is that you end up with one end having a connection that's in the ESTABLISHED state and the other end having no trace of that connection. --david From scott.feldman@intel.com Tue Jun 3 13:01:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 13:01:57 -0700 (PDT) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53K1X2x023590 for ; Tue, 3 Jun 2003 13:01:54 -0700 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by caduceus.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h53JrfA01302 for ; Tue, 3 Jun 2003 19:53:41 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by petasus.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h53JssK05729 for ; Tue, 3 Jun 2003 19:54:54 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003060313013015442 ; Tue, 03 Jun 2003 13:01:30 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 3 Jun 2003 13:01:30 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: [PATCH] fix use after free in e100 Date: Tue, 3 Jun 2003 13:01:29 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] fix use after free in e100 Thread-Index: AcMokjx3yixGSrs6TK+86kV3QlMsXwBeFFag From: "Feldman, Scott" To: "Martin Josefsson" Cc: X-OriginalArrivalTime: 03 Jun 2003 20:01:30.0686 (UTC) FILETIME=[ECC4B5E0:01C32A0A] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h53K1X2x023590 X-archive-position: 2845 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > Here's a fix for a use-after-free in the e100 driver. > You can't touch the skb after a call to netif_rx(), it might > have been free'd. Caught with Manfred's unmap-page-debugging > patch in -mm. Thanks Martin. We'll pick this patch up in our dev driver and propagate the change from there. -scott From jgrimm2@us.ibm.com Tue Jun 3 14:09:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 14:09:39 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53L8v2x025658 for ; Tue, 3 Jun 2003 14:09:30 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h53L6AuT258362; Tue, 3 Jun 2003 17:06:10 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h53L66h7148720; Tue, 3 Jun 2003 15:06:07 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h53L65sQ056046; Tue, 3 Jun 2003 16:06:05 -0500 Received: from us.ibm.com (sig-9-65-53-56.mts.ibm.com [9.65.53.56]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id QAA20796; Tue, 3 Jun 2003 16:06:04 -0500 Message-ID: <3EDD0DFC.4080806@us.ibm.com> Date: Tue, 03 Jun 2003 16:07:08 -0500 From: Jon Grimm Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adrian Bunk CC: Margit Schubert-While , lksctp-developers@lists.sourceforge.net, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: SCTP config 2.5.70(-bk) References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> <20030603130308.GC27168@fs.tum.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2846 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgrimm2@us.ibm.com Precedence: bulk X-list: netdev Hi Adrian, Sorry for a bit of delay... We are away at an SCTP Interoperability event. Adrian Bunk wrote: > On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > > >>CONFIG_IPV6_SCTP__ is always being set to "y" even though >>not selected (CONFIG_IPV6 not set) > > > First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't > result in anything getting compiled. > > But besides, it seems a bit broken. > > From net/sctp/Kconfig: > > <-- snip --> > > ... > > config IPV6_SCTP__ > tristate > default y if IPV6=n > default IPV6 if IPV6 > > config IP_SCTP > tristate "The SCTP Protocol (EXPERIMENTAL)" > depends on IPV6_SCTP__ > ... > > <-- snip --> > > > Semantically equivalent is the following for IPV6_SCTP__: > > config IPV6_SCTP__ > tristate > default y if IPV6=n || IPV6=y > default m if IPV6=m > > > If it was intended to disallow a static IP_SCTP with a modular IPV6 it > doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and > later compile and install a modular IPV6 for the same kernel. > Are you sure? I vaguely remember one of the network structs having #ifdef'd fields for v6. Consequently, if one compiles first without, but the tries later compiles/loads ipv6... bad things happen as the kernel has a different concept of what the sock is. > > Could someone from the SCTP developers comment on the intentions behind > IPV6_SCTP__ ? > Yes. The intent was to at least discourage a configuration that will segfault. Thanks, jon > > >>Margit > > > cu > Adrian > From jmorris@intercode.com.au Tue Jun 3 17:26:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:26:46 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:m/3FwucN+ZbRkqvuBNbRehdxIuxqTgbZ@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540QJ2x031482 for ; Tue, 3 Jun 2003 17:26:41 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h540OEr26556; Wed, 4 Jun 2003 10:24:15 +1000 Date: Wed, 4 Jun 2003 10:24:14 +1000 (EST) From: James Morris To: davidm@hpl.hp.com cc: Martin Josefsson , , , , , "David S. Miller" Subject: Re: fix TCP roundtrip time update code In-Reply-To: <16092.60612.352739.581639@napali.hpl.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2847 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 3 Jun 2003, David Mosberger wrote: > Martin> I tested this patch and it looks like it has cured my > Martin> mysterious TCP stalls. > > Yes, this sounds reasonable. I wasn't very clear on this point, but > "by going south" I meant that TCP is starting to misbehave. In > particular, you'll likely end up with the kernel aborting ESTABLISHED > TCP connections with extreme prejudice (and in violation of the TCP > protocol), because it thought that it had been unable to communicate > with the remote end for a _very_ long time. The net effect typically > is that you end up with one end having a connection that's in the > ESTABLISHED state and the other end having no trace of that > connection. David, This might be the solution to one of the 'must-fix' bugs for the networking, which nobody so far was quite able to track down. - James -- James Morris From yoshfuji@linux-ipv6.org Tue Jun 3 17:39:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:39:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540cr2x031928 for ; Tue, 3 Jun 2003 17:39:14 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h540diBo003194; Wed, 4 Jun 2003 09:39:44 +0900 Date: Wed, 04 Jun 2003 09:39:44 +0900 (JST) Message-Id: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: Ville Nuorvala , netdev@oss.sgi.com Subject: [PATCH] IPV6: Sereral errors on udpv6_connect() From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2848 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. The CONFIG_IPV6_SUBTREE contains multiple fixes and changes. I'm trying to split them. This patch fixes multiple errors in udpv6_connect(). - pointer within an automatic storage class variable fl was illegally cached using ip6_dst_store(). - uninitialized saddr was copied to fl.fl6_src. - don't cache if ipv6_saddr_get() failed. Patch is based on CONFIG_IPV6_SUBTREE patch from Ville Nuorvala . Index: linux25-LINUS/net/ipv6/udp.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/udp.c,v retrieving revision 1.1.1.18 diff -u -r1.1.1.18 udp.c --- linux25-LINUS/net/ipv6/udp.c 26 May 2003 08:04:11 -0000 1.1.1.18 +++ linux25-LINUS/net/ipv6/udp.c 4 Jun 2003 00:29:32 -0000 @@ -254,7 +254,6 @@ struct inet_opt *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *daddr; - struct in6_addr saddr; struct dst_entry *dst; struct flowi fl; struct ip6_flowlabel *flowlabel = NULL; @@ -355,7 +354,7 @@ fl.proto = IPPROTO_UDP; ipv6_addr_copy(&fl.fl6_dst, &np->daddr); - ipv6_addr_copy(&fl.fl6_src, &saddr); + ipv6_addr_copy(&fl.fl6_src, &np->saddr); fl.oif = sk->bound_dev_if; fl.fl_ip_dport = inet->dport; fl.fl_ip_sport = inet->sport; @@ -381,20 +380,23 @@ return err; } - ip6_dst_store(sk, dst, &fl.fl6_dst); - /* get the source address used in the appropriate device */ - err = ipv6_get_saddr(dst, daddr, &saddr); + err = ipv6_get_saddr(dst, daddr, &fl.fl6_src); if (err == 0) { if (ipv6_addr_any(&np->saddr)) - ipv6_addr_copy(&np->saddr, &saddr); + ipv6_addr_copy(&np->saddr, &fl.fl6_src); if (ipv6_addr_any(&np->rcv_saddr)) { - ipv6_addr_copy(&np->rcv_saddr, &saddr); + ipv6_addr_copy(&np->rcv_saddr, &fl.fl6_src); inet->rcv_saddr = LOOPBACK4_IPV6; } + + ip6_dst_store(sk, dst, + !ipv6_addr_cmp(&fl.fl6_dst, &np->daddr) ? + &np->daddr : NULL); + sk->state = TCP_ESTABLISHED; } fl6_sock_release(flowlabel); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From kuznet@ms2.inr.ac.ru Tue Jun 3 17:46:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:46:20 -0700 (PDT) Received: from dub.inr.ac.ru (dub.inr.ac.ru [193.233.7.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540jp2x032444 for ; Tue, 3 Jun 2003 17:46:13 -0700 Received: (from kuznet@localhost) by dub.inr.ac.ru (8.6.13/ANK) id EAA24505; Wed, 4 Jun 2003 04:43:22 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200306040043.EAA24505@dub.inr.ac.ru> Subject: Re: fix TCP roundtrip time update code To: jmorris@intercode.com.au (James Morris) Date: Wed, 4 Jun 2003 04:43:22 +0400 (MSD) Cc: davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, davem@redhat.com, akpm@digeo.com In-Reply-To: from "James Morris" at Jun 04, 2003 10:24:14 AM X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2849 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > This might be the solution to one of the 'must-fix' bugs for the > networking, which nobody so far was quite able to track down. No doubts. All the symptoms are explained by this. I hope Andrew will confirm that the problem has gone. Alexey From niv@us.ibm.com Tue Jun 3 19:08:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 19:09:02 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5428Q2x001531 for ; Tue, 3 Jun 2003 19:08:54 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5427HuT170244; Tue, 3 Jun 2003 22:07:17 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5427Eh7083078; Tue, 3 Jun 2003 20:07:15 -0600 Message-ID: <3EDD52F5.8090706@us.ibm.com> Date: Tue, 03 Jun 2003 19:01:25 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: James Morris , davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, davem@redhat.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> In-Reply-To: <200306040043.EAA24505@dub.inr.ac.ru> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2850 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev kuznet@ms2.inr.ac.ru wrote: > No doubts. All the symptoms are explained by this. I hope Andrew > will confirm that the problem has gone. Yep, great catch! But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). thanks, Nivedita From jmorris@intercode.com.au Tue Jun 3 19:29:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 19:29:37 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:5XN7XGMIBVp5EAxqx0JpHmruTuzfpsp9@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h542T52x002010 for ; Tue, 3 Jun 2003 19:29:27 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h542Sor27032; Wed, 4 Jun 2003 12:28:51 +1000 Date: Wed, 4 Jun 2003 12:28:50 +1000 (EST) From: James Morris To: "David S. Miller" cc: Andrew Morton , Subject: [PATCH] Use new kconfig 'select' for networking crypto Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2851 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev The patch below against recent bk uses the new 'select' feature of kconfig to configure crypto features for ipsec and ipv6 privacy extensions. This should solve a lot of the build problems people have been having, and it also enables the crypto submenu (which previously did not work). The sctp folk may also want to look at this scheme for their stuff. - James -- James Morris diff -urN -X dontdiff bk.pending/crypto/Kconfig bk.w1/crypto/Kconfig --- bk.pending/crypto/Kconfig 2003-06-04 11:41:26.000000000 +1000 +++ bk.w1/crypto/Kconfig 2003-06-04 12:28:36.234711904 +1000 @@ -6,16 +6,12 @@ config CRYPTO bool "Cryptographic API" - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m || INET6_IPCOMP=y || INET6_IPCOMP=m || IPV6_PRIVACY=y help This option provides the core Cryptographic API. config CRYPTO_HMAC bool "HMAC support" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m help HMAC: Keyed-Hashing for Message Authentication (RFC2104). This is required for IPSec. @@ -35,16 +31,12 @@ config CRYPTO_MD5 tristate "MD5 digest algorithm" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m || IPV6_PRIVACY=y help MD5 message digest algorithm (RFC1321). config CRYPTO_SHA1 tristate "SHA1 digest algorithm" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m help SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2). @@ -72,7 +64,6 @@ config CRYPTO_DES tristate "DES and Triple DES EDE cipher algorithms" depends on CRYPTO - default y if INET_ESP=y || INET_ESP=m || INET6_ESP=y || INET6_ESP=m help DES cipher algorithm (FIPS 46-2), and Triple DES EDE (FIPS 46-3). @@ -138,7 +129,6 @@ config CRYPTO_DEFLATE tristate "Deflate compression algorithm" depends on CRYPTO - default y if INET_IPCOMP=y || INET_IPCOMP=m || INET6_IPCOMP=y || INET6_IPCOMP=m help This is the Deflate algorithm (RFC1951), specified for use in IPSec with the IPCOMP protocol (RFC3173, RFC2394). diff -urN -X dontdiff bk.pending/net/ipv4/Kconfig bk.w1/net/ipv4/Kconfig --- bk.pending/net/ipv4/Kconfig 2003-06-04 11:42:08.000000000 +1000 +++ bk.w1/net/ipv4/Kconfig 2003-06-04 12:24:06.752679400 +1000 @@ -343,6 +343,10 @@ config INET_AH tristate "IP: AH transformation" + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 ---help--- Support for IPsec AH. @@ -350,6 +354,11 @@ config INET_ESP tristate "IP: ESP transformation" + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 + select CRYPTO_DES ---help--- Support for IPsec ESP. @@ -357,6 +366,8 @@ config INET_IPCOMP tristate "IP: IPComp transformation" + select CRYPTO + select CRYPTO_DEFLATE ---help--- Support for IP Paylod Compression (RFC3173), typically needed for IPsec. diff -urN -X dontdiff bk.pending/net/ipv6/Kconfig bk.w1/net/ipv6/Kconfig --- bk.pending/net/ipv6/Kconfig 2003-06-04 11:42:09.000000000 +1000 +++ bk.w1/net/ipv6/Kconfig 2003-06-04 12:24:05.242908920 +1000 @@ -4,6 +4,8 @@ config IPV6_PRIVACY bool "IPv6: Privacy Extensions (RFC 3041) support" depends on IPV6 + select CRYPTO + select CRYPTO_MD5 ---help--- Privacy Extensions for Stateless Address Autoconfiguration in IPv6 support. With this option, additional periodically-alter @@ -20,6 +22,10 @@ config INET6_AH tristate "IPv6: AH transformation" depends on IPV6 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 ---help--- Support for IPsec AH. @@ -28,6 +34,11 @@ config INET6_ESP tristate "IPv6: ESP transformation" depends on IPV6 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 + select CRYPTO_DES ---help--- Support for IPsec ESP. @@ -36,6 +47,8 @@ config INET6_IPCOMP tristate "IPv6: IPComp transformation" depends on IPV6 + select CRYPTO + select CRYPTO_DEFLATE ---help--- Support for IP Paylod Compression (RFC3173), typically needed for IPsec. From davem@redhat.com Tue Jun 3 20:11:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:11:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543Bo2x002750 for ; Tue, 3 Jun 2003 20:11:51 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29047; Tue, 3 Jun 2003 20:09:45 -0700 Date: Tue, 03 Jun 2003 20:09:44 -0700 (PDT) Message-Id: <20030603.200944.78736971.davem@redhat.com> To: jgarzik@pobox.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV From: "David S. Miller" In-Reply-To: <20030603175921.GE2079@gtf.org> References: <20030603175921.GE2079@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2852 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 3 Jun 2003 13:59:21 -0400 For janitors and other developers placing this in net drivers... please don't :) This can be done in upper layers, accomplishing the same goal without changing the low-level net driver code at all. Don't say something can be done without showing exactly how :-) How does register_netdevice() know that the device is "whatever" and where to get the generic device struct from? From davem@redhat.com Tue Jun 3 20:26:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:26:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543QH2x006401 for ; Tue, 3 Jun 2003 20:26:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29104; Tue, 3 Jun 2003 20:23:21 -0700 Date: Tue, 03 Jun 2003 20:23:20 -0700 (PDT) Message-Id: <20030603.202320.59680883.davem@redhat.com> To: niv@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <3EDD52F5.8090706@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2853 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Tue, 03 Jun 2003 19:01:25 -0700 But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). I think this means nothing, and that you can infer nothing from such results. My understanding is that the problem case triggers only when a timeout based retransmit occurs. On LAN this tends to be extremely rare. Although under enough traffic load it can occur. So if your old SpecWEB99 lab tended more to trigger timeout based retransmits on LAN, and your new test network does not, then your new test network will tend to not reproduce the bug regardless of whether the bug is present in the kernel or not :-) From davem@redhat.com Tue Jun 3 20:48:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:48:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543mI2x007129 for ; Tue, 3 Jun 2003 20:48:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29169; Tue, 3 Jun 2003 20:46:12 -0700 Date: Tue, 03 Jun 2003 20:46:12 -0700 (PDT) Message-Id: <20030603.204612.48501825.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV From: "David S. Miller" In-Reply-To: <3EDD6B51.9070909@osdl.org> References: <20030603175921.GE2079@gtf.org> <20030603.200944.78736971.davem@redhat.com> <3EDD6B51.9070909@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2854 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 03 Jun 2003 20:45:21 -0700 There are enough PCI network devices, that something like alloc_pci_etherdev might be a good future idea. What is sos special about PCI? :-) In this light, alloc_device_etherdev() seems much more appropriate. But we can play this game AD_INFINITUM, for each and every paramter that is common across a class of ethernet devices. At what point do you stop? :-) From davidm@napali.hpl.hp.com Tue Jun 3 21:36:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:36:41 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544aA2x012022 for ; Tue, 3 Jun 2003 21:36:31 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel11.hp.com (Postfix) with ESMTP id EE4C71C01831; Tue, 3 Jun 2003 21:36:09 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h544a4xV008846; Tue, 3 Jun 2003 21:36:04 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h544a3rK029849; Tue, 3 Jun 2003 21:36:03 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h544ZtI4029842; Tue, 3 Jun 2003 21:35:55 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.30507.661714.676184@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 21:35:55 -0700 To: "David S. Miller" Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <20030603.202320.59680883.davem@redhat.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2855 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev >>>>> On Tue, 03 Jun 2003 20:23:20 -0700 (PDT), "David S. Miller" said: DaveM> From: Nivedita Singhvi Date: Tue, 03 Jun DaveM> 2003 19:01:25 -0700 DaveM> But, FYI, DaveM and Alexey, we tried reproducing the DaveM> stalls we (Dave Hansen, Troy Wilson) had seen during DaveM> SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same DaveM> config, etc). So its possible our hang/stalls were some other DaveM> issue that got silently fixed (or more likely, possibly the DaveM> same thing but other changes minimized us running into the DaveM> problem). DaveM> I think this means nothing, and that you can infer nothing DaveM> from such results. DaveM> My understanding is that the problem case triggers only when DaveM> a timeout based retransmit occurs. On LAN this tends to be DaveM> extremely rare. Although under enough traffic load it can DaveM> occur. DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) --david From davem@redhat.com Tue Jun 3 21:39:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:39:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544cf2x012387 for ; Tue, 3 Jun 2003 21:39:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29347; Tue, 3 Jun 2003 21:34:59 -0700 Date: Tue, 03 Jun 2003 21:34:58 -0700 (PDT) Message-Id: <20030603.213458.112594590.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2856 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Fri, 30 May 2003 18:00:55 +0300 (EEST) The patch is sent as an attachment to this mail, but is also available at: You need to fix some things before I will apply this: 1) Bogus #ifdef CONFIG_IPV6_TUNNEL_MODULE. You need not this test around things like MODULE_AUTHOR() and stuff like that, linux/module.h does that for you. 2) Dependency upon subtrees patch, please remove it. There is no agreement on that semantic change to how subtrees work. Thanks. From davem@redhat.com Tue Jun 3 21:42:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:42:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544gB2x012749 for ; Tue, 3 Jun 2003 21:42:32 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29371; Tue, 3 Jun 2003 21:38:30 -0700 Date: Tue, 03 Jun 2003 21:38:30 -0700 (PDT) Message-Id: <20030603.213830.85382657.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2857 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Sat, 31 May 2003 00:38:58 +0900 (JST) In article (at Fri, 30 May 2003 18:00:55 +0300 (EEST)), Ville Nuorvala says: > The tunnels are needed by MIPv6 for encapsulation and decapsulation of > tunneled packets between the home agent and mobile node. Some proctocols > like DHCP are also run over the virtual link between the MN and the home > network according to the MIPv6 specification. I'm not sure if MIP6 will use this tunnel driver. Yes, it is an important issue. I am VERY UPSET that there appears to be NO dialogue between USAGI and MIPV6 folks to discuss design of MIPV6. If you do not talk together, how can you guys possibly coordinate efforts and not avoid duplicated work? And, it is very clear from my perspective that it is the MIPV6 developers who are not communicating. USAGI are making an effort to discuss the issues, but MIPV6 coders disappear for weeks at a time not answering queries made to them or comments made about their patch submissions. That is unacceptable. And this makes me less likely to apply any patches from MIPV6 project, here is why. If some bug shows in some patch I apply from MIPV6 project, can I expect them to act similarly and not respond for weeks at a time? That's intolerable. If you add some bug to the tree, you are responsible to be responsive and fix the problem in a reasonable amount of time. From niv@us.ibm.com Tue Jun 3 21:47:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:47:39 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544lE2x013135 for ; Tue, 3 Jun 2003 21:47:35 -0700 Received: from westrelay05.boulder.ibm.com (westrelay05.boulder.ibm.com [9.17.193.33]) by e32.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h544kCkc135630; Wed, 4 Jun 2003 00:46:12 -0400 Received: from us.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay05.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h544k7qD047262; Tue, 3 Jun 2003 22:46:08 -0600 Message-ID: <3EDD7832.7010804@us.ibm.com> Date: Tue, 03 Jun 2003 21:40:18 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: davidm@hpl.hp.com CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> In-Reply-To: <16093.30507.661714.676184@napali.hpl.hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2858 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David Mosberger wrote: > DaveM> So if your old SpecWEB99 lab tended more to trigger timeout > DaveM> based retransmits on LAN, and your new test network does not, > DaveM> then your new test network will tend to not reproduce the bug > DaveM> regardless of whether the bug is present in the kernel or not > DaveM> :-) > > Is this where I get to plug httperf? It triggered the bug reliably in > less than 10 secs. ;-) Tarnation!! Ran httperf! Didnt hit it! :(. What were your settings? I extracted an old debug patch to implement dropping of packets - have a sysctl that controls the rate at which I can drop IP packets, so can also generate any kind of packet loss..So thought I would bang away with netperf using sendfile()/TCP_CORK. Thought it was in that code path. Will be running tests tmrw and the rest of this week on 2.5.70 +- patch. Will see if I can provoke any further hangs, stalls, wackiness of any flavor... thanks, Nivedita From davem@redhat.com Tue Jun 3 21:51:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:51:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544pT2x013483 for ; Tue, 3 Jun 2003 21:51:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29427; Tue, 3 Jun 2003 21:47:30 -0700 Date: Tue, 03 Jun 2003 21:47:30 -0700 (PDT) Message-Id: <20030603.214730.08347437.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.30507.661714.676184@napali.hpl.hp.com> References: <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2859 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Mosberger Date: Tue, 3 Jun 2003 21:35:55 -0700 Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) distcc was a reliable test case too... From davem@redhat.com Tue Jun 3 22:15:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:15:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545FJ2x014294 for ; Tue, 3 Jun 2003 22:15:20 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29530; Tue, 3 Jun 2003 22:13:13 -0700 Date: Tue, 03 Jun 2003 22:13:13 -0700 (PDT) Message-Id: <20030603.221313.70195889.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] move dmascc away from setup.c From: "David S. Miller" In-Reply-To: <20030603082326.GA22946@lst.de> References: <20030603082326.GA22946@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2860 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Tue, 3 Jun 2003 10:23:26 +0200 Yeah, it's a isa driver but it already was in the setup.c pci probes list. Also use SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT. Applied, thanks. From davidm@napali.hpl.hp.com Tue Jun 3 22:34:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:34:41 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545Ya2x015236 for ; Tue, 3 Jun 2003 22:34:36 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel13.hp.com (Postfix) with ESMTP id D6C341C00F6F; Tue, 3 Jun 2003 22:34:35 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h545YYxV012939; Tue, 3 Jun 2003 22:34:35 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h545YYrK030279; Tue, 3 Jun 2003 22:34:34 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h545YUxT030275; Tue, 3 Jun 2003 22:34:30 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.34022.445246.52398@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 22:34:30 -0700 To: Nivedita Singhvi Cc: davidm@hpl.hp.com, "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <3EDD7832.7010804@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2862 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 1337 Lines: 34 >>>>> On Tue, 03 Jun 2003 21:40:18 -0700, Nivedita Singhvi said: Nivedita> David Mosberger wrote: DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) >> Is this where I get to plug httperf? It triggered the bug >> reliably in less than 10 secs. ;-) Nivedita> Tarnation!! Ran httperf! Didnt hit it! :(. What were your Nivedita> settings? I used: $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ --uri pathto30KBfile on 3 clients (for a total of 3000 conns/sec). You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). This load worked well for a machine with a single GigE card. All network tunables were on the default setting (in particular, the tx queue len was 300, which is were the losses came from). With this load, I saw bad RTT values in the route cache within a couple of seconds after starting the third httperf generator. It then took a bit longer (on the order of 1-2 minutes) until the first TCPAbortFailed errors started to pop up. --david From davem@redhat.com Tue Jun 3 22:50:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:50:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545oE2x015874 for ; Tue, 3 Jun 2003 22:50:35 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29711; Tue, 3 Jun 2003 22:46:58 -0700 Date: Tue, 03 Jun 2003 22:46:57 -0700 (PDT) Message-Id: <20030603.224657.116381839.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: vnuorval@tcs.hut.fi, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: Sereral errors on udpv6_connect() From: "David S. Miller" In-Reply-To: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> References: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2863 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 965 Lines: 23 From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Wed, 04 Jun 2003 09:39:44 +0900 (JST) This patch fixes multiple errors in udpv6_connect(). - pointer within an automatic storage class variable fl was illegally cached using ip6_dst_store(). - uninitialized saddr was copied to fl.fl6_src. - don't cache if ipv6_saddr_get() failed. Applied. All these kinds of things need to be done differently once routing by saddr is supported, more specifically when route6 lookups make source address selection. Look at ipv4 side to see the kind of thing I'm talking about. Yoshfuji-san, remember when Alexey wanted you to change your source address selection so that it occurred at routing layer? This is exactly what I'm talking about. In my view, ipv6 routing is merely a SEVERELY crippled version of ipv4 routing. Most of ipv6 routing changes needed amount to merely "porting over" existing ipv4 routing features. From davem@redhat.com Tue Jun 3 22:52:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:52:03 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545pf2x016007 for ; Tue, 3 Jun 2003 22:52:00 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29724; Tue, 3 Jun 2003 22:49:25 -0700 Date: Tue, 03 Jun 2003 22:49:25 -0700 (PDT) Message-Id: <20030603.224925.68063710.davem@redhat.com> To: jmorris@intercode.com.au Cc: akpm@digeo.com, netdev@oss.sgi.com Subject: Re: [PATCH] Use new kconfig 'select' for networking crypto From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2864 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 435 Lines: 13 From: James Morris Date: Wed, 4 Jun 2003 12:28:50 +1000 (EST) The patch below against recent bk uses the new 'select' feature of kconfig to configure crypto features for ipsec and ipv6 privacy extensions. This should solve a lot of the build problems people have been having, and it also enables the crypto submenu (which previously did not work). Applied, thanks a lot James. From davem@redhat.com Tue Jun 3 22:56:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:56:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545ul2x016588 for ; Tue, 3 Jun 2003 22:56:48 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29745; Tue, 3 Jun 2003 22:52:45 -0700 Date: Tue, 03 Jun 2003 22:52:45 -0700 (PDT) Message-Id: <20030603.225245.55753285.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.34022.445246.52398@napali.hpl.hp.com> References: <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2865 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 372 Lines: 10 From: David Mosberger Date: Tue, 3 Jun 2003 22:34:30 -0700 You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle It should eliminate this limit. Unfortunately we can't enable this by default because of NAT :( From yoshfuji@linux-ipv6.org Tue Jun 3 23:01:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:01:32 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5461Q2x016967 for ; Tue, 3 Jun 2003 23:01:28 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5462JBo004641; Wed, 4 Jun 2003 15:02:19 +0900 Date: Wed, 04 Jun 2003 15:02:18 +0900 (JST) Message-Id: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, Ville Nuorvala Subject: [PATCH] IPV6: typo, unrequired #undef and killing warning From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2866 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 1051 Lines: 38 Hello. - no need to #undef CONFIG_IPV6_SUBTREE - use braces around "&" and "|". - fib_repair_tree() is typo. Thanks. Index: linux25-LINUS/net/ipv6/ip6_fib.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_fib.c,v retrieving revision 1.1.1.12 diff -u -r1.1.1.12 ip6_fib.c --- linux25-LINUS/net/ipv6/ip6_fib.c 26 May 2003 08:04:11 -0000 1.1.1.12 +++ linux25-LINUS/net/ipv6/ip6_fib.c 4 Jun 2003 05:39:49 -0000 @@ -40,7 +40,6 @@ #include #define RT6_DEBUG 2 -#undef CONFIG_IPV6_SUBTREES #if RT6_DEBUG >= 3 #define RT6_TRACE(x...) printk(KERN_DEBUG x) @@ -594,8 +593,8 @@ is orphan. If it is, shoot it. */ st_failure: - if (fn && !(fn->fn_flags&RTN_RTINFO|RTN_ROOT)) - fib_repair_tree(fn); + if (fn && !(fn->fn_flags&(RTN_RTINFO|RTN_ROOT))) + fib6_repair_tree(fn); dst_free(&rt->u.dst); return err; #endif -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From hch@infradead.org Tue Jun 3 23:08:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:08:12 -0700 (PDT) Received: from phoenix.infradead.org (phoenix.mvhi.com [195.224.96.167]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546872x017348 for ; Tue, 3 Jun 2003 23:08:08 -0700 Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 19NRRK-000248-00; Wed, 04 Jun 2003 07:08:02 +0100 Date: Wed, 4 Jun 2003 07:08:01 +0100 From: Christoph Hellwig To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" Cc: davem@redhat.com, netdev@oss.sgi.com, Ville Nuorvala Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning Message-ID: <20030604070801.A7938@infradead.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030604.150218.69810413.yoshfuji@linux-ipv6.org>; from yoshfuji@linux-ipv6.org on Wed, Jun 04, 2003 at 03:02:18PM +0900 X-archive-position: 2867 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 340 Lines: 10 On Wed, Jun 04, 2003 at 03:02:18PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > st_failure: > - if (fn && !(fn->fn_flags&RTN_RTINFO|RTN_ROOT)) > - fib_repair_tree(fn); > + if (fn && !(fn->fn_flags&(RTN_RTINFO|RTN_ROOT))) This still is not the right codingstyle :) it should be if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) From davem@redhat.com Tue Jun 3 23:10:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:10:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546AD2x017675 for ; Tue, 3 Jun 2003 23:10:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29792; Tue, 3 Jun 2003 23:08:04 -0700 Date: Tue, 03 Jun 2003 23:08:03 -0700 (PDT) Message-Id: <20030603.230803.10324588.davem@redhat.com> To: hch@infradead.org Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, vnuorval@tcs.hut.fi Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: "David S. Miller" In-Reply-To: <20030604070801.A7938@infradead.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> <20030604070801.A7938@infradead.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2868 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 282 Lines: 9 From: Christoph Hellwig Date: Wed, 4 Jun 2003 07:08:01 +0100 This still is not the right codingstyle :) it should be if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) I'll take care of this, Yoshfuji you do not need to make a new patch :) From davidm@napali.hpl.hp.com Tue Jun 3 23:12:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:12:53 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546Cm2x017993 for ; Tue, 3 Jun 2003 23:12:49 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel10.hp.com (Postfix) with ESMTP id 7429C1C01411; Tue, 3 Jun 2003 23:12:48 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h546ClsV017659; Tue, 3 Jun 2003 23:12:47 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h546ClrK030657; Tue, 3 Jun 2003 23:12:47 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h546Cljk030653; Tue, 3 Jun 2003 23:12:47 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.36319.412668.87363@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 23:12:47 -0700 To: "David S. Miller" Cc: davidm@hpl.hp.com, davidm@napali.hpl.hp.com, niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <20030603.225245.55753285.davem@redhat.com> References: <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> <20030603.225245.55753285.davem@redhat.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2869 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 632 Lines: 18 >>>>> On Tue, 03 Jun 2003 22:52:45 -0700 (PDT), "David S. Miller" said: David> From: David Mosberger Date: David> Tue, 3 Jun 2003 22:34:30 -0700 David> You can't go higher than 1000 conn/sec per client (IP David> address) because otherwise you run out of port space (due to David> TIME_WAIT). DaveM> echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle DaveM> It should eliminate this limit. Unfortunately we can't DaveM> enable this by default because of NAT :( Ah, yes, provided PAWS is enabled, this would give you a time_wait timeout of 3.5*RTO. Nice. --david From niv@us.ibm.com Tue Jun 3 23:14:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:14:57 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546Eq2x018362 for ; Tue, 3 Jun 2003 23:14:53 -0700 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5469ruT256040; Wed, 4 Jun 2003 02:09:53 -0400 Received: from us.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay01.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5469pNQ224046; Wed, 4 Jun 2003 00:09:51 -0600 Message-ID: <3EDD8BD2.9040008@us.ibm.com> Date: Tue, 03 Jun 2003 23:04:02 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: davidm@hpl.hp.com CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> In-Reply-To: <16093.34022.445246.52398@napali.hpl.hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2870 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1180 Lines: 34 David Mosberger wrote: > $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ > --uri pathto30KBfile Hmm, ditto, except I was way down at --rate 300 (was seeing client errors of fd-unavail). Have ulimited upwards but am still seeing them.. > on 3 clients (for a total of 3000 conns/sec). You can't go higher > than 1000 conn/sec per client (IP address) because otherwise you run > out of port space (due to TIME_WAIT). You can hike /proc/sys/net/ipv4/tcp_tw_recycle for that. > This load worked well for a machine with a single GigE card. All > network tunables were on the default setting (in particular, the tx > queue len was 300, which is were the losses came from). > > With this load, I saw bad RTT values in the route cache within a > couple of seconds after starting the third httperf generator. It then > took a bit longer (on the order of 1-2 minutes) until the first > TCPAbortFailed errors started to pop up I saw a few AbortOnTimeouts, but no AbortFailed counts. Those should be TCPAbortOnTimeout counts, rather than TCPAbortFailed errors, I would expect? Why AbortFailed? Coming from IP via tcp_transmit_skb()? thanks, Nivedita From davidm@napali.hpl.hp.com Tue Jun 3 23:19:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:19:56 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546JW2x018731 for ; Tue, 3 Jun 2003 23:19:52 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel10.hp.com (Postfix) with ESMTP id 4011F1C01522; Tue, 3 Jun 2003 23:19:32 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h546JVsV018240; Tue, 3 Jun 2003 23:19:31 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h546JVrK030722; Tue, 3 Jun 2003 23:19:31 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h546JVpQ030718; Tue, 3 Jun 2003 23:19:31 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.36723.418623.698303@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 23:19:31 -0700 To: Nivedita Singhvi Cc: davidm@hpl.hp.com, "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <3EDD8BD2.9040008@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> <3EDD8BD2.9040008@us.ibm.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2871 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 505 Lines: 12 >>>>> On Tue, 03 Jun 2003 23:04:02 -0700, Nivedita Singhvi said: Nivedita> Those should be TCPAbortOnTimeout counts, rather than Nivedita> TCPAbortFailed errors, I would expect? Why AbortFailed? Nivedita> Coming from IP via tcp_transmit_skb()? Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; the TCPAbortFailed errors were indicating that tcp_transmit_skb() had failed, i.e., the tx queue was overrun (that's were the losses came from). --david From davem@redhat.com Wed Jun 4 00:50:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:50:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547nk2x021923 for ; Wed, 4 Jun 2003 00:50:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA30035; Wed, 4 Jun 2003 00:47:39 -0700 Date: Wed, 04 Jun 2003 00:47:38 -0700 (PDT) Message-Id: <20030604.004738.26506541.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, vnuorval@tcs.hut.fi Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: "David S. Miller" In-Reply-To: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2872 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 299 Lines: 11 From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Wed, 04 Jun 2003 15:02:18 +0900 (JST) - use braces around "&" and "|". You mean "parentheses", braces define basic block scope in the C language, parentheses group expressions :-) Patch applied, thank you :-) From davem@redhat.com Wed Jun 4 00:56:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:56:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547tt2x022378 for ; Wed, 4 Jun 2003 00:56:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA30052; Wed, 4 Jun 2003 00:51:46 -0700 Date: Wed, 04 Jun 2003 00:51:45 -0700 (PDT) Message-Id: <20030604.005145.98890243.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.36723.418623.698303@napali.hpl.hp.com> References: <16093.34022.445246.52398@napali.hpl.hp.com> <3EDD8BD2.9040008@us.ibm.com> <16093.36723.418623.698303@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2873 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 551 Lines: 14 From: David Mosberger Date: Tue, 3 Jun 2003 23:19:31 -0700 Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; This is correct. And it is the reason the connection dies silently. Because such write timeouts invoke tcp_done() which closes the connection off silently. This is correct behavior (sans the RTT bug David fixed of course :)) because a host which hasn't responded at all from so many repeated retransmission attempts isn't likely to get any reset we send either :) From jgarzik@pobox.com Wed Jun 4 00:57:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:57:29 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547v32x022552 for ; Wed, 4 Jun 2003 00:57:24 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19NOwK-0007q1-1G; Wed, 04 Jun 2003 04:27:52 +0100 Message-ID: <3EDD672C.2000701@pobox.com> Date: Tue, 03 Jun 2003 23:27:40 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV References: <20030603175921.GE2079@gtf.org> <20030603.200944.78736971.davem@redhat.com> In-Reply-To: <20030603.200944.78736971.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2874 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 629 Lines: 22 David S. Miller wrote: > From: Jeff Garzik > Date: Tue, 3 Jun 2003 13:59:21 -0400 > > For janitors and other developers placing this in net drivers... > please don't :) This can be done in upper layers, accomplishing the > same goal without changing the low-level net driver code at all. > > Don't say something can be done without showing exactly > how :-) > > How does register_netdevice() know that the device is "whatever" and > where to get the generic device struct from? Doh! You are totally right -- it can't get the association any other way. Folks, ignore me :) Jeff From yoshfuji@linux-ipv6.org Wed Jun 4 02:19:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 02:19:16 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h549J72x025255 for ; Wed, 4 Jun 2003 02:19:08 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h549JxBo005477; Wed, 4 Jun 2003 18:19:59 +0900 Date: Wed, 04 Jun 2003 18:19:59 +0900 (JST) Message-Id: <20030604.181959.41095926.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030604.004738.26506541.davem@redhat.com> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> <20030604.004738.26506541.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2875 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 403 Lines: 10 In article <20030604.004738.26506541.davem@redhat.com> (at Wed, 04 Jun 2003 00:47:38 -0700 (PDT)), "David S. Miller" says: > You mean "parentheses", braces define basic block scope in the > C language, parentheses group expressions :-) I'm deeply ashamed... -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From vnuorval@tcs.hut.fi Wed Jun 4 05:44:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 05:45:00 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54CiO2x001635 for ; Wed, 4 Jun 2003 05:44:45 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Wed, 4 Jun 2003 15:40:07 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h54Ce6jH026238; Wed, 4 Jun 2003 15:40:06 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h54Ce2i4026232; Wed, 4 Jun 2003 15:40:02 +0300 Date: Wed, 4 Jun 2003 15:40:02 +0300 (EEST) From: Ville Nuorvala To: "David S. Miller" cc: kuznet@ms2.inr.ac.ru, , , , , , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030603.213458.112594590.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-377318441-1269789112-1054730402=:26066" X-archive-position: 2876 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev Content-Length: 55162 Lines: 922 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---377318441-1269789112-1054730402=:26066 Content-Type: TEXT/PLAIN; charset=US-ASCII On Tue, 3 Jun 2003, David S. Miller wrote: > You need to fix some things before I will apply this: > > 1) Bogus #ifdef CONFIG_IPV6_TUNNEL_MODULE. You need not this test > around things like MODULE_AUTHOR() and stuff like that, > linux/module.h does that for you. > Fixed. > 2) Dependency upon subtrees patch, please remove it. There is no > agreement on that semantic change to how subtrees work. > Done. I'll send a separate patch for the subtrees stuff if needed. The revised version is attached to this mail, but also available at: http://www.mipl.mediapoli.com/patches/ip6-tunnel-r2.patch -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 ---377318441-1269789112-1054730402=:26066 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="ip6-tunnel-r2.patch" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="ip6-tunnel-r2.patch" ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvaW5jbHVkZS9saW51eC9p Zl9hcnAuaCBtZXJnZS0yLjUvaW5jbHVkZS9saW51eC9pZl9hcnAuaA0KLS0t IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lmX2FycC5oCVdlZCBKdW4gIDQg MTM6NDM6MDMgMjAwMw0KKysrIG1lcmdlLTIuNS9pbmNsdWRlL2xpbnV4L2lm X2FycC5oCVdlZCBNYXkgMjggMjE6MTE6NDMgMjAwMw0KQEAgLTYwLDcgKzYw LDcgQEANCiAjZGVmaW5lIEFSUEhSRF9SQVdIRExDCTUxOAkJLyogUmF3IEhE TEMJCQkqLw0KIA0KICNkZWZpbmUgQVJQSFJEX1RVTk5FTAk3NjgJCS8qIElQ SVAgdHVubmVsCQkJKi8NCi0jZGVmaW5lIEFSUEhSRF9UVU5ORUw2CTc2OQkJ LyogSVBJUDYgdHVubmVsCQkJKi8NCisjZGVmaW5lIEFSUEhSRF9UVU5ORUw2 CTc2OQkJLyogSVA2SVA2IHR1bm5lbCAgICAgICAJCSovDQogI2RlZmluZSBB UlBIUkRfRlJBRAk3NzAgICAgICAgICAgICAgLyogRnJhbWUgUmVsYXkgQWNj ZXNzIERldmljZSAgICAqLw0KICNkZWZpbmUgQVJQSFJEX1NLSVAJNzcxCQkv KiBTS0lQIHZpZgkJCSovDQogI2RlZmluZSBBUlBIUkRfTE9PUEJBQ0sJNzcy CQkvKiBMb29wYmFjayBkZXZpY2UJCSovDQpkaWZmIC1OdXIgLS1leGNsdWRl PVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0 IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lwNl90dW5uZWwuaCBtZXJnZS0y LjUvaW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgNCi0tLSBsaW51eC0yLjUv aW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDow MCAxOTcwDQorKysgbWVyZ2UtMi41L2luY2x1ZGUvbGludXgvaXA2X3R1bm5l bC5oCVdlZCBNYXkgMjggMjE6MTE6NDMgMjAwMw0KQEAgLTAsMCArMSwzMiBA QA0KKy8qDQorICogJElkJA0KKyAqLw0KKw0KKyNpZm5kZWYgX0lQNl9UVU5O RUxfSA0KKyNkZWZpbmUgX0lQNl9UVU5ORUxfSA0KKw0KKyNkZWZpbmUgSVBW Nl9UTFZfVE5MX0VOQ0FQX0xJTUlUIDQNCisjZGVmaW5lIElQVjZfREVGQVVM VF9UTkxfRU5DQVBfTElNSVQgNA0KKw0KKy8qIGRvbid0IGFkZCBlbmNhcHN1 bGF0aW9uIGxpbWl0IGlmIG9uZSBpc24ndCBwcmVzZW50IGluIGlubmVyIHBh Y2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX0lHTl9FTkNBUF9MSU1JVCAw eDENCisvKiBjb3B5IHRoZSB0cmFmZmljIGNsYXNzIGZpZWxkIGZyb20gdGhl IGlubmVyIHBhY2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX1VTRV9PUklH X1RDTEFTUyAweDINCisvKiBjb3B5IHRoZSBmbG93bGFiZWwgZnJvbSB0aGUg aW5uZXIgcGFja2V0ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfVVNFX09SSUdf RkxPV0xBQkVMIDB4NA0KKy8qIGJlaW5nIHVzZWQgZm9yIE1vYmlsZSBJUHY2 ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfTUlQNl9ERVYgMHg4DQorDQorc3Ry dWN0IGlwNl90bmxfcGFybSB7DQorCWNoYXIgbmFtZVtJRk5BTVNJWl07CS8q IG5hbWUgb2YgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgbGluazsJCS8qIGlm aW5kZXggb2YgdW5kZXJseWluZyBMMiBpbnRlcmZhY2UgKi8NCisJX191OCBw cm90bzsJCS8qIHR1bm5lbCBwcm90b2NvbCAqLw0KKwlfX3U4IGVuY2FwX2xp bWl0OwkvKiBlbmNhcHN1bGF0aW9uIGxpbWl0IGZvciB0dW5uZWwgKi8NCisJ X191OCBob3BfbGltaXQ7CQkvKiBob3AgbGltaXQgZm9yIHR1bm5lbCAqLw0K KwlfX3UzMiBmbG93aW5mbzsJCS8qIHRyYWZmaWMgY2xhc3MgYW5kIGZsb3ds YWJlbCBmb3IgdHVubmVsICovDQorCV9fdTMyIGZsYWdzOwkJLyogdHVubmVs IGZsYWdzICovDQorCXN0cnVjdCBpbjZfYWRkciBsYWRkcjsJLyogbG9jYWwg dHVubmVsIGVuZC1wb2ludCBhZGRyZXNzICovDQorCXN0cnVjdCBpbjZfYWRk ciByYWRkcjsJLyogcmVtb3RlIHR1bm5lbCBlbmQtcG9pbnQgYWRkcmVzcyAq Lw0KK307DQorDQorI2VuZGlmDQpkaWZmIC1OdXIgLS1leGNsdWRlPVNDQ1Mg LS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4 LTIuNS9pbmNsdWRlL25ldC9pcDZfdHVubmVsLmggbWVyZ2UtMi41L2luY2x1 ZGUvbmV0L2lwNl90dW5uZWwuaA0KLS0tIGxpbnV4LTIuNS9pbmNsdWRlL25l dC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDowMCAxOTcwDQorKysg bWVyZ2UtMi41L2luY2x1ZGUvbmV0L2lwNl90dW5uZWwuaAlXZWQgTWF5IDI4 IDIxOjExOjQzIDIwMDMNCkBAIC0wLDAgKzEsNDQgQEANCisvKg0KKyAqICRJ ZCQNCisgKi8NCisNCisjaWZuZGVmIF9ORVRfSVA2X1RVTk5FTF9IDQorI2Rl ZmluZSBfTkVUX0lQNl9UVU5ORUxfSA0KKw0KKyNpbmNsdWRlIDxsaW51eC9p cHY2Lmg+DQorI2luY2x1ZGUgPGxpbnV4L25ldGRldmljZS5oPg0KKyNpbmNs dWRlIDxsaW51eC9pcDZfdHVubmVsLmg+DQorDQorLyogY2FwYWJsZSBvZiBz ZW5kaW5nIHBhY2tldHMgKi8NCisjZGVmaW5lIElQNl9UTkxfRl9DQVBfWE1J VCAweDEwMDAwDQorLyogY2FwYWJsZSBvZiByZWNlaXZpbmcgcGFja2V0cyAq Lw0KKyNkZWZpbmUgSVA2X1ROTF9GX0NBUF9SQ1YgMHgyMDAwMA0KKw0KKyNk ZWZpbmUgSVA2X1ROTF9NQVggMTI4DQorDQorLyogSVB2NiB0dW5uZWwgKi8N CisNCitzdHJ1Y3QgaXA2X3RubCB7DQorCXN0cnVjdCBpcDZfdG5sICpuZXh0 OwkvKiBuZXh0IHR1bm5lbCBpbiBsaXN0ICovDQorCXN0cnVjdCBuZXRfZGV2 aWNlICpkZXY7CS8qIHZpcnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0 dW5uZWwgKi8NCisJc3RydWN0IG5ldF9kZXZpY2Vfc3RhdHMgc3RhdDsJLyog c3RhdGlzdGljcyBmb3IgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgcmVjdXJz aW9uOwkJLyogZGVwdGggb2YgaGFyZF9zdGFydF94bWl0IHJlY3Vyc2lvbiAq Lw0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJtIHBhcm1zOwkvKiB0dW5uZWwgY29u ZmlndXJhdGlvbiBwYXJhbXRlcnMgKi8NCisJc3RydWN0IGZsb3dpIGZsOwkv KiBmbG93aSB0ZW1wbGF0ZSBmb3IgeG1pdCAqLw0KK307DQorDQorLyogVHVu bmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgZGVzdGluYXRpb24gc3ViLW9wdGlv biAqLw0KKw0KK3N0cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSB7DQorCV9f dTggdHlwZTsJCS8qIHR5cGUtY29kZSBmb3Igb3B0aW9uICAgICAgICAgKi8N CisJX191OCBsZW5ndGg7CQkvKiBvcHRpb24gbGVuZ3RoICAgICAgICAgICAg ICAgICovDQorCV9fdTggZW5jYXBfbGltaXQ7CS8qIHR1bm5lbCBlbmNhcHN1 bGF0aW9uIGxpbWl0ICAgKi8NCit9IF9fYXR0cmlidXRlX18gKChwYWNrZWQp KTsNCisNCisjaWZkZWYgX19LRVJORUxfXw0KKyNpZmRlZiBDT05GSUdfSVBW Nl9UVU5ORUwNCitleHRlcm4gaW50IF9faW5pdCBpcDZfdHVubmVsX2luaXQo dm9pZCk7DQorZXh0ZXJuIHZvaWQgaXA2X3R1bm5lbF9jbGVhbnVwKHZvaWQp Ow0KKyNlbmRpZg0KKyNlbmRpZg0KKyNlbmRpZg0KZGlmZiAtTnVyIC0tZXhj bHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5n ZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvS2NvbmZpZyBtZXJnZS0yLjUvbmV0 L2lwdjYvS2NvbmZpZw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9LY29uZmln CVdlZCBKdW4gIDQgMTM6NDM6NDQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQv aXB2Ni9LY29uZmlnCVdlZCBKdW4gIDQgMTI6MjA6MzMgMjAwMw0KQEAgLTQy LDQgKzQyLDEyIEBADQogDQogCSAgSWYgdW5zdXJlLCBzYXkgWS4NCiANCitj b25maWcgSVBWNl9UVU5ORUwNCisJdHJpc3RhdGUgIklQdjY6IElQdjYtaW4t SVB2NiB0dW5uZWwiDQorCWRlcGVuZHMgb24gSVBWNg0KKwktLS1oZWxwLS0t DQorCSAgU3VwcG9ydCBmb3IgSVB2Ni1pbi1JUHY2IHR1bm5lbHMgZGVzY3Jp YmVkIGluIFJGQyAyNDczLg0KKw0KKwkgIElmIHVuc3VyZSwgc2F5IE4uDQor DQogc291cmNlICJuZXQvaXB2Ni9uZXRmaWx0ZXIvS2NvbmZpZyINCmRpZmYg LU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhj bHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L01ha2VmaWxlIG1l cmdlLTIuNS9uZXQvaXB2Ni9NYWtlZmlsZQ0KLS0tIGxpbnV4LTIuNS9uZXQv aXB2Ni9NYWtlZmlsZQlXZWQgSnVuICA0IDEzOjQzOjA2IDIwMDMNCisrKyBt ZXJnZS0yLjUvbmV0L2lwdjYvTWFrZWZpbGUJV2VkIE1heSAyOCAyMToxMTo1 OSAyMDAzDQpAQCAtMTUsMyArMTUsNSBAQA0KIG9iai0kKENPTkZJR19JTkVU Nl9FU1ApICs9IGVzcDYubw0KIG9iai0kKENPTkZJR19JTkVUNl9JUENPTVAp ICs9IGlwY29tcDYubw0KIG9iai0kKENPTkZJR19ORVRGSUxURVIpCSs9IG5l dGZpbHRlci8NCisNCitvYmotJChDT05GSUdfSVBWNl9UVU5ORUwpICs9IGlw Nl90dW5uZWwubw0KZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVk ZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0 L2lwdjYvYWZfaW5ldDYuYyBtZXJnZS0yLjUvbmV0L2lwdjYvYWZfaW5ldDYu Yw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9hZl9pbmV0Ni5jCVdlZCBKdW4g IDQgMTM6NDM6MDcgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9hZl9p bmV0Ni5jCVdlZCBNYXkgMjggMjE6MTM6MDggMjAwMw0KQEAgLTU3LDYgKzU3 LDkgQEANCiAjaW5jbHVkZSA8bmV0L3RyYW5zcF92Ni5oPg0KICNpbmNsdWRl IDxuZXQvaXA2X3JvdXRlLmg+DQogI2luY2x1ZGUgPG5ldC9hZGRyY29uZi5o Pg0KKyNpZiBDT05GSUdfSVBWNl9UVU5ORUwNCisjaW5jbHVkZSA8bmV0L2lw Nl90dW5uZWwuaD4NCisjZW5kaWYNCiANCiAjaW5jbHVkZSA8YXNtL3VhY2Nl c3MuaD4NCiAjaW5jbHVkZSA8YXNtL3N5c3RlbS5oPg0KQEAgLTc4MCw2ICs3 ODMsMTEgQEANCiAJZXJyID0gbmRpc2NfaW5pdCgmaW5ldDZfZmFtaWx5X29w cyk7DQogCWlmIChlcnIpDQogCQlnb3RvIG5kaXNjX2ZhaWw7DQorI2lmZGVm IENPTkZJR19JUFY2X1RVTk5FTA0KKwllcnIgPSBpcDZfdHVubmVsX2luaXQo KTsNCisJaWYgKGVycikNCisJCWdvdG8gaXA2X3R1bm5lbF9mYWlsOw0KKyNl bmRpZg0KIAllcnIgPSBpZ21wNl9pbml0KCZpbmV0Nl9mYW1pbHlfb3BzKTsN CiAJaWYgKGVycikNCiAJCWdvdG8gaWdtcF9mYWlsOw0KQEAgLTgzNCw2ICs4 NDIsMTAgQEANCiAJaWdtcDZfY2xlYW51cCgpOw0KICNlbmRpZg0KIGlnbXBf ZmFpbDoNCisjaWZkZWYgQ09ORklHX0lQVjZfVFVOTkVMDQorCWlwNl90dW5u ZWxfY2xlYW51cCgpOw0KK2lwNl90dW5uZWxfZmFpbDoNCisjZW5kaWYNCiAJ bmRpc2NfY2xlYW51cCgpOw0KIG5kaXNjX2ZhaWw6DQogCWljbXB2Nl9jbGVh bnVwKCk7DQpAQCAtODY5LDYgKzg4MSw5IEBADQogCWlwNl9yb3V0ZV9jbGVh bnVwKCk7DQogCWlwdjZfcGFja2V0X2NsZWFudXAoKTsNCiAJaWdtcDZfY2xl YW51cCgpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUwNCisJaXA2X3R1 bm5lbF9jbGVhbnVwKCk7DQorI2VuZGlmDQogCW5kaXNjX2NsZWFudXAoKTsN CiAJaWNtcHY2X2NsZWFudXAoKTsNCiAjaWZkZWYgQ09ORklHX1NZU0NUTA0K ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1 bm5lbC5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVsLmMNCi0tLSBs aW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1bm5lbC5jCVRodSBKYW4gIDEgMDI6 MDA6MDAgMTk3MA0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVs LmMJV2VkIEp1biAgNCAxMzo0NDo1MCAyMDAzDQpAQCAtMCwwICsxLDEyNjEg QEANCisvKg0KKyAqCUlQdjYgb3ZlciBJUHY2IHR1bm5lbCBkZXZpY2UNCisg KglMaW51eCBJTkVUNiBpbXBsZW1lbnRhdGlvbg0KKyAqDQorICoJQXV0aG9y czoNCisgKglWaWxsZSBOdW9ydmFsYQkJPHZudW9ydmFsQHRjcy5odXQuZmk+ CQ0KKyAqDQorICoJJElkJA0KKyAqDQorICogICAgICBCYXNlZCBvbjoNCisg KiAgICAgIGxpbnV4L25ldC9pcHY2L3NpdC5jDQorICoNCisgKiAgICAgIFJG QyAyNDczDQorICoNCisgKglUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2Fy ZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yDQorICogICAgICBt b2RpZnkgaXQgdW5kZXIgdGhlIHRlcm1zIG9mIHRoZSBHTlUgR2VuZXJhbCBQ dWJsaWMgTGljZW5zZQ0KKyAqICAgICAgYXMgcHVibGlzaGVkIGJ5IHRoZSBG cmVlIFNvZnR3YXJlIEZvdW5kYXRpb247IGVpdGhlciB2ZXJzaW9uDQorICog ICAgICAyIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFu eSBsYXRlciB2ZXJzaW9uLg0KKyAqDQorICovDQorDQorI2luY2x1ZGUgPGxp bnV4L2NvbmZpZy5oPg0KKyNpbmNsdWRlIDxsaW51eC9tb2R1bGUuaD4NCisj aW5jbHVkZSA8bGludXgvZXJybm8uaD4NCisjaW5jbHVkZSA8bGludXgvdHlw ZXMuaD4NCisjaW5jbHVkZSA8bGludXgvc29ja2V0Lmg+DQorI2luY2x1ZGUg PGxpbnV4L3NvY2tpb3MuaD4NCisjaW5jbHVkZSA8bGludXgvaWYuaD4NCisj aW5jbHVkZSA8bGludXgvaW4uaD4NCisjaW5jbHVkZSA8bGludXgvaXAuaD4N CisjaW5jbHVkZSA8bGludXgvaWZfdHVubmVsLmg+DQorI2luY2x1ZGUgPGxp bnV4L25ldC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pbjYuaD4NCisjaW5jbHVk ZSA8bGludXgvbmV0ZGV2aWNlLmg+DQorI2luY2x1ZGUgPGxpbnV4L2lmX2Fy cC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pY21wdjYuaD4NCisjaW5jbHVkZSA8 bGludXgvaW5pdC5oPg0KKyNpbmNsdWRlIDxsaW51eC9yb3V0ZS5oPg0KKyNp bmNsdWRlIDxsaW51eC9ydG5ldGxpbmsuaD4NCisNCisjaW5jbHVkZSA8YXNt L3VhY2Nlc3MuaD4NCisjaW5jbHVkZSA8YXNtL2F0b21pYy5oPg0KKw0KKyNp bmNsdWRlIDxuZXQvaXAuaD4NCisjaW5jbHVkZSA8bmV0L3NvY2suaD4NCisj aW5jbHVkZSA8bmV0L2lwdjYuaD4NCisjaW5jbHVkZSA8bmV0L3Byb3RvY29s Lmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfcm91dGUuaD4NCisjaW5jbHVkZSA8 bmV0L2FkZHJjb25mLmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfdHVubmVsLmg+ DQorDQorTU9EVUxFX0FVVEhPUigiVmlsbGUgTnVvcnZhbGEiKTsNCitNT0RV TEVfREVTQ1JJUFRJT04oIklQdjYtaW4tSVB2NiB0dW5uZWwiKTsNCitNT0RV TEVfTElDRU5TRSgiR1BMIik7DQorDQorI2RlZmluZSBJUFY2X1RMVl9URUxf RFNUX1NJWkUgOA0KKw0KKyNpZmRlZiBJUDZfVE5MX0RFQlVHDQorI2RlZmlu ZSBJUDZfVE5MX1RSQUNFKHguLi4pIHByaW50ayhLRVJOX0RFQlVHICIlczoi IHggIlxuIiwgX19GVU5DVElPTl9fKQ0KKyNlbHNlDQorI2RlZmluZSBJUDZf VE5MX1RSQUNFKHguLi4pIGRvIHs7fSB3aGlsZSgwKQ0KKyNlbmRpZg0KKw0K KyNkZWZpbmUgSVBWNl9UQ0xBU1NfTUFTSyAoSVBWNl9GTE9XSU5GT19NQVNL ICYgfklQVjZfRkxPV0xBQkVMX01BU0spDQorDQorLyogc29ja2V0KHMpIHVz ZWQgYnkgaXA2aXA2X3RubF94bWl0KCkgZm9yIHJlc2VuZGluZyBwYWNrZXRz ICovDQorc3RhdGljIHN0cnVjdCBzb2NrZXQgKl9faXA2X3NvY2tldFtOUl9D UFVTXTsNCisjZGVmaW5lIGlwNl9zb2NrZXQgX19pcDZfc29ja2V0W3NtcF9w cm9jZXNzb3JfaWQoKV0NCisNCitzdGF0aWMgdm9pZCBpcDZfeG1pdF9sb2Nr KHZvaWQpDQorew0KKwlsb2NhbF9iaF9kaXNhYmxlKCk7DQorCWlmICh1bmxp a2VseSghc3Bpbl90cnlsb2NrKCZpcDZfc29ja2V0LT5zay0+bG9jay5zbG9j aykpKQ0KKwkJQlVHKCk7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlwNl94bWl0 X3VubG9jayh2b2lkKQ0KK3sNCisJc3Bpbl91bmxvY2tfYmgoJmlwNl9zb2Nr ZXQtPnNrLT5sb2NrLnNsb2NrKTsNCit9DQorDQorI2RlZmluZSBIQVNIX1NJ WkUgIDMyDQorDQorI2RlZmluZSBIQVNIKGFkZHIpICgoKGFkZHIpLT5zNl9h ZGRyMzJbMF0gXiAoYWRkciktPnM2X2FkZHIzMlsxXSBeIFwNCisJICAgICAg ICAgICAgIChhZGRyKS0+czZfYWRkcjMyWzJdIF4gKGFkZHIpLT5zNl9hZGRy MzJbM10pICYgXA0KKyAgICAgICAgICAgICAgICAgICAgKEhBU0hfU0laRSAt IDEpKQ0KKw0KK3N0YXRpYyBpbnQgaXA2aXA2X2ZiX3RubF9kZXZfaW5pdChz dHJ1Y3QgbmV0X2RldmljZSAqZGV2KTsNCitzdGF0aWMgaW50IGlwNmlwNl90 bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldik7DQorDQorLyog dGhlIElQdjYgdHVubmVsIGZhbGxiYWNrIGRldmljZSAqLw0KK3N0YXRpYyBz dHJ1Y3QgbmV0X2RldmljZSBpcDZpcDZfZmJfdG5sX2RldiA9IHsNCisJLm5h bWUgPSAiaXA2dG5sMCIsDQorCS5pbml0ID0gaXA2aXA2X2ZiX3RubF9kZXZf aW5pdA0KK307DQorDQorLyogdGhlIElQdjYgZmFsbGJhY2sgdHVubmVsICov DQorc3RhdGljIHN0cnVjdCBpcDZfdG5sIGlwNmlwNl9mYl90bmwgPSB7DQor CS5kZXYgPSAmaXA2aXA2X2ZiX3RubF9kZXYsDQorCS5wYXJtcyA9ey5uYW1l ID0gImlwNnRubDAiLCAucHJvdG8gPSBJUFBST1RPX0lQVjZ9DQorfTsNCisN CisvKiBsaXN0cyBmb3Igc3RvcmluZyB0dW5uZWxzIGluIHVzZSAqLw0KK3N0 YXRpYyBzdHJ1Y3QgaXA2X3RubCAqdG5sc19yX2xbSEFTSF9TSVpFXTsNCitz dGF0aWMgc3RydWN0IGlwNl90bmwgKnRubHNfd2NbMV07DQorc3RhdGljIHN0 cnVjdCBpcDZfdG5sICoqdG5sc1syXSA9IHsgdG5sc193YywgdG5sc19yX2wg fTsNCisNCisvKiBsb2NrIGZvciB0aGUgdHVubmVsIGxpc3RzICovDQorc3Rh dGljIHJ3bG9ja190IGlwNmlwNl9sb2NrID0gUldfTE9DS19VTkxPQ0tFRDsN CisNCisvKioNCisgKiBpcDZpcDZfdG5sX2xvb2t1cCAtIGZldGNoIHR1bm5l bCBtYXRjaGluZyB0aGUgZW5kLXBvaW50IGFkZHJlc3Nlcw0KKyAqICAgQHJl bW90ZTogdGhlIGFkZHJlc3Mgb2YgdGhlIHR1bm5lbCBleGl0LXBvaW50IA0K KyAqICAgQGxvY2FsOiB0aGUgYWRkcmVzcyBvZiB0aGUgdHVubmVsIGVudHJ5 LXBvaW50IA0KKyAqDQorICogUmV0dXJuOiAgDQorICogICB0dW5uZWwgbWF0 Y2hpbmcgZ2l2ZW4gZW5kLXBvaW50cyBpZiBmb3VuZCwNCisgKiAgIGVsc2Ug ZmFsbGJhY2sgdHVubmVsIGlmIGl0cyBkZXZpY2UgaXMgdXAsIA0KKyAqICAg ZWxzZSAlTlVMTA0KKyAqKi8NCisNCitzdHJ1Y3QgaXA2X3RubCAqDQoraXA2 aXA2X3RubF9sb29rdXAoc3RydWN0IGluNl9hZGRyICpyZW1vdGUsIHN0cnVj dCBpbjZfYWRkciAqbG9jYWwpDQorew0KKwl1bnNpZ25lZCBoMCA9IEhBU0go cmVtb3RlKTsNCisJdW5zaWduZWQgaDEgPSBIQVNIKGxvY2FsKTsNCisJc3Ry dWN0IGlwNl90bmwgKnQ7DQorDQorCWZvciAodCA9IHRubHNfcl9sW2gwIF4g aDFdOyB0OyB0ID0gdC0+bmV4dCkgew0KKwkJaWYgKCFpcHY2X2FkZHJfY21w KGxvY2FsLCAmdC0+cGFybXMubGFkZHIpICYmDQorCQkgICAgIWlwdjZfYWRk cl9jbXAocmVtb3RlLCAmdC0+cGFybXMucmFkZHIpICYmDQorCQkgICAgKHQt PmRldi0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJCXJldHVybiB0Ow0KKwl9DQor CWlmICgodCA9IHRubHNfd2NbMF0pICE9IE5VTEwgJiYgKHQtPmRldi0+Zmxh Z3MgJiBJRkZfVVApKQ0KKwkJcmV0dXJuIHQ7DQorDQorCXJldHVybiBOVUxM Ow0KK30NCisNCisvKioNCisgKiBpcDZpcDZfYnVja2V0IC0gZ2V0IGhlYWQg b2YgbGlzdCBtYXRjaGluZyBnaXZlbiB0dW5uZWwgcGFyYW1ldGVycw0KKyAq ICAgQHA6IHBhcmFtZXRlcnMgY29udGFpbmluZyB0dW5uZWwgZW5kLXBvaW50 cyANCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2J1Y2tl dCgpIHJldHVybnMgdGhlIGhlYWQgb2YgdGhlIGxpc3QgbWF0Y2hpbmcgdGhl IA0KKyAqICAgJnN0cnVjdCBpbjZfYWRkciBlbnRyaWVzIGxhZGRyIGFuZCBy YWRkciBpbiBAcC4NCisgKg0KKyAqIFJldHVybjogaGVhZCBvZiBJUHY2IHR1 bm5lbCBsaXN0IA0KKyAqKi8NCisNCitzdGF0aWMgc3RydWN0IGlwNl90bmwg KioNCitpcDZpcDZfYnVja2V0KHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnApDQor ew0KKwlzdHJ1Y3QgaW42X2FkZHIgKnJlbW90ZSA9ICZwLT5yYWRkcjsNCisJ c3RydWN0IGluNl9hZGRyICpsb2NhbCA9ICZwLT5sYWRkcjsNCisJdW5zaWdu ZWQgaCA9IDA7DQorCWludCBwcmlvID0gMDsNCisNCisJaWYgKCFpcHY2X2Fk ZHJfYW55KHJlbW90ZSkgfHwgIWlwdjZfYWRkcl9hbnkobG9jYWwpKSB7DQor CQlwcmlvID0gMTsNCisJCWggPSBIQVNIKHJlbW90ZSkgXiBIQVNIKGxvY2Fs KTsNCisJfQ0KKwlyZXR1cm4gJnRubHNbcHJpb11baF07DQorfQ0KKw0KKy8q Kg0KKyAqIGlwNmlwNl90bmxfbGluayAtIGFkZCB0dW5uZWwgdG8gaGFzaCB0 YWJsZQ0KKyAqICAgQHQ6IHR1bm5lbCB0byBiZSBhZGRlZA0KKyAqKi8NCisN CitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfbGluayhzdHJ1Y3QgaXA2X3Ru bCAqdCkNCit7DQorCXN0cnVjdCBpcDZfdG5sICoqdHAgPSBpcDZpcDZfYnVj a2V0KCZ0LT5wYXJtcyk7DQorDQorCXdyaXRlX2xvY2tfYmgoJmlwNmlwNl9s b2NrKTsNCisJdC0+bmV4dCA9ICp0cDsNCisJd3JpdGVfdW5sb2NrX2JoKCZp cDZpcDZfbG9jayk7DQorCSp0cCA9IHQ7DQorfQ0KKw0KKy8qKg0KKyAqIGlw NmlwNl90bmxfdW5saW5rIC0gcmVtb3ZlIHR1bm5lbCBmcm9tIGhhc2ggdGFi bGUNCisgKiAgIEB0OiB0dW5uZWwgdG8gYmUgcmVtb3ZlZA0KKyAqKi8NCisN CitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfdW5saW5rKHN0cnVjdCBpcDZf dG5sICp0KQ0KK3sNCisJc3RydWN0IGlwNl90bmwgKip0cDsNCisNCisJZm9y ICh0cCA9IGlwNmlwNl9idWNrZXQoJnQtPnBhcm1zKTsgKnRwOyB0cCA9ICYo KnRwKS0+bmV4dCkgew0KKwkJaWYgKHQgPT0gKnRwKSB7DQorCQkJd3JpdGVf bG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwkJCSp0cCA9IHQtPm5leHQ7DQor CQkJd3JpdGVfdW5sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQkJYnJlYWs7 DQorCQl9DQorCX0NCit9DQorDQorLyoqDQorICogaXA2X3RubF9jcmVhdGUo KSAtIGNyZWF0ZSBhIG5ldyB0dW5uZWwNCisgKiAgIEBwOiB0dW5uZWwgcGFy YW1ldGVycw0KKyAqICAgQHB0OiBwb2ludGVyIHRvIG5ldyB0dW5uZWwNCisg Kg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgQ3JlYXRlIHR1bm5lbCBtYXRj aGluZyBnaXZlbiBwYXJhbWV0ZXJzLg0KKyAqIA0KKyAqIFJldHVybjogDQor ICogICAwIG9uIHN1Y2Nlc3MNCisgKiovDQorDQorc3RhdGljIGludA0KK2lw Nl90bmxfY3JlYXRlKHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAsIHN0cnVjdCBp cDZfdG5sICoqcHQpDQorew0KKwlzdHJ1Y3QgbmV0X2RldmljZSAqZGV2Ow0K KwlpbnQgZXJyID0gLUVOT0JVRlM7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0K Kw0KKwlkZXYgPSBrbWFsbG9jKHNpemVvZiAoKmRldikgKyBzaXplb2YgKCp0 KSwgR0ZQX0tFUk5FTCk7DQorCWlmICghZGV2KQ0KKwkJcmV0dXJuIGVycjsN CisNCisJbWVtc2V0KGRldiwgMCwgc2l6ZW9mICgqZGV2KSArIHNpemVvZiAo KnQpKTsNCisJZGV2LT5wcml2ID0gKHZvaWQgKikgKGRldiArIDEpOw0KKwl0 ID0gKHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJdC0+ZGV2ID0g ZGV2Ow0KKwlkZXYtPmluaXQgPSBpcDZpcDZfdG5sX2Rldl9pbml0Ow0KKwlt ZW1jcHkoJnQtPnBhcm1zLCBwLCBzaXplb2YgKCpwKSk7DQorCXQtPnBhcm1z Lm5hbWVbSUZOQU1TSVogLSAxXSA9ICdcMCc7DQorCWlmICh0LT5wYXJtcy5o b3BfbGltaXQgPiAyNTUpDQorCQl0LT5wYXJtcy5ob3BfbGltaXQgPSAtMTsN CisJc3RyY3B5KGRldi0+bmFtZSwgdC0+cGFybXMubmFtZSk7DQorCWlmICgh ZGV2LT5uYW1lWzBdKSB7DQorCQlpbnQgaSA9IDA7DQorCQlpbnQgZXhpc3Rz ID0gMDsNCisNCisJCWRvIHsNCisJCQlzcHJpbnRmKGRldi0+bmFtZSwgImlw NnRubCVkIiwgKytpKTsNCisJCQlleGlzdHMgPSAoX19kZXZfZ2V0X2J5X25h bWUoZGV2LT5uYW1lKSAhPSBOVUxMKTsNCisJCX0gd2hpbGUgKGkgPCBJUDZf VE5MX01BWCAmJiBleGlzdHMpOw0KKw0KKwkJaWYgKGkgPT0gSVA2X1ROTF9N QVgpIHsNCisJCQlnb3RvIGZhaWxlZDsNCisJCX0NCisJCW1lbWNweSh0LT5w YXJtcy5uYW1lLCBkZXYtPm5hbWUsIElGTkFNU0laKTsNCisJfQ0KKwlTRVRf TU9EVUxFX09XTkVSKGRldik7DQorCWlmICgoZXJyID0gcmVnaXN0ZXJfbmV0 ZGV2aWNlKGRldikpIDwgMCkgew0KKwkJZ290byBmYWlsZWQ7DQorCX0NCisJ aXA2aXA2X3RubF9saW5rKHQpOw0KKwkqcHQgPSB0Ow0KKwlyZXR1cm4gMDsN CitmYWlsZWQ6DQorCWtmcmVlKGRldik7DQorCXJldHVybiBlcnI7DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNl90bmxfZGVzdHJveSgpIC0gZGVzdHJveSBvbGQg dHVubmVsDQorICogICBAdDogdHVubmVsIHRvIGJlIGRlc3Ryb3llZA0KKyAq DQorICogUmV0dXJuOg0KKyAqICAgd2hhdGV2ZXIgdW5yZWdpc3Rlcl9uZXRk ZXZpY2UoKSByZXR1cm5zDQorICoqLw0KKw0KK3N0YXRpYyBpbmxpbmUgaW50 DQoraXA2X3RubF9kZXN0cm95KHN0cnVjdCBpcDZfdG5sICp0KQ0KK3sNCisJ cmV0dXJuIHVucmVnaXN0ZXJfbmV0ZGV2aWNlKHQtPmRldik7DQorfQ0KKw0K Ky8qKg0KKyAqIGlwNmlwNl90bmxfbG9jYXRlIC0gZmluZCBvciBjcmVhdGUg dHVubmVsIG1hdGNoaW5nIGdpdmVuIHBhcmFtZXRlcnMNCisgKiAgIEBwOiB0 dW5uZWwgcGFyYW1ldGVycyANCisgKiAgIEBjcmVhdGU6ICE9IDAgaWYgYWxs b3dlZCB0byBjcmVhdGUgbmV3IHR1bm5lbCBpZiBubyBtYXRjaCBmb3VuZA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBpcDZpcDZfdG5sX2xvY2F0 ZSgpIGZpcnN0IHRyaWVzIHRvIGxvY2F0ZSBhbiBleGlzdGluZyB0dW5uZWwN CisgKiAgIGJhc2VkIG9uIEBwYXJtcy4gSWYgdGhpcyBpcyB1bnN1Y2Nlc3Nm dWwsIGJ1dCBAY3JlYXRlIGlzIHNldCBhIG5ldw0KKyAqICAgdHVubmVsIGRl dmljZSBpcyBjcmVhdGVkIGFuZCByZWdpc3RlcmVkIGZvciB1c2UuDQorICoN CisgKiBSZXR1cm46DQorICogICAwIGlmIHR1bm5lbCBsb2NhdGVkIG9yIGNy ZWF0ZWQsDQorICogICAtRUlOVkFMIGlmIHBhcmFtZXRlcnMgaW5jb3JyZWN0 LA0KKyAqICAgLUVOT0RFViBpZiBubyBtYXRjaGluZyB0dW5uZWwgYXZhaWxh YmxlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZfdG5sX2xvY2F0 ZShzdHJ1Y3QgaXA2X3RubF9wYXJtICpwLCBzdHJ1Y3QgaXA2X3RubCAqKnB0 LCBpbnQgY3JlYXRlKQ0KK3sNCisJc3RydWN0IGluNl9hZGRyICpyZW1vdGUg PSAmcC0+cmFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAqbG9jYWwgPSAmcC0+ bGFkZHI7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKw0KKwlpZiAocC0+cHJv dG8gIT0gSVBQUk9UT19JUFY2KQ0KKwkJcmV0dXJuIC1FSU5WQUw7DQorDQor CWZvciAodCA9ICppcDZpcDZfYnVja2V0KHApOyB0OyB0ID0gdC0+bmV4dCkg ew0KKwkJaWYgKCFpcHY2X2FkZHJfY21wKGxvY2FsLCAmdC0+cGFybXMubGFk ZHIpICYmDQorCQkgICAgIWlwdjZfYWRkcl9jbXAocmVtb3RlLCAmdC0+cGFy bXMucmFkZHIpKSB7DQorCQkJKnB0ID0gdDsNCisJCQlyZXR1cm4gKGNyZWF0 ZSA/IC1FRVhJU1QgOiAwKTsNCisJCX0NCisJfQ0KKwlpZiAoIWNyZWF0ZSkg ew0KKwkJcmV0dXJuIC1FTk9ERVY7DQorCX0NCisJcmV0dXJuIGlwNl90bmxf Y3JlYXRlKHAsIHB0KTsNCit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9k ZXZfZGVzdHJ1Y3RvciAtIHR1bm5lbCBkZXZpY2UgZGVzdHJ1Y3Rvcg0KKyAq ICAgQGRldjogdGhlIGRldmljZSB0byBiZSBkZXN0cm95ZWQNCisgKiovDQor DQorc3RhdGljIHZvaWQNCitpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yKHN0 cnVjdCBuZXRfZGV2aWNlICpkZXYpDQorew0KKwlrZnJlZShkZXYpOw0KK30N CisNCisvKioNCisgKiBpcDZpcDZfdG5sX2Rldl91bmluaXQgLSB0dW5uZWwg ZGV2aWNlIHVuaW5pdGlhbGl6ZXINCisgKiAgIEBkZXY6IHRoZSBkZXZpY2Ug dG8gYmUgZGVzdHJveWVkDQorICogICANCisgKiBEZXNjcmlwdGlvbjoNCisg KiAgIGlwNmlwNl90bmxfZGV2X3VuaW5pdCgpIHJlbW92ZXMgdHVubmVsIGZy b20gaXRzIGxpc3QNCisgKiovDQorDQorc3RhdGljIHZvaWQNCitpcDZpcDZf dG5sX2Rldl91bmluaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQor CWlmIChkZXYgPT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQl3cml0ZV9s b2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQl0bmxzX3djWzBdID0gTlVMTDsN CisJCXdyaXRlX3VubG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwl9IGVsc2Ug ew0KKwkJc3RydWN0IGlwNl90bmwgKnQgPSAoc3RydWN0IGlwNl90bmwgKikg ZGV2LT5wcml2Ow0KKwkJaXA2aXA2X3RubF91bmxpbmsodCk7DQorCX0NCit9 DQorDQorLyoqDQorICogcGFyc2VfdHZsX3RubF9lbmNfbGltIC0gaGFuZGxl IGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICogICBAc2tiOiByZWNl aXZlZCBzb2NrZXQgYnVmZmVyDQorICoNCisgKiBSZXR1cm46IA0KKyAqICAg MCBpZiBub25lIHdhcyBmb3VuZCwgDQorICogICBlbHNlIGluZGV4IHRvIGVu Y2Fwc3VsYXRpb24gbGltaXQNCisgKiovDQorDQorc3RhdGljIF9fdTE2DQor cGFyc2VfdGx2X3RubF9lbmNfbGltKHN0cnVjdCBza19idWZmICpza2IsIF9f dTggKiByYXcpDQorew0KKwlzdHJ1Y3QgaXB2NmhkciAqaXB2NmggPSAoc3Ry dWN0IGlwdjZoZHIgKikgcmF3Ow0KKwlfX3U4IG5leHRoZHIgPSBpcHY2aC0+ bmV4dGhkcjsNCisJX191MTYgb2ZmID0gc2l6ZW9mICgqaXB2NmgpOw0KKw0K Kwl3aGlsZSAoaXB2Nl9leHRfaGRyKG5leHRoZHIpICYmIG5leHRoZHIgIT0g TkVYVEhEUl9OT05FKSB7DQorCQlfX3UxNiBvcHRsZW4gPSAwOw0KKwkJc3Ry dWN0IGlwdjZfb3B0X2hkciAqaGRyOw0KKwkJaWYgKHJhdyArIG9mZiArIHNp emVvZiAoKmhkcikgPiBza2ItPmRhdGEgJiYNCisJCSAgICAhcHNrYl9tYXlf cHVsbChza2IsIHJhdyAtIHNrYi0+ZGF0YSArIG9mZiArIHNpemVvZiAoKmhk cikpKQ0KKwkJCWJyZWFrOw0KKw0KKwkJaGRyID0gKHN0cnVjdCBpcHY2X29w dF9oZHIgKikgKHJhdyArIG9mZik7DQorCQlpZiAobmV4dGhkciA9PSBORVhU SERSX0ZSQUdNRU5UKSB7DQorCQkJc3RydWN0IGZyYWdfaGRyICpmcmFnX2hk ciA9IChzdHJ1Y3QgZnJhZ19oZHIgKikgaGRyOw0KKwkJCWlmIChmcmFnX2hk ci0+ZnJhZ19vZmYpDQorCQkJCWJyZWFrOw0KKwkJCW9wdGxlbiA9IDg7DQor CQl9IGVsc2UgaWYgKG5leHRoZHIgPT0gTkVYVEhEUl9BVVRIKSB7DQorCQkJ b3B0bGVuID0gKGhkci0+aGRybGVuICsgMikgPDwgMjsNCisJCX0gZWxzZSB7 DQorCQkJb3B0bGVuID0gaXB2Nl9vcHRsZW4oaGRyKTsNCisJCX0NCisJCWlm IChuZXh0aGRyID09IE5FWFRIRFJfREVTVCkgew0KKwkJCV9fdTE2IGkgPSBv ZmYgKyAyOw0KKwkJCXdoaWxlICgxKSB7DQorCQkJCXN0cnVjdCBpcHY2X3Rs dl90bmxfZW5jX2xpbSAqdGVsOw0KKw0KKwkJCQkvKiBObyBtb3JlIHJvb20g Zm9yIGVuY2Fwc3VsYXRpb24gbGltaXQgKi8NCisJCQkJaWYgKGkgKyBzaXpl b2YgKCp0ZWwpID4gb2ZmICsgb3B0bGVuKQ0KKwkJCQkJYnJlYWs7DQorDQor CQkJCXRlbCA9IChzdHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0gKikgJnJh d1tpXTsNCisJCQkJLyogcmV0dXJuIGluZGV4IG9mIG9wdGlvbiBpZiBmb3Vu ZCBhbmQgdmFsaWQgKi8NCisJCQkJaWYgKHRlbC0+dHlwZSA9PSBJUFY2X1RM Vl9UTkxfRU5DQVBfTElNSVQgJiYNCisJCQkJICAgIHRlbC0+bGVuZ3RoID09 IDEpDQorCQkJCQlyZXR1cm4gaTsNCisJCQkJLyogZWxzZSBqdW1wIHRvIG5l eHQgb3B0aW9uICovDQorCQkJCWlmICh0ZWwtPnR5cGUpDQorCQkJCQlpICs9 IHRlbC0+bGVuZ3RoICsgMjsNCisJCQkJZWxzZQ0KKwkJCQkJaSsrOw0KKwkJ CX0NCisJCX0NCisJCW5leHRoZHIgPSBoZHItPm5leHRoZHI7DQorCQlvZmYg Kz0gb3B0bGVuOw0KKwl9DQorCXJldHVybiAwOw0KK30NCisNCisvKioNCisg KiBpcDZpcDZfZXJyIC0gdHVubmVsIGVycm9yIGhhbmRsZXINCisgKg0KKyAq IERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2VycigpIHNob3VsZCBoYW5k bGUgZXJyb3JzIGluIHRoZSB0dW5uZWwgYWNjb3JkaW5nDQorICogICB0byB0 aGUgc3BlY2lmaWNhdGlvbnMgaW4gUkZDIDI0NzMuDQorICoqLw0KKw0KK3Zv aWQgaXA2aXA2X2VycihzdHJ1Y3Qgc2tfYnVmZiAqc2tiLCBzdHJ1Y3QgaW5l dDZfc2tiX3Bhcm0gKm9wdCwNCisJCSAgIGludCB0eXBlLCBpbnQgY29kZSwg aW50IG9mZnNldCwgX191MzIgaW5mbykNCit7DQorCXN0cnVjdCBpcHY2aGRy ICppcHY2aCA9IChzdHJ1Y3QgaXB2NmhkciAqKSBza2ItPmRhdGE7DQorCXN0 cnVjdCBpcDZfdG5sICp0Ow0KKwlpbnQgcmVsX21zZyA9IDA7DQorCWludCBy ZWxfdHlwZSA9IElDTVBWNl9ERVNUX1VOUkVBQ0g7DQorCWludCByZWxfY29k ZSA9IElDTVBWNl9BRERSX1VOUkVBQ0g7DQorCV9fdTMyIHJlbF9pbmZvID0g MDsNCisJX191MTYgbGVuOw0KKw0KKwkvKiBJZiB0aGUgcGFja2V0IGRvZXNu J3QgY29udGFpbiB0aGUgb3JpZ2luYWwgSVB2NiBoZWFkZXIgd2UgYXJlIA0K KwkgICBpbiB0cm91YmxlIHNpbmNlIHdlIG1pZ2h0IG5lZWQgdGhlIHNvdXJj ZSBhZGRyZXNzIGZvciBmdXJ0ZXIgDQorCSAgIHByb2Nlc3Npbmcgb2YgdGhl IGVycm9yLiAqLw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisJ aWYgKCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5kYWRkciwgJmlw djZoLT5zYWRkcikpID09IE5VTEwpDQorCQlnb3RvIG91dDsNCisNCisJc3dp dGNoICh0eXBlKSB7DQorCQlfX3UzMiB0ZWxpOw0KKwkJc3RydWN0IGlwdjZf dGx2X3RubF9lbmNfbGltICp0ZWw7DQorCQlfX3UzMiBtdHU7DQorCWNhc2Ug SUNNUFY2X0RFU1RfVU5SRUFDSDoNCisJCWlmIChuZXRfcmF0ZWxpbWl0KCkp DQorCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCSAgICAgICAiJXM6IFBh dGggdG8gZGVzdGluYXRpb24gaW52YWxpZCAiDQorCQkJICAgICAgICJvciBp bmFjdGl2ZSFcbiIsIHQtPnBhcm1zLm5hbWUpOw0KKwkJcmVsX21zZyA9IDE7 DQorCQlicmVhazsNCisJY2FzZSBJQ01QVjZfVElNRV9FWENFRUQ6DQorCQlp ZiAoY29kZSA9PSBJQ01QVjZfRVhDX0hPUExJTUlUKSB7DQorCQkJaWYgKG5l dF9yYXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJ CQkgICAgICAgIiVzOiBUb28gc21hbGwgaG9wIGxpbWl0IG9yICINCisJCQkJ ICAgICAgICJyb3V0aW5nIGxvb3AgaW4gdHVubmVsIVxuIiwgDQorCQkJCSAg ICAgICB0LT5wYXJtcy5uYW1lKTsNCisJCQlyZWxfbXNnID0gMTsNCisJCX0N CisJCWJyZWFrOw0KKwljYXNlIElDTVBWNl9QQVJBTVBST0I6DQorCQkvKiBp Z25vcmUgaWYgcGFyYW1ldGVyIHByb2JsZW0gbm90IGNhdXNlZCBieSBhIHR1 bm5lbA0KKwkJICAgZW5jYXBzdWxhdGlvbiBsaW1pdCBzdWItb3B0aW9uICov DQorCQlpZiAoY29kZSAhPSBJQ01QVjZfSERSX0ZJRUxEKSB7DQorCQkJYnJl YWs7DQorCQl9DQorCQl0ZWxpID0gcGFyc2VfdGx2X3RubF9lbmNfbGltKHNr Yiwgc2tiLT5kYXRhKTsNCisNCisJCWlmICh0ZWxpICYmIHRlbGkgPT0gaW5m byAtIDIpIHsNCisJCQl0ZWwgPSAoc3RydWN0IGlwdjZfdGx2X3RubF9lbmNf bGltICopICZza2ItPmRhdGFbdGVsaV07DQorCQkJaWYgKHRlbC0+ZW5jYXBf bGltaXQgPD0gMSkgew0KKwkJCQlpZiAobmV0X3JhdGVsaW1pdCgpKQ0KKwkJ CQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCQkJICAgICAgICIlczogVG9v IHNtYWxsIGVuY2Fwc3VsYXRpb24gIg0KKwkJCQkJICAgICAgICJsaW1pdCBv ciByb3V0aW5nIGxvb3AgaW4gIg0KKwkJCQkJICAgICAgICJ0dW5uZWwhXG4i LCB0LT5wYXJtcy5uYW1lKTsNCisJCQkJcmVsX21zZyA9IDE7DQorCQkJfQ0K KwkJfQ0KKwkJYnJlYWs7DQorCWNhc2UgSUNNUFY2X1BLVF9UT09CSUc6DQor CQltdHUgPSBpbmZvIC0gb2Zmc2V0Ow0KKwkJaWYgKG10dSA8PSBJUFY2X01J Tl9NVFUpIHsNCisJCQltdHUgPSBJUFY2X01JTl9NVFU7DQorCQl9DQorCQl0 LT5kZXYtPm10dSA9IG10dTsNCisNCisJCWlmICgobGVuID0gc2l6ZW9mICgq aXB2NmgpICsgaXB2NmgtPnBheWxvYWRfbGVuKSA+IG10dSkgew0KKwkJCXJl bF90eXBlID0gSUNNUFY2X1BLVF9UT09CSUc7DQorCQkJcmVsX2NvZGUgPSAw Ow0KKwkJCXJlbF9pbmZvID0gbXR1Ow0KKwkJCXJlbF9tc2cgPSAxOw0KKwkJ fQ0KKwkJYnJlYWs7DQorCX0NCisJaWYgKHJlbF9tc2cgJiYgIHBza2JfbWF5 X3B1bGwoc2tiLCBvZmZzZXQgKyBzaXplb2YgKCppcHY2aCkpKSB7DQorCQlz dHJ1Y3QgcnQ2X2luZm8gKnJ0Ow0KKwkJc3RydWN0IHNrX2J1ZmYgKnNrYjIg PSBza2JfY2xvbmUoc2tiLCBHRlBfQVRPTUlDKTsNCisJCWlmICghc2tiMikN CisJCQlnb3RvIG91dDsNCisNCisJCWRzdF9yZWxlYXNlKHNrYjItPmRzdCk7 DQorCQlza2IyLT5kc3QgPSBOVUxMOw0KKwkJc2tiX3B1bGwoc2tiMiwgb2Zm c2V0KTsNCisJCXNrYjItPm5oLnJhdyA9IHNrYjItPmRhdGE7DQorDQorCQkv KiBUcnkgdG8gZ3Vlc3MgaW5jb21pbmcgaW50ZXJmYWNlICovDQorCQlydCA9 IHJ0Nl9sb29rdXAoJnNrYjItPm5oLmlwdjZoLT5zYWRkciwgTlVMTCwgMCwg MCk7DQorDQorCQlpZiAocnQgJiYgcnQtPnJ0NmlfZGV2KQ0KKwkJCXNrYjIt PmRldiA9IHJ0LT5ydDZpX2RldjsNCisNCisJCWljbXB2Nl9zZW5kKHNrYjIs IHJlbF90eXBlLCByZWxfY29kZSwgcmVsX2luZm8sIHNrYjItPmRldik7DQor DQorCQlpZiAocnQpDQorCQkJZHN0X2ZyZWUoJnJ0LT51LmRzdCk7DQorDQor CQlrZnJlZV9za2Ioc2tiMik7DQorCX0NCitvdXQ6DQorCXJlYWRfdW5sb2Nr KCZpcDZpcDZfbG9jayk7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9yY3Yg LSBkZWNhcHN1bGF0ZSBJUHY2IHBhY2tldCBhbmQgcmV0cmFuc21pdCBpdCBs b2NhbGx5DQorICogICBAc2tiOiByZWNlaXZlZCBzb2NrZXQgYnVmZmVyDQor ICoNCisgKiBSZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlwNl9yY3Yo c3RydWN0IHNrX2J1ZmYgKipwc2tiLCB1bnNpZ25lZCBpbnQgKm5ob2ZmcCkN Cit7DQorCXN0cnVjdCBza19idWZmICpza2IgPSAqcHNrYjsNCisJc3RydWN0 IGlwdjZoZHIgKmlwdjZoOw0KKwlzdHJ1Y3QgaXA2X3RubCAqdDsNCisNCisJ aWYgKCFwc2tiX21heV9wdWxsKHNrYiwgc2l6ZW9mICgqaXB2NmgpKSkNCisJ CWdvdG8gZGlzY2FyZDsNCisNCisJaXB2NmggPSBza2ItPm5oLmlwdjZoOw0K Kw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisNCisJaWYgKCh0ID0g aXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5zYWRkciwgJmlwdjZoLT5kYWRk cikpICE9IE5VTEwpIHsNCisJCWlmICghKHQtPnBhcm1zLmZsYWdzICYgSVA2 X1ROTF9GX0NBUF9SQ1YpKSB7DQorCQkJdC0+c3RhdC5yeF9kcm9wcGVkKys7 DQorCQkJcmVhZF91bmxvY2soJmlwNmlwNl9sb2NrKTsNCisJCQlnb3RvIGRp c2NhcmQ7DQorCQl9DQorCQlza2ItPm1hYy5yYXcgPSBza2ItPm5oLnJhdzsN CisJCXNrYi0+bmgucmF3ID0gc2tiLT5kYXRhOw0KKwkJc2tiLT5wcm90b2Nv bCA9IGh0b25zKEVUSF9QX0lQVjYpOw0KKwkJc2tiLT5wa3RfdHlwZSA9IFBB Q0tFVF9IT1NUOw0KKwkJbWVtc2V0KHNrYi0+Y2IsIDAsIHNpemVvZihzdHJ1 Y3QgaW5ldDZfc2tiX3Bhcm0pKTsNCisJCXNrYi0+ZGV2ID0gdC0+ZGV2Ow0K KwkJZHN0X3JlbGVhc2Uoc2tiLT5kc3QpOw0KKwkJc2tiLT5kc3QgPSBOVUxM Ow0KKwkJdC0+c3RhdC5yeF9wYWNrZXRzKys7DQorCQl0LT5zdGF0LnJ4X2J5 dGVzICs9IHNrYi0+bGVuOw0KKwkJbmV0aWZfcngoc2tiKTsNCisJCXJlYWRf dW5sb2NrKCZpcDZpcDZfbG9jayk7DQorCQlyZXR1cm4gMDsNCisJfQ0KKwly ZWFkX3VubG9jaygmaXA2aXA2X2xvY2spOw0KKwlpY21wdjZfc2VuZChza2Is IElDTVBWNl9ERVNUX1VOUkVBQ0gsIElDTVBWNl9BRERSX1VOUkVBQ0gsIDAs IHNrYi0+ZGV2KTsNCitkaXNjYXJkOg0KKwlrZnJlZV9za2Ioc2tiKTsNCisJ cmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIHR4b3B0X2xlbiAtIGdldCBu ZWNlc3Nhcnkgc2l6ZSBmb3IgbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMN CisgKiAgIEBvcmlnX29wdDogb2xkIG9wdGlvbnMNCisgKg0KKyAqIFJldHVy bjoNCisgKiAgIFNpemUgb2Ygb2xkIG9uZSBwbHVzIHNpemUgb2YgdHVubmVs IGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICoqLw0KKw0KK3N0YXRp YyBpbmxpbmUgaW50DQordHhvcHRfbGVuKHN0cnVjdCBpcHY2X3R4b3B0aW9u cyAqb3JpZ19vcHQpDQorew0KKwlpbnQgbGVuID0gc2l6ZW9mICgqb3JpZ19v cHQpICsgODsNCisNCisJaWYgKG9yaWdfb3B0ICYmIG9yaWdfb3B0LT5kc3Qw b3B0KQ0KKwkJbGVuICs9IGlwdjZfb3B0bGVuKG9yaWdfb3B0LT5kc3Qwb3B0 KTsNCisJcmV0dXJuIGxlbjsNCit9DQorDQorLyoqDQorICogbWVyZ2Vfb3B0 aW9ucyAtIGFkZCBlbmNhcHN1bGF0aW9uIGxpbWl0IHRvIG9yaWdpbmFsIG9w dGlvbnMNCisgKiAgIEBlbmNhcF9saW1pdDogbnVtYmVyIG9mIGFsbG93ZWQg ZW5jYXBzdWxhdGlvbiBsaW1pdHMNCisgKiAgIEBvcmlnX29wdDogb3JpZ2lu YWwgb3B0aW9ucw0KKyAqIA0KKyAqIFJldHVybjoNCisgKiAgIFBvaW50ZXIg dG8gbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgY29udGFpbmluZyB0aGUg dHVubmVsDQorICogICBlbmNhcHN1bGF0aW9uIGxpbWl0DQorICoqLw0KKw0K K3N0YXRpYyBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKg0KK21lcmdlX29wdGlv bnMoc3RydWN0IHNvY2sgKnNrLCBfX3U4IGVuY2FwX2xpbWl0LA0KKwkgICAg ICBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9yaWdfb3B0KQ0KK3sNCisJc3Ry dWN0IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCXN0cnVjdCBpcHY2 X3R4b3B0aW9ucyAqb3B0Ow0KKwlfX3U4ICpyYXc7DQorCV9fdTggcGFkX3Rv ID0gODsNCisJaW50IG9wdF9sZW4gPSB0eG9wdF9sZW4ob3JpZ19vcHQpOw0K Kw0KKwlpZiAoIShvcHQgPSBzb2NrX2ttYWxsb2Moc2ssIG9wdF9sZW4sIEdG UF9BVE9NSUMpKSkgew0KKwkJcmV0dXJuIE5VTEw7DQorCX0NCisNCisJbWVt c2V0KG9wdCwgMCwgb3B0X2xlbik7DQorCW9wdC0+dG90X2xlbiA9IG9wdF9s ZW47DQorCW9wdC0+ZHN0MG9wdCA9IChzdHJ1Y3QgaXB2Nl9vcHRfaGRyICop IChvcHQgKyAxKTsNCisJb3B0LT5vcHRfbmZsZW4gPSA4Ow0KKw0KKwlyYXcg PSAoX191OCAqKSBvcHQtPmRzdDBvcHQ7DQorDQorCXRlbCA9IChzdHJ1Y3Qg aXB2Nl90bHZfdG5sX2VuY19saW0gKikgKG9wdC0+ZHN0MG9wdCArIDEpOw0K Kwl0ZWwtPnR5cGUgPSBJUFY2X1RMVl9UTkxfRU5DQVBfTElNSVQ7DQorCXRl bC0+bGVuZ3RoID0gMTsNCisJdGVsLT5lbmNhcF9saW1pdCA9IGVuY2FwX2xp bWl0Ow0KKw0KKwlpZiAob3JpZ19vcHQpIHsNCisJCV9fdTggKm9yaWdfcmF3 Ow0KKw0KKwkJb3B0LT5ob3BvcHQgPSBvcmlnX29wdC0+aG9wb3B0Ow0KKw0K KwkJLyogS2VlcCB0aGUgb3JpZ2luYWwgZGVzdGluYXRpb24gb3B0aW9ucyBw cm9wZXJseQ0KKwkJICAgYWxpZ25lZCBhbmQgbWVyZ2UgcG9zc2libGUgb2xk IHBhZGRpbmdzIHRvIHRoZQ0KKwkJICAgbmV3IHBhZGRpbmcgb3B0aW9uICov DQorCQlpZiAoKG9yaWdfcmF3ID0gKF9fdTggKikgb3JpZ19vcHQtPmRzdDBv cHQpICE9IE5VTEwpIHsNCisJCQlfX3U4IHR5cGU7DQorCQkJaW50IGkgPSBz aXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXBhZF90byArPSBz aXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXdoaWxlIChpIDwg aXB2Nl9vcHRsZW4ob3JpZ19vcHQtPmRzdDBvcHQpKSB7DQorCQkJCXR5cGUg PSBvcmlnX3Jhd1tpKytdOw0KKwkJCQlpZiAodHlwZSA9PSBJUFY2X1RMVl9Q QUQwKQ0KKwkJCQkJcGFkX3RvKys7DQorCQkJCWVsc2UgaWYgKHR5cGUgPT0g SVBWNl9UTFZfUEFETikgew0KKwkJCQkJaW50IGxlbiA9IG9yaWdfcmF3W2kr K107DQorCQkJCQlpICs9IGxlbjsNCisJCQkJCXBhZF90byArPSBsZW4gKyAy Ow0KKwkJCQl9IGVsc2Ugew0KKwkJCQkJYnJlYWs7DQorCQkJCX0NCisJCQl9 DQorCQkJb3B0LT5kc3Qwb3B0LT5oZHJsZW4gPSBvcmlnX29wdC0+ZHN0MG9w dC0+aGRybGVuICsgMTsNCisJCQltZW1jcHkocmF3ICsgcGFkX3RvLCBvcmln X3JhdyArIHBhZF90byAtIDgsDQorCQkJICAgICAgIG9wdF9sZW4gLSBzaXpl b2YgKCpvcHQpIC0gcGFkX3RvKTsNCisJCX0NCisJCW9wdC0+c3JjcnQgPSBv cmlnX29wdC0+c3JjcnQ7DQorCQlvcHQtPm9wdF9uZmxlbiArPSBvcmlnX29w dC0+b3B0X25mbGVuOw0KKw0KKwkJb3B0LT5kc3Qxb3B0ID0gb3JpZ19vcHQt PmRzdDFvcHQ7DQorCQlvcHQtPmF1dGggPSBvcmlnX29wdC0+YXV0aDsNCisJ CW9wdC0+b3B0X2ZsZW4gPSBvcmlnX29wdC0+b3B0X2ZsZW47DQorCX0NCisJ cmF3WzVdID0gSVBWNl9UTFZfUEFETjsNCisNCisJLyogc3VidHJhY3QgbGVu Z3RocyBvZiBkZXN0aW5hdGlvbiBzdWJvcHRpb24gaGVhZGVyLA0KKwkgICB0 dW5uZWwgZW5jYXBzdWxhdGlvbiBsaW1pdCBhbmQgcGFkIE4gaGVhZGVyICov DQorCXJhd1s2XSA9IHBhZF90byAtIDc7DQorDQorCXJldHVybiBvcHQ7DQor fQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfYWRkcl9jb25mbGljdCAtIGNv bXBhcmUgcGFja2V0IGFkZHJlc3NlcyB0byB0dW5uZWwncyBvd24NCisgKiAg IEB0OiB0aGUgb3V0Z29pbmcgdHVubmVsIGRldmljZQ0KKyAqICAgQGhkcjog SVB2NiBoZWFkZXIgZnJvbSB0aGUgaW5jb21pbmcgcGFja2V0IA0KKyAqDQor ICogRGVzY3JpcHRpb246DQorICogICBBdm9pZCB0cml2aWFsIHR1bm5lbGlu ZyBsb29wIGJ5IGNoZWNraW5nIHRoYXQgdHVubmVsIGV4aXQtcG9pbnQgDQor ICogICBkb2Vzbid0IG1hdGNoIHNvdXJjZSBvZiBpbmNvbWluZyBwYWNrZXQu DQorICoNCisgKiBSZXR1cm46IA0KKyAqICAgMSBpZiBjb25mbGljdCwNCisg KiAgIDAgZWxzZQ0KKyAqKi8NCisNCitzdGF0aWMgaW5saW5lIGludA0KK2lw NmlwNl90bmxfYWRkcl9jb25mbGljdChzdHJ1Y3QgaXA2X3RubCAqdCwgc3Ry dWN0IGlwdjZoZHIgKmhkcikNCit7DQorCXJldHVybiAhaXB2Nl9hZGRyX2Nt cCgmdC0+cGFybXMucmFkZHIsICZoZHItPnNhZGRyKTsNCit9DQorDQorLyoq DQorICogaXA2aXA2X3RubF94bWl0IC0gZW5jYXBzdWxhdGUgcGFja2V0IGFu ZCBzZW5kIA0KKyAqICAgQHNrYjogdGhlIG91dGdvaW5nIHNvY2tldCBidWZm ZXINCisgKiAgIEBkZXY6IHRoZSBvdXRnb2luZyB0dW5uZWwgZGV2aWNlIA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBCdWlsZCBuZXcgaGVhZGVy IGFuZCBkbyBzb21lIHNhbml0eSBjaGVja3Mgb24gdGhlIHBhY2tldCBiZWZv cmUgc2VuZGluZw0KKyAqICAgaXQgdG8gaXA2X2J1aWxkX3htaXQoKS4NCisg Kg0KKyAqIFJldHVybjogDQorICogICAwDQorICoqLw0KKw0KK2ludCBpcDZp cDZfdG5sX3htaXQoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3RydWN0IG5ldF9k ZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0cnVj dCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJc3RydWN0IG5ldF9kZXZpY2Vf c3RhdHMgKnN0YXRzID0gJnQtPnN0YXQ7DQorCXN0cnVjdCBpcHY2aGRyICpp cHY2aCA9IHNrYi0+bmguaXB2Nmg7DQorCXN0cnVjdCBpcHY2X3R4b3B0aW9u cyAqb3JpZ19vcHQgPSBOVUxMOw0KKwlzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMg Km9wdCA9IE5VTEw7DQorCV9fdTggZW5jYXBfbGltaXQgPSAwOw0KKwlfX3Ux NiBvZmZzZXQ7DQorCXN0cnVjdCBmbG93aSBmbDsNCisJc3RydWN0IGlwNl9m bG93bGFiZWwgKmZsX2xibCA9IE5VTEw7DQorCWludCBlcnIgPSAwOw0KKwlz dHJ1Y3QgZHN0X2VudHJ5ICpkc3Q7DQorCWludCBsaW5rX2ZhaWx1cmUgPSAw Ow0KKwlzdHJ1Y3Qgc29jayAqc2sgPSBpcDZfc29ja2V0LT5zazsNCisJc3Ry dWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOw0KKwlpbnQgbXR1 Ow0KKw0KKwlpZiAodC0+cmVjdXJzaW9uKyspIHsNCisJCXN0YXRzLT5jb2xs aXNpb25zKys7DQorCQlnb3RvIHR4X2VycjsNCisJfQ0KKwlpZiAoc2tiLT5w cm90b2NvbCAhPSBodG9ucyhFVEhfUF9JUFY2KSB8fA0KKwkgICAgISh0LT5w YXJtcy5mbGFncyAmIElQNl9UTkxfRl9DQVBfWE1JVCkgfHwNCisJICAgIGlw NmlwNl90bmxfYWRkcl9jb25mbGljdCh0LCBpcHY2aCkpIHsNCisJCWdvdG8g dHhfZXJyOw0KKwl9DQorCWlmICgob2Zmc2V0ID0gcGFyc2VfdGx2X3RubF9l bmNfbGltKHNrYiwgc2tiLT5uaC5yYXcpKSA+IDApIHsNCisJCXN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKwkJdGVsID0gKHN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqKSAmc2tiLT5uaC5yYXdbb2Zmc2V0XTsN CisJCWlmICh0ZWwtPmVuY2FwX2xpbWl0IDw9IDEpIHsNCisJCQlpY21wdjZf c2VuZChza2IsIElDTVBWNl9QQVJBTVBST0IsDQorCQkJCSAgICBJQ01QVjZf SERSX0ZJRUxELCBvZmZzZXQgKyAyLCBza2ItPmRldik7DQorCQkJZ290byB0 eF9lcnI7DQorCQl9DQorCQllbmNhcF9saW1pdCA9IHRlbC0+ZW5jYXBfbGlt aXQgLSAxOw0KKwl9IGVsc2UgaWYgKCEodC0+cGFybXMuZmxhZ3MgJiBJUDZf VE5MX0ZfSUdOX0VOQ0FQX0xJTUlUKSkgew0KKwkJZW5jYXBfbGltaXQgPSB0 LT5wYXJtcy5lbmNhcF9saW1pdDsNCisJfQ0KKwlpcDZfeG1pdF9sb2NrKCk7 DQorDQorCW1lbWNweSgmZmwsICZ0LT5mbCwgc2l6ZW9mIChmbCkpOw0KKw0K KwlpZiAoKHQtPnBhcm1zLmZsYWdzICYgSVA2X1ROTF9GX1VTRV9PUklHX1RD TEFTUykpDQorCQlmbC5mbDZfZmxvd2xhYmVsIHw9ICgqKF9fdTMyICopIGlw djZoICYgSVBWNl9UQ0xBU1NfTUFTSyk7DQorCWlmICgodC0+cGFybXMuZmxh Z3MgJiBJUDZfVE5MX0ZfVVNFX09SSUdfRkxPV0xBQkVMKSkNCisJCWZsLmZs Nl9mbG93bGFiZWwgfD0gKCooX191MzIgKikgaXB2NmggJiBJUFY2X0ZMT1dM QUJFTF9NQVNLKTsNCisNCisJaWYgKGZsLmZsNl9mbG93bGFiZWwpIHsNCisJ CWZsX2xibCA9IGZsNl9zb2NrX2xvb2t1cChzaywgZmwuZmw2X2Zsb3dsYWJl bCk7DQorCQlpZiAoZmxfbGJsKQ0KKwkJCW9yaWdfb3B0ID0gZmxfbGJsLT5v cHQ7DQorCX0NCisJaWYgKGVuY2FwX2xpbWl0ID4gMCkgew0KKwkJaWYgKCEo b3B0ID0gbWVyZ2Vfb3B0aW9ucyhzaywgZW5jYXBfbGltaXQsIG9yaWdfb3B0 KSkpIHsNCisJCQlnb3RvIHR4X2Vycl9mcmVlX2ZsX2xibDsNCisJCX0NCisJ fSBlbHNlIHsNCisJCW9wdCA9IG9yaWdfb3B0Ow0KKwl9DQorCWRzdCA9IF9f c2tfZHN0X2NoZWNrKHNrLCBucC0+ZHN0X2Nvb2tpZSk7DQorDQorCWlmIChk c3QpIHsNCisJCWlmIChucC0+ZGFkZHJfY2FjaGUgPT0gTlVMTCB8fA0KKwkJ ICAgIGlwdjZfYWRkcl9jbXAoJmZsLmZsNl9kc3QsIG5wLT5kYWRkcl9jYWNo ZSkgfHwNCisJCSAgICAoZmwub2lmICYmIGZsLm9pZiAhPSBkc3QtPmRldi0+ aWZpbmRleCkpIHsNCisJCQlkc3QgPSBOVUxMOw0KKwkJfQ0KKwl9DQorCWlm IChkc3QgPT0gTlVMTCkgew0KKwkJZHN0ID0gaXA2X3JvdXRlX291dHB1dChz aywgJmZsKTsNCisJCWlmIChkc3QtPmVycm9yKSB7DQorCQkJc3RhdHMtPnR4 X2NhcnJpZXJfZXJyb3JzKys7DQorCQkJbGlua19mYWlsdXJlID0gMTsNCisJ CQlnb3RvIHR4X2Vycl9kc3RfcmVsZWFzZTsNCisJCX0NCisJCS8qIGxvY2Fs IHJvdXRpbmcgbG9vcCAqLw0KKwkJaWYgKGRzdC0+ZGV2ID09IGRldikgew0K KwkJCXN0YXRzLT5jb2xsaXNpb25zKys7DQorCQkJaWYgKG5ldF9yYXRlbGlt aXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORyANCisJCQkJICAgICAg ICIlczogTG9jYWwgcm91dGluZyBsb29wIGRldGVjdGVkIVxuIiwNCisJCQkJ ICAgICAgIHQtPnBhcm1zLm5hbWUpOw0KKwkJCWdvdG8gdHhfZXJyX2RzdF9y ZWxlYXNlOw0KKwkJfQ0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5kYWRkciwg JmZsLmZsNl9kc3QpOw0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5zYWRkciwg JmZsLmZsNl9zcmMpOw0KKwl9DQorCW10dSA9IGRzdF9wbXR1KGRzdCkgLSBz aXplb2YgKCppcHY2aCk7DQorCWlmIChvcHQpIHsNCisJCW10dSAtPSAob3B0 LT5vcHRfbmZsZW4gKyBvcHQtPm9wdF9mbGVuKTsNCisJfQ0KKwlpZiAobXR1 IDwgSVBWNl9NSU5fTVRVKQ0KKwkJbXR1ID0gSVBWNl9NSU5fTVRVOw0KKwlp ZiAoc2tiLT5kc3QgJiYgbXR1IDwgZHN0X3BtdHUoc2tiLT5kc3QpKSB7DQor CQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gKHN0cnVjdCBydDZfaW5mbyAqKSBz a2ItPmRzdDsNCisJCXJ0LT5ydDZpX2ZsYWdzIHw9IFJURl9NT0RJRklFRDsN CisJCXJ0LT51LmRzdC5tZXRyaWNzW1JUQVhfTVRVLTFdID0gbXR1Ow0KKwl9 DQorCWlmIChza2ItPmxlbiA+IG10dSkgew0KKwkJaWNtcHY2X3NlbmQoc2ti LCBJQ01QVjZfUEtUX1RPT0JJRywgMCwgbXR1LCBkZXYpOw0KKwkJZ290byB0 eF9lcnJfb3B0X3JlbGVhc2U7DQorCX0NCisJZXJyID0gaXA2X2FwcGVuZF9k YXRhKHNrLCBpcF9nZW5lcmljX2dldGZyYWcsIHNrYi0+bmgucmF3LCBza2It PmxlbiwgMCwNCisJCQkgICAgICB0LT5wYXJtcy5ob3BfbGltaXQsIG9wdCwg JmZsLCANCisJCQkgICAgICAoc3RydWN0IHJ0Nl9pbmZvICopZHN0LCBNU0df RE9OVFdBSVQpOw0KKw0KKwlpZiAoZXJyKSB7DQorCQlpcDZfZmx1c2hfcGVu ZGluZ19mcmFtZXMoc2spOw0KKwl9IGVsc2Ugew0KKwkJZXJyID0gaXA2X3B1 c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwkJZXJyID0gKGVyciA8IDAgPyBl cnIgOiAwKTsNCisJfQ0KKwlpZiAoIWVycikgew0KKwkJc3RhdHMtPnR4X2J5 dGVzICs9IHNrYi0+bGVuOw0KKwkJc3RhdHMtPnR4X3BhY2tldHMrKzsNCisJ fSBlbHNlIHsNCisJCXN0YXRzLT50eF9lcnJvcnMrKzsNCisJCXN0YXRzLT50 eF9hYm9ydGVkX2Vycm9ycysrOw0KKwl9DQorCWlmIChvcHQgJiYgb3B0ICE9 IG9yaWdfb3B0KQ0KKwkJc29ja19rZnJlZV9zKHNrLCBvcHQsIG9wdC0+dG90 X2xlbik7DQorDQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2 X2RzdF9zdG9yZShzaywgZHN0LCAmbnAtPmRhZGRyKTsNCisJaXA2X3htaXRf dW5sb2NrKCk7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1cnNpb24t LTsNCisJcmV0dXJuIDA7DQordHhfZXJyX2RzdF9yZWxlYXNlOg0KKwlkc3Rf cmVsZWFzZShkc3QpOw0KK3R4X2Vycl9vcHRfcmVsZWFzZToNCisJaWYgKG9w dCAmJiBvcHQgIT0gb3JpZ19vcHQpDQorCQlzb2NrX2tmcmVlX3Moc2ssIG9w dCwgb3B0LT50b3RfbGVuKTsNCit0eF9lcnJfZnJlZV9mbF9sYmw6DQorCWZs Nl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2X3htaXRfdW5sb2NrKCk7 DQorCWlmIChsaW5rX2ZhaWx1cmUpDQorCQlkc3RfbGlua19mYWlsdXJlKHNr Yik7DQordHhfZXJyOg0KKwlzdGF0cy0+dHhfZXJyb3JzKys7DQorCXN0YXRz LT50eF9kcm9wcGVkKys7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1 cnNpb24tLTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlw Nl90bmxfc2V0X2NhcChzdHJ1Y3QgaXA2X3RubCAqdCkNCit7DQorCXN0cnVj dCBpcDZfdG5sX3Bhcm0gKnAgPSAmdC0+cGFybXM7DQorCXN0cnVjdCBpbjZf YWRkciAqbGFkZHIgPSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAq cmFkZHIgPSAmcC0+cmFkZHI7DQorCWludCBsdHlwZSA9IGlwdjZfYWRkcl90 eXBlKGxhZGRyKTsNCisJaW50IHJ0eXBlID0gaXB2Nl9hZGRyX3R5cGUocmFk ZHIpOw0KKw0KKwlwLT5mbGFncyAmPSB+KElQNl9UTkxfRl9DQVBfWE1JVHxJ UDZfVE5MX0ZfQ0FQX1JDVik7DQorDQorCWlmIChsdHlwZSAhPSBJUFY2X0FE RFJfQU5ZICYmIHJ0eXBlICE9IElQVjZfQUREUl9BTlkgJiYNCisJICAgICgo bHR5cGV8cnR5cGUpICYNCisJICAgICAoSVBWNl9BRERSX1VOSUNBU1R8DQor CSAgICAgIElQVjZfQUREUl9MT09QQkFDS3xJUFY2X0FERFJfTElOS0xPQ0FM fA0KKwkgICAgICBJUFY2X0FERFJfTUFQUEVEfElQVjZfQUREUl9SRVNFUlZF RCkpID09IElQVjZfQUREUl9VTklDQVNUKSB7DQorCQlzdHJ1Y3QgbmV0X2Rl dmljZSAqbGRldiA9IE5VTEw7DQorCQlpbnQgbF9vayA9IDE7DQorCQlpbnQg cl9vayA9IDE7DQorDQorCQlpZiAocC0+bGluaykNCisJCQlsZGV2ID0gZGV2 X2dldF9ieV9pbmRleChwLT5saW5rKTsNCisJCQ0KKwkJaWYgKChsdHlwZSZJ UFY2X0FERFJfVU5JQ0FTVCkgJiYgIWlwdjZfY2hrX2FkZHIobGFkZHIsIGxk ZXYpKQ0KKwkJCWxfb2sgPSAwOw0KKwkJDQorCQlpZiAoKHJ0eXBlJklQVjZf QUREUl9VTklDQVNUKSAmJiBpcHY2X2Noa19hZGRyKHJhZGRyLCBOVUxMKSkN CisJCQlyX29rID0gMDsNCisJCQ0KKwkJaWYgKGxfb2sgJiYgcl9vaykgew0K KwkJCWlmIChsdHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+Zmxh Z3MgfD0gSVA2X1ROTF9GX0NBUF9YTUlUOw0KKwkJCWlmIChydHlwZSZJUFY2 X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+ZmxhZ3MgfD0gSVA2X1ROTF9GX0NB UF9SQ1Y7DQorCQl9DQorCQlpZiAobGRldikNCisJCQlkZXZfcHV0KGxkZXYp Ow0KKwl9DQorfQ0KKw0KKw0KK3N0YXRpYyB2b2lkIGlwNmlwNl90bmxfbGlu a19jb25maWcoc3RydWN0IGlwNl90bmwgKnQpDQorew0KKwlzdHJ1Y3QgbmV0 X2RldmljZSAqZGV2ID0gdC0+ZGV2Ow0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJt ICpwID0gJnQtPnBhcm1zOw0KKwlzdHJ1Y3QgZmxvd2kgKmZsOw0KKwkvKiBT ZXQgdXAgZmxvd2kgdGVtcGxhdGUgKi8NCisJZmwgPSAmdC0+Zmw7DQorCWlw djZfYWRkcl9jb3B5KCZmbC0+Zmw2X3NyYywgJnAtPmxhZGRyKTsNCisJaXB2 Nl9hZGRyX2NvcHkoJmZsLT5mbDZfZHN0LCAmcC0+cmFkZHIpOw0KKwlmbC0+ b2lmID0gcC0+bGluazsNCisJZmwtPmZsNl9mbG93bGFiZWwgPSAwOw0KKw0K KwlpZiAoIShwLT5mbGFncyZJUDZfVE5MX0ZfVVNFX09SSUdfVENMQVNTKSkN CisJCWZsLT5mbDZfZmxvd2xhYmVsIHw9IElQVjZfVENMQVNTX01BU0sgJiBo dG9ubChwLT5mbG93aW5mbyk7DQorCWlmICghKHAtPmZsYWdzJklQNl9UTkxf Rl9VU0VfT1JJR19GTE9XTEFCRUwpKQ0KKwkJZmwtPmZsNl9mbG93bGFiZWwg fD0gSVBWNl9GTE9XTEFCRUxfTUFTSyAmIGh0b25sKHAtPmZsb3dpbmZvKTsN CisNCisJaXA2X3RubF9zZXRfY2FwKHQpOw0KKw0KKwlpZiAocC0+ZmxhZ3Mm SVA2X1ROTF9GX0NBUF9YTUlUICYmIHAtPmZsYWdzJklQNl9UTkxfRl9DQVBf UkNWKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7DQorCWVs c2UNCisJCWRldi0+ZmxhZ3MgJj0gfklGRl9QT0lOVE9QT0lOVDsNCisNCisJ aWYgKHAtPmZsYWdzICYgSVA2X1ROTF9GX0NBUF9YTUlUKSB7DQorCQlzdHJ1 Y3QgcnQ2X2luZm8gKnJ0ID0gcnQ2X2xvb2t1cCgmcC0+cmFkZHIsICZwLT5s YWRkciwNCisJCQkJCQkgcC0+bGluaywgMCk7DQorCQlpZiAocnQpIHsNCisJ CQlzdHJ1Y3QgbmV0X2RldmljZSAqcnRkZXY7DQorCQkJaWYgKCEocnRkZXYg PSBydC0+cnQ2aV9kZXYpIHx8DQorCQkJICAgIHJ0ZGV2LT50eXBlID09IEFS UEhSRF9UVU5ORUw2KSB7DQorCQkJCS8qIGFzIGxvbmcgYXMgdHVubmVscyB1 c2UgdGhlIHNhbWUgc29ja2V0IA0KKwkJCQkgICBmb3IgdHJhbnNtaXNzaW9u LCBsb2NhbGx5IG5lc3RlZCB0dW5uZWxzIA0KKwkJCQkgICB3b24ndCB3b3Jr ICovDQorCQkJCWRzdF9yZWxlYXNlKCZydC0+dS5kc3QpOw0KKwkJCQlnb3Rv IG5vX2xpbms7DQorCQkJfSBlbHNlIHsNCisJCQkJZGV2LT5pZmxpbmsgPSBy dGRldi0+aWZpbmRleDsNCisJCQkJZGV2LT5oYXJkX2hlYWRlcl9sZW4gPSBy dGRldi0+aGFyZF9oZWFkZXJfbGVuICsNCisJCQkJCXNpemVvZiAoc3RydWN0 IGlwdjZoZHIpOw0KKwkJCQlkZXYtPm10dSA9IHJ0ZGV2LT5tdHUgLSBzaXpl b2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCQkJaWYgKGRldi0+bXR1IDwgSVBW Nl9NSU5fTVRVKQ0KKwkJCQkJZGV2LT5tdHUgPSBJUFY2X01JTl9NVFU7DQor CQkJCQ0KKwkJCQlkc3RfcmVsZWFzZSgmcnQtPnUuZHN0KTsNCisJCQl9DQor CQl9DQorCX0gZWxzZSB7DQorCW5vX2xpbms6DQorCQlkZXYtPmlmbGluayA9 IDA7DQorCQlkZXYtPmhhcmRfaGVhZGVyX2xlbiA9IExMX01BWF9IRUFERVIg KyBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCWRldi0+bXR1ID0gRVRI X0RBVEFfTEVOIC0gc2l6ZW9mIChzdHJ1Y3QgaXB2Nmhkcik7DQorCX0NCit9 DQorDQorLyoqDQorICogaXA2aXA2X3RubF9jaGFuZ2UgLSB1cGRhdGUgdGhl IHR1bm5lbCBwYXJhbWV0ZXJzDQorICogICBAdDogdHVubmVsIHRvIGJlIGNo YW5nZWQNCisgKiAgIEBwOiB0dW5uZWwgY29uZmlndXJhdGlvbiBwYXJhbWV0 ZXJzDQorICogICBAYWN0aXZlOiAhPSAwIGlmIHR1bm5lbCBpcyByZWFkeSBm b3IgdXNlDQorICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIGlwNmlwNl90 bmxfY2hhbmdlKCkgdXBkYXRlcyB0aGUgdHVubmVsIHBhcmFtZXRlcnMNCisg KiovDQorDQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfY2hhbmdlKHN0cnVj dCBpcDZfdG5sICp0LCBzdHJ1Y3QgaXA2X3RubF9wYXJtICpwKQ0KK3sNCisJ aXB2Nl9hZGRyX2NvcHkoJnQtPnBhcm1zLmxhZGRyLCAmcC0+bGFkZHIpOw0K KwlpcHY2X2FkZHJfY29weSgmdC0+cGFybXMucmFkZHIsICZwLT5yYWRkcik7 DQorCXQtPnBhcm1zLmZsYWdzID0gcC0+ZmxhZ3M7DQorCXQtPnBhcm1zLmhv cF9saW1pdCA9IChwLT5ob3BfbGltaXQgPD0gMjU1ID8gcC0+aG9wX2xpbWl0 IDogLTEpOw0KKwl0LT5wYXJtcy5lbmNhcF9saW1pdCA9IHAtPmVuY2FwX2xp bWl0Ow0KKwl0LT5wYXJtcy5mbG93aW5mbyA9IHAtPmZsb3dpbmZvOw0KKwlp cDZpcDZfdG5sX2xpbmtfY29uZmlnKHQpOw0KKwlyZXR1cm4gMDsNCit9DQor DQorLyoqDQorICogaXA2aXA2X3RubF9pb2N0bCAtIGNvbmZpZ3VyZSBpcHY2 IHR1bm5lbHMgZnJvbSB1c2Vyc3BhY2UgDQorICogICBAZGV2OiB2aXJ0dWFs IGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICogICBAaWZyOiBw YXJhbWV0ZXJzIHBhc3NlZCBmcm9tIHVzZXJzcGFjZQ0KKyAqICAgQGNtZDog Y29tbWFuZCB0byBiZSBwZXJmb3JtZWQNCisgKg0KKyAqIERlc2NyaXB0aW9u Og0KKyAqICAgaXA2aXA2X3RubF9pb2N0bCgpIGlzIHVzZWQgZm9yIG1hbmFn aW5nIElQdjYgdHVubmVscyANCisgKiAgIGZyb20gdXNlcnNwYWNlLiANCisg Kg0KKyAqICAgVGhlIHBvc3NpYmxlIGNvbW1hbmRzIGFyZSB0aGUgZm9sbG93 aW5nOg0KKyAqICAgICAlU0lPQ0dFVFRVTk5FTDogZ2V0IHR1bm5lbCBwYXJh bWV0ZXJzIGZvciBkZXZpY2UNCisgKiAgICAgJVNJT0NBRERUVU5ORUw6IGFk ZCB0dW5uZWwgbWF0Y2hpbmcgZ2l2ZW4gdHVubmVsIHBhcmFtZXRlcnMNCisg KiAgICAgJVNJT0NDSEdUVU5ORUw6IGNoYW5nZSB0dW5uZWwgcGFyYW1ldGVy cyB0byB0aG9zZSBnaXZlbg0KKyAqICAgICAlU0lPQ0RFTFRVTk5FTDogZGVs ZXRlIHR1bm5lbA0KKyAqDQorICogICBUaGUgZmFsbGJhY2sgZGV2aWNlICJp cDZ0bmwwIiwgY3JlYXRlZCBkdXJpbmcgbW9kdWxlIA0KKyAqICAgaW5pdGlh bGl6YXRpb24sIGNhbiBiZSB1c2VkIGZvciBjcmVhdGluZyBvdGhlciB0dW5u ZWwgZGV2aWNlcy4NCisgKg0KKyAqIFJldHVybjoNCisgKiAgIDAgb24gc3Vj Y2VzcywNCisgKiAgICUtRUZBVUxUIGlmIHVuYWJsZSB0byBjb3B5IGRhdGEg dG8gb3IgZnJvbSB1c2Vyc3BhY2UsDQorICogICAlLUVQRVJNIGlmIGN1cnJl bnQgcHJvY2VzcyBoYXNuJ3QgJUNBUF9ORVRfQURNSU4gc2V0DQorICogICAl LUVJTlZBTCBpZiBwYXNzZWQgdHVubmVsIHBhcmFtZXRlcnMgYXJlIGludmFs aWQsDQorICogICAlLUVFWElTVCBpZiBjaGFuZ2luZyBhIHR1bm5lbCdzIHBh cmFtZXRlcnMgd291bGQgY2F1c2UgYSBjb25mbGljdA0KKyAqICAgJS1FTk9E RVYgaWYgYXR0ZW1wdGluZyB0byBjaGFuZ2Ugb3IgZGVsZXRlIGEgbm9uZXhp c3RpbmcgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZf dG5sX2lvY3RsKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYsIHN0cnVjdCBpZnJl cSAqaWZyLCBpbnQgY21kKQ0KK3sNCisJaW50IGVyciA9IDA7DQorCWludCBj cmVhdGU7DQorCXN0cnVjdCBpcDZfdG5sX3Bhcm0gcDsNCisJc3RydWN0IGlw Nl90bmwgKnQgPSBOVUxMOw0KKw0KKwlzd2l0Y2ggKGNtZCkgew0KKwljYXNl IFNJT0NHRVRUVU5ORUw6DQorCQlpZiAoZGV2ID09ICZpcDZpcDZfZmJfdG5s X2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21fdXNlcigmcCwNCisJCQkJCSAg IGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLA0KKwkJCQkJICAgc2l6ZW9mIChw KSkpIHsNCisJCQkJZXJyID0gLUVGQVVMVDsNCisJCQkJYnJlYWs7DQorCQkJ fQ0KKwkJCWlmICgoZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAsICZ0LCAw KSkgPT0gLUVOT0RFVikNCisJCQkJdCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBk ZXYtPnByaXY7DQorCQkJZWxzZSBpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJ CX0gZWxzZQ0KKwkJCXQgPSAoc3RydWN0IGlwNl90bmwgKikgZGV2LT5wcml2 Ow0KKw0KKwkJbWVtY3B5KCZwLCAmdC0+cGFybXMsIHNpemVvZiAocCkpOw0K KwkJaWYgKGNvcHlfdG9fdXNlcihpZnItPmlmcl9pZnJ1LmlmcnVfZGF0YSwg JnAsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJCX0N CisJCWJyZWFrOw0KKwljYXNlIFNJT0NBRERUVU5ORUw6DQorCWNhc2UgU0lP Q0NIR1RVTk5FTDoNCisJCWVyciA9IC1FUEVSTTsNCisJCWNyZWF0ZSA9IChj bWQgPT0gU0lPQ0FERFRVTk5FTCk7DQorCQlpZiAoIWNhcGFibGUoQ0FQX05F VF9BRE1JTikpDQorCQkJYnJlYWs7DQorCQlpZiAoY29weV9mcm9tX3VzZXIo JnAsIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLCBzaXplb2YgKHApKSkgew0K KwkJCWVyciA9IC1FRkFVTFQ7DQorCQkJYnJlYWs7DQorCQl9DQorCQlpZiAo IWNyZWF0ZSAmJiBkZXYgIT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQkJ dCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQlp ZiAoIXQgJiYgKGVyciA9IGlwNmlwNl90bmxfbG9jYXRlKCZwLCAmdCwgY3Jl YXRlKSkpIHsNCisJCQlicmVhazsNCisJCX0NCisJCWlmIChjbWQgPT0gU0lP Q0NIR1RVTk5FTCkgew0KKwkJCWlmICh0LT5kZXYgIT0gZGV2KSB7DQorCQkJ CWVyciA9IC1FRVhJU1Q7DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJCQlpcDZp cDZfdG5sX3VubGluayh0KTsNCisJCQllcnIgPSBpcDZpcDZfdG5sX2NoYW5n ZSh0LCAmcCk7DQorCQkJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkJCW5ldGRl dl9zdGF0ZV9jaGFuZ2UoZGV2KTsNCisJCX0NCisJCWlmIChjb3B5X3RvX3Vz ZXIoaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCSAmdC0+cGFybXMs IHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJCX0gZWxz ZSB7DQorCQkJZXJyID0gMDsNCisJCX0NCisJCWJyZWFrOw0KKwljYXNlIFNJ T0NERUxUVU5ORUw6DQorCQllcnIgPSAtRVBFUk07DQorCQlpZiAoIWNhcGFi bGUoQ0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorDQorCQlpZiAoZGV2 ID09ICZpcDZpcDZfZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21f dXNlcigmcCwgaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCQkgICBz aXplb2YgKHApKSkgew0KKwkJCQllcnIgPSAtRUZBVUxUOw0KKwkJCQlicmVh azsNCisJCQl9DQorCQkJZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAsICZ0 LCAwKTsNCisJCQlpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJCQlpZiAodCA9 PSAmaXA2aXA2X2ZiX3RubCkgew0KKwkJCQllcnIgPSAtRVBFUk07DQorCQkJ CWJyZWFrOw0KKwkJCX0NCisJCX0gZWxzZSB7DQorCQkJdCA9IChzdHJ1Y3Qg aXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQllcnIgPSBpcDZfdG5s X2Rlc3Ryb3kodCk7DQorCQlicmVhazsNCisJZGVmYXVsdDoNCisJCWVyciA9 IC1FSU5WQUw7DQorCX0NCisJcmV0dXJuIGVycjsNCit9DQorDQorLyoqDQor ICogaXA2aXA2X3RubF9nZXRfc3RhdHMgLSByZXR1cm4gdGhlIHN0YXRzIGZv ciB0dW5uZWwgZGV2aWNlIA0KKyAqICAgQGRldjogdmlydHVhbCBkZXZpY2Ug YXNzb2NpYXRlZCB3aXRoIHR1bm5lbA0KKyAqDQorICogUmV0dXJuOiBzdGF0 cyBmb3IgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBzdHJ1Y3QgbmV0X2Rl dmljZV9zdGF0cyAqDQoraXA2aXA2X3RubF9nZXRfc3RhdHMoc3RydWN0IG5l dF9kZXZpY2UgKmRldikNCit7DQorCXJldHVybiAmKCgoc3RydWN0IGlwNl90 bmwgKikgZGV2LT5wcml2KS0+c3RhdCk7DQorfQ0KKw0KKy8qKg0KKyAqIGlw NmlwNl90bmxfY2hhbmdlX210dSAtIGNoYW5nZSBtdHUgbWFudWFsbHkgZm9y IHR1bm5lbCBkZXZpY2UNCisgKiAgIEBkZXY6IHZpcnR1YWwgZGV2aWNlIGFz c29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiAgIEBuZXdfbXR1OiB0aGUgbmV3 IG10dQ0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgMCBvbiBzdWNjZXNzLA0K KyAqICAgJS1FSU5WQUwgaWYgbXR1IHRvbyBzbWFsbA0KKyAqKi8NCisNCitz dGF0aWMgaW50DQoraXA2aXA2X3RubF9jaGFuZ2VfbXR1KHN0cnVjdCBuZXRf ZGV2aWNlICpkZXYsIGludCBuZXdfbXR1KQ0KK3sNCisJaWYgKG5ld19tdHUg PCBJUFY2X01JTl9NVFUpIHsNCisJCXJldHVybiAtRUlOVkFMOw0KKwl9DQor CWRldi0+bXR1ID0gbmV3X210dTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KKy8q Kg0KKyAqIGlwNmlwNl90bmxfZGV2X2luaXRfZ2VuIC0gZ2VuZXJhbCBpbml0 aWFsaXplciBmb3IgYWxsIHR1bm5lbCBkZXZpY2VzDQorICogICBAZGV2OiB2 aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICoNCisg KiBEZXNjcmlwdGlvbjoNCisgKiAgIFNldCBmdW5jdGlvbiBwb2ludGVycyBh bmQgaW5pdGlhbGl6ZSB0aGUgJnN0cnVjdCBmbG93aSB0ZW1wbGF0ZSB1c2Vk DQorICogICBieSB0aGUgdHVubmVsLg0KKyAqKi8NCisNCitzdGF0aWMgdm9p ZA0KK2lwNmlwNl90bmxfZGV2X2luaXRfZ2VuKHN0cnVjdCBuZXRfZGV2aWNl ICpkZXYpDQorew0KKwlzdHJ1Y3QgaXA2X3RubCAqdCA9IChzdHJ1Y3QgaXA2 X3RubCAqKSBkZXYtPnByaXY7DQorCXN0cnVjdCBmbG93aSAqZmwgPSAmdC0+ Zmw7DQorDQorCW1lbXNldChmbCwgMCwgc2l6ZW9mICgqZmwpKTsNCisJZmwt PnByb3RvID0gSVBQUk9UT19JUFY2Ow0KKw0KKwlkZXYtPmRlc3RydWN0b3Ig PSBpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yOw0KKwlkZXYtPnVuaW5pdCA9 IGlwNmlwNl90bmxfZGV2X3VuaW5pdDsNCisJZGV2LT5oYXJkX3N0YXJ0X3ht aXQgPSBpcDZpcDZfdG5sX3htaXQ7DQorCWRldi0+Z2V0X3N0YXRzID0gaXA2 aXA2X3RubF9nZXRfc3RhdHM7DQorCWRldi0+ZG9faW9jdGwgPSBpcDZpcDZf dG5sX2lvY3RsOw0KKwlkZXYtPmNoYW5nZV9tdHUgPSBpcDZpcDZfdG5sX2No YW5nZV9tdHU7DQorCWRldi0+dHlwZSA9IEFSUEhSRF9UVU5ORUw2Ow0KKwlk ZXYtPmZsYWdzIHw9IElGRl9OT0FSUDsNCisJaWYgKGlwdjZfYWRkcl90eXBl KCZ0LT5wYXJtcy5yYWRkcikgJiBJUFY2X0FERFJfVU5JQ0FTVCAmJg0KKwkg ICAgaXB2Nl9hZGRyX3R5cGUoJnQtPnBhcm1zLmxhZGRyKSAmIElQVjZfQURE Ul9VTklDQVNUKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7 DQorCS8qIEhtbS4uLiBNQVhfQUREUl9MRU4gaXMgOCwgc28gdGhlIGlwdjYg YWRkcmVzc2VzIGNhbid0IGJlIA0KKwkgICBjb3BpZWQgdG8gZGV2LT5kZXZf YWRkciBhbmQgZGV2LT5icm9hZGNhc3QsIGxpa2UgdGhlIGlwdjQNCisJICAg YWRkcmVzc2VzIHdlcmUgaW4gaXBpcC5jLCBpcF9ncmUuYyBhbmQgc2l0LmMu ICovDQorCWRldi0+YWRkcl9sZW4gPSAwOw0KK30NCisNCisvKioNCisgKiBp cDZpcDZfdG5sX2Rldl9pbml0IC0gaW5pdGlhbGl6ZXIgZm9yIGFsbCBub24g ZmFsbGJhY2sgdHVubmVsIGRldmljZXMNCisgKiAgIEBkZXY6IHZpcnR1YWwg ZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiovDQorDQorc3Rh dGljIGludA0KK2lwNmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZp Y2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0cnVjdCBp cDZfdG5sICopIGRldi0+cHJpdjsNCisJaXA2aXA2X3RubF9kZXZfaW5pdF9n ZW4oZGV2KTsNCisJaXA2aXA2X3RubF9saW5rX2NvbmZpZyh0KTsNCisJcmV0 dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9mYl90bmxfZGV2X2lu aXQgLSBpbml0aWFsaXplciBmb3IgZmFsbGJhY2sgdHVubmVsIGRldmljZQ0K KyAqICAgQGRldjogZmFsbGJhY2sgZGV2aWNlDQorICoNCisgKiBSZXR1cm46 IDANCisgKiovDQorDQoraW50IGlwNmlwNl9mYl90bmxfZGV2X2luaXQoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCWlwNmlwNl90bmxfZGV2X2lu aXRfZ2VuKGRldik7DQorCXRubHNfd2NbMF0gPSAmaXA2aXA2X2ZiX3RubDsN CisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyBzdHJ1Y3QgaW5ldDZfcHJv dG9jb2wgaXA2aXA2X3Byb3RvY29sID0gew0KKwkuaGFuZGxlciA9IGlwNmlw Nl9yY3YsDQorCS5lcnJfaGFuZGxlciA9IGlwNmlwNl9lcnIsDQorCS5mbGFn cyA9IElORVQ2X1BST1RPX0ZJTkFMDQorfTsNCisNCisvKioNCisgKiBpcDZf dHVubmVsX2luaXQgLSByZWdpc3RlciBwcm90b2NvbCBhbmQgcmVzZXJ2ZSBu ZWVkZWQgcmVzb3VyY2VzDQorICoNCisgKiBSZXR1cm46IDAgb24gc3VjY2Vz cw0KKyAqKi8NCisNCitpbnQgX19pbml0IGlwNl90dW5uZWxfaW5pdCh2b2lk KQ0KK3sNCisJaW50IGksIGosIGVycjsNCisJc3RydWN0IHNvY2sgKnNrOw0K KwlzdHJ1Y3QgaXB2Nl9waW5mbyAqbnA7DQorDQorCWlwNmlwNl9mYl90bmxf ZGV2LnByaXYgPSAodm9pZCAqKSAmaXA2aXA2X2ZiX3RubDsNCisNCisJZm9y IChpID0gMDsgaSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFjcHVfcG9z c2libGUoaSkpDQorCQkJY29udGludWU7DQorDQorCQllcnIgPSBzb2NrX2Ny ZWF0ZShQRl9JTkVUNiwgU09DS19SQVcsIElQUFJPVE9fSVBWNiwgDQorCQkJ CSAgJl9faXA2X3NvY2tldFtpXSk7DQorCQlpZiAoZXJyIDwgMCkgew0KKwkJ CXByaW50ayhLRVJOX0VSUiANCisJCQkgICAgICAgIkZhaWxlZCB0byBjcmVh dGUgdGhlIElQdjYgdHVubmVsIHNvY2tldCAiDQorCQkJICAgICAgICIoZXJy ICVkKS5cbiIsIA0KKwkJCSAgICAgICBlcnIpOw0KKwkJCWdvdG8gZmFpbDsN CisJCX0NCisJCXNrID0gX19pcDZfc29ja2V0W2ldLT5zazsNCisJCXNrLT5h bGxvY2F0aW9uID0gR0ZQX0FUT01JQzsNCisNCisJCW5wID0gaW5ldDZfc2so c2spOw0KKwkJbnAtPmhvcF9saW1pdCA9IDI1NTsNCisJCW5wLT5tY19sb29w ID0gMDsNCisNCisJCXNrLT5wcm90LT51bmhhc2goc2spOw0KKwl9DQorCWlm ICgoZXJyID0gaW5ldDZfYWRkX3Byb3RvY29sKCZpcDZpcDZfcHJvdG9jb2ws IElQUFJPVE9fSVBWNikpIDwgMCkgew0KKwkJcHJpbnRrKEtFUk5fRVJSICJG YWlsZWQgdG8gcmVnaXN0ZXIgSVB2NiBwcm90b2NvbFxuIik7DQorCQlnb3Rv IGZhaWw7DQorCX0NCisNCisJU0VUX01PRFVMRV9PV05FUigmaXA2aXA2X2Zi X3RubF9kZXYpOw0KKwlyZWdpc3Rlcl9uZXRkZXYoJmlwNmlwNl9mYl90bmxf ZGV2KTsNCisNCisJcmV0dXJuIDA7DQorZmFpbDoNCisJZm9yIChqID0gMDsg aiA8IGk7IGorKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaikpDQorCQkJ Y29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0W2pdKTsN CisJCV9faXA2X3NvY2tldFtqXSA9IE5VTEw7DQorCX0NCisJcmV0dXJuIGVy cjsNCit9DQorDQorLyoqDQorICogaXA2X3R1bm5lbF9jbGVhbnVwIC0gZnJl ZSByZXNvdXJjZXMgYW5kIHVucmVnaXN0ZXIgcHJvdG9jb2wNCisgKiovDQor DQordm9pZCBpcDZfdHVubmVsX2NsZWFudXAodm9pZCkNCit7DQorCWludCBp Ow0KKw0KKwl1bnJlZ2lzdGVyX25ldGRldigmaXA2aXA2X2ZiX3RubF9kZXYp Ow0KKw0KKwlpbmV0Nl9kZWxfcHJvdG9jb2woJmlwNmlwNl9wcm90b2NvbCwg SVBQUk9UT19JUFY2KTsNCisNCisJZm9yIChpID0gMDsgaSA8IE5SX0NQVVM7 IGkrKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaSkpDQorCQkJY29udGlu dWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0W2ldKTsNCisJCV9f aXA2X3NvY2tldFtpXSA9IE5VTEw7DQorCX0NCit9DQorDQorI2lmZGVmIE1P RFVMRQ0KK21vZHVsZV9pbml0KGlwNl90dW5uZWxfaW5pdCk7DQorbW9kdWxl X2V4aXQoaXA2X3R1bm5lbF9jbGVhbnVwKTsNCisjZW5kaWYNCmRpZmYgLU51 ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhjbHVk ZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L2lwdjZfc3ltcy5jIG1l cmdlLTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYw0KLS0tIGxpbnV4LTIuNS9u ZXQvaXB2Ni9pcHY2X3N5bXMuYwlXZWQgSnVuICA0IDEzOjQzOjA5IDIwMDMN CisrKyBtZXJnZS0yLjUvbmV0L2lwdjYvaXB2Nl9zeW1zLmMJV2VkIE1heSAy OCAyMToxMjowMiAyMDAzDQpAQCAtMzgsMyArMzgsMTEgQEANCiBFWFBPUlRf U1lNQk9MKGlwNl9mb3VuZF9uZXh0aGRyKTsNCiBFWFBPUlRfU1lNQk9MKHhm cm02X3Jjdik7DQogRVhQT1JUX1NZTUJPTCh4ZnJtNl9jbGVhcl9tdXRhYmxl X29wdGlvbnMpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUxfTU9EVUxF DQorRVhQT1JUX1NZTUJPTChydDZfbG9va3VwKTsNCitFWFBPUlRfU1lNQk9M KGZsNl9zb2NrX2xvb2t1cCk7DQorRVhQT1JUX1NZTUJPTChpcHY2X2V4dF9o ZHIpOw0KK0VYUE9SVF9TWU1CT0woaXA2X2FwcGVuZF9kYXRhKTsNCitFWFBP UlRfU1lNQk9MKGlwNl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyk7DQorRVhQT1JU X1NZTUJPTChpcDZfcHVzaF9wZW5kaW5nX2ZyYW1lcyk7DQorI2VuZGlmDQpk aWZmIC1OdXIgLS1leGNsdWRlPVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAt LWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4LTIuNS9uZXQvbmV0c3ltcy5jIG1l cmdlLTIuNS9uZXQvbmV0c3ltcy5jDQotLS0gbGludXgtMi41L25ldC9uZXRz eW1zLmMJV2VkIEp1biAgNCAxMzo0MzoxMCAyMDAzDQorKysgbWVyZ2UtMi41 L25ldC9uZXRzeW1zLmMJV2VkIE1heSAyOCAyMToxMjowMiAyMDAzDQpAQCAt NDc3LDggKzQ3NywxMCBAQA0KIEVYUE9SVF9TWU1CT0woc3lzY3RsX21heF9z eW5fYmFja2xvZyk7DQogI2VuZGlmDQogDQotRVhQT1JUX1NZTUJPTChpcF9n ZW5lcmljX2dldGZyYWcpOw0KKyNlbmRpZg0KIA0KKyNpZiBkZWZpbmVkIChD T05GSUdfSVBWNl9NT0RVTEUpIHx8IGRlZmluZWQgKENPTkZJR19JUF9TQ1RQ X01PRFVMRSkgfHwgZGVmaW5lZCAoQ09ORklHX0lQVjZfVFVOTkVMX01PRFVM RSkNCitFWFBPUlRfU1lNQk9MKGlwX2dlbmVyaWNfZ2V0ZnJhZyk7DQogI2Vu ZGlmDQogDQogRVhQT1JUX1NZTUJPTCh0Y3BfcmVhZF9zb2NrKTsNCg== ---377318441-1269789112-1054730402=:26066-- From lpetande@tml.hut.fi Wed Jun 4 07:23:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 07:23:58 -0700 (PDT) Received: from smtp-4.hut.fi (root@smtp-4.hut.fi [130.233.228.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54ENM2x003731 for ; Wed, 4 Jun 2003 07:23:43 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-4.hut.fi (8.12.9/8.12.9) with ESMTP id h54EMgDD011293; Wed, 4 Jun 2003 17:22:42 +0300 Message-ID: <3EDE0286.4000304@tml.hut.fi> Date: Wed, 04 Jun 2003 17:30:30 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> <20030603.213830.85382657.davem@redhat.com> In-Reply-To: <20030603.213830.85382657.davem@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-4.hut.fi) X-DCC-HUTCC-Metrics: smtp-4.hut.fi 1165; Body=9 Fuz1=9 Fuz2=9 X-archive-position: 2877 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Content-Length: 3566 Lines: 81 David S. Miller wrote: > I am VERY UPSET that there appears to be NO dialogue between USAGI and > MIPV6 folks to discuss design of MIPV6. If you do not talk together, > how can you guys possibly coordinate efforts and not avoid duplicated > work? I am sorry about the long delay in the extension header addition mechanism discussion. The issues involved with interactions between xfrm stacked destinations and mipv6 were at least to me somewhat fuzzy. To understand them better we developed a prototype of the mipv6 extension header addition and processing. This unfortunately took too long, due to other work. In any case, I hope we can all work better together from now on. In hopes of starting a working dialogue, I'll try to summarize the current situation on our side. 1. Tunnel The tunneling code should be ready, Ville just sent a patch without dependancies on source address based routing. All received and future comments about that are highly appreciated. 2. Source address based routing Ville sent the patch. The semantical changes to the original code were in our opinion necessary to get source address based routing working more as IPv4 policy routing. Let's discuss this more. 3. API We've added kernel support for the API to accept routing header type 2 and to do the additional checks necessary. Also home address option API support has been written. In fact, you can add any destination option to the new DO position with this. There is a (sub)group working on MIPv6 extensions to the Advanced Socket API for IPv6. To me it seems pointless to add anything other to the spec than a way to insert a destination option header to the third possible DO position (i.e. between routing header and fragmentation header). This could be done just by adding new type, let's call it IPV6_NOFRAGDSTOPTS. Everything else should be doable with the existing ASA. We would like to hear comments on this. Is a rtnetlink extension enough for adding mobility routes or do we need to support ioctl too? 4. Source address selection We think adding new home address flag to addresses is the best and easiest way making the source address selection to work with MIPv6. I'm sure USAGI will add the relevant checks to their source address selection code for that. Dave, Antti already brought this up some weeks ago, but got no answer. Is the home address bit OK with you? 5. MIPv6 extension header adding We have been also testing how the mipv6 extension header adding would work in practice through the development of a prototype for the purpose. Based on the work it seems (to me) that the use of xfrm for storing the mipv6 stuff conflicts with its primary use, especially if there are overlapping entries for IPSec and MIPv6. Storing of the mipv6 information would in our opinion be achieved more cleanly by using cached routes which included the mipv6 information (two extra addresses and flags). The routes would contain modified nexthop information and mip6_output as the rt->u.dst.output function. Mip6_output would add the extension headers based on the information stored in the route. The routes would have a stacked dst entry, which would be used for actual output. Our prototype currently works with tcp, tcp + ipsec and raw sockets, but has only a hackish interface through route ioctl for testing. I can send a preview patch of the code for discussion, if the general approach makes sense to you. I would like to hear your opinions on this and also if you (USAGI) have planned something else for storing the mipv6 state. Henrik From peter@bieringer.de Wed Jun 4 08:24:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:24:37 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54FOH2x008182 for ; Wed, 4 Jun 2003 08:24:18 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 30110) id 88F0D1387A; Wed, 4 Jun 2003 16:53:50 +0200 (CEST) From: "Dr. Peter Bieringer " To: "Maillist netdev" Cc: "Maillist USAGI-users" Subject: Compatibility problems IPsec 2.5.70 against FreeS/WAN 1.99 Date: Wed, 04 Jun 2003 16:53:50 +0200 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <20030604145350.88F0D1387A@smtp2.aerasec.de> X-archive-position: 2878 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Content-Length: 1275 Lines: 38 Hi, has anyone successful examples of configuration settings for 2.5.70 IPsec (racoon/SAD/SPD) and FreeS/WAN? I got no success between 2 hosts, neither in tunnel nor in transport mode. (racoon and pluto config looks like ok, the IPsec-SA was proper established, also both hosts send packets with related spi). In transport mode, the comment of Andreas came true that in the ESP packet an IP-in-IP tunnel packet is transported (sent from the 2.5.70-ipsec host): 16:42:06.215546 [|ip] 0x0000 45 E 16:42:08.215348 [|ip] 0x0000 4500 0007 0004 40 E.....@ Looks like FreeS/WAN don't like this. In tunnel mode, ipsec0 interface of FreeS/WAN drops all received packages by the 2.5.70-ipsec host (seen in ifconfig stat). On 2.5.70-ipsec side I currently don't know how to debug, but I only see the ESP packet on the interface, nothing decrpyted. Very strange at all... Any hints available how to let FreeS/WAN communicate with 2.5.70-ipsec? Thank you very much, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From peter@bieringer.de Wed Jun 4 08:40:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:40:15 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Fe92x008892 for ; Wed, 4 Jun 2003 08:40:10 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 30110) id 5AA111387A; Wed, 4 Jun 2003 17:40:03 +0200 (CEST) From: "Dr. Peter Bieringer " To: "Maillist netdev" Cc: "Maillist USAGI-users" Subject: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes Date: Wed, 04 Jun 2003 17:40:03 +0200 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <20030604154003.5AA111387A@smtp2.aerasec.de> X-archive-position: 2879 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Content-Length: 6999 Lines: 152 Hi, is this helpful? Happen on playing around with IPsec on 2.5.70, caused by a ping to a destination (1.2.3.4) in IPsec topology. Jun 4 17:41:31 racoonhost kernel: ------------[ cut here ]------------ Jun 4 17:41:31 racoonhost kernel: kernel BUG at net/xfrm/xfrm_policy.c:185! Jun 4 17:41:31 racoonhost kernel: invalid operand: 0000 [#1] Jun 4 17:41:31 racoonhost kernel: CPU: 0 Jun 4 17:41:31 racoonhost kernel: EIP: 0060:[] Tainted: P Jun 4 17:41:31 racoonhost kernel: EFLAGS: 00010246 Jun 4 17:41:31 racoonhost kernel: eax: c6f80a01 ebx: c1b45000 ecx: c6f80a80 edx: c1b45000 Jun 4 17:41:31 racoonhost kernel: esi: c1b45000 edi: 00000000 ebp: c6f80a80 esp: c0985d04 Jun 4 17:41:31 racoonhost kernel: ds: 007b es: 007b ss: 0068 Jun 4 17:41:31 racoonhost kernel: Process ping (pid: 23407, threadinfo=c0984000 task=c4e6c6a0) Jun 4 17:41:31 racoonhost kernel: Stack: c0985ddc c022d09d c1b45000 c0985ddc 00000002 0000002e 00000001 c6f80a80 Jun 4 17:41:31 racoonhost kernel: c1b45000 c0a79d80 c016eff7 fd010018 c6d9b900 c027c7e0 1f3e030a 00000000 Jun 4 17:41:31 racoonhost kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 4 17:41:31 racoonhost kernel: Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Jun 4 17:41:31 racoonhost kernel: Code: 0f 0b b9 00 89 49 25 c0 8b 8b c8 00 00 00 85 c9 74 08 0f 0b Jun 4 17:41:33 racoonhost kernel: ------------[ cut here ]------------ Jun 4 17:41:33 racoonhost kernel: kernel BUG at net/xfrm/xfrm_policy.c:185! Jun 4 17:41:33 racoonhost kernel: invalid operand: 0000 [#2] Jun 4 17:41:33 racoonhost kernel: CPU: 0 Jun 4 17:41:33 racoonhost kernel: EIP: 0060:[] Tainted: P Jun 4 17:41:33 racoonhost kernel: EFLAGS: 00010246 Jun 4 17:41:33 racoonhost kernel: eax: c6f80a01 ebx: c1b45000 ecx: c6f80a80 edx: c1b45000 Jun 4 17:41:33 racoonhost kernel: esi: 00000002 edi: c1b45000 ebp: c6f80a80 esp: c094bd04 Jun 4 17:41:33 racoonhost kernel: ds: 007b es: 007b ss: 0068 Jun 4 17:41:33 racoonhost kernel: Process ping (pid: 23408, threadinfo=c094a000 task=c4e6c6a0) Jun 4 17:41:33 racoonhost kernel: Stack: c1b45000 c022d09d c1b45000 c2d38ab0 00000002 0000002e c7ee1f00 c6f80a80 Jun 4 17:41:33 racoonhost kernel: c1b45000 c0a79d80 c016eff7 c016f045 c7ee1f00 c7ee3800 00000000 c7ee1f00 Jun 4 17:41:33 racoonhost kernel: c7eb2100 c7ece494 c01767ee c2d38ab0 c0d11424 c2d38ab0 00000000 00000000 Jun 4 17:41:33 racoonhost kernel: Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Jun 4 17:41:33 racoonhost kernel: Code: 0f 0b b9 00 89 49 25 c0 8b 8b c8 00 00 00 85 c9 74 08 0f 0b Btw: ping segfaults...that is not good because ping is usually with suid bit set installed: # stat `which ping` File: "/bin/ping" Size: 35192 Blocks: 72 IO Block: -4611693715008778240 Regular File Device: 303h/771d Inode: 128458 Links: 1 Access: (4755/-rwsr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: Wed Jun 4 17:43:44 2003 Modify: Thu Apr 18 23:40:02 2002 Change: Tue Nov 5 18:25:31 2002 # strace ping 1.2.3.4 execve("/bin/ping", ["ping", "1.2.3.4"], [/* 29 vars */]) = 0 uname({sys="Linux", node="racoonhost.lab.aerasec.de", ...}) = 0 brk(0) = 0x8063000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=14186, ...}) = 0 old_mmap(NULL, 14186, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40014000 close(3) = 0 open("/lib/libresolv.so.2", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\'\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=68925, ...}) = 0 old_mmap(NULL, 69408, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40018000 mprotect(0x40026000, 12064, PROT_NONE) = 0 old_mmap(0x40026000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe000) = 0x40026000 old_mmap(0x40027000, 7968, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40027000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0Pv\1B4\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=1402035, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40029000 old_mmap(0x42000000, 1264960, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x42000000 mprotect(0x4212c000, 36160, PROT_NONE) = 0 old_mmap(0x4212c000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12c000) = 0x4212c000 old_mmap(0x42131000, 15680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x42131000 close(3) = 0 munmap(0x40014000, 14186) = 0 brk(0) = 0x8063000 brk(0x8063030) = 0x8063030 brk(0x8064000) = 0x8064000 socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 getuid32() = 0 setuid32(0) = 0 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 connect(4, {sin_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("1.2.3.4")}}, 16 +++ killed by SIGSEGV +++ # rpm -qf `which ping` iputils-20020124-3 # rpm -qi iputils-20020124-3 Name : iputils Relocations: /usr Version : 20020124 Vendor: Red Hat, Inc. Release : 3 Build Date: Thu 18 Apr 2002 11:40:05 PM CEST Install date: Tue 05 Nov 2002 06:25:31 PM CET Build Host: stripples.devel.redhat.com Group : System Environment/Daemons Source RPM: iputils-20020124-3.src.rpm Size : 188776 License: BSD Packager : Red Hat, Inc. Summary : Network monitoring tools including ping. Description : The iputils package contains basic utilities for monitoring a network, including ping. The ping command sends a series of ICMP protocol ECHO_REQUEST packets to a specified network host to discover whether the target machine is alive and receiving network traffic. Hope this helps, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From yoshfuji@linux-ipv6.org Wed Jun 4 08:49:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:49:24 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Fmv2x009504 for ; Wed, 4 Jun 2003 08:49:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h54FnXBo007612; Thu, 5 Jun 2003 00:49:33 +0900 Date: Thu, 05 Jun 2003 00:49:32 +0900 (JST) Message-Id: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> To: lpetande@tml.hut.fi Cc: davem@redhat.com, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EDE0286.4000304@tml.hut.fi> References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> <20030603.213830.85382657.davem@redhat.com> <3EDE0286.4000304@tml.hut.fi> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2880 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 3032 Lines: 74 Hello. In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > 2. Source address based routing > > Ville sent the patch. The semantical changes to the original code were > in our opinion necessary to get source address based routing working > more as IPv4 policy routing. Let's discuss this more. I'm not sure why you need this (and tunnel) for MIP... Would you clearify for me? (IMHO, I believe we don't need this change if we use XFRM engine.) BTW, source based routing (source address is major, destination is minor) is done by policy routing; It is NOT the task of CONFIG_IPV6_SUBTREE, IMHO. Yes, poeple want to have policy routing, but it is NOT (only) for MIP6. > 3. API : > There is a (sub)group working on MIPv6 extensions to the Advanced Socket > API for IPv6. To me it seems pointless to add anything other to the > spec than a way to insert a destination option header to the third > possible DO position (i.e. between routing header and fragmentation > header). This could be done just by adding new type, let's call it > IPV6_NOFRAGDSTOPTS. Everything else should be doable with the > existing ASA. We would like to hear comments on this. No, user daemon adds a XFRM policy for adding destination option (and/or routing header). Stackable destination will do the real work. (So, we don't need socket options.) > 4. Source address selection > > We think adding new home address flag to addresses is the best and > easiest way making the source address selection to work with MIPv6. > I'm sure USAGI will add the relevant checks to their source address > selection code for that. Dave, Antti already brought this up some weeks > ago, but got no answer. Is the home address bit OK with you? "Yes," is my answer for now. > 5. MIPv6 extension header adding : > Storing of the mipv6 information would in our opinion be achieved more > cleanly by using cached routes which included the mipv6 information (two > extra addresses and flags). The routes would contain modified nexthop > information and mip6_output as the rt->u.dst.output function. > Mip6_output would add the extension headers based on the information > stored in the route. The routes would have a stacked dst entry, which > would be used for actual output. I still belive it is very natural to use XFRM to manage stackable destination. > Our prototype currently works with tcp, tcp + ipsec and raw sockets, > but has only a hackish interface through route ioctl for testing. I can > send a preview patch of the code for discussion, if the general approach > makes sense to you. I would like to hear your opinions on this and also > if you (USAGI) have planned something else for storing the mipv6 state. Okay, anyway, please sent it to David, Alexey and me (at least). We can learn more from the code than documentation. ;-) Thank you. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From aj@dungeon.inka.de Wed Jun 4 09:16:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 09:16:46 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54GGK2x011546 for ; Wed, 4 Jun 2003 09:16:39 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19Navy-0003AL-00; Wed, 04 Jun 2003 18:16:18 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 5742120FC1; Wed, 4 Jun 2003 18:16:15 +0200 (CEST) From: Andreas Jellinghaus To: "Dr. Peter Bieringer " , "Maillist netdev" Subject: Re: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes Date: Wed, 4 Jun 2003 18:18:22 +0200 User-Agent: KMail/1.5.2 Cc: "Maillist USAGI-users" References: <20030604154003.5AA111387A@smtp2.aerasec.de> In-Reply-To: <20030604154003.5AA111387A@smtp2.aerasec.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306041818.22607.aj@dungeon.inka.de> X-archive-position: 2881 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev Content-Length: 174 Lines: 9 Am Mittwoch, 4. Juni 2003 17:40 schrieb Dr. Peter Bieringer: > Hi, > > is this helpful? Happen on playing around with IPsec on 2.5.70, at least -bk5 has the fix. Andreas From lpetande@morphine.tml.hut.fi Wed Jun 4 10:32:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 10:33:03 -0700 (PDT) Received: from tml-gw.tml.hut.fi (tml.hut.fi [130.233.44.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54HWS2x029158 for ; Wed, 4 Jun 2003 10:32:49 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id UAA27345 for ; Wed, 4 Jun 2003 20:32:27 +0300 X-Authentication-Warning: tml-gw.tml.hut.fi: smap set sender to using -f Received: from mail.tml.hut.fi(130.233.45.70) by tml-gw.tml.hut.fi via smap (V2.0) id xma027329; Wed, 4 Jun 03 20:32:19 +0300 Received: from localhost (localhost [127.0.0.1]) by mail.tml.hut.fi (Postfix) with ESMTP id B37D018C1AA; Wed, 4 Jun 2003 20:32:18 +0300 (EEST) Received: from mail.tml.hut.fi ([127.0.0.1]) by localhost (mail.tml.hut.fi [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 15696-01; Wed, 4 Jun 2003 20:32:18 +0300 (EEST) Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by mail.tml.hut.fi (Postfix) with ESMTP id 90EC718C1A8; Wed, 4 Jun 2003 20:32:17 +0300 (EEST) Received: from tml.hut.fi (localhost [127.0.0.1]) by morphine.tml.hut.fi (8.12.2+Sun/8.12.2) with ESMTP id h54HWHF5008647; Wed, 4 Jun 2003 20:32:17 +0300 (EEST) Received: from localhost (lpetande@localhost) by tml.hut.fi (8.12.2+Sun/8.12.2/Submit) with ESMTP id h54HVrB8008642; Wed, 4 Jun 2003 20:32:09 +0300 (EEST) Date: Wed, 4 Jun 2003 20:31:53 +0300 (EEST) From: Henrik Petander To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@redhat.com, , , "netdev@oss.sgi.com" , , Venkata Jagana , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 2882 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@morphine.tml.hut.fi Precedence: bulk X-list: netdev Content-Length: 3920 Lines: 95 Hello Yoshifuji, On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > 2. Source address based routing > > I'm not sure why you need this (and tunnel) for MIP... > Would you clearify for me? > (IMHO, I believe we don't need this change if we use XFRM engine.) As far as I remember there are three main reasons for this: (Ville correct me if forgot something) 1. Mobile node sends packets to correspondent node through the tunnel, if they have home address as the IPv6 source address (not in home address option). MIPv6 signalling packets are sent at the same time with the home address option (i.e. care-of address as the source address in IPv6 header) to the same destination, but they must not be sent through the tunnel. If the xfrm engine works correctly (for IPSec) and does the lookup using the home address as the source address, the flows appear the same based on the addresses. 2. Home Agent delivers packets with its own address as source address to mobile node using route optimization, i.e. routing header type 2, but tunnels other traffic to MN. Again behaviour depends on the source address. 3. Multihomed mobile hosts: Mobile nodes can have a cellular and WLAN interface. Packets with address from one interface can be sent only through that interface to avoid RPF dropping the traffic. With routing based primarily on source addresses this is easy to achieve. Actually this is more general than MIPv6, but multihoming is essential for real-life mobility. > > BTW, source based routing (source address is major, destination is minor) > is done by policy routing; It is NOT the task of CONFIG_IPV6_SUBTREE, IMHO. > Yes, poeple want to have policy routing, but it is NOT (only) for MIP6. IMO source address subtrees were useless as they were at least for doing mobility and multihoming, whereas with source address as primary selector routing for mobile and multihomed hosts is straightforward. Of course I may miss something of the larger picture ;-) But so far I have heard of no one using them for anything else. As long as Linux lacks "real" IPv6 policy routing, source based routing is the best we have got and works both for mobility and multihoming. The main problem with it is IMO the source address selection using the route as a selection basis. Just my thoughts, though. > > > 3. API > > No, user daemon adds a XFRM policy for adding destination option > (and/or routing header). Stackable destination will do the real work. > (So, we don't need socket options.) Actually we do... Due to some interesting requirements in the MIPv6 spec. the signalling packets are treated differently from data packets: home address option is always present in binding update messages, but it can be used with data packets only after sending a binding update. Routing header type 2 is used in negative binding acks sometimes with addresses which differ from the ones used with data packets. The signalling packets need IPSec protection with final addresses as selectors, so the MIPv6 extension headers can't be added to packets created by a raw socket as the final addresses would be hidden in the extension headers. > > > 5. MIPv6 extension header adding > I still belive it is very natural to use XFRM to manage stackable > destination. If you have a concrete proposal how to do it, I would be eager to hear it ;-) > Okay, anyway, please sent it to David, Alexey and me (at least). > We can learn more from the code than documentation. ;-) Sure, I'll send it tomorrow when i get back to work. Regards, Henrik ---------------------------------- Henrik Petander Helsinki University of Technology, GO/Core Project Henrik.Petander@hut.fi Office: +358 (0)9 451 5846 GSM: +358 (0)40 741 5248 ---------------------------------- From hch@lst.de Wed Jun 4 11:16:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:16:41 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IGU2x030179 for ; Wed, 4 Jun 2003 11:16:32 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h54IGRJT024755 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Wed, 4 Jun 2003 20:16:27 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h54IGReI024753 for netdev@oss.sgi.com; Wed, 4 Jun 2003 20:16:27 +0200 Date: Wed, 4 Jun 2003 20:16:27 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] kill drivers/net/setup.c Message-ID: <20030604181627.GA24733@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2883 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 3932 Lines: 163 The last two drivers are ppc 8xx system devices and Paul (ppc maintainer) said I should just send this patch along, the 8xx guys will have to deal with a possible breakage when brining up their port for 2.5/2.6 again. (Not that I expect anything bad to happen..) --- 1.8/arch/ppc/8260_io/enet.c Sun Apr 27 13:56:50 2003 +++ edited/arch/ppc/8260_io/enet.c Tue Jun 3 22:08:59 2003 @@ -608,7 +608,7 @@ /* Initialize the CPM Ethernet on SCC. */ -int __init scc_enet_init(void) +static int __init scc_enet_init(void) { struct net_device *dev; struct scc_enet_private *cep; @@ -860,3 +860,4 @@ return 0; } +module_init(scc_enet_init); --- 1.9/arch/ppc/8260_io/fcc_enet.c Sun Apr 27 13:56:50 2003 +++ edited/arch/ppc/8260_io/fcc_enet.c Tue Jun 3 22:08:59 2003 @@ -1323,7 +1323,7 @@ /* Initialize the CPM Ethernet on FCC. */ -int __init fec_enet_init(void) +static int __init fec_enet_init(void) { struct net_device *dev; struct fcc_enet_private *cep; @@ -1394,6 +1394,7 @@ return 0; } +module_init(fec_enet_init); /* Make sure the device is shut down during initialization. */ --- 1.10/arch/ppc/8xx_io/enet.c Mon Sep 16 06:51:56 2002 +++ edited/arch/ppc/8xx_io/enet.c Tue Jun 3 22:08:59 2003 @@ -639,7 +639,7 @@ * transmit and receive to make sure we don't catch the CPM with some * inconsistent control information. */ -int __init scc_enet_init(void) +static int __init scc_enet_init(void) { struct net_device *dev; struct scc_enet_private *cep; @@ -964,3 +964,5 @@ return 0; } + +module_init(scc_enet_init); --- 1.13/arch/ppc/8xx_io/fec.c Tue Dec 31 22:10:48 2002 +++ edited/arch/ppc/8xx_io/fec.c Tue Jun 3 22:09:00 2003 @@ -1566,7 +1566,7 @@ /* Initialize the FEC Ethernet on 860T. */ -int __init fec_enet_init(void) +static int __init fec_enet_init(void) { struct net_device *dev; struct fec_enet_private *fep; @@ -1782,6 +1782,7 @@ return 0; } +module_init(fec_enet_init); /* This function is called to start or restart the FEC during a link * change. This only happens when switching between half and full --- 1.16/drivers/net/setup.c Wed Jun 4 07:13:57 2003 +++ edited/drivers/net/setup.c Tue Jun 3 22:12:10 2003 @@ -1,54 +0,0 @@ - -/* - * New style setup code for the network devices - */ - -#include -#include -#include -#include -#include - -extern int scc_enet_init(void); -extern int fec_enet_init(void); - -/* - * Devices in this list must do new style probing. That is they must - * allocate their own device objects and do their own bus scans. - */ - -struct net_probe -{ - int (*probe)(void); - int status; /* non-zero if autoprobe has failed */ -}; - -static struct net_probe pci_probes[] __initdata = { - /* - * Early setup devices - */ -#if defined(CONFIG_SCC_ENET) - {scc_enet_init, 0}, -#endif -#if defined(CONFIG_FEC_ENET) - {fec_enet_init, 0}, -#endif - {NULL, 0}, -}; - - -/* - * Run the updated device probes. These do not need a device passed - * into them. - */ - -void __init net_device_init(void) -{ - struct net_probe *p = pci_probes; - - while (p->probe != NULL) - { - p->status = p->probe(); - p++; - } -} --- 1.83/net/core/dev.c Mon May 26 07:16:23 2003 +++ edited/net/core/dev.c Tue Jun 3 22:12:01 2003 @@ -2861,15 +2861,8 @@ * unhooks any devices that fail to initialise (normally hardware not * present) and leaves us with a valid list of present and active devices. * - */ - -extern void net_device_init(void); -extern void ip_auto_config(void); - - -/* - * This is called single threaded during boot, so no need - * to take the rtnl semaphore. + * This is called single threaded during boot, so no need + * to take the rtnl semaphore. */ static int __init net_dev_init(void) { @@ -3003,7 +2996,6 @@ * Initialise network devices */ - net_device_init(); rc = 0; out: return rc; From hch@lst.de Wed Jun 4 11:18:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:18:17 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IIB2x030483 for ; Wed, 4 Jun 2003 11:18:12 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h54II9JT024813 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Wed, 4 Jun 2003 20:18:09 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h54II9ko024811 for netdev@oss.sgi.com; Wed, 4 Jun 2003 20:18:09 +0200 Date: Wed, 4 Jun 2003 20:18:09 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] switch skfp over to initcalls Message-ID: <20030604181809.GA24779@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2884 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 4257 Lines: 191 This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) --- 1.20/drivers/net/Space.c Wed May 21 03:56:26 2003 +++ edited/drivers/net/Space.c Tue Jun 3 22:17:09 2003 @@ -105,9 +105,6 @@ /* Detachable devices ("pocket adaptors") */ extern int de620_probe(struct net_device *); -/* FDDI adapters */ -extern int skfp_probe(struct net_device *dev); - /* Fibre Channel adapters */ extern int iph5526_probe(struct net_device *dev); @@ -401,29 +398,6 @@ return -ENODEV; } -#ifdef CONFIG_FDDI -static int __init fddiif_probe(struct net_device *dev) -{ - unsigned long base_addr = dev->base_addr; - - if (base_addr == 1) - return 1; /* ENXIO */ - - if (1 -#ifdef CONFIG_APFDDI - && apfddi_init(dev) -#endif -#ifdef CONFIG_SKFP - && skfp_probe(dev) -#endif - && 1 ) { - return 1; /* -ENODEV or -EAGAIN would be more accurate. */ - } - return 0; -} -#endif - - #ifdef CONFIG_NET_FC static int fcif_probe(struct net_device *dev) { @@ -614,52 +588,6 @@ #define NEXT_DEV (&tr0_dev) #endif - -#ifdef CONFIG_FDDI -static struct net_device fddi7_dev = { - .name = "fddi7", - .next = NEXT_DEV, - .init = fddiif_probe -}; -static struct net_device fddi6_dev = { - .name = "fddi6", - .next = &fddi7_dev, - .init = fddiif_probe -}; -static struct net_device fddi5_dev = { - .name = "fddi5", - .next = &fddi6_dev, - .init = fddiif_probe -}; -static struct net_device fddi4_dev = { - .name = "fddi4", - .next = &fddi5_dev, - .init = fddiif_probe -}; -static struct net_device fddi3_dev = { - .name = "fddi3", - .next = &fddi4_dev, - .init = fddiif_probe -}; -static struct net_device fddi2_dev = { - .name = "fddi2", - .next = &fddi3_dev, - .init = fddiif_probe -}; -static struct net_device fddi1_dev = { - .name = "fddi1", - .next = &fddi2_dev, - .init = fddiif_probe -}; -static struct net_device fddi0_dev = { - .name = "fddi0", - .next = &fddi1_dev, - .init = fddiif_probe -}; -#undef NEXT_DEV -#define NEXT_DEV (&fddi0_dev) -#endif - #ifdef CONFIG_NET_FC static struct net_device fc1_dev = { --- 1.12/drivers/net/skfp/skfddi.c Fri May 9 02:40:17 2003 +++ edited/drivers/net/skfp/skfddi.c Tue Jun 3 22:19:04 2003 @@ -2539,72 +2539,25 @@ } // drv_reset_indication - -//--------------- functions for use as a module ---------------- - -#ifdef MODULE -/************************ - * - * Note now that module autoprobing is allowed under PCI. The - * IRQ lines will not be auto-detected; instead I'll rely on the BIOSes - * to "do the right thing". - * - ************************/ -#define LP(a) ((struct s_smc*)(a)) static struct net_device *mdev; -/************************ - * - * init_module - * - * If compiled as a module, find - * adapters and initialize them. - * - ************************/ -int init_module(void) +static int __init skfd_init(void) { struct net_device *p; - PRINTK(KERN_INFO "FDDI init module\n"); if ((mdev = insert_device(NULL, skfp_probe)) == NULL) return -ENOMEM; - for (p = mdev; p != NULL; p = LP(p->priv)->os.next_module) { - PRINTK(KERN_INFO "device to register: %s\n", p->name); + for (p = mdev; p != NULL; p = ((struct s_smc *)p->priv)->os.next_module) { if (register_netdev(p) != 0) { printk("skfddi init_module failed\n"); return -EIO; } } - PRINTK(KERN_INFO "+++++ exit with success +++++\n"); return 0; -} // init_module +} -/************************ - * - * cleanup_module - * - * Release all resources claimed by this module. - * - ************************/ -void cleanup_module(void) -{ - PRINTK(KERN_INFO "cleanup_module\n"); - while (mdev != NULL) { - mdev = unlink_modules(mdev); - } - return; -} // cleanup_module - - -/************************ - * - * unlink_modules - * - * Unregister devices and release their memory. - * - ************************/ static struct net_device *unlink_modules(struct net_device *p) { struct net_device *next = NULL; @@ -2638,5 +2591,11 @@ return next; } // unlink_modules +static void __exit skfd_exit(void) +{ + while (mdev) + mdev = unlink_modules(mdev); +} -#endif /* MODULE */ +module_init(skfd_init); +module_exit(skfd_exit); From shemminger@osdl.org Wed Jun 4 11:21:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:22:00 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54ILq2x030905 for ; Wed, 4 Jun 2003 11:21:52 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54ILaX24456; Wed, 4 Jun 2003 11:21:36 -0700 Date: Wed, 4 Jun 2003 11:21:36 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] tulip/xircom initialization bug Message-Id: <20030604112136.7b8e2cf4.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2885 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 579 Lines: 16 By inspection of device initialization code, this driver unregister's the net device in the error path even though the register_netdevice never succeeded. Compiles, but don't have the hardware. diff -Nru a/drivers/net/tulip/xircom_tulip_cb.c b/drivers/net/tulip/xircom_tulip_cb.c --- a/drivers/net/tulip/xircom_tulip_cb.c Wed Jun 4 11:18:44 2003 +++ b/drivers/net/tulip/xircom_tulip_cb.c Wed Jun 4 11:18:44 2003 @@ -648,7 +648,6 @@ pci_set_drvdata(pdev, NULL); pci_release_regions(pdev); err_out_free_netdev: - unregister_netdev(dev); kfree(dev); return -ENODEV; } From shemminger@osdl.org Wed Jun 4 11:25:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:25:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IPa2x031542 for ; Wed, 4 Jun 2003 11:25:37 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54IP9X25211; Wed, 4 Jun 2003 11:25:09 -0700 Date: Wed, 4 Jun 2003 11:25:09 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] sb1000 driver bugs Message-Id: <20030604112509.1e0cc260.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2886 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1811 Lines: 65 Inspecting the sb1000 driver showed some interesting bugs: - net device pointer is used before the device is allocated; gcc does catch this. - unregister is called even though device not registered successfully - net device is not freed on remove. Compiles but don't have hardware to test. Don't know how it ever worked though. diff -Nru a/drivers/net/sb1000.c b/drivers/net/sb1000.c --- a/drivers/net/sb1000.c Wed Jun 4 11:17:47 2003 +++ b/drivers/net/sb1000.c Wed Jun 4 11:17:47 2003 @@ -162,10 +162,17 @@ irq = pnp_irq(pdev, 0); - if (!request_region(ioaddr[0], 16, dev->name)) + if (!request_region(ioaddr[0], 16, "sb1000")) goto out_disable; - if (!request_region(ioaddr[1], 16, dev->name)) + if (!request_region(ioaddr[1], 16, "sb1000")) goto out_release_region0; + + dev = alloc_etherdev(sizeof(struct sb1000_private)); + if (!dev) { + error = -ENOMEM; + goto out_release_regions; + } + dev->base_addr = ioaddr[0]; /* mem_start holds the second I/O address */ @@ -177,12 +184,6 @@ "S/N %#8.8x, IRQ %d.\n", dev->name, dev->base_addr, dev->mem_start, serial_number, dev->irq); - dev = alloc_etherdev(sizeof(struct sb1000_private)); - if (!dev) { - error = -ENOMEM; - goto out_release_regions; - } - /* * The SB1000 is an rx-only cable modem device. The uplink is a modem * and we do not want to arp on it. @@ -212,11 +213,9 @@ error = register_netdev(dev); if (error) - goto out_unregister; + goto out_release_regions; return 0; - out_unregister: - unregister_netdev(dev); out_release_regions: release_region(ioaddr[1], 16); out_release_region0: @@ -236,6 +235,7 @@ unregister_netdev(dev); release_region(dev->base_addr, 16); release_region(dev->mem_start, 16); + kfree(dev); } static struct pnp_driver sb1000_driver = { From shemminger@osdl.org Wed Jun 4 15:41:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 15:41:35 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Mf82x005229 for ; Wed, 4 Jun 2003 15:41:29 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54MeMX12290; Wed, 4 Jun 2003 15:40:22 -0700 Date: Wed, 4 Jun 2003 15:40:22 -0700 From: Stephen Hemminger To: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik Cc: akpm@digeo.com, davem@redhat.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: [PATCH 2.5.70] Tun device encapsulation Message-Id: <20030604154022.0ef344ff.shemminger@osdl.org> In-Reply-To: <20030604212528.GA24515@conectiva.com.br> References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2887 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 845 Lines: 31 Tun device was encapsulating the net_device in a private structure then doing: unregister_netdev(&tun->dev); kfree(tun); rtnl_unlock(); This breaks with the delayed cleanup now in the network core. Moving the kfree outside of the rtnl_unlock will fix it. Builds, but not sure how to use TUN to test it. As part of later refcounting changes, I do have a more complex change that uses the same encapsulation as ethernet and other devices. Will save it for later. diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c --- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 +++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 @@ -551,10 +551,12 @@ if (!(tun->flags & TUN_PERSIST)) { dev_close(&tun->dev); unregister_netdevice(&tun->dev); - kfree(tun); } rtnl_unlock(); + + if (!(tun->flags & TUN_PERSIST)) + kfree(tun); return 0; } From jjs@tmsusa.com Wed Jun 4 15:47:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 15:47:38 -0700 (PDT) Received: from freeside.toyota.com (freeside.toyota.com [63.87.74.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54MlB2x005580 for ; Wed, 4 Jun 2003 15:47:32 -0700 Received: from einsten.tms.toyota.com (einstein.tms.toyota.com [10.49.36.228]) by freeside.toyota.com (8.12.8/8.12.5) with ESMTP id h54MkjdH031490; Wed, 4 Jun 2003 15:46:45 -0700 Received: from tmsusa.com (localhost.localdomain [127.0.0.1]) by einsten.tms.toyota.com (Postfix) with ESMTP id 83BA23DE7; Wed, 4 Jun 2003 15:46:44 -0700 (PDT) Message-ID: <3EDE76D4.8070001@tmsusa.com> Date: Wed, 04 Jun 2003 15:46:44 -0700 From: jjs User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger Cc: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik , akpm@digeo.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2888 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jjs@tmsusa.com Precedence: bulk X-list: netdev Content-Length: 964 Lines: 44 I'll be happy to test it out later - Thanks, Joe Stephen Hemminger wrote: >Tun device was encapsulating the net_device in a private structure then doing: > unregister_netdev(&tun->dev); > kfree(tun); > rtnl_unlock(); > >This breaks with the delayed cleanup now in the network core. >Moving the kfree outside of the rtnl_unlock will fix it. > >Builds, but not sure how to use TUN to test it. > >As part of later refcounting changes, I do have a more complex change >that uses the same encapsulation as ethernet and other >devices. Will save it for later. > >diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c >--- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >+++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >@@ -551,10 +551,12 @@ > if (!(tun->flags & TUN_PERSIST)) { > dev_close(&tun->dev); > unregister_netdevice(&tun->dev); >- kfree(tun); > } > > rtnl_unlock(); >+ >+ if (!(tun->flags & TUN_PERSIST)) >+ kfree(tun); > return 0; > } > > > > From shemminger@osdl.org Wed Jun 4 16:14:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:15:16 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NEq2x006337 for ; Wed, 4 Jun 2003 16:14:52 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54NEcX22925; Wed, 4 Jun 2003 16:14:38 -0700 Date: Wed, 4 Jun 2003 16:14:37 -0700 From: Stephen Hemminger To: Jeff Garzik , "David S. Miller" Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: 2.5.70-bk+ broken networking Message-Id: <20030604161437.2b4d3a79.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2889 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 21729 Lines: 583 Test machine running 2.5.70-bk latest can't boot because eth2 won't come up. The same machine and configuration successfully brings up all the devices and runs on 2.5.70. Starting ip6tables: [ OK ] Starting iptables: [ OK ] Setting network parameters: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ OK ] Bringing up interface eth1: [ OK ] Bringing up interface eth2: sender address length == 0 e1000 device does not seem to be present, delaying eth2 initialization. [FAILED] Starting system logger: [ OK ] Starting kernel logger: [ OK ] Starting portmapper: [ OK ] Starting NFS statd: [ OK ] Starting keytable: [ OK ] Initializing random number generator: [ OK ] Starting pcmcia: [ OK ] Mounting other filesystems: [ OK ] Setting NIS domain name osdl: [ OK ] Binding to the NIS domain: [ OK ] Listening for an NIS domain server. (hung) SysRq : Show State free sibling task PC stack pid father child younger older init S 00000001 3414430476 1 0 2 (NOTLB) Call Trace: [] schedule_timeout+0x6a/0xbc [] process_timeout+0x0/0xc [] do_select+0x193/0x2ee [] __pollwait+0x0/0xaa [] sys_select+0x2a6/0x4a8 [] sys_stat64+0x35/0x38 [] syscall_call+0x7/0xb migration/0 S 00000001 4294947312 2 1 3 (L-TLB) Call Trace: [] migration_thread+0x4f3/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/0 S 00000000 4294940388 3 1 4 2 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/1 S 00000001 7996 4 1 5 3 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/1 S 00000000 4294960540 5 1 6 4 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/2 S 00000001 4294953884 6 1 7 5 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/2 S 00000000 4294947324 7 1 8 6 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/3 S 00000001 4294940668 8 1 9 7 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/3 S 00000001 8044 9 1 10 8 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/4 S 00000001 4294960492 10 1 11 9 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/4 S 00000001 4294953932 11 1 12 10 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/5 S 00000001 4294947276 12 1 13 11 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/5 S 00000001 4294940716 13 1 14 12 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/6 S 00000001 7996 14 1 15 13 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/6 S 00000001 4294960540 15 1 16 14 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/7 S 00000001 4294953884 16 1 17 15 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/7 S 00000001 4294947324 17 1 18 16 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc events/0 S 00000001 3415176940 18 1 19 17 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] flush_to_ldisc+0x0/0x176 [] preempt_schedule+0x36/0x50 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/1 S 00000001 7620 19 1 20 18 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] blk_unplug_work+0x0/0x16 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/2 S 00000001 4294960436 20 1 21 19 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/3 S 00000001 4294953652 21 1 22 20 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] blk_unplug_work+0x0/0x16 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/4 S 00000001 4294947220 22 1 23 21 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/5 S 00000001 4294940612 23 1 24 22 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/6 S 00000001 7940 24 1 25 23 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/7 S 00000001 4294960436 25 1 26 24 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc kirqd S 00000001 874843020 26 1 27 25 (L-TLB) Call Trace: [] schedule_timeout+0x6a/0xbc [] process_timeout+0x0/0xc [] balanced_irq+0x4f/0x76 [] balanced_irq+0x0/0x76 [] kernel_thread_helper+0x5/0xc pdflush S 00000001 874836508 27 1 29 26 (L-TLB) Call Trace: [] daemonize+0xd1/0xd8 [] __pdflush+0xdc/0x378 [] preempt_schedule+0x36/0x50 [] schedule_tail+0xc0/0xdc [] pdflush+0x0/0x16 [] pdflush+0x11/0x16 [] kernel_thread_helper+0x5/0xc kswapd0 S F7A37EE4 7884 29 1 28 27 (L-TLB) Call Trace: [] reparent_to_init+0x10a/0x1b0 [] daemonize+0xd1/0xd8 [] kswapd+0xe0/0x10c [] preempt_schedule+0x36/0x50 [] autoremove_wake_function+0x0/0x4c [] ret_from_fork+0x6/0x14 [] autoremove_wake_function+0x0/0x4c [] kswapd+0x0/0x10c [] kernel_thread_helper+0x5/0xc pdflush S 00000001 874828660 28 1 30 29 (L-TLB) Call Trace: [] __pdflush+0xdc/0x378 [] preempt_schedule+0x36/0x50 [] schedule_tail+0xc0/0xdc [] pdflush+0x0/0x16 [] pdflush+0x11/0x16 [] kernel_thread_helper+0x5/0xc aio/0 S F7A0DBF8 4294624600 30 1 31 28 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] preempt_schedule+0x36/0x50 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/1 S 00000001 4294617956 31 1 32 30 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/2 S 00000001 4294611348 32 1 33 31 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/3 S 00000001 4294604740 33 1 34 32 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/4 S 00000001 7940 34 1 35 33 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/5 S 00000001 4294960436 35 1 36 34 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/6 S 00000001 4294953828 36 1 37 35 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/7 S 00000001 4294947220 37 1 38 36 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc kseriod S 00000001 4294383536 38 1 43 37 (L-TLB) Call Trace: [] allow_signal+0x5a/0xd8 [] serio_thread+0xbe/0x190 [] default_wake_function+0x0/0x2e [] serio_thread+0x0/0x190 [] kernel_thread_helper+0x5/0xc scsi_eh_0 S 00000000 8052 43 1 44 38 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] __down_failed_interruptible+0xa/0x10 [] .text.lock.scsi_error+0xad/0xb5 [scsi_mod] [] +0x20fb/0x2d40 [scsi_mod] [] scsi_error_handler+0x0/0x23a [scsi_mod] [] kernel_thread_helper+0x5/0xc ahc_dv_0 S F70F6000 4294960340 44 1 45 43 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] ahc_linux_release_simq+0xdb/0x15a [aic7xxx] [] __down_failed_interruptible+0xa/0x10 [] .text.lock.aic7xxx_osm+0x8e/0x1fb [aic7xxx] [] +0x215d/0x2600 [aic7xxx] [] ahc_linux_dv_thread+0x0/0x632 [aic7xxx] [] kernel_thread_helper+0x5/0xc scsi_eh_1 S 00000000 4294359084 45 1 46 44 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] __down_failed_interruptible+0xa/0x10 [] .text.lock.scsi_error+0xad/0xb5 [scsi_mod] [] +0x20fb/0x2d40 [scsi_mod] [] scsi_error_handler+0x0/0x23a [scsi_mod] [] kernel_thread_helper+0x5/0xc ahc_dv_1 S F70D4000 4294349080 46 1 47 45 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] ahc_linux_release_simq+0xdb/0x15a [aic7xxx] [] __down_failed_interruptible+0xa/0x10 [] .text.lock.aic7xxx_osm+0x8e/0x1fb [aic7xxx] [] +0x215d/0x2600 [aic7xxx] [] ahc_linux_dv_thread+0x0/0x632 [aic7xxx] [] kernel_thread_helper+0x5/0xc kjournald S 00000001 4294213892 47 1 148 46 (L-TLB) Call Trace: [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 16131824 148 1 149 47 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 192 149 1 150 148 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000000 4287980920 150 1 151 149 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald D 00000001 4287973144 151 1 152 150 (L-TLB) Call Trace: [] blk_run_queues+0xcd/0x1ae [] io_schedule+0x26/0x30 [] __wait_on_buffer+0xcf/0xd2 [] autoremove_wake_function+0x0/0x4c [] autoremove_wake_function+0x0/0x4c [] journal_commit_transaction+0x49b/0x1632 [] smp_apic_timer_interrupt+0xd8/0x140 [] schedule+0x218/0x608 [] default_wake_function+0x2a/0x2e [] default_wake_function+0x0/0x2e [] kjournald+0x163/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 4287966564 152 1 242 151 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc rc S 00000001 4294206928 242 1 670 434 152 (NOTLB) Call Trace: [] sys_wait4+0x1e6/0x29a [] sys_rt_sigaction+0xd1/0xf4 [] default_wake_function+0x0/0x2e [] sys_rt_sigprocmask+0xce/0x1b0 [] default_wake_function+0x0/0x2e [] syscall_call+0x7/0xb dhclient S 00000001 4291868976 434 1 557 242 (NOTLB) Call Trace: [] common_interrupt+0x18/0x20 [] schedule_timeout+0x6a/0xbc [] __pollwait+0x38/0xaa [] process_timeout+0x0/0xc [] sock_poll+0x26/0x2a [] do_select+0x193/0x2ee [] __pollwait+0x0/0xaa [] sys_select+0x2a6/0x4a8 [] syscall_call+0x7/0xb syslogd D 00000001 4290889300 557 1 561 434 (NOTLB) Call Trace: [] default_wake_function+0x2a/0x2e [] sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] log_wait_commit+0x70/0x120 [] log_start_commit+0xea/0x114 [] journal_stop+0x193/0x20e [] journal_force_commit+0xd1/0xea [] ext3_force_commit+0x69/0xe6 [] sys_fsync+0xa3/0xce [] syscall_call+0x7/0xb klogd S 00000001 4289963772 561 1 572 557 (NOTLB) Call Trace: [] fprob+0x2b/0x34 [] schedule_timeout+0xb9/0xbc [] kmalloc+0x188/0x1d6 [] unix_wait_for_peer+0xde/0xea [] autoremove_wake_function+0x0/0x4c [] memcpy_fromiovec+0x88/0x8e [] autoremove_wake_function+0x0/0x4c [] sock_alloc_send_skb+0x2e/0x32 [] unix_dgram_sendmsg+0x2be/0x68c [] filemap_nopage+0x1e5/0x2ce [] pte_chain_alloc+0x94/0x9c [] sock_aio_write+0xbc/0xd8 [] do_sync_write+0x8a/0xb6 [] handle_mm_fault+0x103/0x1fc [] default_wake_function+0x0/0x2e [] run_timer_softirq+0x196/0x25c [] vfs_write+0xe9/0x11a [] sys_write+0x3f/0x5e [] syscall_call+0x7/0xb portmap S 00000001 4292398496 572 1 561 (NOTLB) Call Trace: [] schedule_timeout+0xb9/0xbc [] sock_poll+0x26/0x2a [] do_pollfd+0x57/0x98 [] do_poll+0xa5/0xc4 [] sys_poll+0x160/0x23a [] __pollwait+0x0/0xaa [] syscall_call+0x7/0xb S27ypbind S 00000001 276376 670 242 687 (NOTLB) Call Trace: [] sys_wait4+0x1e6/0x29a [] sys_rt_sigaction+0xd1/0xf4 [] default_wake_function+0x0/0x2e [] sys_rt_sigprocmask+0xce/0x1b0 [] default_wake_function+0x0/0x2e [] syscall_call+0x7/0xb rpcinfo S 00000001 4287705648 687 670 688 (NOTLB) Call Trace: [] tcp_v4_connect+0x42f/0x68c [] schedule_timeout+0xb9/0xbc [] inet_wait_for_connect+0x119/0x298 [] autoremove_wake_function+0x0/0x4c [] autoremove_wake_function+0x0/0x4c [] inet_stream_connect+0x218/0x340 [] move_addr_to_kernel+0x6b/0x70 [] sys_connect+0x78/0x9a [] do_page_fault+0x27f/0x4bd [] sock_create+0x100/0x2b0 [] sys_socket+0x3a/0x56 [] sys_socketcall+0xb2/0x262 [] sys_munmap+0x58/0x78 [] syscall_call+0x7/0xb grep S 00000001 4293967752 688 670 687 (NOTLB) Call Trace: [] pipe_wait+0x8b/0xc0 [] autoremove_wake_function+0x0/0x4c [] cp_new_stat64+0xe6/0xea [] autoremove_wake_function+0x0/0x4c [] pipe_read+0x158/0x246 [] vfs_read+0xaf/0x11a [] sys_read+0x3f/0x5e [] syscall_call+0x7/0xb From shemminger@osdl.org Wed Jun 4 16:25:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:25:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NPg2x006821 for ; Wed, 4 Jun 2003 16:25:42 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54NPSX26806; Wed, 4 Jun 2003 16:25:28 -0700 Date: Wed, 4 Jun 2003 16:25:28 -0700 From: Stephen Hemminger To: Stephen Hemminger Cc: jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604162528.637ae1ff.shemminger@osdl.org> In-Reply-To: <20030604161437.2b4d3a79.shemminger@osdl.org> References: <20030604161437.2b4d3a79.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2891 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 696 Lines: 21 On Wed, 4 Jun 2003 16:14:37 -0700 Stephen Hemminger wrote: > Test machine running 2.5.70-bk latest can't boot because eth2 won't > come up. The same machine and configuration successfully brings up > all the devices and runs on 2.5.70. > > Starting ip6tables: [ OK ] > Starting iptables: [ OK ] > Setting network parameters: [ OK ] > Bringing up loopback interface: [ OK ] > Bringing up interface eth0: [ OK ] > Bringing up interface eth1: [ OK ] > Bringing up interface eth2: sender address length == 0 > e1000 device does not seem to be present, delaying eth2 initialization. > [FAILED] One more piece of info: eth0 and eth1 are e100 eth2 is e1000 From Andrew.Morton@digeo.com Wed Jun 4 16:25:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:25:42 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NPc2x006820 for ; Wed, 4 Jun 2003 16:25:38 -0700 Received: from digeo.com ([172.17.140.150]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 16:25:32 -0700 Message-ID: <3EDE7FEB.2C7FAEC7@digeo.com> Date: Wed, 04 Jun 2003 16:25:31 -0700 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.70-mm3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Stephen Hemminger CC: Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking References: <20030604161437.2b4d3a79.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 04 Jun 2003 23:25:32.0670 (UTC) FILETIME=[97F8F9E0:01C32AF0] X-archive-position: 2890 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Content-Length: 397 Lines: 11 Stephen Hemminger wrote: > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > come up. The same machine and configuration successfully brings up > all the devices and runs on 2.5.70. kjournald is stuck waiting for IO to complete against some buffer during transaction commit. I'd be suspecting block layer or device drivers. What device driver is handling your /var/log? From patmans@us.ibm.com Wed Jun 4 18:48:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 18:48:13 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h551lt2x010986 for ; Wed, 4 Jun 2003 18:48:05 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h551lftd176574; Wed, 4 Jun 2003 21:47:41 -0400 Received: from DYN318139.beaverton.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h551lbcO178264; Wed, 4 Jun 2003 21:47:38 -0400 Received: (from patman@localhost) by DYN318139.beaverton.ibm.com (8.11.6/8.11.6) id h551hfw10353; Wed, 4 Jun 2003 18:43:41 -0700 X-Authentication-Warning: DYN318139.beaverton.ibm.com: patman set sender to patmans@us.ibm.com using -f Date: Wed, 4 Jun 2003 18:43:41 -0700 From: Patrick Mansfield To: Andrew Morton Cc: Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030604184341.A10256@beaverton.ibm.com> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3EDE7FEB.2C7FAEC7@digeo.com>; from akpm@digeo.com on Wed, Jun 04, 2003 at 04:25:31PM -0700 X-archive-position: 2892 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: patmans@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1498 Lines: 47 On Wed, Jun 04, 2003 at 04:25:31PM -0700, Andrew Morton wrote: > Stephen Hemminger wrote: > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > come up. The same machine and configuration successfully brings up > > all the devices and runs on 2.5.70. > > kjournald is stuck waiting for IO to complete against some buffer > during transaction commit. > > I'd be suspecting block layer or device drivers. What device driver > is handling your /var/log? I also can't get networking up on current bk, I don't know if this is the same problem, the system did not hang (I'm not running NIS?). I also got that "sender address length == 0" message, I have not seen it before, it seems to be output by the "ip -o link". During boot: [ ... ] Enabling local filesystem quotas: [ OK ] Enabling swap space: [ OK ] /bin/cat: /proc/ksyms: No such file or directory INIT: Entering runlevel: 3 Entering non-interactive startup Setting network parameters: [ OK ] Bringing up interface lo: [ OK ] sender address length == 0 sender address length == 0 Starting system logger: [ OK ] Starting kernel logger: [ OK ] Starting portmapper: [ OK ] Starting NFS file locking services: [ ... ] After logging in: [root@elm3b79 root]# ifup eth0 sender address length == 0 [root@elm3b79 root]# ip -o link sender address length == 0 [root@elm3b79 root]# dmesg | grep eth0 eth0: Digital DS21140 Tulip rev 33 at 0xf8800000, 00:00:BC:0F:03:EB, IRQ 36. -- Patrick Mansfield From acme@conectiva.com.br Wed Jun 4 19:32:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 19:33:06 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h552Wr2x021427 for ; Wed, 4 Jun 2003 19:32:54 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YzL2-00009b-00; Sat, 05 Jul 2003 23:33:16 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 568021966C; Thu, 5 Jun 2003 02:33:49 +0000 (UTC) Date: Wed, 4 Jun 2003 23:33:49 -0300 From: Arnaldo Carvalho de Melo To: Andrew Morton Cc: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605023349.GH24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030604185652.31958d1f.akpm@digeo.com> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2893 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 1297 Lines: 33 Em Wed, Jun 04, 2003 at 06:56:52PM -0700, Andrew Morton escreveu: > Andrew Morton wrote: > > > > Stephen Hemminger wrote: > > > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > > come up. The same machine and configuration successfully brings up > > > all the devices and runs on 2.5.70. > > > > kjournald is stuck waiting for IO to complete against some buffer > > during transaction commit. > > > > I'd be suspecting block layer or device drivers. What device driver > > is handling your /var/log? > > I take that back. > > Your sysrq-T woke up syslogd which did a synchronous write which poked > kjournald. You happened to catch it in mid-commit. So that's all normal > and sane. > > Something is up with netdevice initialisation. My eth0 (e100) is in a > strange half-there state and won't come up. Reverting the post-2.5.70 e100 > changes does not help. It's something which went into the tree today I > think. Strange as I'm using 2.5.70-latest-bk as of 30 minutes ago, i.e. uptodate with Linus + my network patches. Thing is related to nfs, please nfs loading at boot time and try again, worked for me, don't know what is wrong with nfs loading tho (haven't checked at all, just disabled loading of the nfs server) :-( - Arnaldo From acme@conectiva.com.br Wed Jun 4 19:41:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 19:41:20 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h552fE2x021867 for ; Wed, 4 Jun 2003 19:41:15 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YzTB-0000BI-00; Sat, 05 Jul 2003 23:41:41 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id B7A951966C; Thu, 5 Jun 2003 02:42:13 +0000 (UTC) Date: Wed, 4 Jun 2003 23:42:13 -0300 From: Arnaldo Carvalho de Melo To: Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605024212.GI24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> <20030605023349.GH24515@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030605023349.GH24515@conectiva.com.br> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2894 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 1476 Lines: 35 Em Wed, Jun 04, 2003 at 11:33:49PM -0300, Arnaldo C. Melo escreveu: > Em Wed, Jun 04, 2003 at 06:56:52PM -0700, Andrew Morton escreveu: > > Andrew Morton wrote: > > > > > > Stephen Hemminger wrote: > > > > > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > > > come up. The same machine and configuration successfully brings up > > > > all the devices and runs on 2.5.70. > > > > > > kjournald is stuck waiting for IO to complete against some buffer > > > during transaction commit. > > > > > > I'd be suspecting block layer or device drivers. What device driver > > > is handling your /var/log? > > > > I take that back. > > > > Your sysrq-T woke up syslogd which did a synchronous write which poked > > kjournald. You happened to catch it in mid-commit. So that's all normal > > and sane. > > > > Something is up with netdevice initialisation. My eth0 (e100) is in a > > strange half-there state and won't come up. Reverting the post-2.5.70 e100 > > changes does not help. It's something which went into the tree today I > > think. > > Strange as I'm using 2.5.70-latest-bk as of 30 minutes ago, i.e. uptodate with > Linus + my network patches. Thing is related to nfs, please nfs loading at Ouch, it should have been "please disable nfs loading..." > boot time and try again, worked for me, don't know what is wrong with nfs > loading tho (haven't checked at all, just disabled loading of the nfs > server) :-( From jmorris@intercode.com.au Wed Jun 4 20:26:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 20:26:59 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:556VwnA7GH4nCqJ3WvCtm1xx63Zyi+XP@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h553Qj2x024044 for ; Wed, 4 Jun 2003 20:26:47 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h553Pxr00868; Thu, 5 Jun 2003 13:26:05 +1000 Date: Thu, 5 Jun 2003 13:25:58 +1000 (EST) From: James Morris To: Patrick Mansfield cc: Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , , , Christoph Hellwig Subject: Re: 2.5.70-bk+ broken networking In-Reply-To: <20030604184341.A10256@beaverton.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2895 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev Content-Length: 695 Lines: 31 On Wed, 4 Jun 2003, Patrick Mansfield wrote: > [root@elm3b79 root]# ifup eth0 > sender address length == 0 This is a bug introduced by a coding style cleanup, fix below. - James -- James Morris --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc address); if (err < 0) return err; - m->msg_name = address; - } else - m->msg_name = NULL; - } + } + m->msg_name = address; + } else + m->msg_name = NULL; size = m->msg_iovlen * sizeof(struct iovec); if (copy_from_user(iov, m->msg_iov, size)) From acme@conectiva.com.br Wed Jun 4 20:31:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 20:31:23 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h553VI2x024366 for ; Wed, 4 Jun 2003 20:31:19 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19Z0FW-0000LY-00; Sun, 06 Jul 2003 00:31:38 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 06F5C1966C; Thu, 5 Jun 2003 03:32:09 +0000 (UTC) Date: Thu, 5 Jun 2003 00:32:08 -0300 From: Arnaldo Carvalho de Melo To: James Morris Cc: Patrick Mansfield , Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605033208.GK24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , James Morris , Patrick Mansfield , Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org, Christoph Hellwig References: <20030604184341.A10256@beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2896 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 894 Lines: 36 For the curious, it was introduced in changeset 1.1259.9.18 - Arnaldo Em Thu, Jun 05, 2003 at 01:25:58PM +1000, James Morris escreveu: > On Wed, 4 Jun 2003, Patrick Mansfield wrote: > > > [root@elm3b79 root]# ifup eth0 > > sender address length == 0 > > This is a bug introduced by a coding style cleanup, fix below. > > > - James > -- > James Morris > > > --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 > +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 > @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc > address); > if (err < 0) > return err; > - m->msg_name = address; > - } else > - m->msg_name = NULL; > - } > + } > + m->msg_name = address; > + } else > + m->msg_name = NULL; > > size = m->msg_iovlen * sizeof(struct iovec); > if (copy_from_user(iov, m->msg_iov, size)) > From davem@redhat.com Wed Jun 4 21:10:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 21:10:28 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h554AM2x025408 for ; Wed, 4 Jun 2003 21:10:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA00325; Wed, 4 Jun 2003 21:08:03 -0700 Date: Wed, 04 Jun 2003 21:08:02 -0700 (PDT) Message-Id: <20030604.210802.115939500.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes From: "David S. Miller" In-Reply-To: <20030604154003.5AA111387A@smtp2.aerasec.de> References: <20030604154003.5AA111387A@smtp2.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2897 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 217 Lines: 7 From: "Dr. Peter Bieringer " Date: Wed, 04 Jun 2003 17:40:03 +0200 Happen on playing around with IPsec on 2.5.70, 2.5.70 is ancient, many bugs fixed, please sync up to the current tree From davem@redhat.com Wed Jun 4 21:20:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 21:20:14 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h554KA2x025878 for ; Wed, 4 Jun 2003 21:20:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA00375; Wed, 4 Jun 2003 21:17:51 -0700 Date: Wed, 04 Jun 2003 21:17:50 -0700 (PDT) Message-Id: <20030604.211750.28820261.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] tulip/xircom initialization bug From: "David S. Miller" In-Reply-To: <20030604112136.7b8e2cf4.shemminger@osdl.org> References: <20030604112136.7b8e2cf4.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2898 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 502 Lines: 12 From: Stephen Hemminger Date: Wed, 4 Jun 2003 11:21:36 -0700 By inspection of device initialization code, this driver unregister's the net device in the error path even though the register_netdevice never succeeded. This is fully legal, unregister_netdevice() checks for existence of the netdev in the device list and if not found it returns an error. This severely simplifies error path handling while we convert all these drivers away from init_etherdev(). From davem@redhat.com Wed Jun 4 22:06:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 22:06:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5556T2x028773 for ; Wed, 4 Jun 2003 22:06:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA00526; Wed, 4 Jun 2003 22:03:24 -0700 Date: Wed, 04 Jun 2003 22:03:24 -0700 (PDT) Message-Id: <20030604.220324.116384963.davem@redhat.com> To: acme@conectiva.com.br Cc: jmorris@intercode.com.au, patmans@us.ibm.com, akpm@digeo.com, shemminger@osdl.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, hch@infradead.org Subject: Re: 2.5.70-bk+ broken networking From: "David S. Miller" In-Reply-To: <20030605033208.GK24515@conectiva.com.br> References: <20030604184341.A10256@beaverton.ibm.com> <20030605033208.GK24515@conectiva.com.br> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2899 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 343 Lines: 11 From: Arnaldo Carvalho de Melo Date: Thu, 5 Jun 2003 00:32:08 -0300 For the curious, it was introduced in changeset 1.1259.9.18 Christophe, PLEASE be more careful in the future. I value your changes, very much. However, you really need to get a little more meticulious when you submit changes. Thanks. From joe@tmsusa.com Wed Jun 4 22:33:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 22:33:40 -0700 (PDT) Received: from jyro.mirai.cx (dsl081-085-006.lax1.dsl.speakeasy.net [64.81.85.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h555XY2x029332 for ; Wed, 4 Jun 2003 22:33:34 -0700 Received: from tmsusa.com (neo [192.168.111.123]) by jyro.mirai.cx (Postfix) with ESMTP id 8EC9E17823; Wed, 4 Jun 2003 22:33:33 -0700 (PDT) Message-ID: <3EDED62D.3020808@tmsusa.com> Date: Wed, 04 Jun 2003 22:33:33 -0700 From: Joe User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger Cc: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik , akpm@digeo.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2900 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: joe@tmsusa.com Precedence: bulk X-list: netdev Content-Length: 961 Lines: 43 This fixes the tun problem nicely here - Joe Stephen Hemminger wrote: >Tun device was encapsulating the net_device in a private structure then doing: > unregister_netdev(&tun->dev); > kfree(tun); > rtnl_unlock(); > >This breaks with the delayed cleanup now in the network core. >Moving the kfree outside of the rtnl_unlock will fix it. > >Builds, but not sure how to use TUN to test it. > >As part of later refcounting changes, I do have a more complex change >that uses the same encapsulation as ethernet and other >devices. Will save it for later. > >diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c >--- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >+++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >@@ -551,10 +551,12 @@ > if (!(tun->flags & TUN_PERSIST)) { > dev_close(&tun->dev); > unregister_netdevice(&tun->dev); >- kfree(tun); > } > > rtnl_unlock(); >+ >+ if (!(tun->flags & TUN_PERSIST)) >+ kfree(tun); > return 0; > } > > > > > From davem@redhat.com Wed Jun 4 23:53:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 23:53:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h556r22x031893 for ; Wed, 4 Jun 2003 23:53:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA00812; Wed, 4 Jun 2003 23:50:05 -0700 Date: Wed, 04 Jun 2003 23:50:05 -0700 (PDT) Message-Id: <20030604.235005.26995218.davem@redhat.com> To: shemminger@osdl.org Cc: acme@conectiva.com.br, jgarzik@pobox.com, akpm@digeo.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation From: "David S. Miller" In-Reply-To: <20030604154022.0ef344ff.shemminger@osdl.org> References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2901 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 498 Lines: 16 From: Stephen Hemminger Date: Wed, 4 Jun 2003 15:40:22 -0700 Tun device was encapsulating the net_device in a private structure then doing: unregister_netdev(&tun->dev); kfree(tun); rtnl_unlock(); This breaks with the delayed cleanup now in the network core. Moving the kfree outside of the rtnl_unlock will fix it. Builds, but not sure how to use TUN to test it. This seems to indeed fix the problem for people, applied thanks. From vnuorval@tcs.hut.fi Thu Jun 5 01:44:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 01:44:17 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h558i52x003783 for ; Thu, 5 Jun 2003 01:44:06 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Thu, 5 Jun 2003 11:36:51 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h558apjH030219; Thu, 5 Jun 2003 11:36:51 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h558ala6030215; Thu, 5 Jun 2003 11:36:47 +0300 Date: Thu, 5 Jun 2003 11:36:47 +0300 (EEST) From: Ville Nuorvala To: Henrik Petander cc: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= , , , "netdev@oss.sgi.com" , , Venkata Jagana , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2903 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Wed, 4 Jun 2003, Henrik Petander wrote: > Hello Yoshifuji, > > On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > > > 2. Source address based routing > > > > I'm not sure why you need this (and tunnel) for MIP... > > Would you clearify for me? > > (IMHO, I believe we don't need this change if we use XFRM engine.) > > As far as I remember there are three main reasons for this: (Ville > correct me if forgot something) I think you've got all the main reasons. I'll get back to this issue if I suddenly remember something you forgot. :) -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From yoshfuji@linux-ipv6.org Thu Jun 5 03:11:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 03:11:55 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55ABf2x010081 for ; Thu, 5 Jun 2003 03:11:43 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h55ACPBo012800; Thu, 5 Jun 2003 19:12:25 +0900 Date: Thu, 05 Jun 2003 19:12:24 +0900 (JST) Message-Id: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> To: vnuorval@tcs.hut.fi Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, yoshfuji@linux-ipv6.org, nakam@linux-ipv6.org, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030531.000319.114704530.yoshfuji@linux-ipv6.org> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2904 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030531.000319.114704530.yoshfuji@linux-ipv6.org> (at Sat, 31 May 2003 00:03:19 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > In article (at Fri, 30 May 2003 17:34:40 +0300 (EEST)), Ville Nuorvala says: > > > here is a patch that fixes CONFIG_IPV6_SUBTREES and allows overriding > > normal routes with source address specific ones. This is for example > > needed in MIPv6 for handling the traffic to and from a mobile node's home > > address correctly. > > Let us test the patch. It seemed buggy when USAGI tested before. I've re-tested your latest CONFIG_IPV6_SUBTREE patch. The results of the restesting seems fine. However, I won't accept your patch as-is for now. The patch consists of several parts: 1. fixing bugs in IPv6 code 2. fixing bugs in CONFIG_IPV6_SUBTREE code 3. changing majority of keys of routing table. There's no problems with 1 and 2. However, We need to discuss on 3. As I said in other thread, the policy routing should be done in the other way. And, it is not good to change the semantics of CONFIG_IPV6_SUBTREE. In original, routing is looked up by destination address, and then, looked up by the source address; destination takes precedence over source. Your patch changes this. Source address takes precedence over destination address. From the point of the policy routing, both (and other attributes) should be considered equally, and this is what IPv4 routing table does. Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. The reason why I won't hurry is because I still believe it is not required for MIP6. Nakamura, one of our member, will describe the details. It takes precedence over "limited" policy(?) routing to introcuce generic policy routing. Anyway, will you split up your patch (into 1-3 above) first, please? Thanks. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From pb@bieringer.de Thu Jun 5 04:54:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 04:54:18 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Bs12x014073 for ; Thu, 5 Jun 2003 04:54:02 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id B90311387A; Thu, 5 Jun 2003 13:21:44 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id A12011387E; Thu, 5 Jun 2003 13:21:43 +0200 (CEST) X-AV-Checked: Thu Jun 5 13:21:43 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id B0BDE1387A; Thu, 5 Jun 2003 13:21:42 +0200 (CEST) Date: Thu, 05 Jun 2003 13:21:40 +0200 From: "Dr. Peter Bieringer" To: Maillist netdev Cc: usagi-users@linux-ipv6.org Subject: 2.5.70-bk9: no IPsec modules are autoloaded Message-ID: <29980000.1054812100@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2905 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi again, now playing around with 2.5.70-bk9...which still not solves the interoperability problem with FreeS/WAN. Are they talking different ESP? Sure known that autoloading of IPsec modules is broken...is this a bug or by design? The error messages of racoon are not very useful: 2003-06-05 11:34:34: INFO: main.c:174:main(): @(#)racoon 20001216 20001216 sakane@kame.net 2003-06-05 11:34:34: INFO: main.c:175:main(): @(#)This product linked OpenSSL 0.9.6b [engine] 9 Jul 2001 (http://www.openssl.org/) racoon: something error happened while pfkey initializing. 2003-06-05 11:34:34: ERROR: pfkey.c:364:pfkey_init(): libipsec failed pfkey open (Address family not supported by protocol) -> missing module "af_key" 2003-06-05 11:42:07: INFO: isakmp.c:1048:isakmp_ph2begin_r(): respond new phase 2 negotiation: 10.3.62.31[0]<=>10.3.62.35[0] 2003-06-05 11:42:08: ERROR: pfkey.c:209:pfkey_handler(): pfkey UPDATE failed: No buffer space available 2003-06-05 11:42:08: ERROR: pfkey.c:209:pfkey_handler(): pfkey ADD failed: No buffer space available 2003-06-05 11:42:22: ERROR: pfkey.c:740:pfkey_timeover(): *remote* give up to get IPsec-SA due to time up to wait. 2003-06-05 11:42:37: INFO: pfkey.c:1367:pk_recvexpire(): IPsec-SA expired: ESP/Transport *remote*->*local* spi=256398122(0xf48532a) 2003-06-05 11:43:07: INFO: isakmp.c:1520:isakmp_ph1expire(): ISAKMP-SA expired *local*[500]-*remote*[500] spi:3087159632fe32b6:88a45a3eabd327fd 2003-06-05 11:43:08: INFO: isakmp.c:1568:isakmp_ph1delete(): ISAKMP-SA deleted *remote*[500]-*local*[500] spi:3087159632fe32b6:88a45a3eabd327fd -> missing module "ah" and "esp" (not so funny, cost me about 15 min to find the solution for "No buffer space available" - "why it worked yesterday and not today") None of the above ones are automagically loaded, while others (e.g. the encrytion ones) are. BTW: is this normal? (host is IPv4 only at the moment): 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): 127.0.0.1[500] used as isakmp port (fd=7) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip1*[500] used as isakmp port (fd=8) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip2*[500] used as isakmp port (fd=9) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip3*[500] used as isakmp port (fd=10) 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From davem@redhat.com Thu Jun 5 05:14:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:15:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CEw2x014638 for ; Thu, 5 Jun 2003 05:14:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA01807; Thu, 5 Jun 2003 05:12:38 -0700 Date: Thu, 05 Jun 2003 05:12:38 -0700 (PDT) Message-Id: <20030605.051238.74748591.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: 2.5.70-bk9: no IPsec modules are autoloaded From: "David S. Miller" In-Reply-To: <29980000.1054812100@klopffest.muc.aerasec.de> References: <29980000.1054812100@klopffest.muc.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2906 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Dr. Peter Bieringer" Date: Thu, 05 Jun 2003 13:21:40 +0200 Sure known that autoloading of IPsec modules is broken...is this a bug or by design? You (or someone) has to add the appropriate entries to /etc/modules.conf From lpetande@tml.hut.fi Thu Jun 5 05:17:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:17:32 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CHQ2x014955 for ; Thu, 5 Jun 2003 05:17:27 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55CHO5b017649 for ; Thu, 5 Jun 2003 15:17:24 +0300 Message-ID: <3EDF36AA.9020403@tml.hut.fi> Date: Thu, 05 Jun 2003 15:25:14 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Bug in ipv6 ipsec in handling of packets with extension headers Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=1 Fuz1=1 Fuz2=1 X-archive-position: 2907 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Hi, There's a bug in get_offset function of ah6 and esp6. The function returns also a pointer, prev_hdr, pointing to the last extension header before the IPSec headers. This pointer points to the skb. The ipsec headers go between the payload and the extension header, making the pointer invalid. However, after this the pointer is used for setting the next header field of the extension header to IPPROTO_ESP or IPPROTO_AH. This corrupts the packet, if any extension headers are present. An easy way to test this is to send a data packet with routing header protected by IPSec. A possible fix is to change the pointer into an offset from the start of the packet and use the offset later to set the nexthdr value in the extension header. Thanks, Henrik From davem@redhat.com Thu Jun 5 05:19:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:19:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CJT2x015274 for ; Thu, 5 Jun 2003 05:19:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA01843; Thu, 5 Jun 2003 05:17:10 -0700 Date: Thu, 05 Jun 2003 05:17:09 -0700 (PDT) Message-Id: <20030605.051709.104035049.davem@redhat.com> To: lpetande@tml.hut.fi Cc: netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers From: "David S. Miller" In-Reply-To: <3EDF36AA.9020403@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2908 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Thu, 05 Jun 2003 15:25:14 +0300 A possible fix is to change the pointer into an offset from the start of the packet and use the offset later to set the nexthdr value in the extension header. Please indicate the version of the sources you are looking at when making reports. Thank you. From pb@bieringer.de Thu Jun 5 05:26:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:27:03 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CQw2x015604 for ; Thu, 5 Jun 2003 05:26:59 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 486931387E; Thu, 5 Jun 2003 14:26:52 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 88DE71387A; Thu, 5 Jun 2003 14:26:50 +0200 (CEST) X-AV-Checked: Thu Jun 5 14:26:50 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id C15191387E; Thu, 5 Jun 2003 14:26:49 +0200 (CEST) Date: Thu, 05 Jun 2003 14:26:47 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: 2.5.70-bk9: no IPsec modules are autoloaded Message-ID: <34470000.1054816007@klopffest.muc.aerasec.de> In-Reply-To: <20030605.051238.74748591.davem@redhat.com> References: <29980000.1054812100@klopffest.muc.aerasec.de> <20030605.051238.74748591.davem@redhat.com> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2909 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, June 05, 2003 05:12:38 AM -0700 "David S. Miller" wrote: > From: "Dr. Peter Bieringer" > Date: Thu, 05 Jun 2003 13:21:40 +0200 > > Sure known that autoloading of IPsec modules is broken...is this a bug > or by design? > > You (or someone) has to add the appropriate entries to > /etc/modules.conf Ok, good to know. Are there any aliases possible like alias-something-ike af_key alias-something-cryptobasic-49 ah alias-something-cryptobasic-50 esp alias-something-crypto-modules-0 crypt_null pre-install esp modprobe ah BTW: isn't this file called now "modprobe. conf"? Thanks, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From vnuorval@tcs.hut.fi Thu Jun 5 05:52:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:52:22 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CqG2x016165 for ; Thu, 5 Jun 2003 05:52:17 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Thu, 5 Jun 2003 15:40:59 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h55CewjH031145; Thu, 5 Jun 2003 15:40:58 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h55Cer6J031141; Thu, 5 Jun 2003 15:40:54 +0300 Date: Thu, 5 Jun 2003 15:40:53 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: lpetande@tml.hut.fi, , , , , , , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2911 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > 2. Source address based routing > > I'm not sure why you need this (and tunnel) for MIP... > Would you clearify for me? > (IMHO, I believe we don't need this change if we use XFRM engine.) > A few comments about the tunnels: Can you run link-scope protocols over an XFRM tunnel? The MIPv6 spec more or less requires this feature if you are to support protocols like DHCPv6 and MLD. (See eg. sections 8.5, 10.4.3 and 10.4.4 in the MIPv6 draft 22). The only way to get them to work is AFAICS that there is a virtual net_device associated with every tunnel. Are XFRM tunnels like this? At least they didn't seem to be, based on the xfrm6_tunnel patch sent to netdev last week... If the tunnels aren't separate devices I can straight away think of one scenario where we run into trouble. 1. MN receives RA with M or O flags set from a router on the foreign link. 2. MN receives a MPA with M or O flags set from HA. In the first case the DHCP queries should be sent to the current link the MN is attached to, in the latter to the HA. I dont see any way for the MN to separate these two cases while sending the DHCP queries, _unless_ they are sent through different interfaces (i.e. the physical vs the virtual tunnel interface). On a more general note, the driver I sent aims to provide provide a completely RFC 2473 compliant tunnel interface. :) Things (at the moment) missing from the xfrm6_tunnel are at least: - tunnel encapsulation limit destination sub-option support - forwarding of ICMP errors to the original source of the packet - transparent fragmentation of packets if MTU minus size of tunnel headers less than IPV6_MIN_MTU - ability to configure things like traffic class and flowlabel of encapsulating ipv6 header Perhaps we could make feature complete ip6ip6 tunnels if we combined xfrm6_tunnel and ip6_tunnel? :) Regards, Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From lpetande@tml.hut.fi Thu Jun 5 05:51:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:51:48 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Cph2x016053 for ; Thu, 5 Jun 2003 05:51:44 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55Cpf5b024962; Thu, 5 Jun 2003 15:51:42 +0300 Message-ID: <3EDF3EB4.8010105@tml.hut.fi> Date: Thu, 05 Jun 2003 15:59:32 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> In-Reply-To: <20030605.051709.104035049.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=2 Fuz1=2 Fuz2=2 X-archive-position: 2910 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev David S. Miller wrote: > From: Henrik Petander > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > A possible fix is to change the pointer into an offset from the start of > the packet and use the offset later to set the nexthdr value in the > extension header. > > Please indicate the version of the sources you are looking > at when making reports. Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: the problem is with esp6 and not with ah6, which does not use the get_offset function. Henrik From pb@bieringer.de Thu Jun 5 06:50:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 06:51:00 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Dog2x025545 for ; Thu, 5 Jun 2003 06:50:43 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 0CDEE1387A; Thu, 5 Jun 2003 15:07:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 228101387E; Thu, 5 Jun 2003 15:07:40 +0200 (CEST) X-AV-Checked: Thu Jun 5 15:07:40 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 5D90D1387A; Thu, 5 Jun 2003 15:07:39 +0200 (CEST) Date: Thu, 05 Jun 2003 15:07:36 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods Message-ID: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2912 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi again, because I got no success, I've tried different encryption methods than 3DES. And *suddenly* it began to work. One side : 2.5.70-bk9 Other side: FreeS/WAN 1.99 with algopatches 0.8.1rc2 Result: AES --- AES-128: working AES-192: not working AES-256: not working FreeS/WAN: 112 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: initiate 003 "freeswan-racoon-tunnel" #14: ESP transform ESP_AES passed key_len=32 > 16 032 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: internal error 3DES ---- Not working, no message Blowfish -------- blowfish-128: working Other key lengths: not working NO_PROPOSAL_CHOSEN Other algorithms: not tested at the moment I'm very wondering why 3DES is incompatible in IPsec-SA modus, while working in IKE. Can someone confirm and/or extend this compatibility test? TIA, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From jmorris@intercode.com.au Thu Jun 5 07:16:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:16:21 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:DCeOq5adnwF804cXIr9y5TBswOrhw6Mf@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EGD2x026587 for ; Thu, 5 Jun 2003 07:16:15 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h55EG3r03820; Fri, 6 Jun 2003 00:16:03 +1000 Date: Fri, 6 Jun 2003 00:16:02 +1000 (EST) From: James Morris To: "Dr. Peter Bieringer" cc: netdev@oss.sgi.com, Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2913 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, Dr. Peter Bieringer wrote: > I'm very wondering why 3DES is incompatible in IPsec-SA modus, while > working in IKE. What happens if you use manual configurations (e.g. setkey with the native ipsec) ? With this, we can first establish whether on the wire stuff is fundamentally working, before looking at negotiated configurations. - James -- James Morris From pb@bieringer.de Thu Jun 5 07:20:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:20:23 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EKH2x026907 for ; Thu, 5 Jun 2003 07:20:18 -0700 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 9F0881387E; Thu, 5 Jun 2003 16:20:11 +0200 (CEST) X-AV-Checked: Thu Jun 5 16:20:11 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id D05C51387A; Thu, 5 Jun 2003 16:20:10 +0200 (CEST) Date: Thu, 05 Jun 2003 16:20:09 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: (usagi-users 02412) IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 Message-ID: <3525719.1054830009@[10.3.62.6]> In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2914 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Ohoh, sorry for confusions, my racoon here was a little bit buggy... ...be warned, not using RHL's ipsec-tools from rawhide...looks like the racoon isn't compiled in a proper environment :-( it doesn't support DES and causes trouble on 3DES *grmml*). The reported 3DES problem was solved now by using a fresh compiled one. But the AES one still occurs. > FreeS/WAN: > 112 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: initiate > 003 "freeswan-racoon-tunnel" #14: ESP transform ESP_AES passed key_len=32 > > 16 032 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: internal error Or on 192 bits: 112 "freeswan-racoon-tunnel" #15: STATE_QUICK_I1: initiate 003 "freeswan-racoon-tunnel" #15: ESP transform ESP_AES passed key_len=24 > 16 032 "freeswan-racoon-tunnel" #15: STATE_QUICK_I1: internal error Strange, looks like racoon reports always AES key length 16*8, but in raccoon.conf was "aes 192" or "aes 256" specified. Peter, partially happy now -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From jmorris@intercode.com.au Thu Jun 5 07:26:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:26:16 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:HuI/ir4pemYSIE2wCKzZONjohwNV9+Mv@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EQ62x027256 for ; Thu, 5 Jun 2003 07:26:07 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h55EPxr03865; Fri, 6 Jun 2003 00:25:59 +1000 Date: Fri, 6 Jun 2003 00:25:58 +1000 (EST) From: James Morris To: "Dr. Peter Bieringer" cc: netdev@oss.sgi.com, Subject: Re: (usagi-users 02412) IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 In-Reply-To: <3525719.1054830009@[10.3.62.6]> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2915 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, Dr. Peter Bieringer wrote: > Ohoh, sorry for confusions, my racoon here was a little bit buggy... > > ...be warned, not using RHL's ipsec-tools from rawhide...looks like the > racoon isn't compiled in a proper environment :-( it doesn't support DES > and causes trouble on 3DES *grmml*). Actually, the ABI changed recently, due to renumbering the algorithim ids in pfkeyv2.h. (This will affect setkey as well). - James -- James Morris From lpetande@tml.hut.fi Thu Jun 5 07:28:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:28:29 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55ESG2x027577 for ; Thu, 5 Jun 2003 07:28:17 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55ERY5b012776; Thu, 5 Jun 2003 17:27:34 +0300 Message-ID: <3EDF552D.3060003@tml.hut.fi> Date: Thu, 05 Jun 2003 17:35:25 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alexey , "David S. Miller" , yoshfuji@linux-ipv6.org, Venkata Jagana , Krishna Kumar , Antti Tuominen , Ville Nuorvala , netdev@oss.sgi.com Subject: RFC: Mechanism for adding MIPv6 extension headers Content-Type: multipart/mixed; boundary="------------080703090805070306090804" X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=8 Fuz1=8 Fuz2=8 X-archive-position: 2916 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------080703090805070306090804 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hello all, Attached is a patch which includes the relevant functionality for adding mipv6 extension headers to data packets. It probably does not compile, as I included only the files which are directly involved in the adding mechanism to minimize the size of the patch. If you want to try out the mechanism I can prepare a full patch. The patch is against bk changeset 1.1325 and is meant as a basis for discussion about the extension header addition mechanism. The code is still work in progress and lacks a proper interface. The mechanism has been tested with tcp and raw sockets and with tcp+ipsec. An overview of the system: User inserts the mipv6 information into the kernel. Based on this information ip6_add_miproute adds a new cached route. This cached route contains mip6_output as output function for adding the extension headers, a decreased pmtu and mipv6 binding information. The route also contains a pointer (u.dst->child) to a new route which contains correct forwarding information for mipv6 intermediate hops and the raw pmtu. Adding of extension headers in mip6_output is done as in esp6_output. The mechanism is fairly close to xfrm, except for storing the mipv6 information only in a cached route. Thus the state for a mipv6 binding is soft. This is a tradeoff between keeping the overhead of mipv6 small and having persistent state. If routes change, the mipv6 state can be easily reinserted into the kernel, since the userspace daemon needs to keep track of it for signaling purposes anyhow. I will not go more into details here, but I am happy to answer any questions about the design. Your comments are much appreciated. Regards, Henrik --------------080703090805070306090804 Content-Type: text/plain; name="mip6-exthdr.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mip6-exthdr.patch" --- net/ipv6/mip6.c 1969-12-31 22:00:00.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/mip6.c 2003-06-05 04:57:00.000000000 -0200 @@ -0,0 +1,341 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define IPPROTO_MOBILITY 62 + + +struct mipv6_be +{ + u8 payload; /* Payload Protocol */ + u8 length; /* MH Length */ + u8 type; /* MH Type */ + u8 reserved; /* Reserved */ + u16 checksum; /* Checksum */ + u8 status; /* Error code */ + u8 reserved_2; + struct in6_addr home_addr; +} __attribute__ ((packed)); + +struct socket *mipv6_mh_socket = NULL; + +static int dstopts_getfrag(const void *data, struct in6_addr *addr, + char *buff, unsigned int offset, unsigned int len) +{ + memcpy(buff, data + offset, len); + return 0; +} +static __inline__ void mip6_xmit_lock(void) +{ + + + local_bh_disable(); + if (unlikely(!spin_trylock(&mipv6_mh_socket->sk->lock.slock))) + BUG(); +} + +static __inline__ void mip6_xmit_unlock(void) +{ + spin_unlock_bh(&mipv6_mh_socket->sk->lock.slock); +} + + +void mip6_send_be(struct in6_addr *daddr, + struct in6_addr *saddr, + struct in6_addr *hao_addr) +{ + struct flowi fl; + struct mipv6_be be; + struct sock *sk = mipv6_mh_socket->sk; + + memset(&fl, 0, sizeof(fl)); + fl.proto = IPPROTO_MOBILITY; + ipv6_addr_copy(&fl.fl6_dst, daddr); + ipv6_addr_copy(&fl.fl6_src, saddr); + fl.fl6_flowlabel = 0; + fl.oif = sk->bound_dev_if; + + memset(&be, 0, sizeof(be)); + be.payload = NEXTHDR_NONE; + be.length = 2; + be.type = 7; + ipv6_addr_copy(&be.home_addr, hao_addr); + be.status = 1; /* Home address option without binding */ + + mip6_xmit_lock(); + ip6_build_xmit(sk, dstopts_getfrag, &be, &fl, sizeof(be), NULL, 255, + MSG_DONTWAIT); + mip6_xmit_unlock(); +} +/* TODO: Move the home address option / BCE check to tcp/udp/raw + * processing so cached route in socket can be used + * to avoid route lookup + */ +int mip6_hao_check(struct sk_buff *skb, u8 nexthdr) +{ + struct inet6_skb_parm *opt = (struct inet6_skb_parm *) skb->cb; + struct in6_addr *coaddr; + struct rt6_info *rt; + /* Home address option in mobility header messages is checked + by userspace mipv6 daemon */ + + if (!opt || !opt->dst_nofrag || nexthdr == IPPROTO_MOBILITY) + return 0; + if (opt && opt->dst_nofrag) { + rt = rt6_lookup(&skb->nh.ipv6h->saddr, &skb->nh.ipv6h->daddr, 0, 0); + if (rt) { + if (rt->binding.flags & MIPV6_F_BCE) { + dst_release(&rt->u.dst); + return 0; + } + else + dst_release(&rt->u.dst); + } + coaddr = (struct in6_addr *)((u8 *)skb->nh.raw + opt->dst_nofrag); + mip6_send_be(coaddr, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); + return -1; + } +} + +/** + * mipv6_append_rt2hdr - Add Type 2 Routing Header + * @rt: buffer for new routing header + * @addr: intermediate hop address + * + * Adds a Routing Header Type 2 in a packet. Stores newly created + * routing header in buffer @rt. Type 2 RT only carries one address, + * so there is no need to process old routing header. @rt must have + * allocated space for 24 bytes. + **/ +void mipv6_append_rt2hdr(struct rt2_hdr *rt, struct in6_addr *addr) +{ + struct rt2_hdr *rt2 = (struct rt2_hdr *)rt; + + memset(rt2, 0, sizeof(*rt2)); + rt2->rt_hdr.type = 2; + rt2->rt_hdr.hdrlen = 2; + rt2->rt_hdr.segments_left = 1; + ipv6_addr_copy(&rt2->addr, addr); +} + +struct mipv6_padn +{ + __u8 type; + __u8 length; + __u8 data[0]; +} __attribute__ ((packed)); + +/* + * Add Pad1 or PadN option to data + */ +int mipv6_add_pad(u8 *data, int n) +{ + struct mipv6_padn *padn; + + if (n <= 0) return 0; + if (n == 1) { + *data = MIPV6_OPT_PAD1; + return 1; + } + padn = (struct mipv6_padn *)data; + padn->type = MIPV6_OPT_PADN; + padn->length = n - 2; + memset(padn->data, 0, n - 2); + return n; +} + +/** + * mipv6_append_home_addr - Add Home Address Option + * @opt: buffer for Home Address Option + * @offset: offset from beginning of @opt + * @addr: address for HAO + * + * Adds a Home Address Option to a packet. Option is stored in + * @offset from beginning of @opt. The option is created but the + * original source address in IPv6 header is left intact. The source + * address will be changed from home address to CoA after the checksum + * has been calculated in getfrag. Padding is done automatically, and + * @opt must have allocated space for both actual option and pad. + * Returns offset from @opt to end of options. + **/ +int mipv6_append_home_addr(u8 *opt, struct in6_addr *addr) +{ + int pad; + struct ipv6_dstopt_homeaddr *ho; + int offset = sizeof(struct ipv6_opt_hdr); + + pad = (6 - offset) & 7; + mipv6_add_pad(opt + offset, pad); + + ho = (struct ipv6_dstopt_homeaddr *)(opt + offset + pad); + ho->type = IPV6_TLV_HOMEADDR; + ho->length = sizeof(*ho) - 2; + ipv6_addr_copy(&ho->addr, addr); + + return offset + pad + sizeof(*ho); +} + + +static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, int *offset_prevhdr) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); + u8 nextnexthdr; + + *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; + + while (offset + 1 < packet_len) { + + switch (*nexthdr) { + + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + *offset_prevhdr = offset; + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + case NEXTHDR_DEST: + nextnexthdr = + ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; + /* XXX We know the option is inner dest opt + with next next header check. */ + if (nextnexthdr != NEXTHDR_HOP && + nextnexthdr != NEXTHDR_ROUTING && + nextnexthdr != NEXTHDR_DEST) { + return offset; + } + *offset_prevhdr = offset; + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + default : + return offset; + } + } + + return offset; +} + + +int mip6_output(struct sk_buff *skb) + +{ + struct ipv6hdr *iph = NULL, *top_iph; + struct dst_entry *dst = skb->dst; + struct ipv6_opt_hdr *prevhdr = NULL; + struct rt6_info *rt = (struct rt6_info *)skb->dst; + u8 nexthdr; + int offset_prevhdr = 0; + int hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &offset_prevhdr); + int len, err = 0; + + if (nexthdr == IPPROTO_MOBILITY) /* No exthdrs for MH */ + goto out; + + /* First, if the skb is not checksummed, complete checksum. */ + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) { + err = -EINVAL; + goto error; + } + iph = kmalloc(hdr_len, GFP_ATOMIC); + + if (!iph) { + err = -ENOMEM; + goto error; + } + + memcpy(iph, skb->nh.raw, hdr_len); + __skb_pull(skb, hdr_len); + + /* TODO: Is this correct ? */ + if ((err = skb_cow(skb, mip6_hdrlen(rt->binding.flags)) != 0)) + goto error; + if (rt->binding.flags & MIPV6_F_BULE) { + struct ipv6_opt_hdr *dstopt; + dstopt = (struct ipv6_opt_hdr *)skb_push(skb, sizeof(struct ipv6_dstopt_homeaddr) + 6); + dstopt->nexthdr = nexthdr; + len = mipv6_append_home_addr((u8 *)dstopt, &iph->saddr); + dstopt->hdrlen = (len >> 3) - 1; + ipv6_addr_copy(&iph->saddr, &rt->binding.lcoa); + skb->h.raw = (unsigned char *)dstopt; + nexthdr = IPPROTO_DSTOPTS; + + } + if (rt->binding.flags & MIPV6_F_BCE) { + struct rt2_hdr *rt2; + rt2 = (struct rt2_hdr *)skb_push(skb, sizeof(struct rt2_hdr)); + skb->h.raw = (unsigned char *)rt2; + mipv6_append_rt2hdr(rt2, &iph->daddr); + ipv6_addr_copy(&iph->daddr, &rt->binding.rcoa); + rt2->rt_hdr.nexthdr = nexthdr; + nexthdr = IPPROTO_ROUTING; + } + + top_iph = (struct ipv6hdr *)skb_push(skb, hdr_len); + memcpy(top_iph, iph, hdr_len); + skb->nh.raw = skb->data; + kfree(iph); + top_iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + if (offset_prevhdr) { + prevhdr = (struct ipv6_opt_hdr *)((int *)top_iph + offset_prevhdr); + prevhdr->nexthdr = nexthdr; + + } else { + top_iph->nexthdr = nexthdr; + } + + out: + if ((skb->dst = dst_pop(dst)) == NULL) { + err = -EHOSTUNREACH; + goto error; + } + + return NET_XMIT_BYPASS; + error: + kfree_skb(skb); + return err; +} + +int mip6_init(void) +{ + mipv6_mh_socket = sock_alloc(); + mipv6_mh_socket->type = SOCK_RAW; + struct sock *sk; + int err; + + if ((err = sock_create(PF_INET6, SOCK_RAW, IPPROTO_MOBILITY, + &mipv6_mh_socket)) < 0) { + printk(KERN_ERR + "Failed to initialize the MIP6 MH control socket (err %d).\n", + err); + sock_release(mipv6_mh_socket); + mipv6_mh_socket = NULL; /* for safety */ + return err; + } + + sk = mipv6_mh_socket->sk; + sk->allocation = GFP_ATOMIC; + sk->sndbuf = SK_WMEM_MAX; + sk->prot->unhash(sk); + return 0; +} + +void mip6_cleanup(void) +{ + if (mipv6_mh_socket) sock_release(mipv6_mh_socket); + mipv6_mh_socket = NULL; /* For safety. */ +} + + +MODULE_LICENSE("GPL"); --- net/ipv6/route.c 2003-06-05 08:48:57.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/route.c 2003-06-05 06:37:59.000000000 -0200 @@ -52,7 +52,7 @@ #include #include #include - +#include #include #ifdef CONFIG_SYSCTL @@ -336,7 +336,7 @@ return err; } -/* No rt6_lock! If COW failed, the function returns dead route entry +/* No rt6_lock! If COW faild, the function returns dead route entry with dst->error set to errno value. */ @@ -363,12 +363,8 @@ rt->u.dst.flags |= DST_HOST; #ifdef CONFIG_IPV6_SUBTREES - if (rt->rt6i_src.plen && saddr) { - ipv6_addr_copy(&rt->rt6i_src.addr, saddr); - rt->rt6i_src.plen = 128; - } + rt->rt6i_src.plen = ort->rt6i_src.plen; #endif - rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway); dst_hold(&rt->u.dst); @@ -885,7 +881,7 @@ struct rt6_info *rt, *nrt; /* Locate old route to this destination. */ - rt = rt6_lookup(dest, NULL, neigh->dev->ifindex, 1); + rt = rt6_lookup(dest, saddr, neigh->dev->ifindex, 1); if (rt == NULL) return; @@ -1052,6 +1048,9 @@ nrt = ip6_rt_copy(rt); if (nrt == NULL) goto out; +#ifdef CONFIG_IPV6_SUBTREES + nrt->rt6i_src.plen = rt->rt6i_src.plen; +#endif ipv6_addr_copy(&nrt->rt6i_dst.addr, daddr); nrt->rt6i_dst.plen = 128; nrt->u.dst.flags |= DST_HOST; @@ -1162,7 +1161,107 @@ } read_unlock_bh(&rt6_lock); } +/* TODO: Move struct definition + * to a header file under include/linux +*/ +struct mipv6_info_user +{ + struct mip6_info bind; + unsigned long expires; + struct in6_addr src; + struct in6_addr dst; +}; + +/* Adds mip6 related info and a stacked dst entry to the new cached route. + */ +static void fill_mip6_rt(struct rt6_info *mip6rt, struct rt6_info *coart, struct mip6_info *bind) +{ + mip6rt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; + mip6rt->u.dst.flags = DST_HOST; + mip6rt->u.dst.header_len = mip6_hdrlen(mip6rt->binding.flags); + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1] - + mip6rt->u.dst.header_len; + mip6rt->u.dst.metrics[RTAX_ADVMSS-1] = max_t(unsigned int, dst_pmtu(&mip6rt->u.dst) - 60, ip6_rt_min_advmss); + if (mip6rt->u.dst.metrics[RTAX_ADVMSS-1] > 65535-20) + mip6rt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + mip6rt->u.dst.child = dst_clone(&coart->u.dst); /* Is this correct ? */ + memcpy(&mip6rt->binding, bind, sizeof(bind)); + mip6rt->u.dst.output = mip6_output; +} +/* Add mipv6 information to a new cache route entry. + * Mostly copied code from rt6_pmtu_discovery + */ +int ip6_add_miproute(struct mipv6_info_user *mipinfo) +{ + /* First look up the coa route */ + struct rt6_info *rt, *mip6rt, *coart = NULL; + int err = 0; + + if ((rt = (struct rt6_info *)rt6_lookup(&mipinfo->dst, &mipinfo->src, 0, 0)) == NULL) { + return -ENOENT; + } + + /* + * Delete old host route before adding new one. TODO: Could we just modify the existing cache + * route after locking the routing table ? + */ + if (rt->rt6i_flags & RTF_CACHE) { + ip6_del_rt(rt, NULL, NULL); + rt = NULL; + } + + if ((coart = rt6_lookup(&mipinfo->bind.rcoa, &mipinfo->bind.lcoa, 0, 0)) == NULL) { + err = -NOENT; + goto out; + } + /* Network route. + Two cases are possible: + 1. It is connected route. Action: COW + 2. It is gatewayed route or NONEXTHOP route. Action: clone it. + */ + if (!coart->rt6i_nexthop && !(coart->rt6i_flags & RTF_NONEXTHOP)) { + mip6rt = rt6_cow(coart, &mipinfo->dst, &mipinfo->src); + if (!mip6rt->u.dst.error) { + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1]; + dst_set_expires(&mip6rt->u.dst, HZ*mipinfo->expires); + fill_mip6_rt(mip6rt, coart, &mipinfo->bind); + dst_release(&mip6rt->u.dst); + } + } else { + + mip6rt = ip6_rt_copy(coart); + ipv6_addr_copy(&mip6rt->rt6i_dst.addr, &mipinfo->dst); + +#ifdef CONFIG_IPV6_SUBTREES + ipv6_addr_copy(&mip6rt->rt6i_src.addr, &mipinfo->src); + mip6rt->rt6i_src.plen = 128; +#endif + mip6rt->rt6i_dst.plen = 128; + mip6rt->u.dst.flags |= DST_HOST; + mip6rt->rt6i_nexthop = neigh_clone(coart->rt6i_nexthop); + dst_set_expires(&mip6rt->u.dst, HZ*mipinfo->expires); + mip6rt->rt6i_flags |= RTF_DYNAMIC|RTF_CACHE|RTF_EXPIRES; + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1]; + fill_mip6_rt(mip6rt, coart, &mipinfo->bind); + rt6_ins(mip6rt, NULL, NULL); + } + out: + if (coart) dst_release(&coart->u.dst); + if (rt) dst_release(&rt->u.dst); + return err; +} + +static int add_mip6_binding(void *arg) +{ + + struct mipv6_info_user mip; + if (copy_from_user(&mip, arg, sizeof(mip))) { + return -EINVAL; + } + + return ip6_add_miproute(&mip); +} int ipv6_route_ioctl(unsigned int cmd, void *arg) { struct in6_rtmsg rtmsg; @@ -1192,9 +1291,18 @@ rtnl_unlock(); return err; + case SIOCADDMIPINFO: + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + rtnl_lock(); + err = add_mip6_binding(arg); + rtnl_unlock(); + return err; }; - return -EINVAL; + + + return -EINVAL; } /* @@ -1786,12 +1894,11 @@ static int rt6_stats_seq_show(struct seq_file *seq, void *v) { - seq_printf(seq, "%04x %04x %04x %04x %04x %04x %04x\n", + seq_printf(seq, "%04x %04x %04x %04x %04x %04x\n", rt6_stats.fib_nodes, rt6_stats.fib_route_nodes, rt6_stats.fib_rt_alloc, rt6_stats.fib_rt_entries, rt6_stats.fib_rt_cache, - atomic_read(&ip6_dst_ops.entries), - rt6_stats.fib_discarded_routes); + atomic_read(&ip6_dst_ops.entries)); return 0; } --- net/ipv6/af_inet6.c 2003-06-05 08:48:57.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/af_inet6.c 2003-06-03 10:11:38.000000000 -0200 @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -310,7 +311,7 @@ } else { if (addr_type != IPV6_ADDR_ANY) { /* ipv4 addr of the socket is invalid. Only the - * unspecified and mapped address have a v4 equivalent. + * unpecified and mapped address have a v4 equivalent. */ v4addr = LOOPBACK4_IPV6; if (!(addr_type & IPV6_ADDR_MULTICAST)) { @@ -475,7 +476,7 @@ case SIOCADDRT: case SIOCDELRT: - + case SIOCADDMIPINFO: return(ipv6_route_ioctl(cmd,(void *)arg)); case SIOCSIFADDR: @@ -780,6 +781,14 @@ err = ndisc_init(&inet6_family_ops); if (err) goto ndisc_fail; +#ifdef CONFIG_IPV6_TUNNEL + err = ip6_tunnel_init(); + if (err) + goto ip6_tunnel_fail; +#endif + err = mip6_init(); + if (err) + goto mip6_fail; err = igmp6_init(&inet6_family_ops); if (err) goto igmp_fail; @@ -816,7 +825,6 @@ /* Init v6 transport protocols. */ udpv6_init(); tcpv6_init(); - return 0; #ifdef CONFIG_PROC_FS @@ -834,6 +842,12 @@ igmp6_cleanup(); #endif igmp_fail: + mip6_cleanup(); +mip6_fail: +#ifdef CONFIG_IPV6_TUNNEL + ip6_tunnel_cleanup(); +ip6_tunnel_fail: +#endif ndisc_cleanup(); ndisc_fail: icmpv6_cleanup(); @@ -869,6 +883,10 @@ ip6_route_cleanup(); ipv6_packet_cleanup(); igmp6_cleanup(); + mip6_cleanup(); +#ifdef CONFIG_IPV6_TUNNEL + ip6_tunnel_cleanup(); +#endif ndisc_cleanup(); icmpv6_cleanup(); #ifdef CONFIG_SYSCTL --- include/net/mipv6.h 1969-12-31 22:00:00.000000000 -0200 +++ ../mipv6-kernel/include/net/mipv6.h 2003-06-05 06:21:37.000000000 -0200 @@ -0,0 +1,53 @@ +/* mipv6.h - Mobile IPv6 kernel support */ + +#ifndef _NET_MIPV6_H +#define _NET_MIPV6_H + +#define MIPV6_F_BULE 0x1 +#define MIPV6_F_BCE 0x2 +#define MIPV6_OPT_PAD1 0x00 +#define MIPV6_OPT_PADN 0x01 +/** + * NIPV6ADDR - macro for IPv6 addresses + * @addr: Network byte order IPv6 address + * + * Macro for printing IPv6 addresses. Used in conjunction with + * printk() or derivatives (such as DEBUG macro). + **/ +#define NIPV6ADDR(addr) \ + ntohs(((u16 *)addr)[0]), \ + ntohs(((u16 *)addr)[1]), \ + ntohs(((u16 *)addr)[2]), \ + ntohs(((u16 *)addr)[3]), \ + ntohs(((u16 *)addr)[4]), \ + ntohs(((u16 *)addr)[5]), \ + ntohs(((u16 *)addr)[6]), \ + ntohs(((u16 *)addr)[7]) + +struct ipv6_dstopt_homeaddr +{ + __u8 type; /* type-code for option */ + __u8 length; /* option length */ + struct in6_addr addr; /* home address */ +} __attribute__ ((packed)); +static inline int mip6_hdrlen(int flags) +{ + int miphdrlen = 0; + + if (flags & MIPV6_F_BULE) + miphdrlen = sizeof(struct ipv6_dstopt_homeaddr) + 6; + if (flags & MIPV6_F_BCE) + miphdrlen += sizeof(struct rt2_hdr); + return miphdrlen; +} +int mip6_output(struct sk_buff *skb); +struct ipv6_txoptions * +mipv6_modify_txoptions(struct sock *sk, + struct ipv6_txoptions *old_opt, struct flowi *fl, + struct dst_entry **dst); + +int mip6_hao_check(struct sk_buff *skb, u8 nexthdr); +int mip6_init(void); +void mip6_cleanup(void); + +#endif /* _NET_MIPV6_H */ --- include/net/ip6_fib.h 2003-06-05 08:48:46.000000000 -0200 +++ ../mipv6-kernel/include/net/ip6_fib.h 2003-06-03 10:11:19.000000000 -0200 @@ -50,6 +50,13 @@ int plen; }; +struct mip6_info +{ + struct in6_addr lcoa; + struct in6_addr rcoa; + u32 flags; +}; + struct rt6_info { union { @@ -71,8 +78,9 @@ struct rt6key rt6i_dst; struct rt6key rt6i_src; - + u8 rt6i_protocol; + struct mip6_info binding; }; struct fib6_walker_t @@ -111,10 +119,9 @@ struct rt6_statistics { __u32 fib_nodes; __u32 fib_route_nodes; - __u32 fib_rt_alloc; /* permanent routes */ + __u32 fib_rt_alloc; /* permanet routes */ __u32 fib_rt_entries; /* rt entries in table */ __u32 fib_rt_cache; /* cache routes */ - __u32 fib_discarded_routes; }; #define RTN_TL_ROOT 0x0001 --------------080703090805070306090804-- From pb@bieringer.de Thu Jun 5 07:57:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:57:43 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EvM2x029532 for ; Thu, 5 Jun 2003 07:57:23 -0700 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id BE48C1387E; Thu, 5 Jun 2003 16:20:40 +0200 (CEST) X-AV-Checked: Thu Jun 5 16:20:40 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 2E69F1387A; Thu, 5 Jun 2003 16:20:40 +0200 (CEST) Date: Thu, 05 Jun 2003 16:20:38 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: IPsec 2.5.70-bk9 and Check Point VPN-1 NG FP4 RC Message-ID: <3555172.1054830038@[10.3.62.6]> In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2917 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi, Here are some results (tunnel mode only tested, auth=SHA1): DES : ok 3DES : ok AES-128: ok AES-192: not supported by CP VPN-1 AES-256: ok CAST* : not supported by used Linux kernel BTW: be warned, not using RHL's ipsec-tools from rawhide...looks like the racoon isn't compiled in a proper environment :-( it doesn't support DES and causes trouble on 3DES *grmml*). Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From shemminger@osdl.org Thu Jun 5 09:55:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 09:55:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Gtc2x031390 for ; Thu, 5 Jun 2003 09:55:38 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h55GtDX21057; Thu, 5 Jun 2003 09:55:13 -0700 Date: Thu, 5 Jun 2003 09:55:12 -0700 From: Stephen Hemminger To: James Morris Cc: patmans@us.ibm.com, akpm@digeo.com, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, hch@infradead.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030605095512.022ea3be.shemminger@osdl.org> In-Reply-To: References: <20030604184341.A10256@beaverton.ibm.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2919 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 5 Jun 2003 13:25:58 +1000 (EST) James Morris wrote: > On Wed, 4 Jun 2003, Patrick Mansfield wrote: > > > [root@elm3b79 root]# ifup eth0 > > sender address length == 0 > > This is a bug introduced by a coding style cleanup, fix below. > > > - James > -- > James Morris > > > --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 > +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 > @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc > address); > if (err < 0) > return err; > - m->msg_name = address; > - } else > - m->msg_name = NULL; > - } > + } > + m->msg_name = address; > + } else > + m->msg_name = NULL; > > size = m->msg_iovlen * sizeof(struct iovec); > if (copy_from_user(iov, m->msg_iov, size)) Thanks, this works for me. I will see if it fixes the other gnome mystery as well. From mk@karaba.org Thu Jun 5 09:54:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 09:54:50 -0700 (PDT) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Gsd2x031239 for ; Thu, 5 Jun 2003 09:54:40 -0700 Received: from [3ffe:501:1057:710:202:b3ff:feb4:25aa] (helo=mokuba.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 19Ny04-0002qd-00; Fri, 06 Jun 2003 01:54:04 +0900 Date: Fri, 06 Jun 2003 01:54:06 +0900 Message-ID: <873cioqxch.wl@karaba.org> From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: Henrik Petander Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers In-Reply-To: <3EDF3EB4.8010105@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> <3EDF3EB4.8010105@tml.hut.fi> MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 2918 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello, At Thu, 05 Jun 2003 15:59:32 +0300, Henrik Petander wrote: > > David S. Miller wrote: > > From: Henrik Petander > > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > > > A possible fix is to change the pointer into an offset from the start of > > the packet and use the offset later to set the nexthdr value in the > > extension header. > > > > Please indicate the version of the sources you are looking > > at when making reports. > > Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: > the problem is with esp6 and not with ah6, which does not use the > get_offset function. I have fixed this in our tree (replaced by ip6_found_nexthdr()). I will send a patch related to these ipsec6 fix collection by this weekend ASAP. Regards, -mk From Andrew.Morton@digeo.com Thu Jun 5 13:38:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 13:38:28 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55KcK2x012769 for ; Thu, 5 Jun 2003 13:38:21 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 20:26:22 -0700 Date: Wed, 4 Jun 2003 20:26:22 -0700 From: Andrew Morton To: Arnaldo Carvalho de Melo Cc: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604202622.1be40092.akpm@digeo.com> In-Reply-To: <20030605023349.GH24515@conectiva.com.br> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> <20030605023349.GH24515@conectiva.com.br> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 Jun 2003 03:26:22.0152 (UTC) FILETIME=[3C88F480:01C32B12] X-archive-position: 2920 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev broken "cleanup" net/core/iovec.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff -puN net/core/iovec.c~iovec-fix net/core/iovec.c --- 25/net/core/iovec.c~iovec-fix 2003-06-04 20:23:03.000000000 -0700 +++ 25-akpm/net/core/iovec.c 2003-06-04 20:24:05.000000000 -0700 @@ -47,9 +47,10 @@ int verify_iovec(struct msghdr *m, struc address); if (err < 0) return err; - m->msg_name = address; - } else - m->msg_name = NULL; + } + m->msg_name = address; + } else { + m->msg_name = NULL; } size = m->msg_iovlen * sizeof(struct iovec); _ From Andrew.Morton@digeo.com Thu Jun 5 14:00:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 14:00:30 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55L0M2x013214 for ; Thu, 5 Jun 2003 14:00:22 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 18:56:52 -0700 Date: Wed, 4 Jun 2003 18:56:52 -0700 From: Andrew Morton To: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604185652.31958d1f.akpm@digeo.com> In-Reply-To: <3EDE7FEB.2C7FAEC7@digeo.com> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 Jun 2003 01:56:52.0736 (UTC) FILETIME=[BC1D1800:01C32B05] X-archive-position: 2921 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Andrew Morton wrote: > > Stephen Hemminger wrote: > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > come up. The same machine and configuration successfully brings up > > all the devices and runs on 2.5.70. > > kjournald is stuck waiting for IO to complete against some buffer > during transaction commit. > > I'd be suspecting block layer or device drivers. What device driver > is handling your /var/log? I take that back. Your sysrq-T woke up syslogd which did a synchronous write which poked kjournald. You happened to catch it in mid-commit. So that's all normal and sane. Something is up with netdevice initialisation. My eth0 (e100) is in a strange half-there state and won't come up. Reverting the post-2.5.70 e100 changes does not help. It's something which went into the tree today I think. From davem@redhat.com Thu Jun 5 22:01:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:01:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5651a2x026593 for ; Thu, 5 Jun 2003 22:01:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA03527; Thu, 5 Jun 2003 21:59:08 -0700 Date: Thu, 05 Jun 2003 21:59:07 -0700 (PDT) Message-Id: <20030605.215907.71090944.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods From: "David S. Miller" In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2923 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Dr. Peter Bieringer" Date: Thu, 05 Jun 2003 15:07:36 +0200 because I got no success, I've tried different encryption methods than 3DES. And *suddenly* it began to work. Sounds like an out-of-date include/linux/pfkeyv2.h file used during tool building. From garzik@gtf.org Thu Jun 5 22:40:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:40:49 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565ed2x027263 for ; Thu, 5 Jun 2003 22:40:40 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 5F3396657; Fri, 6 Jun 2003 01:40:38 -0400 (EDT) Date: Fri, 6 Jun 2003 01:40:38 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [PATCHES] 2.4.x net driver updates Message-ID: <20030606054038.GA3479@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2924 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev BK users may do bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 Others may obtain the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-rc8-netdrvr1.patch.bz2 This will update the following files: drivers/net/bonding.c | 3434 ------------------------- Documentation/Configure.help | 9 Documentation/networking/bonding.txt | 537 ++- Documentation/networking/ifenslave.c | 496 ++- drivers/net/8139cp.c | 9 drivers/net/8139too.c | 6 drivers/net/Config.in | 3 drivers/net/Makefile | 8 drivers/net/amd8111e.c | 1063 ++++--- drivers/net/amd8111e.h | 968 +++---- drivers/net/arcnet/arcnet.c | 2 drivers/net/arcnet/rfc1201.c | 6 drivers/net/bonding.c | 266 + drivers/net/bonding/Makefile | 18 drivers/net/bonding/bond_3ad.c | 2667 ++++++++++++++++++- drivers/net/bonding/bond_3ad.h | 342 ++ drivers/net/bonding/bond_alb.c | 1585 +++++++++++ drivers/net/bonding/bond_alb.h | 129 drivers/net/bonding/bond_main.c | 4795 ++++++++++++++++++++++++++++++++--- drivers/net/bonding/bonding.h | 209 + drivers/net/dl2k.h | 1 drivers/net/e1000/e1000.h | 3 drivers/net/e1000/e1000_main.c | 167 + drivers/net/eepro.c | 2 drivers/net/ns83820.c | 2 drivers/net/pci-skeleton.c | 4 drivers/net/pcnet32.c | 7 drivers/net/r8169.c | 52 drivers/net/sk98lin/skge.c | 2 drivers/net/sundance.c | 144 - drivers/net/tg3.c | 2 drivers/net/tlan.c | 258 + drivers/net/tlan.h | 7 drivers/net/tokenring/olympic.c | 3 drivers/net/tulip/tulip_core.c | 7 drivers/net/typhoon.c | 4 drivers/net/via-rhine.c | 2 drivers/net/wireless/airo.c | 2 include/linux/ethtool.h | 27 include/linux/if_arcnet.h | 4 include/linux/if_bonding.h | 101 include/linux/if_vlan.h | 1 include/linux/skbuff.h | 4 include/net/if_inet6.h | 5 include/net/irda/irlan_common.h | 2 net/core/dev.c | 4 net/core/skbuff.c | 3 net/ipv6/addrconf.c | 13 net/ipv6/ndisc.c | 3 net/irda/irlan/irlan_eth.c | 6 50 files changed, 11856 insertions(+), 5538 deletions(-) through these ChangeSets: (03/06/06 1.1205) [PATCH] Bonding 2.4 update patch 6 Fix to the ifenslave -c fix, fix to version control (plus change log update). I've got an additional fix for version control that I'll send you on Monday. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1204) [PATCH] Bonding 2.4 update patch 5 Fix to prevent routes on the bonding device from being lost during enslavement processing. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1203) [PATCH] Bonding 2.4 update patch 4 A fix for ifenslave -c. Later patches have fixes for this fix. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1202) [PATCH] Bonding 2.4 update patch 3 A patch with some miscellaneous little stuff (comments, mode names, fix a printk). Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1201) [PATCH] Bonding 2.4 update patch 2 Small patch to fix endless failover problem in the ARP monitor. Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1200) [PATCH] Bonding 2.4 update patch 1 Documentation. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/bonding.txt (03/06/06 1.1199) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1198) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/05 1.1197) [netdrvr amd8111e] link against mii lib (03/06/04 1.1196) [netdrvr] gcc 3.3 cleanups Mostly marking 64-bit constants as ULL. (03/05/29 1.1185.1.52) [netdrvr amd8111e] remove out-of-tree feature that snuck in (03/05/29 1.1185.1.51) [netdrvr amd8111e] interrupt coalescing, libmii, bug fixes * Dynamic interrupt coalescing * mii lib support * dynamic IPG support (disabled by default) * jumbo frame fix * vlan fix * rx irq coalescing fix (03/05/29 1.1185.1.50) [netdrvr tlan] fix 64-bit issues (03/05/29 1.1185.1.49) [netdrvr r8169] sync with 2.5 (backport whitespace cleanups) (03/05/29 1.1185.1.48) [netdrvr r8169] use alloc_etherdev (fix race), pci_disable_device (03/05/29 1.1185.1.47) [netdrvr olympic] fix build with gcc 3.3 (03/05/29 1.1185.6.3) [netdrvr 8139too] add comment, whitespace cleanup (03/05/28 1.1185.6.2) [netdrvr] s/init_etherdev/alloc_etherdev/ in code comments, in 8139too and pci-skeleton drivers. (03/05/28 1.1185.6.1) [netdrvr tlan] backport fixes and cleanups from 2.5 * alloc_etherdev (fixes race) * PCI DMA API * C99 initializers * speling fixes * use pci_{request,release}_regions for PCI devices * propagate error returns back from pci_xxx functions * call pci_set_dma_mask * use keventd for adapter error reset (2.5 uses workqueue) (03/05/27 1.1185.1.45) [netdrvr pcnet32] bug fixes I would like to see a couple of the pcnet32 changes that I think we can agree on be put into the trees so a couple of the potential defects can be avoided. The following patch contains just these pieces. The only controversial one is an arbitrary change in the number of iterations in a while loop spinning on hardware state. No matter how this is done, I am not especially fond of this bit of code as it has no reasonable error recovery path -- however, as a half-way, incremental solution, increasing the polling time should help as the 100 value was certainly found to be insufficient. 1000 may not be sufficient either, but it is certainly no worse. Both of the other changes were hit in testing (and I belive the wmb() at a customer even), so it would help reduce some debug if these go in. Any feedback is appreciated - thanks. (03/05/27 1.1185.1.44) [netdrvr eepro] update MODULE_AUTHOR per old-author request (03/05/27 1.1185.1.43) [netdrvr sundance] fix another flow control bug (03/05/27 1.1185.1.42) [netdrvr sundance] fix flow control bug (03/05/27 1.1185.1.41) [netdrvr bonding] fix ABI version control problem This fix makes bonding not commit to a specific ABI version if the ioctl command is not supported by bonding. (It also removes the '\n' in the continuous printk reporting the link down event in bond_mii_monitor - it got in there by mistake in our previous patch set and caused log messages to appear funny in some situations). (03/05/27 1.1185.1.40) [netdrvr bonding] fix long failover in 802.3ad mode This patch fixes the bug reported by Jay on April 3rd regarding long failover time when releasing the last slave in the active aggregator. The fix, as suggested by Jay, is to follow the spec recommendation and send a LACPDU to the partner saying this port is no longer aggregatable and therefore trigger an immediate re-selection of a new aggregator instead of waiting the entire expiration timeout. (03/05/25 1.1185.1.39) IPv6 over ARCnet (RFC2497) support, IPv6 part. (03/05/25 1.1185.1.38) IPv6 over ARCnet (RFC2497) support, driver part (03/05/25 1.1185.1.37) [irda] module refcounts for irlan (03/05/23 1.1185.3.7) [bonding] small cleanups (03/05/23 1.1185.3.6) [bonding] add rcv load balancing mode This patch adds a new mode that enables receive load balancing for IPv4 traffic on top of the transmit load balancing mode. This capability is achieved by intercepting and manipulating the ARP negotiation to teach clients several MAC addresses for the bond and thus distribute incoming traffic among all slaves with the highest link speed. In order to function properly, slaves are required to be able to have their MAC address set even while the interface is up since once the primary slave looses its link, the new primary slave (and only it) must be able to take over and receive the incoming traffic instead. If a non-primary slave looses its link, ARP packets will be sent to all clients communicating through it in order to teach them a replacement MAC address, and the primary slave will be put in promiscuous mode for 10 seconds for fault tolerance reasons. This patch is against bonding-20030415, but must come only after the locking scheme changing patch since it uses dev_set_promiscuity() that would otherwise cause a system hang. (03/05/23 1.1185.3.5) [bonding] support xmit load balancing mode (03/05/23 1.1185.3.4) [bonding] much improved locking This patch replaces the use of lock_irqsave/unlock_irqrestore in bonding with lock/unlock or lock_bh/unlock_bh as appropriate according to context. This change is based on a previous discussion regarding the fact that holding a lock_irqsave doesn't prevent softirqs from running which can cause deadlocks in certain situations. This new locking scheme has already undergone massive testing cycle by our QA group and we feel it is ready for release (some new modes and enhancements will not work properly without it). (03/05/23 1.1185.3.3) [bonding] better 802.3ad mode control, some cleanup This patch adds the lacp_rate module param to enable better control over the IEEE 802.3ad mode. This param controls the rate at which the partner system is asked to send LACPDUs to bonding. Two options exist: - slow (or 0) - LACPDUs are 30 seconds apart - fast (or 1) - LACPDUs are 1 second apart The default is slow (like most switches around). There are also some code beautifications (mainly converting comments to C style in code segments we added in the past). (03/05/23 1.1185.3.2) [bonding] ABI versioning This patch adds user-land to kernel ABI version control in bonding to restore backward compatibility between different versions of ifenslave and the bonding module. It uses ethtool's GDRVINFO ioctl to pass the ABI version number between ifenslave and the bonding module in both directions so both the driver and the application can tell which partner they're working against and take the appropriate measures when enslaving/releasing an interface. The bonding module remembers the ABI version received from the application, and from that moment on will deny enslave and release commands from an application using a different ABI version, which means that if you want to switch to an ifenslave with a different ABI version (or with non at all), you'll first have to re-load the bonding module. This patch also changes the driver/application versioning scheme to contain 3 fields X.Y.Z with the follows meaning: X - Major version - big behavior changes Y - Minor version - addition of features Z - Extra version - minor changes and bug fixes There are also three minor bug fixes: 1. Prevent enslaving an interface that is already a slave. 2. Prevent enslaving if the bond is down. 3. In bond_release_all, save old value of current_slave before assigning NULL to it to enable using it's original value later on. This patch is against bonding-20030415. (03/04/27 1.1137.1.6) [netdrvr e1000] add TSO support -- disabled * Copy TSO support for 2.5 e1000. Wrapped with NETIF_F_TSO, so not currently enabled in 2.4. Done to keep 2.4 and 2.5 drivers in-sync as much as possible. (03/04/27 1.1137.1.5) [netdrvr e1000] add support for NAPI * Copy NAPI support from 2.5 e1000 driver * Add CONFIG_E1000_NAPI option (03/04/27 1.1137.1.4) [netdrvr tulip] support DM910x chip from ALi (03/04/27 1.1137.1.3) Remove duplicate CONFIG_TULIP_MWI entry in Configure.help Noticed by Geert Uytterhoeven (03/04/27 1.1137.1.2) [netdrvr 8139cp] enable MWI via pci_set_mwi, rather than manually (03/04/26 1.1131.2.6) [netdrvr typhoon] s/#if/#ifdef/ for a CONFIG_ var (03/04/25 1.1131.2.5) [netdrvr sundance] small cleanups from 2.5 - s/long flag/unsigned long flag/ - C99 initializers (03/04/25 1.1131.2.4) [netdrvr sundance] bug fixes, VLAN support - Fix tx bugs in big-endian machines - Remove unused max_interrupt_work module parameter, the new NAPI-like rx scheme doesn't need it. - Remove redundancy get_stats() in intr_handler(), those I/O access could affect performance in ARM-based system - Add Linux software VLAN support - Fix bug of custom mac address (StationAddr register only accept word write) (03/04/25 1.1131.2.3) [netdrvr via-rhine] fix promisc mode I found a via-rhine bug, it can't receive BPDU (mac: 0180c2000000) in promiscuous mode. Fill all "1" in hash table to fix this problem in promiscuous mode. (RCR remain 0x1c, write it as 0x1f don't work) (03/04/25 1.1131.2.2) [wireless airo] fix end-of-array test FYI statsLabels[] is an array of char*, so the fix below is pretty obvious. (03/04/25 1.1131.2.1) [PATCH] fix .text.exit error in drivers/net/r8169.c In drivers/net/r8169.c the function rtl8169_remove_one is __devexit but the pointer to it didn't use __devexit_p resulting in a.text.exit compile error when !CONFIG_HOTPLUG. The fix is simple: (03/04/17 1.1101.8.7) [bonding] add support for IEEE 802.3ad Dynamic link aggregation Contributed by Shmulik Hen @ Intel, merge by Jay Vosburgh @ IBM (03/04/17 1.1101.8.6) [bonding] move private decls into new drv/net/bonding/bonding.h file (03/04/17 1.1101.8.5) [bonding] move driver into new drivers/net/bonding directory (03/04/17 1.1101.8.4) [bonding] Moved setting slave mac addr, and open, from app to the driver This patch enables support of modes that need to use the unique mac address of each slave. It moves setting the slave's mac address and opening it from the application to the driver. This breaks backward compatibility between the new driver and older applications ! It also blocks possibility of enslaving before the master is up (to prevent putting the system in an unstable state), and removes the code that unconditionally restores all base driver's flags (flags are automatically restored once all undo stages are done in proper order). Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.3) [bonding] add support for getting slave's speed and duplex via ethtool Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.2) [bonding] fix comment to prevent future merge difficulties Contributed by Jay Vosburgh @ IBM (03/04/17 1.1101.8.1) [net] store physical device a packet arrives in on (Needed for bonding) Contributed by Jay Vosburgh @ IBM, Shmulik Hen @ Intel, and others. From nakam@linux-ipv6.org Thu Jun 5 22:42:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:42:28 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565gI2x027582 for ; Thu, 5 Jun 2003 22:42:19 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19O9w9-0000SR-00; Fri, 06 Jun 2003 14:38:49 +0900 From: Masahide NAKAMURA To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Cc: usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030606143844.0604c306.nakam@linux-ipv6.org> In-Reply-To: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Date: Fri, 06 Jun 2003 14:38:49 +0900 X-archive-position: 2925 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello, I'm Nakamura, a member of USAGI. On Thu, 05 Jun 2003 19:12:24 +0900 (JST) YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote: > Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. > The reason why I won't hurry is because I still believe it is not > required for MIP6. Nakamura, one of our member, will describe the details. > It takes precedence over "limited" policy(?) routing to introcuce generic > policy routing. As you know, we've been planning the MIPv6 design to use XFRM. If we use MIPv6, we need some fix and extension to XFRM and it results to make XFRM more generic. On output processing, our design is like below: Through netlink/xfrm from userland, we have to set xfrm_policy and xfrm_state with something like ip command(or extended ip command). The xfrm_policy has two templates now: - template handling Routing Header type 2(RT2) ...(a) - template handling Destination Options Header(DST) ...(b) And we have to add one address field(c) in xfrm_state for MIPv6. (Currently it is named mip6_state.addr.) Template-(a) finds a xfrm_state that points function like mip6_rthdr_output() to insert RT2 and replace dst address of IP header with specified address-(c). Also, template-(b) finds a xfrm_state that points function like mip6_destopt_output() to insert DST and replace src address of IP header with specified address-(c). Of course, both mip6_rthdr_output() and mip6_destopt_output() are callled as dst_output in XFRM world internally. For example, if two state is found, the packet will be append both RT2 and DST. We have tested that on our tree. In case of tunneling, We think we also make it to add a template and prepare a function for dst_output on XFRM world like above. (Maybe xfrm6_tunnel needs some fix to use MIPv6, as Henrik said.) Could you give us comments? BTW, I have read Henrik's patch(mip6-exthdr.patch) sent to netdev in other thread and I feel that is simple code to implement MIPv6 and is clean one. Thanks, Henrik. As he said, it is similar one to use XFRM like ours. We know that the big difference between yours and ours is to modify either routing table or XFRM. Anyway, we'll show you our patch later. Regards, -- Masahide NAKAMURA From garzik@gtf.org Thu Jun 5 22:42:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:42:37 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565gV2x027616 for ; Thu, 5 Jun 2003 22:42:32 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 7E1376611; Fri, 6 Jun 2003 01:42:31 -0400 (EDT) Date: Fri, 6 Jun 2003 01:42:31 -0400 From: Jeff Garzik To: torvalds@transmeta.com Cc: davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [BK PATCHES] net driver merges Message-ID: <20030606054231.GA3545@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2926 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Linus, please do a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.5 Others may obtain the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.70-bk10-netdrvr1.patch.bz2 This will update the following files: MAINTAINERS | 7 drivers/net/8139cp.c | 2 drivers/net/Makefile | 2 drivers/net/arcnet/arc-rawmode.c | 10 drivers/net/arcnet/arcnet.c | 10 drivers/net/arcnet/rfc1051.c | 10 drivers/net/arcnet/rfc1201.c | 12 drivers/net/dl2k.h | 1 drivers/net/ns83820.c | 2 drivers/net/pcmcia/fmvj18x_cs.c | 12 drivers/net/sb1000.c | 22 drivers/net/sk98lin/skge.c | 2 drivers/net/tg3.c | 2 drivers/net/wireless/Kconfig | 15 drivers/net/wireless/Makefile | 1 drivers/net/wireless/atmel.c | 3943 +++++++++++++++++++++++++++++++++++++++ drivers/net/wireless/atmel_cs.c | 768 +++++++ include/linux/ethtool.h | 27 18 files changed, 4791 insertions(+), 57 deletions(-) through these ChangeSets: (03/06/06 1.1313) [netdrvr] C99 initializers for arcnet (03/06/06 1.1312) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1311) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/06 1.1310) [PATCH] cli/sti cleanup for fmvj18x This one should be safe as we're protected by the xmit_lock in all instances (03/06/06 1.1309) [netdrvr] add MAINTAINERS entry for atmel wireless driver (03/06/06 1.1308) [netdrvr] add atmel[_cs], new wireless driver Attached is a driver for Atmel at76c50x WiFi cards. This code started out as a GPL release from Atmel of pretty horrible quality and I've extensively re-worked it with the aim of making it acceptable in the kernel. Please could you take a look and either pass it into the patch stream or let me know what's wrong with it? The code has been tested on at least three different brand cards by different people. Jean Tourrilhes took a look at an earlier version an was positive. He's put incorporating this into 2.6 as a priority 1. The patch works fine on 2.5.70. The firmware issue has been addressed now. The only firmware in the driver is a small stub which reads the MAC address from NVRAM on the card. The source for that is included so there are no GPL issues. The main firmware is loaded from userspace using Manuel Estrada Sainz's sysfs firmware class. I know that the patch for that has been accepted but it hasn't turned up anywhere I can see yet. The driver compiles fine even without the firmware class. I've made a package of the firmware images which is available from my website. The remaining issues with the driver are migrating PCMCIA to the new driver model and PCI support. I'm happy to produce followup patches as the PCMCIA system gets evolved to the new driver model: the timing on that is controlled by others. This set of chips includes a PCI version and the driver should support that, but AFAIK there is no PCI hardware available anywhere. If Atmel can provide me with some it will be simple to add PCI support. The driver uses the CRC32 library module and the firmware loader. I've not put in dependencies on those, but when the lastest set of patches go into Kconfig I'll set it up so that selecting the Atmel driver selects CRC32 and FW_LOADER too. (03/06/06 1.1307) [PATCH] sb1000 driver bugs Inspecting the sb1000 driver showed some interesting bugs: - net device pointer is used before the device is allocated; gcc does catch this. - unregister is called even though device not registered successfully - net device is not freed on remove. Compiles but don't have hardware to test. Don't know how it ever worked though. (03/06/05 1.1306) [netdrvr amd8111e] link against mii lib (03/06/05 1.1305) [netdrvr skge] add ULL modifier to 64-bit constant (03/06/05 1.1304) [netdrvr] gcc 3.3 cleanups Mostly adding 'ULL' modifier to 64-bit constants. From kazunori@miyazawa.org Thu Jun 5 22:48:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:48:38 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565mS2x028208 for ; Thu, 5 Jun 2003 22:48:29 -0700 Received: from monza.miyazawa.org (softdnserr [3ffe:501:41c:3:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Fri, 06 Jun 2003 14:46:20 +0900 Date: Fri, 6 Jun 2003 14:49:25 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2927 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, I observed invalid refcnt incrementation when using IPsec in IPv6. I configured IPsec and did ping6 then refcnt of dst was incremented two by two. I observed it with using "route -A inet6". I also check it with using printk. This patch fixes dst reference count management. In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. In output functions, dst is changed by xfrm_lookup if there is any matching policy. Therefore original dst which is held before calling xfrm_lookup will be never released. When xfrm_lookup scceeds and dst is changed, original dst should be release. Patch-Name: fix dst refcnt with xfrm Patch-Id: FIX_2_5_70+CS1_1259_DST_REFCNT_WITH_XFRM Patch-Author: Kazunori Miyazawa / USAGI Project Credit: Kazunori Miyazawa / USAGI Project Index: linux25/include/net/dst.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/dst.h,v retrieving revision 1.1.1.9 retrieving revision 1.1.1.9.22.1 diff -u -r1.1.1.9 -r1.1.1.9.22.1 --- linux25/include/net/dst.h 17 Apr 2003 18:15:56 -0000 1.1.1.9 +++ linux25/include/net/dst.h 6 Jun 2003 05:02:36 -0000 1.1.1.9.22.1 @@ -160,10 +160,7 @@ static inline struct dst_entry *dst_pop(struct dst_entry *dst) { - struct dst_entry *child = dst_clone(dst->child); - - dst_release(dst); - return child; + return dst->child; } extern void * dst_alloc(struct dst_ops * ops); Index: linux25/net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.16 retrieving revision 1.1.1.16.14.1 diff -u -r1.1.1.16 -r1.1.1.16.14.1 --- linux25/net/ipv6/ip6_output.c 26 May 2003 08:04:10 -0000 1.1.1.16 +++ linux25/net/ipv6/ip6_output.c 6 Jun 2003 05:00:58 -0000 1.1.1.16.14.1 @@ -211,6 +211,8 @@ if ((err = xfrm_lookup(&skb->dst, fl, sk, 0)) < 0) { return err; } + if (dst != skb->dst) + dst_release(dst); if (opt) { int head_room; @@ -595,10 +597,13 @@ pktlength = length; if (dst) { + struct dst_entry *dst0 = dst; if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0) { dst_release(dst); return -ENETUNREACH; } + if (dst0 != dst) + dst_release(dst0); } if (hlimit < 0) { @@ -1194,10 +1199,13 @@ } if (*dst) { + struct dst_entry *dst0 = *dst; if ((err = xfrm_lookup(dst, fl, sk, 0)) < 0) { dst_release(*dst); return -ENETUNREACH; } + if (*dst != dst0) + dst_release(dst0); } return 0; From davem@redhat.com Thu Jun 5 22:58:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:58:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565wt2x028553 for ; Thu, 5 Jun 2003 22:58:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA03766; Thu, 5 Jun 2003 22:55:47 -0700 Date: Thu, 05 Jun 2003 22:55:47 -0700 (PDT) Message-Id: <20030605.225547.28789693.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm From: "David S. Miller" In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2928 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Fri, 6 Jun 2003 14:49:25 +0900 In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. Ok, so the idea is to hold onto top-level parent DST entry the entire time, and this prevents the DST and all it's children from being destroyed. Is this correct? Let me study this a little bit, I want to make sure it is correct. From pb@bieringer.de Thu Jun 5 23:25:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 23:26:03 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h566Pv2x029425 for ; Thu, 5 Jun 2003 23:25:58 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id BA7111387E; Fri, 6 Jun 2003 08:25:51 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id D88461387F; Fri, 6 Jun 2003 08:25:50 +0200 (CEST) X-AV-Checked: Fri Jun 6 08:25:50 2003 smtp2.aerasec.de Received: from p50805418.dip.t-dialin.net (p50805418.dip.t-dialin.net [80.128.84.24]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 919BB1387E; Fri, 6 Jun 2003 08:25:49 +0200 (CEST) Date: Fri, 06 Jun 2003 08:25:47 +0200 From: Peter Bieringer To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods Message-ID: <122560000.1054880747@gate.muc.bieringer.de> In-Reply-To: <20030605.215907.71090944.davem@redhat.com> References: <35410000.1054818456@klopffest.muc.aerasec.de> <20030605.215907.71090944.davem@redhat.com> X-Mailer: Mulberry/3.0.3 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2929 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, June 05, 2003 09:59:07 PM -0700 "David S. Miller" wrote: > From: "Dr. Peter Bieringer" > Date: Thu, 05 Jun 2003 15:07:36 +0200 > > because I got no success, I've tried different encryption methods than > 3DES. And *suddenly* it began to work. > > Sounds like an out-of-date include/linux/pfkeyv2.h file > used during tool building. Yes, it looks like. BTW: is there something like a "version information" which is used in that way that user space tools can detect and report such changes at runtime? Would be perhaps helpful if racoon reports something like "incompatible" in this case. Very much better than such strange problems... Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From kazunori@miyazawa.org Thu Jun 5 23:36:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 23:36:32 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h566aP2x029878 for ; Thu, 5 Jun 2003 23:36:25 -0700 Received: from monza.miyazawa.org (softdnserr [3ffe:501:41c:3:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Fri, 06 Jun 2003 15:34:18 +0900 Date: Fri, 6 Jun 2003 15:37:18 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030606153718.4923bbf9.kazunori@miyazawa.org> In-Reply-To: <20030605.225547.28789693.davem@redhat.com> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> <20030605.225547.28789693.davem@redhat.com> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2930 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Thu, 05 Jun 2003 22:55:47 -0700 (PDT) "David S. Miller" wrote: > From: Kazunori Miyazawa > Date: Fri, 6 Jun 2003 14:49:25 +0900 > > In dst_pop refernce cound of dsts except for last are incremented in > dst_clone and decremented in next call dst_pop but last dst refernce > count will be never decremented. > All dst are held by xfrm_policy and there is no need to touch the > refernce count here. > > Ok, so the idea is to hold onto top-level parent DST entry the entire > time, and this prevents the DST and all it's children from being > destroyed. Is this correct? > Yes. Additionally DST is incremented in the process but never decremented correctly. Let me explain it. It must be "Don't try to teach your grandmother to suck eggs" :-) "O" is original dst structure and its refcnt 1 in routing table. "C" is the child "DEST" is some paramter in the stack. (X) after "O" or "C" represents reference count of it. At first in the result of routing lookup DEST holds "O" with calling dst_hold/dst_clone. DEST=>O(2) In xfrm_lookup and related functions the child is created and connect to "O". Those referenct count are incremented for xfrm_policy holding them. Then the stack builds up stackable destination like this DEST=>C(1) |=>O(3) After this the stack regards "C" as the original destination. I assume the process is datagram. The stack call dst_clone before passing DST to skb->dst. skb->dst=DEST=>C(2) |=>O(3) In dst_pop it increments O with dst_clone and release C with dst_release skb->dst => C(2) |=>O(3) call dst_pop.... skb->dst =>O(4) DST=>C(1) |=>O(4) The stack done the process and it release DST with dst_release. "O"'s reference count is decremented in kfree_skb. skb->dst=DEST=>C(0) |=>O(3) I hope this helps you. I don't think I understand whole dst life cycle. Please teach me if I misunderstand. BTW, why the stack set "0" to dst refernce count at the initialization. IMHO it should be "1". Thank you, --Kazunori Miyazawa (Yokogawa Electric Corporation) From davem@redhat.com Fri Jun 6 00:30:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 00:31:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h567UP2x030603 for ; Fri, 6 Jun 2003 00:30:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA04066; Fri, 6 Jun 2003 00:27:20 -0700 Date: Fri, 06 Jun 2003 00:27:19 -0700 (PDT) Message-Id: <20030606.002719.35016156.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm From: "David S. Miller" In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2931 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Fri, 6 Jun 2003 14:49:25 +0900 In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. Ok, there is problem with this logic. Final dst is set to skb->dst, and when SKB is freed then we do dst_release(skb->dst). Therefore it _IS_ decremented. (see net/core/skbuff.c:__kfree_skb(), it is where this final DST reference is decremented). Something is going wrong in ipv6 code if this is not happening. If you modify skb->dst, it is your job to maintain reference properly. Look at how ipv4 works, we do all the work in the route lookup and furthermore we never pass &skb->dst into these lookups. What ipv6 output does looks really really strange. It is silly to do flow lookups in places like ip6_xmit(). And this is where all the refcount bugs are really coming from. Like ipv4, flow lookups should be occuring at end of ip6_route_output() processing. As far as I can tell, ip6_xmit() makes calculations based upon "dst" and this is wrong. It updates only skb->dst, but this is not what that function uses to make decisions. 'dst' is old copy :( From vnuorval@tcs.hut.fi Fri Jun 6 01:57:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 01:57:24 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h568vD2x000352 for ; Fri, 6 Jun 2003 01:57:14 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Fri, 6 Jun 2003 11:48:38 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h568mcjH002713; Fri, 6 Jun 2003 11:48:38 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h568maNb002709; Fri, 6 Jun 2003 11:48:36 +0300 Date: Fri, 6 Jun 2003 11:48:36 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2932 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <20030531.000319.114704530.yoshfuji@linux-ipv6.org> (at Sat, 31 May 2003 00:03:19 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > > In article (at Fri, 30 May 2003 17:34:40 +0300 (EEST)), Ville Nuorvala says: > > > > > here is a patch that fixes CONFIG_IPV6_SUBTREES and allows overriding > > > normal routes with source address specific ones. This is for example > > > needed in MIPv6 for handling the traffic to and from a mobile node's home > > > address correctly. > > > > Let us test the patch. It seemed buggy when USAGI tested before. > > I've re-tested your latest CONFIG_IPV6_SUBTREE patch. > The results of the restesting seems fine. Great! :) > However, I won't accept your patch as-is for now. > > The patch consists of several parts: > > 1. fixing bugs in IPv6 code > 2. fixing bugs in CONFIG_IPV6_SUBTREE code > 3. changing majority of keys of routing table. > > There's no problems with 1 and 2. > However, We need to discuss on 3. I have of course no objections against 1. :) However 2 and 3 are in my view quite interrelated. > As I said in other thread, the policy routing should be done in the > other way. And, it is not good to change the semantics of > CONFIG_IPV6_SUBTREE. Even if the semantics are flawed? I'll try to explain my reasoning below. > In original, routing is looked up by destination address, and then, > looked up by the source address; destination takes precedence over source. > Your patch changes this. Source address takes precedence over destination > address. The main problem with the original destination,source lookup are the cached host routes created by ip6_route_{input,output} (or actually rt6_cow). Since these routes have destination prefix length 128, they will override all source routes, unless they also are host routes. This happens because the (non-host) source route ends up in the subtree of a node higher up in the destination tree, which will never be reached because the cached host route already matches the destination address. Since the initial mode of communication between a mobile node (using its home address) and any correspondent node is reverse tunneling we at least need something like a default (i.e. a non-host) route through the tunnel for the MN's home address. Not until route optimization is set up between the MN and the CN do we actually get host routes for the traffic between the two. If we switch the order of keys to source,destination we don't get this problem since the cached host routes end up at the bottom of the subtrees and wont interfere with the normal routing. Prefix routes also cause problems with the destination,source key order, since we must create a duplicate route for each prefix and home address. Hope I explained it clearly enough :) > From the point of the policy routing, both (and other attributes) should be > considered equally, and this is what IPv4 routing table does. This of course seems like the optimal solution. > Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. > The reason why I won't hurry is because I still believe it is not > required for MIP6. Nakamura, one of our member, will describe the details. > It takes precedence over "limited" policy(?) routing to introcuce generic > policy routing. I still think _some_ routing changes are necessary, but I guess we need to discuss what the changes are. I'm btw willing to help with the IPv6 policy routing if that helps getting it into the kernel sooner. > Anyway, will you split up your patch (into 1-3 above) first, please? I'll check if there still is anything to do in 1 after the patch you already submitted, but let's please discuss 3 before I split it into 2 and 3. Thanks, Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From yoshfuji@linux-ipv6.org Fri Jun 6 03:31:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 03:31:50 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56AVc2x007066 for ; Fri, 6 Jun 2003 03:31:39 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h56AWJBo020634; Fri, 6 Jun 2003 19:32:19 +0900 Date: Fri, 06 Jun 2003 19:32:18 +0900 (JST) Message-Id: <20030606.193218.117654914.yoshfuji@linux-ipv6.org> To: vnuorval@tcs.hut.fi Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, nakam@linux-ipv6.org, usagi-core@linux-ipv6.org Subject: Re: CONFIG_IPV6_SUBTREES (was [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2933 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Fri, 6 Jun 2003 11:48:36 +0300 (EEST)), Ville Nuorvala says: > Since the initial mode of communication between a mobile node (using its > home address) and any correspondent node is reverse tunneling we at least > need something like a default (i.e. a non-host) route through the tunnel > for the MN's home address. Excuse me, please forget anything related to "Mobile IP" during this discussion; do not assume that Mobile IP is the only user of CONFIG_IPV6_SUBTREES. Thank you. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From lpetande@tml.hut.fi Fri Jun 6 04:07:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 04:07:40 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56B7M2x008423 for ; Fri, 6 Jun 2003 04:07:28 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h56B6hdv012100; Fri, 6 Jun 2003 14:06:43 +0300 Message-ID: <3EE0779B.9080300@tml.hut.fi> Date: Fri, 06 Jun 2003 14:14:35 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Masahide NAKAMURA CC: =?ISO-2022-JP?B?WU9TSElGVUpJIEhpZGVha2kgLyAbJEI1SEYjMVFMQBsoQg==?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> <20030606143844.0604c306.nakam@linux-ipv6.org> In-Reply-To: <20030606143844.0604c306.nakam@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 2934 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Hello Nakamura, Masahide NAKAMURA wrote: > Hello, > I'm Nakamura, a member of USAGI. > > > As you know, we've been planning the MIPv6 design to use XFRM. > If we use MIPv6, we need some fix and extension to XFRM and it results to make XFRM > more generic. I like your idea, as it allows a high level of flexibility in use of mipv6 with flows through the definition of policies. This is the way I would also do mipv6 extension header addition with xfrm. However, if you insert mipv6 policies into xfrm, you need to take care of the interactions between ipsec and mipv6 policies. The system needs to cope with data flows to which both ipsec and mipv6 should be applied. As a result of this the logic of the xfrm lookups probably needs some changes to return both the matching ipsec and mipv6 policies. How have you planned to solve this problem? BTW, feel free to use relevant parts of my code for the output functionality to speed up the work. After your code is ready we can look at which approach is better suited for implementing the kernel support and get a working kernel infrastructure for mipv6 into 2.6 kernels. Henrik From vnuorval@tcs.hut.fi Fri Jun 6 04:21:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 04:22:06 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56BLs2x008832 for ; Fri, 6 Jun 2003 04:21:55 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Fri, 6 Jun 2003 14:17:00 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h56BGxjH003777; Fri, 6 Jun 2003 14:16:59 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h56BGwIQ003772; Fri, 6 Jun 2003 14:16:58 +0300 Date: Fri, 6 Jun 2003 14:16:57 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , , , Subject: Re: CONFIG_IPV6_SUBTREES (was [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6) In-Reply-To: <20030606.193218.117654914.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2935 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Excuse me, please forget anything related to "Mobile IP" during this > discussion; do not assume that Mobile IP is the only user of > CONFIG_IPV6_SUBTREES. At the moment it is :) I was just making a point about the IMHO flawed semantics of CONFIG_IPV6_SUBTREES. If you keep the original (first dest, then src) key ordering you basically can't use the subtrees for anything else but storing source address specific host routes. With the reversed order you can do a lot more... -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From nakam@linux-ipv6.org Fri Jun 6 06:34:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 06:34:45 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56DYb2x011827 for ; Fri, 6 Jun 2003 06:34:38 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19OHJ7-0001Hh-00; Fri, 06 Jun 2003 22:31:01 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> In-Reply-To: <3EE0779B.9080300@tml.hut.fi> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> <20030606143844.0604c306.nakam@linux-ipv6.org> <3EE0779B.9080300@tml.hut.fi> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Fri, 06 Jun 2003 22:31:01 +0900 X-archive-position: 2936 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello Henrik, On Fri, 06 Jun 2003 14:14:35 +0300 Henrik Petander wrote: > I like your idea, as it allows a high level of flexibility in use of > mipv6 with flows through the definition of policies. This is the way I > would also do mipv6 extension header addition with xfrm. Thanks. > However, if you insert mipv6 policies into xfrm, you need to take care > of the interactions between ipsec and mipv6 policies. The system needs > to cope with data flows to which both ipsec and mipv6 should be applied. > As a result of this the logic of the xfrm lookups probably needs some > changes to return both the matching ipsec and mipv6 policies. How have > you planned to solve this problem? We don't think we have to change the logic handling policy with the reason because we can treat MIPv6 policy just like IPsec. When we want to apply both MIPv6 and IPsec to the same target, we need one policy that has two or more of templates(e.g. one is MIPv6's template and the other is IPsec's). Regarding above case, however, we have a problem like below: draft(9.3.1 in draft-ietf-mobileip-ipv6-22) says, When attempting to verify AH authentication data in a packet that contains a Home Address option, the receiving node MUST calculate the AH authentication data as if the following were true: The Home Address option contains the care-of address, and the source IPv6 address field of the IPv6 header contains the home address. Because xfrm decides to call dst_output in the order of templates, at first we had no idea which is the former template, MIPv6 or IPsec(Home Address Option or AH). Then we discussed about that with our IPsec guys and now we guess we have an idea to use xfrm6_clear_mutable_options() to re-replace address for calculating for AH when calling ah6_output(). Anyway, I think this is not specialized matter of xfrm. (Did you also point this, Henrik?) Or, could you have any idea? > BTW, feel free to use relevant parts of my code for the output > functionality to speed up the work. After your code is ready we can look > at which approach is better suited for implementing the kernel support > and get a working kernel infrastructure for mipv6 into 2.6 kernels. Thank you for your kindness. Of course I agree with you. Regards, -- Masahide NAKAMURA From hadmut@danisch.de Fri Jun 6 09:53:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 09:53:47 -0700 (PDT) Received: from sklave3.rackland.de (sklave3.rackland.de [213.133.101.23]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56GrZ2x019947 for ; Fri, 6 Jun 2003 09:53:36 -0700 Received: from sodom (uucp@localhost) by sklave3.rackland.de (8.12.9/8.12.9/Debian-1) with BSMTP id h56GrYLe024482 for netdev@oss.sgi.com; Fri, 6 Jun 2003 18:53:34 +0200 Received: (from hadmut@localhost) by sodom.home.danisch.de (8.12.9/8.12.9/Debian-1) id h56GrEqq012738 for netdev@oss.sgi.com; Fri, 6 Jun 2003 18:53:14 +0200 From: Hadmut Danisch Date: Fri, 6 Jun 2003 18:53:14 +0200 To: netdev@oss.sgi.com Subject: Cisco Aironet Problem Message-ID: <20030606165314.GA12669@danisch.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 2937 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadmut@danisch.de Precedence: bulk X-list: netdev Hi, I have a problem with 2.4.20 and my Cisco Aironet 340 PCMCIA card. When I use the keyboard while traffic is high, something gets broken in the kernel: Both the keyboard and the network device freeze permanently, a reboot is required. I've reported this bug to the aironet driver team (sourceforge), but did not receive any feedback. Meanwhile I found a bug report in the dmesg after a crash. Unfortunately I'm not familiar with that part of the kernel. Maybe you can give me a hint what could be the reason for such a bug message in dmesg? airo: BAP error 4000 2 Warning: kfree_skb passed an skb still on a list (from c01206ea). kernel BUG at skbuff.c:315! invalid operand: 0000 CPU: 0 EIP: 0010:[] Not tainted EFLAGS: 00013286 eax: 00000045 ebx: cfde31c0 ecx: cc6f2000 edx: 00000001 esi: c12f5f84 edi: 00000000 ebp: c12f4000 esp: c12f5f6c ds: 0018 es: 0018 ss: 0018 Process keventd (pid: 2, stackpage=c12f5000) Stack: c0238740 c01206ea 00000000 c12f5f84 c01206ea cfde31c0 cc4002e4 cc4002e4 00000000 00000000 c0128c83 c02489d0 c12f5fb0 00000000 c12f4560 c12f4570 c12f4000 00000001 00000000 c12c9f80 00010000 00000000 00000700 c0128b50 Call Trace: [] [] [] [] [] [] [] Code: 0f 0b 3b 01 5f 79 23 c0 8b 5c 24 14 e9 ce fe ff ff 8d 74 26 regards Hadmut From mk@karaba.org Fri Jun 6 11:17:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 11:17:26 -0700 (PDT) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56IHE2x022281 for ; Fri, 6 Jun 2003 11:17:15 -0700 Received: from [3ffe:501:1057:710::1] (helo=hyakusiki.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 19OLlz-0002Fw-00; Sat, 07 Jun 2003 03:17:08 +0900 Date: Sat, 07 Jun 2003 03:17:10 +0900 Message-ID: <87wufzxe8p.wl@karaba.org> From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: "David S. Miller" Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: [PATCH] fix esp6 extension headers handling In-Reply-To: <873cioqxch.wl@karaba.org> <3EDF36AA.9020403@tml.hut.fi> <3EDF3EB4.8010105@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> <3EDF3EB4.8010105@tml.hut.fi> <873cioqxch.wl@karaba.org> MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 2938 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello, At Thu, 05 Jun 2003 15:59:32 +0300, Henrik Petander wrote: > > David S. Miller wrote: > > From: Henrik Petander > > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > > > A possible fix is to change the pointer into an offset from the start of > > the packet and use the offset later to set the nexthdr value in the > > extension header. > > > > Please indicate the version of the sources you are looking > > at when making reports. > > Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: > the problem is with esp6 and not with ah6, which does not use the > get_offset function. > > Henrik > The attached diff fixes esp6 extension headers handling bug which reported by Henrik. I introduced ip6_find_1stfragopt() instead of get_offset(). # ip6_found_nexthdr() is just renamed ip6_find_1stfragopt() # in order to represent collect functionality. Regards, -mk Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/ipv6.h,v retrieving revision 1.1.1.12 retrieving revision 1.1.1.12.8.1 diff -u -r1.1.1.12 -r1.1.1.12.8.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h 31 May 2003 07:30:34 -0000 1.1.1.12 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h 6 Jun 2003 15:43:46 -0000 1.1.1.12.8.1 @@ -315,7 +315,7 @@ unsigned length, struct ipv6_txoptions *opt, int hlimit, int flags); -extern int ip6_found_nexthdr(struct sk_buff *skb, u8 **nexthdr); +extern int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr); extern int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/esp6.c,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.12.1 diff -u -r1.1.1.13 -r1.1.1.13.12.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c 26 May 2003 08:04:11 -0000 1.1.1.13 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c 6 Jun 2003 16:23:01 -0000 1.1.1.13.12.1 @@ -39,57 +39,6 @@ #define MAX_SG_ONSTACK 4 -/* BUGS: - * - we assume replay seqno is always present. - */ - -/* Move to common area: it is shared with AH. */ -/* Common with AH after some work on arguments. */ - -static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, struct ipv6_opt_hdr **prevhdr) -{ - u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); - u8 nextnexthdr; - - *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; - - while (offset + 1 < packet_len) { - - switch (*nexthdr) { - - case NEXTHDR_HOP: - case NEXTHDR_ROUTING: - offset += ipv6_optlen(exthdr); - *nexthdr = exthdr->nexthdr; - *prevhdr = exthdr; - exthdr = (struct ipv6_opt_hdr*)(packet + offset); - break; - - case NEXTHDR_DEST: - nextnexthdr = - ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; - /* XXX We know the option is inner dest opt - with next next header check. */ - if (nextnexthdr != NEXTHDR_HOP && - nextnexthdr != NEXTHDR_ROUTING && - nextnexthdr != NEXTHDR_DEST) { - return offset; - } - offset += ipv6_optlen(exthdr); - *nexthdr = exthdr->nexthdr; - *prevhdr = exthdr; - exthdr = (struct ipv6_opt_hdr*)(packet + offset); - break; - - default : - return offset; - } - } - - return offset; -} - int esp6_output(struct sk_buff *skb) { int err; @@ -101,12 +50,12 @@ struct crypto_tfm *tfm; struct esp_data *esp; struct sk_buff *trailer; - struct ipv6_opt_hdr *prevhdr = NULL; int blksize; int clen; int alen; int nfrags; - u8 nexthdr; + u8 *prevhdr; + u8 nexthdr = 0; /* First, if the skb is not checksummed, complete checksum. */ if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) { @@ -123,7 +72,9 @@ /* Strip IP header in transport mode. Save it. */ if (!x->props.mode) { - hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &prevhdr); + hdr_len = ip6_find_1stfragopt(skb, &prevhdr); + nexthdr = *prevhdr; + *prevhdr = IPPROTO_ESP; iph = kmalloc(hdr_len, GFP_ATOMIC); if (!iph) { err = -ENOMEM; @@ -178,18 +129,12 @@ ipv6_addr_copy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr); } else { - /* XXX exthdr */ esph = (struct ipv6_esp_hdr*)skb_push(skb, x->props.header_len); skb->h.raw = (unsigned char*)esph; top_iph = (struct ipv6hdr*)skb_push(skb, hdr_len); memcpy(top_iph, iph, hdr_len); kfree(iph); top_iph->payload_len = htons(skb->len + alen - sizeof(struct ipv6hdr)); - if (prevhdr) { - prevhdr->nexthdr = IPPROTO_ESP; - } else { - top_iph->nexthdr = IPPROTO_ESP; - } *(u8*)(trailer->tail - 1) = nexthdr; } @@ -302,6 +247,7 @@ struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; struct scatterlist *sg = sgbuf; u8 padlen; + u8 *prevhdr; if (unlikely(nfrags > MAX_SG_ONSTACK)) { sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); @@ -325,11 +271,13 @@ } /* ... check padding bits here. Silly. :-) */ - ret_nexthdr = ((struct ipv6hdr*)tmp_hdr)->nexthdr = nexthdr[1]; pskb_trim(skb, skb->len - alen - padlen - 2); skb->h.raw = skb_pull(skb, sizeof(struct ipv6_esp_hdr) + esp->conf.ivlen); skb->nh.raw += sizeof(struct ipv6_esp_hdr) + esp->conf.ivlen; memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + ip6_find_1stfragopt(skb, &prevhdr); + ret_nexthdr = *prevhdr = nexthdr[1]; } kfree(tmp_hdr); return ret_nexthdr; Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.16 retrieving revision 1.1.1.16.16.1 diff -u -r1.1.1.16 -r1.1.1.16.16.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c 26 May 2003 08:04:10 -0000 1.1.1.16 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c 6 Jun 2003 15:43:34 -0000 1.1.1.16.16.1 @@ -887,7 +887,7 @@ #endif } -int ip6_found_nexthdr(struct sk_buff *skb, u8 **nexthdr) +int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr) { u16 offset = sizeof(struct ipv6hdr); struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.ipv6h + 1); @@ -929,7 +929,7 @@ u8 *prevhdr, nexthdr = 0; dev = rt->u.dst.dev; - hlen = ip6_found_nexthdr(skb, &prevhdr); + hlen = ip6_find_1stfragopt(skb, &prevhdr); nexthdr = *prevhdr; mtu = dst_pmtu(&rt->u.dst) - hlen - sizeof(struct frag_hdr); Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipcomp6.c,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.14.1 diff -u -r1.1.1.2 -r1.1.1.2.14.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c 21 May 2003 13:15:20 -0000 1.1.1.2 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c 6 Jun 2003 15:43:34 -0000 1.1.1.2.14.1 @@ -105,7 +105,7 @@ iph = skb->nh.ipv6h; iph->payload_len = htons(skb->len); - ip6_found_nexthdr(skb, &prevhdr); + ip6_find_1stfragopt(skb, &prevhdr); *prevhdr = nexthdr; out: if (tmp_hdr) @@ -160,7 +160,7 @@ skb->nh.raw = skb->data; /* == top_iph */ skb->h.raw = skb->nh.raw + hdr_len; } else { - hdr_len = ip6_found_nexthdr(skb, &prevhdr); + hdr_len = ip6_find_1stfragopt(skb, &prevhdr); nexthdr = *prevhdr; } @@ -203,7 +203,7 @@ top_iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); skb->nh.raw = skb->data; /* top_iph */ - ip6_found_nexthdr(skb, &prevhdr); + ip6_find_1stfragopt(skb, &prevhdr); *prevhdr = IPPROTO_COMP; ipch = (struct ipv6_comp_hdr *)((unsigned char *)top_iph + hdr_len); Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipv6_syms.c,v retrieving revision 1.1.1.12 retrieving revision 1.1.1.12.16.4 diff -u -r1.1.1.12 -r1.1.1.12.16.4 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c 26 May 2003 08:04:11 -0000 1.1.1.12 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c 6 Jun 2003 17:38:20 -0000 1.1.1.12.16.4 @@ -35,6 +35,6 @@ EXPORT_SYMBOL(in6addr_any); EXPORT_SYMBOL(in6addr_loopback); EXPORT_SYMBOL(in6_dev_finish_destroy); -EXPORT_SYMBOL(ip6_found_nexthdr); +EXPORT_SYMBOL(ip6_find_1stfragopt); EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_clear_mutable_options); From shemminger@osdl.org Fri Jun 6 14:58:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 14:59:04 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56Lwr2x025321 for ; Fri, 6 Jun 2003 14:58:53 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h56LwZX11697; Fri, 6 Jun 2003 14:58:35 -0700 Date: Fri, 6 Jun 2003 14:58:35 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030606145835.3a263df8.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2939 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This is the first phase of a sequence of patches to resolve network device reference count issues exposed by the new sysfs interface. Phase I: introduces release_netdev which is the hook to allow later changes to hold onto the net device after the device has potentially unloaded. Includes patch for the easy to fix devices. Phase II: fixes devices that encapsulate network device structure inside their own structure, or allocate private data in a way that will break later. Phase III: changes release_netdev to handle the case of delayed freeing of the network device, and appropriate state checking. diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Thu Jun 5 14:44:28 2003 +++ b/include/linux/netdevice.h Thu Jun 5 14:44:28 2003 @@ -491,6 +491,7 @@ extern int dev_queue_xmit(struct sk_buff *skb); extern int register_netdevice(struct net_device *dev); extern int unregister_netdevice(struct net_device *dev); +extern void release_netdev(struct net_device *dev); extern void synchronize_net(void); extern int register_netdevice_notifier(struct notifier_block *nb); extern int unregister_netdevice_notifier(struct notifier_block *nb); diff -Nru a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c Thu Jun 5 14:44:28 2003 +++ b/net/core/dev.c Thu Jun 5 14:44:28 2003 @@ -2768,6 +2768,21 @@ } } + +/** + * release_netdev - free network device + * @dev: device + * + * This function does the last stage of destroying an allocated device + * interface. Currently, it just frees the device. + * + */ + +void release_netdev(struct net_device *dev) +{ + kfree(dev); +} + /* Synchronize with packet receive processing. */ void synchronize_net(void) { diff -Nru a/net/netsyms.c b/net/netsyms.c --- a/net/netsyms.c Thu Jun 5 14:44:28 2003 +++ b/net/netsyms.c Thu Jun 5 14:44:28 2003 @@ -558,6 +558,7 @@ EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); +EXPORT_SYMBOL(release_netdev); EXPORT_SYMBOL(synchronize_net); EXPORT_SYMBOL(netdev_state_change); EXPORT_SYMBOL(dev_new_index); From shemminger@osdl.org Fri Jun 6 16:07:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 16:08:08 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56N7s2x007585 for ; Fri, 6 Jun 2003 16:07:55 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h56N7dX32762; Fri, 6 Jun 2003 16:07:39 -0700 Date: Fri, 6 Jun 2003 16:07:39 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030606160739.0581bf39.shemminger@osdl.org> In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h56N7s2x007585 X-archive-position: 2940 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Here is a patch to convert the "easy" drivers to use release_netdev, instead of directly freeing the net_device. They all compile but only e100 and e1000 have been tested with real hardware. diff -Nru a/drivers/net/3c59x.c b/drivers/net/3c59x.c --- a/drivers/net/3c59x.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/3c59x.c Thu Jun 5 15:51:50 2003 @@ -1021,7 +1021,7 @@ outw (TotalReset|0x14, ioaddr + EL3_CMD); release_region (ioaddr, VORTEX_TOTAL_SIZE); - kfree (dev); + release_netdev (dev); return 0; } #endif @@ -3072,7 +3072,7 @@ vp->rx_ring_dma); if (vp->must_free_region) release_region(dev->base_addr, vp->io_size); - kfree(dev); + release_netdev(dev); } diff -Nru a/drivers/net/8139cp.c b/drivers/net/8139cp.c --- a/drivers/net/8139cp.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/8139cp.c Thu Jun 5 15:51:50 2003 @@ -1969,7 +1969,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } #ifdef CONFIG_PM diff -Nru a/drivers/net/8139too.c b/drivers/net/8139too.c --- a/drivers/net/8139too.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/8139too.c Thu Jun 5 15:51:50 2003 @@ -721,7 +721,7 @@ sizeof (struct rtl8139_private)); #endif /* RTL8139_NDEBUG */ - kfree (dev); + release_netdev (dev); pci_set_drvdata (pdev, NULL); } diff -Nru a/drivers/net/a2065.c b/drivers/net/a2065.c --- a/drivers/net/a2065.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/a2065.c Thu Jun 5 15:51:50 2003 @@ -820,7 +820,7 @@ release_mem_region(ZTWO_PADDR(dev->base_addr), sizeof(struct lance_regs)); release_mem_region(ZTWO_PADDR(dev->mem_start), A2065_RAM_SIZE); - kfree(dev); + release_netdev(dev); root_a2065_dev = next; } #endif diff -Nru a/drivers/net/amd8111e.c b/drivers/net/amd8111e.c --- a/drivers/net/amd8111e.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/amd8111e.c Thu Jun 5 15:51:50 2003 @@ -1709,7 +1709,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct amd8111e_priv *)(dev->priv))->mmio); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/ariadne.c b/drivers/net/ariadne.c --- a/drivers/net/ariadne.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ariadne.c Thu Jun 5 15:51:50 2003 @@ -852,7 +852,7 @@ unregister_netdev(dev); release_mem_region(ZTWO_PADDR(dev->base_addr), sizeof(struct Am79C960)); release_mem_region(ZTWO_PADDR(dev->mem_start), ARIADNE_RAM_SIZE); - kfree(dev); + release_netdev(dev); root_ariadne_dev = next; } #endif diff -Nru a/drivers/net/ariadne2.c b/drivers/net/ariadne2.c --- a/drivers/net/ariadne2.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ariadne2.c Thu Jun 5 15:51:50 2003 @@ -413,7 +413,7 @@ unregister_netdev(dev); free_irq(IRQ_AMIGA_PORTS, dev); release_mem_region(ZTWO_PADDR(dev->base_addr), NE_IO_EXTENT*2); - kfree(dev); + release_netdev(dev); root_ariadne2_dev = next; } #endif diff -Nru a/drivers/net/arm/am79c961a.c b/drivers/net/arm/am79c961a.c --- a/drivers/net/arm/am79c961a.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/am79c961a.c Thu Jun 5 15:51:50 2003 @@ -677,7 +677,7 @@ release_region(dev->base_addr, 0x18); nodev: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } diff -Nru a/drivers/net/arm/ether1.c b/drivers/net/arm/ether1.c --- a/drivers/net/arm/ether1.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/ether1.c Thu Jun 5 15:51:50 2003 @@ -1058,7 +1058,7 @@ release_region(dev->base_addr, 16); release_region(dev->base_addr + 0x800, 4096); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } @@ -1073,7 +1073,7 @@ release_region(dev->base_addr, 16); release_region(dev->base_addr + 0x800, 4096); - kfree(dev); + release_netdev(dev); } static const struct ecard_id ether1_ids[] = { diff -Nru a/drivers/net/arm/ether3.c b/drivers/net/arm/ether3.c --- a/drivers/net/arm/ether3.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/ether3.c Thu Jun 5 15:51:50 2003 @@ -899,7 +899,7 @@ release_region(dev->base_addr, 128); free: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } @@ -912,7 +912,7 @@ unregister_netdev(dev); release_region(dev->base_addr, 128); - kfree(dev); + release_netdev(dev); } static const struct ecard_id ether3_ids[] = { diff -Nru a/drivers/net/au1000_eth.c b/drivers/net/au1000_eth.c --- a/drivers/net/au1000_eth.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/au1000_eth.c Thu Jun 5 15:51:50 2003 @@ -823,7 +823,7 @@ MAX_BUF_SIZE * (NUM_TX_BUFFS+NUM_RX_BUFFS)); printk(KERN_ERR "%s: au1000_probe1 failed. Returns %d\n", dev->name, retval); - kfree(dev); + release_netdev(dev); return retval; } diff -Nru a/drivers/net/b44.c b/drivers/net/b44.c --- a/drivers/net/b44.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/b44.c Thu Jun 5 15:51:50 2003 @@ -1830,7 +1830,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct b44 *)(dev->priv))->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/bmac.c b/drivers/net/bmac.c --- a/drivers/net/bmac.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/bmac.c Thu Jun 5 15:51:50 2003 @@ -1452,7 +1452,7 @@ pmac_call_feature(PMAC_FTR_BMAC_ENABLE, bp->node, 0, 0); } unregister_netdev(dev); - kfree(dev); + release_netdev(dev); } static int bmac_open(struct net_device *dev) @@ -1710,7 +1710,7 @@ free_irq(bp->tx_dma_intr, dev); free_irq(bp->rx_dma_intr, dev); - kfree(dev); + release_netdev(dev); } while (bmac_devs != NULL); } diff -Nru a/drivers/net/declance.c b/drivers/net/declance.c --- a/drivers/net/declance.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/declance.c Thu Jun 5 15:51:50 2003 @@ -1203,7 +1203,7 @@ err_out: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); return ret; } diff -Nru a/drivers/net/dl2k.c b/drivers/net/dl2k.c --- a/drivers/net/dl2k.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/dl2k.c Thu Jun 5 15:51:50 2003 @@ -1844,7 +1844,7 @@ #ifdef MEM_MAPPING iounmap ((char *) (dev->base_addr)); #endif - kfree (dev); + release_netdev (dev); pci_release_regions (pdev); pci_disable_device (pdev); } diff -Nru a/drivers/net/e100/e100_main.c b/drivers/net/e100/e100_main.c --- a/drivers/net/e100/e100_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/e100/e100_main.c Thu Jun 5 15:51:50 2003 @@ -716,7 +716,7 @@ e100_dealloc_space(bdp); err_dev: pci_set_drvdata(pcid, NULL); - kfree(dev); + release_netdev(dev); out: return rc; } @@ -738,7 +738,7 @@ e100_dealloc_space(bdp); pci_set_drvdata(bdp->pdev, NULL); - kfree(dev); + release_netdev(dev); } static void __devexit diff -Nru a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/e1000/e1000_main.c Thu Jun 5 15:51:50 2003 @@ -542,7 +542,7 @@ iounmap(adapter->hw.hw_addr); pci_release_regions(pdev); - kfree(netdev); + release_netdev(netdev); } /** diff -Nru a/drivers/net/eepro100.c b/drivers/net/eepro100.c --- a/drivers/net/eepro100.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/eepro100.c Thu Jun 5 15:51:50 2003 @@ -2364,7 +2364,7 @@ + sizeof(struct speedo_stats), sp->tx_ring, sp->tx_ring_dma); pci_disable_device(pdev); - kfree(dev); + release_netdev(dev); } static struct pci_device_id eepro100_pci_tbl[] __devinitdata = { diff -Nru a/drivers/net/epic100.c b/drivers/net/epic100.c --- a/drivers/net/epic100.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/epic100.c Thu Jun 5 15:51:50 2003 @@ -1482,7 +1482,7 @@ iounmap((void*) dev->base_addr); #endif pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); /* pci_power_off(pdev, -1); */ } diff -Nru a/drivers/net/fealnx.c b/drivers/net/fealnx.c --- a/drivers/net/fealnx.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/fealnx.c Thu Jun 5 15:51:50 2003 @@ -712,7 +712,7 @@ #ifndef USE_IO_OPS iounmap((void *)dev->base_addr); #endif - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); } else diff -Nru a/drivers/net/hamachi.c b/drivers/net/hamachi.c --- a/drivers/net/hamachi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/hamachi.c Thu Jun 5 15:51:50 2003 @@ -1976,7 +1976,7 @@ hmp->tx_ring_dma); unregister_netdev(dev); iounmap((char *)dev->base_addr); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/hydra.c b/drivers/net/hydra.c --- a/drivers/net/hydra.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/hydra.c Thu Jun 5 15:51:50 2003 @@ -243,7 +243,7 @@ unregister_netdev(dev); free_irq(IRQ_AMIGA_PORTS, dev); release_mem_region(ZTWO_PADDR(dev->base_addr)-HYDRA_NIC_BASE, 0x10000); - kfree(dev); + release_netdev(dev); root_hydra_dev = next; } #endif diff -Nru a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c --- a/drivers/net/ixgb/ixgb_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ixgb/ixgb_main.c Thu Jun 5 15:51:50 2003 @@ -478,7 +478,7 @@ iounmap((void *) adapter->hw.hw_addr); pci_release_regions(pdev); - kfree(netdev); + release_netdev(netdev); } /** diff -Nru a/drivers/net/mace.c b/drivers/net/mace.c --- a/drivers/net/mace.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/mace.c Thu Jun 5 15:51:50 2003 @@ -254,7 +254,7 @@ release_OF_resource(mp->of_node, 1); release_OF_resource(mp->of_node, 2); } - kfree(dev); + release_netdev(dev); } static void dbdma_reset(volatile struct dbdma_regs *dma) @@ -976,7 +976,7 @@ release_OF_resource(mp->of_node, 1); release_OF_resource(mp->of_node, 2); - kfree(dev); + release_netdev(dev); } if (dummy_buf != NULL) { kfree(dummy_buf); diff -Nru a/drivers/net/myri_sbus.c b/drivers/net/myri_sbus.c --- a/drivers/net/myri_sbus.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/myri_sbus.c Thu Jun 5 15:51:50 2003 @@ -1090,7 +1090,7 @@ return 0; err: unregister_netdev(dev); /* This will also free the co-allocated 'dev->priv' */ - kfree(dev); + release_netdev(dev); return -ENODEV; } diff -Nru a/drivers/net/natsemi.c b/drivers/net/natsemi.c --- a/drivers/net/natsemi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/natsemi.c Thu Jun 5 15:51:50 2003 @@ -838,7 +838,7 @@ if (i) { pci_release_regions(pdev); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); return i; } diff -Nru a/drivers/net/ne2k-pci.c b/drivers/net/ne2k-pci.c --- a/drivers/net/ne2k-pci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ne2k-pci.c Thu Jun 5 15:51:50 2003 @@ -635,7 +635,7 @@ unregister_netdev(dev); release_region(dev->base_addr, NE_IO_EXTENT); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/pci-skeleton.c b/drivers/net/pci-skeleton.c --- a/drivers/net/pci-skeleton.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pci-skeleton.c Thu Jun 5 15:51:50 2003 @@ -871,7 +871,7 @@ sizeof (struct netdrv_private)); #endif /* NETDRV_NDEBUG */ - kfree (dev); + release_netdev (dev); pci_set_drvdata (pdev, NULL); diff -Nru a/drivers/net/pcmcia/ibmtr_cs.c b/drivers/net/pcmcia/ibmtr_cs.c --- a/drivers/net/pcmcia/ibmtr_cs.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pcmcia/ibmtr_cs.c Thu Jun 5 15:51:50 2003 @@ -310,7 +310,7 @@ /* Unlink device structure, free bits */ *linkp = link->next; unregister_netdev(dev); - kfree(dev); + release_netdev(dev); } /* ibmtr_detach */ /*====================================================================== diff -Nru a/drivers/net/pcnet32.c b/drivers/net/pcnet32.c --- a/drivers/net/pcnet32.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pcnet32.c Thu Jun 5 15:51:50 2003 @@ -1762,7 +1762,7 @@ if (lp->pci_dev) pci_unregister_driver(&pcnet32_driver); pci_free_consistent(lp->pci_dev, sizeof(*lp), lp, lp->dma_addr); - kfree(pcnet32_dev); + release_netdev(pcnet32_dev); pcnet32_dev = next_dev; } } diff -Nru a/drivers/net/r8169.c b/drivers/net/r8169.c --- a/drivers/net/r8169.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/r8169.c Thu Jun 5 15:51:50 2003 @@ -646,7 +646,7 @@ sizeof (struct net_device) + sizeof (struct rtl8169_private)); pci_disable_device(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/rrunner.c b/drivers/net/rrunner.c --- a/drivers/net/rrunner.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/rrunner.c Thu Jun 5 15:51:50 2003 @@ -253,7 +253,7 @@ rr->tx_ring_dma); unregister_netdev(dev); iounmap(rr->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/sis900.c b/drivers/net/sis900.c --- a/drivers/net/sis900.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sis900.c Thu Jun 5 15:51:50 2003 @@ -493,7 +493,7 @@ pci_set_drvdata(pci_dev, NULL); pci_release_regions(pci_dev); err_out: - kfree(net_dev); + release_netdev(net_dev); return ret; } @@ -2189,7 +2189,7 @@ pci_free_consistent(pci_dev, TX_TOTAL_SIZE, sis_priv->tx_ring, sis_priv->tx_ring_dma); unregister_netdev(net_dev); - kfree(net_dev); + release_netdev(net_dev); pci_release_regions(pci_dev); pci_set_drvdata(pci_dev, NULL); } diff -Nru a/drivers/net/skfp/skfddi.c b/drivers/net/skfp/skfddi.c --- a/drivers/net/skfp/skfddi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/skfp/skfddi.c Thu Jun 5 15:51:50 2003 @@ -2633,7 +2633,7 @@ } unregister_netdev(p); printk("%s: unloaded\n", p->name); - kfree(p); /* Free the device structure */ + release_netdev(p); /* Free the device structure */ return next; } // unlink_modules diff -Nru a/drivers/net/starfire.c b/drivers/net/starfire.c --- a/drivers/net/starfire.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/starfire.c Thu Jun 5 15:51:50 2003 @@ -2196,7 +2196,7 @@ pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); /* Will also free np!! */ + release_netdev(dev); /* Will also free np!! */ } diff -Nru a/drivers/net/sunbmac.c b/drivers/net/sunbmac.c --- a/drivers/net/sunbmac.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sunbmac.c Thu Jun 5 15:51:50 2003 @@ -1209,7 +1209,7 @@ unregister_netdev(dev); /* This also frees the co-located 'dev->priv' */ - kfree(dev); + release_netdev(dev); return -ENODEV; } diff -Nru a/drivers/net/sundance.c b/drivers/net/sundance.c --- a/drivers/net/sundance.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sundance.c Thu Jun 5 15:51:50 2003 @@ -730,7 +730,7 @@ #endif pci_release_regions(pdev); err_out_netdev: - kfree (dev); + release_netdev(dev); return -ENODEV; } @@ -1784,7 +1784,7 @@ #ifndef USE_IO_OPS iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } } diff -Nru a/drivers/net/sungem.c b/drivers/net/sungem.c --- a/drivers/net/sungem.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sungem.c Thu Jun 5 15:51:50 2003 @@ -2885,7 +2885,7 @@ gp->gblock_dvma); iounmap((void *) gp->regs); pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/sunhme.c b/drivers/net/sunhme.c --- a/drivers/net/sunhme.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sunhme.c Thu Jun 5 15:51:50 2003 @@ -3351,7 +3351,7 @@ pci_release_regions(hp->happy_dev); } #endif - kfree(dev); + release_netdev(dev); root_happy_dev = next; } diff -Nru a/drivers/net/tc35815.c b/drivers/net/tc35815.c --- a/drivers/net/tc35815.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tc35815.c Thu Jun 5 15:51:50 2003 @@ -1762,7 +1762,7 @@ next_dev = ((struct tc35815_local *)dev->priv)->next_module; iounmap((void *)(dev->base_addr)); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); root_tc35815_dev = next_dev; } } diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c --- a/drivers/net/tg3.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tg3.c Thu Jun 5 15:51:50 2003 @@ -6942,7 +6942,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct tg3 *)(dev->priv))->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/tlan.c b/drivers/net/tlan.c --- a/drivers/net/tlan.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tlan.c Thu Jun 5 15:51:50 2003 @@ -447,7 +447,7 @@ pci_release_regions(pdev); - kfree( dev ); + release_netdev( dev ); pci_set_drvdata( pdev, NULL ); } @@ -695,7 +695,7 @@ release_region( dev->base_addr, 0x10); unregister_netdev( dev ); TLan_Eisa_Devices = priv->nextDevice; - kfree( dev ); + release_netdev( dev ); tlan_have_eisa--; } } diff -Nru a/drivers/net/tokenring/abyss.c b/drivers/net/tokenring/abyss.c --- a/drivers/net/tokenring/abyss.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/abyss.c Thu Jun 5 15:51:50 2003 @@ -443,7 +443,7 @@ release_region(dev->base_addr-0x10, ABYSS_IO_EXTENT); free_irq(dev->irq, dev); tmsdev_term(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tokenring/lanstreamer.c b/drivers/net/tokenring/lanstreamer.c --- a/drivers/net/tokenring/lanstreamer.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/lanstreamer.c Thu Jun 5 15:51:50 2003 @@ -433,7 +433,7 @@ /* shouldn't we do iounmap here? */ release_region(pci_resource_start(pdev, 0), pci_resource_len(pdev,0)); release_mem_region(pci_resource_start(pdev, 1), pci_resource_len(pdev,1)); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tokenring/olympic.c b/drivers/net/tokenring/olympic.c --- a/drivers/net/tokenring/olympic.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/olympic.c Thu Jun 5 15:51:50 2003 @@ -1778,7 +1778,7 @@ iounmap(olympic_priv->olympic_lap) ; pci_release_regions(pdev) ; pci_set_drvdata(pdev,NULL) ; - kfree(dev) ; + release_netdev(dev) ; } static struct pci_driver olympic_driver = { diff -Nru a/drivers/net/tokenring/smctr.c b/drivers/net/tokenring/smctr.c --- a/drivers/net/tokenring/smctr.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/smctr.c Thu Jun 5 15:51:50 2003 @@ -5730,7 +5730,7 @@ if (dev) { unregister_netdev(dev); cleanup_card(dev); - kfree(dev); + release_netdev(dev); } } } diff -Nru a/drivers/net/tokenring/tmspci.c b/drivers/net/tokenring/tmspci.c --- a/drivers/net/tokenring/tmspci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/tmspci.c Thu Jun 5 15:51:50 2003 @@ -229,7 +229,7 @@ release_region(dev->base_addr, TMS_PCI_IO_EXTENT); free_irq(dev->irq, dev); tmsdev_term(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c --- a/drivers/net/tulip/de2104x.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/de2104x.c Thu Jun 5 15:51:50 2003 @@ -2153,7 +2153,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } #ifdef CONFIG_PM diff -Nru a/drivers/net/tulip/dmfe.c b/drivers/net/tulip/dmfe.c --- a/drivers/net/tulip/dmfe.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/dmfe.c Thu Jun 5 15:51:50 2003 @@ -478,7 +478,7 @@ db->buf_pool_ptr, db->buf_pool_dma_ptr); unregister_netdev(dev); pci_release_regions(pdev); - kfree(dev); /* free board information */ + release_netdev(dev); /* free board information */ pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c --- a/drivers/net/tulip/tulip_core.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/tulip_core.c Thu Jun 5 15:51:50 2003 @@ -1767,7 +1767,7 @@ #ifndef USE_IO_OPS iounmap((void *)dev->base_addr); #endif - kfree (dev); + release_netdev (dev); pci_release_regions (pdev); pci_set_drvdata (pdev, NULL); diff -Nru a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c --- a/drivers/net/tulip/winbond-840.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/winbond-840.c Thu Jun 5 15:51:50 2003 @@ -1623,7 +1623,7 @@ #ifndef USE_IO_OPS iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); } pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/tulip/xircom_cb.c b/drivers/net/tulip/xircom_cb.c --- a/drivers/net/tulip/xircom_cb.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/xircom_cb.c Thu Jun 5 15:51:50 2003 @@ -338,7 +338,7 @@ } release_region(dev->base_addr, 128); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); leave("xircom_remove"); } diff -Nru a/drivers/net/tulip/xircom_tulip_cb.c b/drivers/net/tulip/xircom_tulip_cb.c --- a/drivers/net/tulip/xircom_tulip_cb.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/xircom_tulip_cb.c Thu Jun 5 15:51:50 2003 @@ -645,11 +645,11 @@ return 0; err_out_cleardev: + unregister_netdev(dev); pci_set_drvdata(pdev, NULL); pci_release_regions(pdev); err_out_free_netdev: - unregister_netdev(dev); - kfree(dev); + release_netdev(dev); return -ENODEV; } @@ -1702,7 +1702,7 @@ printk(KERN_INFO "xircom_remove_one(%s)\n", dev->name); unregister_netdev(dev); pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/typhoon.c b/drivers/net/typhoon.c --- a/drivers/net/typhoon.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/typhoon.c Thu Jun 5 15:51:50 2003 @@ -2476,7 +2476,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } static struct pci_driver typhoon_driver = { diff -Nru a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c --- a/drivers/net/via-rhine.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/via-rhine.c Thu Jun 5 15:51:50 2003 @@ -1872,7 +1872,7 @@ iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c --- a/drivers/net/wireless/airo.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/airo.c Thu Jun 5 15:51:50 2003 @@ -1573,7 +1573,7 @@ release_region( dev->base_addr, 64 ); } del_airo_dev( dev ); - kfree( dev ); + release_netdev( dev ); } EXPORT_SYMBOL(stop_airo_card); diff -Nru a/drivers/net/wireless/orinoco_cs.c b/drivers/net/wireless/orinoco_cs.c --- a/drivers/net/wireless/orinoco_cs.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_cs.c Thu Jun 5 15:51:50 2003 @@ -290,8 +290,9 @@ DEBUG(0, "orinoco_cs: About to unregister net device %p\n", dev); unregister_netdev(dev); - } - kfree(dev); + release_netdev(dev); + } else + kfree(dev); } /* orinoco_cs_detach */ /* diff -Nru a/drivers/net/wireless/orinoco_pci.c b/drivers/net/wireless/orinoco_pci.c --- a/drivers/net/wireless/orinoco_pci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_pci.c Thu Jun 5 15:51:50 2003 @@ -289,7 +289,7 @@ iounmap((unsigned char *) priv->hw.iobase); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); pci_disable_device(pdev); } diff -Nru a/drivers/net/wireless/orinoco_tmd.c b/drivers/net/wireless/orinoco_tmd.c --- a/drivers/net/wireless/orinoco_tmd.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_tmd.c Thu Jun 5 15:51:50 2003 @@ -182,7 +182,7 @@ pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); release_region(pci_resource_start(pdev, 2), pci_resource_len(pdev, 2)); diff -Nru a/drivers/net/yellowfin.c b/drivers/net/yellowfin.c --- a/drivers/net/yellowfin.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/yellowfin.c Thu Jun 5 15:51:50 2003 @@ -1486,7 +1486,7 @@ iounmap ((void *) dev->base_addr); #endif - kfree (dev); + release_netdev (dev); pci_set_drvdata(pdev, NULL); } From davem@redhat.com Sat Jun 7 02:08:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 02:08:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h579872x023757 for ; Sat, 7 Jun 2003 02:08:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA07413; Sat, 7 Jun 2003 02:05:28 -0700 Date: Sat, 07 Jun 2003 02:05:28 -0700 (PDT) Message-Id: <20030607.020528.68152135.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com, viro@parcelfarce.linux.theplanet.co.uk Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> References: <20030606145835.3a263df8.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2941 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 6 Jun 2003 14:58:35 -0700 Phase I: introduces release_netdev which is the hook to allow later changes to hold onto the net device after the device has potentially unloaded. Includes patch for the easy to fix devices. Besides naming (thought this was going to be named netdev_drop) I have no problems. Al? From davem@redhat.com Sat Jun 7 02:25:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 02:25:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h579PM2x025507 for ; Sat, 7 Jun 2003 02:25:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA07492; Sat, 7 Jun 2003 02:22:41 -0700 Date: Sat, 07 Jun 2003 02:22:41 -0700 (PDT) Message-Id: <20030607.022241.98862720.davem@redhat.com> To: mk@linux-ipv6.org Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] fix esp6 extension headers handling From: "David S. Miller" In-Reply-To: <87wufzxe8p.wl@karaba.org> References: <3EDF3EB4.8010105@tml.hut.fi> <873cioqxch.wl@karaba.org> <87wufzxe8p.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2942 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / $B?@ED(B $B=<(B Date: Sat, 07 Jun 2003 03:17:10 +0900 The attached diff fixes esp6 extension headers handling bug which reported by Henrik. I introduced ip6_find_1stfragopt() instead of get_offset(). # ip6_found_nexthdr() is just renamed ip6_find_1stfragopt() # in order to represent collect functionality. Patch applied, thank you. From davem@redhat.com Sat Jun 7 03:35:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 03:35:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57AZD2x027804 for ; Sat, 7 Jun 2003 03:35:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA07711; Sat, 7 Jun 2003 03:30:59 -0700 Date: Sat, 07 Jun 2003 03:30:59 -0700 (PDT) Message-Id: <20030607.033059.48393210.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: References: <20030603.213458.112594590.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2943 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Wed, 4 Jun 2003 15:40:02 +0300 (EEST) The revised version is attached to this mail, Looks ok, but sorry two things need to be fixed up first: 1) Doesn't apply anymore, I think it's because of the struct sock member renames, just replace sk->foo with sk->sk_foo 2) Just export all those routines from net/ipv6/ipv6_syms.c always, remove the ifdefs. I promise to apply it after you fix this stuff up :))) Thank you. From yoshfuji@linux-ipv6.org Sat Jun 7 03:41:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 03:41:12 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57Af72x028217 for ; Sat, 7 Jun 2003 03:41:07 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h57AfpBo027249; Sat, 7 Jun 2003 19:41:51 +0900 Date: Sat, 07 Jun 2003 19:41:51 +0900 (JST) Message-Id: <20030607.194151.112395246.yoshfuji@linux-ipv6.org> To: davem@redhat.com, vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030607.033059.48393210.davem@redhat.com> References: <20030603.213458.112594590.davem@redhat.com> <20030607.033059.48393210.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2944 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030607.033059.48393210.davem@redhat.com> (at Sat, 07 Jun 2003 03:30:59 -0700 (PDT)), "David S. Miller" says: > I promise to apply it after you fix this stuff up :))) Please be sure not to include "for MIPv6" from the changeset. :-) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From shemminger@osdl.org Sat Jun 7 08:25:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 08:25:25 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57FPJ2x002720 for ; Sat, 7 Jun 2003 08:25:20 -0700 Received: from mylinux.hemminger.net (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h57FP3X20042; Sat, 7 Jun 2003 08:25:03 -0700 Date: Sat, 7 Jun 2003 08:25:15 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: jgarzik@pobox.com, netdev@oss.sgi.com, viro@parcelfarce.linux.theplanet.co.uk Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030607082515.6168be46.shemminger@osdl.org> In-Reply-To: <20030607.020528.68152135.davem@redhat.com> References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607.020528.68152135.davem@redhat.com> Organization: OSDL X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2945 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 07 Jun 2003 02:05:28 -0700 (PDT) "David S. Miller" wrote: > From: Stephen Hemminger > Date: Fri, 6 Jun 2003 14:58:35 -0700 > > Phase I: introduces release_netdev which is the hook to allow later > changes to hold onto the net device after the device has potentially > unloaded. Includes patch for the easy to fix devices. > > Besides naming (thought this was going to be named netdev_drop) > I have no problems. > > Al? My (admittedly weak) rational for this was: - it seemed more like part of the register/unregister process and those functions are named {un}register_netdevice - RTNL should not be held, same as unregister_netdev (vs unregister_netdevice which requires it). - release rather than drop because release is used as name in the kobject callback hook But it's easy to change now. From bunk@fs.tum.de Sat Jun 7 12:12:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:12:54 -0700 (PDT) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JCg2x011120 for ; Sat, 7 Jun 2003 12:12:43 -0700 Received: (qmail 5044 invoked from network); 7 Jun 2003 19:12:37 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 7 Jun 2003 19:12:37 -0000 Date: Sat, 7 Jun 2003 21:12:35 +0200 From: Adrian Bunk To: Jon Grimm Cc: Margit Schubert-While , lksctp-developers@lists.sourceforge.net, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: SCTP config 2.5.70(-bk) Message-ID: <20030607191235.GE13377@fs.tum.de> References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> <20030603130308.GC27168@fs.tum.de> <3EDD0DFC.4080806@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EDD0DFC.4080806@us.ibm.com> User-Agent: Mutt/1.4.1i X-archive-position: 2946 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev On Tue, Jun 03, 2003 at 04:07:08PM -0500, Jon Grimm wrote: > Hi Adrian, Hi Jon, > Sorry for a bit of delay... We are away at an SCTP Interoperability > event. the delay before my answer was bigger... > Adrian Bunk wrote: > >On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > > > > > >>CONFIG_IPV6_SCTP__ is always being set to "y" even though > >>not selected (CONFIG_IPV6 not set) > > > > > >First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't > >result in anything getting compiled. > > > >But besides, it seems a bit broken. > > > >From net/sctp/Kconfig: > > > ><-- snip --> > > > >... > > > >config IPV6_SCTP__ > > tristate > > default y if IPV6=n > > default IPV6 if IPV6 > > > >config IP_SCTP > > tristate "The SCTP Protocol (EXPERIMENTAL)" > > depends on IPV6_SCTP__ > >... > > > ><-- snip --> > > > > > >Semantically equivalent is the following for IPV6_SCTP__: > > > >config IPV6_SCTP__ > > tristate > > default y if IPV6=n || IPV6=y > > default m if IPV6=m > > > > > >If it was intended to disallow a static IP_SCTP with a modular IPV6 it > >doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and > >later compile and install a modular IPV6 for the same kernel. > > > > Are you sure? I vaguely remember one of the network structs having > #ifdef'd fields for v6. Consequently, if one compiles first without, > but the tries later compiles/loads ipv6... bad things happen as the > kernel has a different concept of what the sock is. after reading this at net/Kconfig: <-- snip --> ... # IPv6 as module will cause a CRASH if you try to unload it config IPV6 tristate "The IPv6 protocol (EXPERIMENTAL)" ... <-- snip --> I'm wondering whether it might be an idea to disallow the modular building of IPv6 support? > >Could someone from the SCTP developers comment on the intentions behind > >IPV6_SCTP__ ? > > > > Yes. The intent was to at least discourage a configuration that will > segfault. It's currently discouraged but not completelyt impossible to select... > Thanks, > jon cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From garzik@gtf.org Sat Jun 7 12:15:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:15:27 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JFN2x011431 for ; Sat, 7 Jun 2003 12:15:23 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 595FF6611; Sat, 7 Jun 2003 15:15:22 -0400 (EDT) Date: Sat, 7 Jun 2003 15:15:22 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030607191522.GB3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 2947 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Fri, Jun 06, 2003 at 02:58:35PM -0700, Stephen Hemminger wrote: > This is the first phase of a sequence of patches to resolve network > device reference count issues exposed by the new sysfs interface. > > Phase I: introduces release_netdev which is the hook to allow later > changes to hold onto the net device after the device has potentially > unloaded. Includes patch for the easy to fix devices. > > Phase II: fixes devices that encapsulate network device structure > inside their own structure, or allocate private data in a way > that will break later. I would prefer to fix the drivers _before_ anything else. i.e. Phase 2 becomes Phase 1. These often need to be merged into 2.4 as well, and they can be applied to all drivers without any API changes. The changes are separated out from any refcounting/sysfs stuff, and can (potentially) be considered and reviewed by the respective maintainers. Jeff From garzik@gtf.org Sat Jun 7 12:16:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:16:32 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JGS2x011749 for ; Sat, 7 Jun 2003 12:16:29 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 984AA6611; Sat, 7 Jun 2003 15:16:28 -0400 (EDT) Date: Sat, 7 Jun 2003 15:16:28 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030607191628.GC3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 2948 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev IOW, release_netdev is basically a search-n-replace change that can be done to drivers anytime. Let's apply the "meat" changes to mainline first, the bug fixes / cleanups to use dynamic alloc. Jeff From ryan@michonline.com Sat Jun 7 18:49:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 18:49:41 -0700 (PDT) Received: from michonline.com (mail@pcp01184054pcs.strl301.mi.comcast.net [68.60.186.73]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h581nU2x022061 for ; Sat, 7 Jun 2003 18:49:30 -0700 Received: from mythical ([10.37.3.11] ident=mail) by michonline.com with esmtp (Exim 3.36 #1 (Debian)) id 19OpJG-0007g7-00; Sat, 07 Jun 2003 21:49:26 -0400 Received: from ryan by mythical with local (Exim 3.36 #1 (Debian)) id 19OpOl-0004yl-00; Sat, 07 Jun 2003 21:55:07 -0400 Date: Sat, 7 Jun 2003 21:55:07 -0400 From: Ryan Anderson To: linux-kernel@vger.kernel.org, Linus Torvalds , "David S. Miller" , kernel-janitor-discuss@lists.sourceforge.net Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Remove K&R prototypes in ppp_deflate.c Message-ID: <20030608015507.GA19133@michonline.com> Mail-Followup-To: linux-kernel@vger.kernel.org, Linus Torvalds , "David S. Miller" , kernel-janitor-discuss@lists.sourceforge.net, netdev@oss.sgi.com References: <20030608003916.GF20872@michonline.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608003916.GF20872@michonline.com> User-Agent: Mutt/1.5.4i X-archive-position: 2949 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ryan@michonline.com Precedence: bulk X-list: netdev I forgot to cc: netdev at first, sorry! On Sat, Jun 07, 2003 at 08:39:16PM -0400, Ryan Anderson wrote: > This patch removes the K&R initializers in ppp_deflate.c in favor of > more modern constructions. > > Once the other zlib cleanups appear to be stabilized, I'll look at > moving those cleanups into ppp_deflate.c as well. > > Dave, I think I sent this to you already once, if it's in your queue > already, please ignore this resend. > > # This is a BitKeeper generated patch for the following project: > # Project Name: Linux kernel tree > # This patch format is intended for GNU patch command version 2.5 or higher. > # This patch includes the following deltas: > # ChangeSet 1.1259 -> 1.1260 > # drivers/net/ppp_deflate.c 1.10 -> 1.11 > # > # The following is the BitKeeper ChangeSet Log > # -------------------------------------------- > # 03/06/02 ryan@mythryan2.(none) 1.1260 > # Remove the use of K&R prototypes from ppp_deflate.c > # -------------------------------------------- > # > diff -Nru a/drivers/net/ppp_deflate.c b/drivers/net/ppp_deflate.c > --- a/drivers/net/ppp_deflate.c Mon Jun 2 09:39:01 2003 > +++ b/drivers/net/ppp_deflate.c Mon Jun 2 09:39:01 2003 > @@ -78,8 +78,7 @@ > static void z_comp_stats __P((void *state, struct compstat *stats)); > > static void > -z_comp_free(arg) > - void *arg; > +z_comp_free(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -95,9 +94,7 @@ > * Allocate space for a compressor. > */ > static void * > -z_comp_alloc(options, opt_len) > - unsigned char *options; > - int opt_len; > +z_comp_alloc(unsigned char *options, int opt_len) > { > struct ppp_deflate_state *state; > int w_size; > @@ -136,10 +133,8 @@ > } > > static int > -z_comp_init(arg, options, opt_len, unit, hdrlen, debug) > - void *arg; > - unsigned char *options; > - int opt_len, unit, hdrlen, debug; > +z_comp_init(void *arg, unsigned char *options, > + int opt_len, int unit, int hdrlen, int debug) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -161,8 +156,7 @@ > } > > static void > -z_comp_reset(arg) > - void *arg; > +z_comp_reset(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -171,11 +165,9 @@ > } > > int > -z_compress(arg, rptr, obuf, isize, osize) > - void *arg; > - unsigned char *rptr; /* uncompressed packet (in) */ > - unsigned char *obuf; /* compressed packet (out) */ > - int isize, osize; > +z_compress(void *arg, unsigned char *rptr, /* uncompressed packet (in) */ > + unsigned char *obuf, /* compressed packet (out) */ > + int isize, int osize) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int r, proto, off, olen, oavail; > @@ -252,9 +244,7 @@ > } > > static void > -z_comp_stats(arg, stats) > - void *arg; > - struct compstat *stats; > +z_comp_stats(void *arg, struct compstat *stats) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -262,8 +252,7 @@ > } > > static void > -z_decomp_free(arg) > - void *arg; > +z_decomp_free(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -279,9 +268,7 @@ > * Allocate space for a decompressor. > */ > static void * > -z_decomp_alloc(options, opt_len) > - unsigned char *options; > - int opt_len; > +z_decomp_alloc(unsigned char *options, int opt_len) > { > struct ppp_deflate_state *state; > int w_size; > @@ -318,10 +305,8 @@ > } > > static int > -z_decomp_init(arg, options, opt_len, unit, hdrlen, mru, debug) > - void *arg; > - unsigned char *options; > - int opt_len, unit, hdrlen, mru, debug; > +z_decomp_init(void *arg, unsigned char *options, > + int opt_len, int unit, int hdrlen, int mru, int debug) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -344,8 +329,7 @@ > } > > static void > -z_decomp_reset(arg) > - void *arg; > +z_decomp_reset(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -370,12 +354,8 @@ > * compression, even though they are detected by inspecting the input. > */ > int > -z_decompress(arg, ibuf, isize, obuf, osize) > - void *arg; > - unsigned char *ibuf; > - int isize; > - unsigned char *obuf; > - int osize; > +z_decompress(void *arg, unsigned char *ibuf, int isize, > + unsigned char *obuf, int osize) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int olen, seq, r; > @@ -478,10 +458,7 @@ > * Incompressible data has arrived - add it to the history. > */ > static void > -z_incomp(arg, ibuf, icnt) > - void *arg; > - unsigned char *ibuf; > - int icnt; > +z_incomp(void *arg, unsigned char *ibuf, int icnt) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int proto, r; > > > -- > > Ryan Anderson > sometimes Pug Majere > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ryan Anderson sometimes Pug Majere From davem@redhat.com Sun Jun 8 00:01:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 00:01:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5871F2x026737 for ; Sun, 8 Jun 2003 00:01:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA08997; Sat, 7 Jun 2003 23:58:26 -0700 Date: Sat, 07 Jun 2003 23:58:25 -0700 (PDT) Message-Id: <20030607.235825.71096085.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030607191522.GB3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607191522.GB3346@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2950 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Sat, 7 Jun 2003 15:15:22 -0400 I would prefer to fix the drivers _before_ anything else. i.e. Phase 2 becomes Phase 1. These often need to be merged into 2.4 as well, and they can be applied to all drivers without any API changes. The changes are separated out from any refcounting/sysfs stuff, and can (potentially) be considered and reviewed by the respective maintainers. Have you extracted out all the init_etherdev() killings Al and myself did so you can backport them to 2.4.x too? If you're not going to do that, there is not much point in trying to sync other such things back to 2.4.x as well. From fw@deneb.enyo.de Sun Jun 8 04:40:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 04:40:19 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58Be42x010062 for ; Sun, 8 Jun 2003 04:40:06 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19OyWb-0001Jh-00; Sun, 08 Jun 2003 13:39:49 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19OyWb-0001RY-FS; Sun, 08 Jun 2003 13:39:49 +0200 To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress References: <87adda6uro.fsf@deneb.enyo.de> <20030526.002934.132904126.davem@redhat.com> <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> From: Florian Weimer Date: Sun, 08 Jun 2003 13:39:49 +0200 In-Reply-To: <20030526.233211.54217447.davem@redhat.com> (David S. Miller's message of "Mon, 26 May 2003 23:32:11 -0700 (PDT)") Message-ID: <87he70re62.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2951 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev "David S. Miller" writes: > Of course, this will result in vastly decreased functionality (no > arbitary netmasks, no policy-based routing, code will be fine-tuned > for typical Internet routing tables), so this proposal definitely > comes at a price. > > As a general purpose operating system, where people DO in fact use > these features quite regularly, Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking devices (and routing protocols!) which support them. 8-/ Anyway, I've played a bit with something inspired by CEF (more precisely speaking, one diagram in the IOS internals book and some IOS diagnostic output). Basically, it's a 256-way trie, with "adjacency information" at the leaves (consisting of L2 addressing information and the prefix length). The leaves contain a full list of child nodes which reference to the leaf itself. This allows for branch-free routing (see below). (A further optimization would not allocate the self-referencing pointers for leaves which are at the fourth layer of the trie, but this is unlikely to have a hughe performance impact.) The trie has 7862 internal nodes for my copy of the Internet routing table, which amounts to 8113584 bytes (excluding memory management overhead, twice the value for 64 bit architectures). The numer of internal nodes does not depend on the number of interfaces/peerings, and prefix filtering based on their lengths (/27 or even /24) doesn't make a huge difference either. For each adjacency, space for the L2 addressing information is required plus 256 pointers for the self-references (of course, for each relevant prefix length, so you have a few kilobytes for a typical peering). The routing function looks like this: struct cef_entry * cef_route (struct cef_table *table, ipv4_t address) { unsigned char octet1 = address >> 24; unsigned char octet2 = (address >> 16) & 0xFF; unsigned char octet3 = (address >> 8) & 0xFF; unsigned char octet4 = address & 0xFF; struct cef_entry * entry1 = table->children[octet1]; struct cef_entry * entry2 = entry1->table[octet2]; struct cef_entry * entry3 = entry2->table[octet3]; struct cef_entry * entry4 = entry3->table[octet4]; return entry4; } For the full routing table with "maximum" adjacency information (different L2 addressing information for each origin AS) and "real-world" addresses (captured at the border of a medium-size network, local addresses filtered), the function needs about 82 cycles per routing decision on my Athlon XP (including function call overhead). For random addresses, we have 155 cycles. In a simulation of a moderate peering (only 94 adjacencies, simulated interfaces to half a dozen AS concentrated in Germany), I measured 45 cycles per routing decision for real-world traffic, and 70 cycles for random addresses. (More peerings result in more adjacencies which lead to fewer cache hits.) You can save 1K (or 2K on 64-bit architectures) per adjacency if you introduce data-dependent branches: struct cef_entry * cef_route (struct cef_table *table, ipv4_t address) { unsigned char octet1 = address >> 24; struct cef_entry * entry1 = table->children[octet1]; if (entry1->prefix_length < 0) { unsigned char octet2 = (address >> 16) & 0xFF; struct cef_entry * entry2 = entry1->table[octet2]; if (entry2->prefix_length < 0) { unsigned char octet3 = (address >> 8) & 0xFF; struct cef_entry * entry3 = entry2->table[octet3]; if (entry3->prefix_length < 0) { unsigned char octet4 = address & 0xFF; struct cef_entry * entry4 = entry3->table[octet4]; return entry4; } else { return entry3; } } else { return entry2; } } else { return entry1; } } However, this decreases performance (even on my Athlon XP with just 256 KB cache). At the moment, I've got a userspace prototype for simulations which can build the trie and make routing decisions. Removing entries is a bit tricky and requires more data because formerly overridden prefixes might have to be resurrected. I'm unsure which data structures should be used to solve this problem. Memory management is a related question, too. And locking. *sigh* From davem@redhat.com Sun Jun 8 05:07:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 05:07:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58C7p2x010585 for ; Sun, 8 Jun 2003 05:07:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA15712; Sun, 8 Jun 2003 05:05:01 -0700 Date: Sun, 08 Jun 2003 05:05:00 -0700 (PDT) Message-Id: <20030608.050500.28795668.davem@redhat.com> To: fw@deneb.enyo.de Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <87he70re62.fsf@deneb.enyo.de> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2952 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Florian Weimer Date: Sun, 08 Jun 2003 13:39:49 +0200 "David S. Miller" writes: > As a general purpose operating system, where people DO in fact use > these features quite regularly, Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking devices (and routing protocols!) which support them. 8-/ Yes, people use source based routing to block specific IPs and subnets, it's also needed to Mobile IPV4. Anyway, I've played a bit with something inspired by CEF (more precisely speaking, one diagram in the IOS internals book and some IOS diagnostic output). Thanks, Alexey and myself will need to study this deeply. Although, I hope it's not "too similar" to what CEF does because undoubtedly Cisco has a bazillion patents in this area. This is actually an argument for coming up with out own algorithms without any knowledge of what CEF does or might do. :( From fw@deneb.enyo.de Sun Jun 8 06:10:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 06:10:45 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58DAc2x011651 for ; Sun, 8 Jun 2003 06:10:39 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19OzwH-0006NA-00; Sun, 08 Jun 2003 15:10:25 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19OzwH-0001c8-Lv; Sun, 08 Jun 2003 15:10:25 +0200 To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> From: Florian Weimer Date: Sun, 08 Jun 2003 15:10:25 +0200 In-Reply-To: <20030608.050500.28795668.davem@redhat.com> (David S. Miller's message of "Sun, 08 Jun 2003 05:05:00 -0700 (PDT)") Message-ID: <874r30r9z2.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2953 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev "David S. Miller" writes: > Although, I hope it's not "too similar" to what CEF does because > undoubtedly Cisco has a bazillion patents in this area. Most things in this area are patented, and the patents are extremely fuzzy (e.g. policy-based routing with hierarchical sequence of decisions has been patented countless times). 8-( > This is actually an argument for coming up with out own algorithms > without any knowledge of what CEF does or might do. :( The branchless variant is not described in the IOS book, and I can't tell if Cisco routers use it. If this idea is really novel, we are in pretty good shape because we no longer use trees, tries or whatever, but a DFA. 8-) Further parameters which could be tweaked is the kind of adjacency information (where to store the L2 information, whether to include the prefix length in the adjacency record etc.). From jmorris@intercode.com.au Sun Jun 8 08:49:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 08:49:05 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:+znejfElQ/lNDmBoGnO9/mGdqkwwfswy@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58Fmw2x024215 for ; Sun, 8 Jun 2003 08:48:59 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h58Flrr27183; Mon, 9 Jun 2003 01:47:53 +1000 Date: Mon, 9 Jun 2003 01:47:52 +1000 (EST) From: James Morris To: Kazunori Miyazawa cc: davem@redhat.com, , , Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2954 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, Kazunori Miyazawa wrote: > In output functions, dst is changed by xfrm_lookup if there is > any matching policy. Therefore original dst which is held before > calling xfrm_lookup will be never released. > When xfrm_lookup scceeds and dst is changed, original dst should > be release. It is released in xfrm_lookup(): *dst_p = dst; ip_rt_put(rt); xfrm_pol_put(policy); return 0; - James -- James Morris From pekkas@netcore.fi Sun Jun 8 10:59:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 10:59:21 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58HxD2x025483 for ; Sun, 8 Jun 2003 10:59:14 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h58HwvY03749; Sun, 8 Jun 2003 20:58:58 +0300 Date: Sun, 8 Jun 2003 20:58:57 +0300 (EEST) From: Pekka Savola To: Florian Weimer cc: "David S. Miller" , , Subject: Re: Route cache performance under stress In-Reply-To: <87he70re62.fsf@deneb.enyo.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2955 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Florian Weimer wrote: > "David S. Miller" writes: > > > Of course, this will result in vastly decreased functionality (no > > arbitary netmasks, no policy-based routing, code will be fine-tuned > > for typical Internet routing tables), so this proposal definitely > > comes at a price. > > > > As a general purpose operating system, where people DO in fact use > > these features quite regularly, > > Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking > devices (and routing protocols!) which support them. 8-/ Do you mean netmasks like "255.128.255.0" ? Those are a real abomination and probably not supported.. and I don't know of anything that would require them. Or do you mean netmasks such as 1.1.1.1/19? I don't know of any credible networking devices which wouldn't support them. If so, please come out of the cave. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From sim@netnation.com Sun Jun 8 16:49:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 16:49:33 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58NnQ2x028846 for ; Sun, 8 Jun 2003 16:49:27 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19P9ug-00037G-9G; Sun, 08 Jun 2003 16:49:26 -0700 Date: Sun, 8 Jun 2003 16:49:26 -0700 From: Simon Kirby To: Florian Weimer Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030608234926.GA9453@netnation.com> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874r30r9z2.fsf@deneb.enyo.de> User-Agent: Mutt/1.5.4i X-archive-position: 2956 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From xerox@foonet.net Sun Jun 8 17:20:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 17:20:49 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h590Kb2x029346 for ; Sun, 8 Jun 2003 17:20:38 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h58Nuveq028645; Sun, 8 Jun 2003 19:56:57 -0400 From: "CIT/Paul" To: "'Simon Kirby'" , "'Florian Weimer'" Cc: , Subject: RE: Route cache performance under stress Date: Sun, 8 Jun 2003 19:55:58 -0400 Organization: CIT Message-ID: <001001c32e19$81bc7ea0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608234926.GA9453@netnation.com> Importance: Normal X-archive-position: 2957 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). Anything with seeminly Random source ips (something like juno-z.101f.c will generate worst case scenario for forwarding packets) will cause the cache to constantly Add new entries at pretty much the rate of the attack.. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what The cpu is with less than 100,000 packets per second and in mosts cases less than 30k.. That's why it's just no acceptable for companies using it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, etc.) which can do 300K+ packet per second of routing (and yes it can even route juno-z.101f.c at 300kpps, I have tested it). Linux has no problem doing 300kpps from a single source to a single destination provided you have NAPI or ITR or something limiting the interrupts.. The overhead is the route cache and the related systems that use it and also netfilter is very slow :/ One of these days they will fix it..... If anyone has any ideas or needs a test-bed to try out code on or would like me to test some of their code I would be happy to test it on our development platforms (single and dual processor with intel e1000 82545/6 and above, also e100 and tulip). Thanks for your time P.S. to answer your iteration question.. It does not seem to be such overhead on the cpu even if the route-cache is 600,000 in size.. I have tested this and while there is a definite increase in cpu it comes nothing close to the code that has to add every new arriving packet to the list. IMHO the best way to do this would be like CEF w/ adjacency lists and not have it add every new packet that comes along Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Simon Kirby Sent: Sunday, June 08, 2003 7:49 PM To: Florian Weimer Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From kazunori@miyazawa.org Sun Jun 8 17:48:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 17:48:29 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h590mO2x029798 for ; Sun, 8 Jun 2003 17:48:24 -0700 Received: from monza.miyazawa.org ([2001:200:1b0:1000:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Mon, 09 Jun 2003 09:46:01 +0900 Date: Mon, 9 Jun 2003 09:49:18 +0900 From: Kazunori Miyazawa To: James Morris Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030609094918.3c26d296.kazunori@miyazawa.org> In-Reply-To: References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2958 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 01:47:52 +1000 (EST) James Morris wrote: > On Fri, 6 Jun 2003, Kazunori Miyazawa wrote: > > > In output functions, dst is changed by xfrm_lookup if there is > > any matching policy. Therefore original dst which is held before > > calling xfrm_lookup will be never released. > > When xfrm_lookup scceeds and dst is changed, original dst should > > be release. > > It is released in xfrm_lookup(): > > *dst_p = dst; > ip_rt_put(rt); > xfrm_pol_put(policy); > return 0; > > I overlooked it. Thank you. --Kazunori Miyazawa (Yokogawa Electric Corporation) From hadi@shell.cyberus.ca Sun Jun 8 18:35:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 18:35:44 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h591Zb2x030372 for ; Sun, 8 Jun 2003 18:35:38 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PBZ0-0008eR-0q; Sun, 08 Jun 2003 21:35:10 -0400 Date: Sun, 8 Jun 2003 21:35:09 -0400 (EDT) From: Jamal Hadi To: Hisham Kotry cc: david-b@pacbell.net, rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program In-Reply-To: <20030603075742.34434.qmail@web14305.mail.yahoo.com> Message-ID: <20030608212033.Y33230@shell.cyberus.ca> References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2959 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 3 Jun 2003, Hisham Kotry wrote: > It was defenitly a nice read, but the netlink2 draft > is somewhat inconsistent, it mentions reducing the > 32-bit length field to 16-bits and equally > distributing the remaining 16-bits between the new > version and extended flags fields, but the draft makes > no further refrence to the version field. Infact the > netlink2 message header diagram on page 16, as well as > the pseudo message on page 28, show a 16-bits extended > flags field with no version field in the header. So > this is probably one of those cases in wich specs > aren't clear enough and code usually has the final > word in such situations. > > I mailed Jamal about this a while ago but never got a > reply back. > apologies, I actually have a unrelated daytime job that tends to keep me too occupied at times ;-> Netlink2 draft is work in progress. The draft tends to lag reality. I believe what you refer to has been fixed. Refer to the slides at: http://www.zurich.ibm.com/~rha/netlink2.pdf > BTW, is netlink2 support planned for linux in the near > future? > You will see code from us that is GPL. Consider netlink2 as a distributed netlink. netlink is already proven so why reinvent the wheel? Essentially you should be able to manager clusters of linux network devices (think firewalls, routers etc) with netlink/netlink2. There are some mechanisms for distributdness that are missing. These are the holes we are going to fill. Note some of the stuff i am working on at: www.cyberus.ca/~hadi/patches/action which fits the whole forces paradigm and works quiet well with netlink today and netlink2 next. (I stopped updating that web page for sometime now, talk to me if interested in the patches and if you would like to help in testing, coding, etc) cheers, jamal From hadi@shell.cyberus.ca Sun Jun 8 20:16:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 20:16:19 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h593G62x031157 for ; Sun, 8 Jun 2003 20:16:07 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PD8M-0008hG-U2; Sun, 08 Jun 2003 23:15:46 -0400 Date: Sun, 8 Jun 2003 23:15:46 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'Florian Weimer'" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> Message-ID: <20030608230300.X33412@shell.cyberus.ca> References: <001001c32e19$81bc7ea0$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2960 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, CIT/Paul wrote: > The problem with the route cache as it stands is that it adds every new > packet that isn't in the route cache to the cache, say you have > A denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst case > scenario for forwarding packets) will cause the cache to constantly > Add new entries at pretty much the rate of the attack.. This can stifle > just about any linux router with a measly 10 megabits/second of traffic > unless foo have you tried the latest patches posted recently? get the latest kernel 2.5.x and try it out. BTW, i dont think it is true you can die with 10mbps. I was reading some emails where someone said it was a few 100 pps that will kill the linux sytem (theory mixed with nonsense;->) > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what > The cpu is with less than 100,000 packets per second and in mosts cases > less than 30k.. btw thats waay above 10Mbps. > That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. The > overhead is the route cache and the related systems that use it and also > netfilter is very slow :/ One of these days they will fix it..... If > anyone has any ideas or needs a test-bed to try out code on or would > like me to test some of their code I would be happy to test it on our > development platforms (single and dual processor with intel e1000 > 82545/6 and above, also e100 and tulip). > I think Robert has some numbers with the new patches with similar setups as you. Why dont you compare how much the cost of a CISCO npex devices with Linux PCs with e1000s as well while you are at it ?;-> I am sure there are people who will like to sell you linux devices at half the cisco prices doing Millions of PPS via hardware assists. Support these linux supporting companies instead ;-> The more i think about it the more i think CEF is a lame escape from route caches. What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). You can avoid the packet drive cache generation event by being a little creative if it gets overwhelming. Fix zebra to resolve each BGP nexthop fully every periodic time. In any case who said forwarding by itself was sexy anymore? cheers, jamal > Thanks for your time > > P.S. to answer your iteration question.. It does not seem to be such > overhead on the cpu even if the route-cache is 600,000 in size.. I have > tested this and while there is a definite increase in cpu it comes > nothing close to the code that has to add every new arriving packet to > the list. IMHO the best way to do this would be like CEF w/ adjacency > lists and not have it add every new packet that comes along > > Paul xerox@foonet.net http://www.httpd.net > > > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Simon Kirby > Sent: Sunday, June 08, 2003 7:49 PM > To: Florian Weimer > Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org > Subject: Re: Route cache performance under stress > > > On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > > > Further parameters which could be tweaked is the kind of adjacency > > information (where to store the L2 information, whether to include the > > > prefix length in the adjacency record etc.). > > What is the problem with the current approach? Does the overhead come > from having to iterate through the hashes for each prefix? > > Simon- > > [ Simon Kirby ][ Network Operations ] > [ sim@netnation.com ][ NetNation Communications Inc. ] > [ Opinions expressed are not necessarily those of my employer. ] > > > > From jgarzik@pobox.com Sun Jun 8 20:52:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 20:52:19 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h593q92x031603 for ; Sun, 8 Jun 2003 20:52:10 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19PDhY-00038h-GA; Mon, 09 Jun 2003 04:52:08 +0100 Message-ID: <3EE4045D.4040002@pobox.com> Date: Sun, 08 Jun 2003 23:51:57 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607191522.GB3346@gtf.org> <20030607.235825.71096085.davem@redhat.com> In-Reply-To: <20030607.235825.71096085.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2961 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev David S. Miller wrote: > Have you extracted out all the init_etherdev() killings Al and > myself did so you can backport them to 2.4.x too? That's the plan, yes. Jeff From xerox@foonet.net Sun Jun 8 22:28:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:29:07 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595Sq2x032490 for ; Sun, 8 Jun 2003 22:28:53 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h595Smeq001442; Mon, 9 Jun 2003 01:28:48 -0400 From: "CIT/Paul" To: "'Jamal Hadi'" Cc: "'Simon Kirby'" , "'Florian Weimer'" , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 01:27:48 -0400 Organization: CIT Message-ID: <000701c32e47$ddd25290$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> Importance: Normal X-archive-position: 2962 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. Nothing you can do About that except make the cache and everthing related to it wicked faster, OR remove the per packet additions to the cache (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? That's what conntrack does and we all Know how much you despise that heheheh) And yes, you can die with 10mbps......Try putting in some netfilter rules and try putting some basic traffic on it and then hit it with 10mbps of juno-z and see what happens to your cpu. Granted if there is a linux router doing ABSOUTELY NOTHING you might be able to hit 50kpps of juno with dual p3 cpus w/ 512k cache each and tricked out settings for the hash and route cache but you will also drop some packets along the way..Still this is not acceptable yet :> Point me at some decent cost linux hardware assist platforms.. IMHO the only thing that needs hardware assist is the darn route cache (in its entierty) BTW, Juno-z can send 12,000 packets per second or more and it's still 10mbps :> If anyone has any ideas please feel free to e-amil me direct :> Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Jamal Hadi [mailto:hadi@shell.cyberus.ca] Sent: Sunday, June 08, 2003 11:16 PM To: CIT/Paul Cc: 'Simon Kirby'; 'Florian Weimer'; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: RE: Route cache performance under stress On Sun, 8 Jun 2003, CIT/Paul wrote: > The problem with the route cache as it stands is that it adds every > new packet that isn't in the route cache to the cache, say you have A > denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst > case scenario for forwarding packets) will cause the cache to > constantly Add new entries at pretty much the rate of the attack.. > This can stifle just about any linux router with a measly 10 > megabits/second of traffic unless foo have you tried the latest patches posted recently? get the latest kernel 2.5.x and try it out. BTW, i dont think it is true you can die with 10mbps. I was reading some emails where someone said it was a few 100 pps that will kill the linux sytem (theory mixed with nonsense;->) > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what The cpu is with less than 100,000 packets per second and in mosts > cases less than 30k.. btw thats waay above 10Mbps. > That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. > The overhead is the route cache and the related systems that use it > and also netfilter is very slow :/ One of these days they will fix > it..... If anyone has any ideas or needs a test-bed to try out code on > or would like me to test some of their code I would be happy to test > it on our development platforms (single and dual processor with intel > e1000 82545/6 and above, also e100 and tulip). > I think Robert has some numbers with the new patches with similar setups as you. Why dont you compare how much the cost of a CISCO npex devices with Linux PCs with e1000s as well while you are at it ?;-> I am sure there are people who will like to sell you linux devices at half the cisco prices doing Millions of PPS via hardware assists. Support these linux supporting companies instead ;-> The more i think about it the more i think CEF is a lame escape from route caches. What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). You can avoid the packet drive cache generation event by being a little creative if it gets overwhelming. Fix zebra to resolve each BGP nexthop fully every periodic time. In any case who said forwarding by itself was sexy anymore? cheers, jamal > Thanks for your time > > P.S. to answer your iteration question.. It does not seem to be such > overhead on the cpu even if the route-cache is 600,000 in size.. I > have tested this and while there is a definite increase in cpu it > comes nothing close to the code that has to add every new arriving > packet to the list. IMHO the best way to do this would be like CEF w/ > adjacency lists and not have it add every new packet that comes along > > Paul xerox@foonet.net http://www.httpd.net > > > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Simon Kirby > Sent: Sunday, June 08, 2003 7:49 PM > To: Florian Weimer > Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org > Subject: Re: Route cache performance under stress > > > On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > > > Further parameters which could be tweaked is the kind of adjacency > > information (where to store the L2 information, whether to include > > the > > > prefix length in the adjacency record etc.). > > What is the problem with the current approach? Does the overhead come > from having to iterate through the hashes for each prefix? > > Simon- > > [ Simon Kirby ][ Network Operations ] > [ sim@netnation.com ][ NetNation Communications Inc. ] > [ Opinions expressed are not necessarily those of my employer. ] > > > > From davem@redhat.com Sun Jun 8 22:41:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:41:34 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595fO2x000407 for ; Sun, 8 Jun 2003 22:41:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16685; Sun, 8 Jun 2003 22:38:25 -0700 Date: Sun, 08 Jun 2003 22:38:25 -0700 (PDT) Message-Id: <20030608.223825.104049415.davem@redhat.com> To: sim@netnation.com Cc: fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030608234926.GA9453@netnation.com> References: <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> <20030608234926.GA9453@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2963 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 16:49:26 -0700 On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? It comes from doing the slow path, which actually had a bug (wouldn't grow the hash tables past a certain point). I bet most of Florian's performance problems go away if he runs with the fib_hash fix that I put into the tree. In fact, the current slow path is _OPTIMAL_ for any sane routing table. The lookups are exactly O(n_prefixes) where n_prefixes in the number of unique subnet prefixes you've added to your routing table. This is precisely the same complexity as you'd get with a trie based approach with guarenteed depth not exceeding 32. I think most people are unaware of how the slow path we have actually works. The place I see bugs are in routing cache GC operation, it can't keep up with how fast we can create new routing cache entries, and this is merely because it isn't tuned not because it is not capable of keeping equilibrium properly. This is why I really wish Florian would explore this area instead of ripping the whole thing apart :-) From davem@redhat.com Sun Jun 8 22:47:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:48:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595lo2x000740 for ; Sun, 8 Jun 2003 22:47:51 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16705; Sun, 8 Jun 2003 22:44:47 -0700 Date: Sun, 08 Jun 2003 22:44:46 -0700 (PDT) Message-Id: <20030608.224446.78724665.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2964 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Sun, 8 Jun 2003 19:55:58 -0400 The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). We perform now rather acceptibly in such scenerios. Robert Olsson has demonstrated that even if the attacker could fill up your entire bandwidth with random source address packets, we'd still provide 50kpps routing speed. And this can be made much higher because the performance limiter is the routing cache GC which isn't tuned properly. It can't keep up because it doesn't try to purge the right amount entries each pass. All the performance problems I've seen have been algorithmic or outright bugs. Bad hash functions and limits in how big the FIB hash tables would grow. And what's left is fixing GC. There is nothing AT ALL fundamental about a routing cache that precludes it from behaving sanely in the presence of a random source address DoS load. Absolutely NOTHING. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless Not true, that happens because of BUGs. Not because routing caches cannot behave sanely in such situations. The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what And today, this is because of BUGs in how the GC works. You can design the GC process so that it does the right thing and recycles only the DoS entries (those being very non-localized). You should interact with Robert Olsson who has been doing tests on the effect of gigabit rate full-on DoS runs where every packet creates a new routing cache entry. Franks a lot, David S. Miller davem@redhat.com From xerox@foonet.net Sun Jun 8 22:52:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:52:54 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595qm2x001135 for ; Sun, 8 Jun 2003 22:52:49 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h595qjeq027523; Mon, 9 Jun 2003 01:52:45 -0400 From: "CIT/Paul" To: "'David S. Miller'" Cc: , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 01:51:45 -0400 Organization: CIT Message-ID: <001501c32e4b$35d67d60$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608.224446.78724665.davem@redhat.com> Importance: Normal X-archive-position: 2965 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I'd love to test this out.. If it could do full gigabit line rate with random ips that would be soooooooo nice :> We wouldn't have to have so many routers any more!! :) Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: David S. Miller [mailto:davem@redhat.com] Sent: Monday, June 09, 2003 1:45 AM To: xerox@foonet.net Cc: sim@netnation.com; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "CIT/Paul" Date: Sun, 8 Jun 2003 19:55:58 -0400 The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). We perform now rather acceptibly in such scenerios. Robert Olsson has demonstrated that even if the attacker could fill up your entire bandwidth with random source address packets, we'd still provide 50kpps routing speed. And this can be made much higher because the performance limiter is the routing cache GC which isn't tuned properly. It can't keep up because it doesn't try to purge the right amount entries each pass. All the performance problems I've seen have been algorithmic or outright bugs. Bad hash functions and limits in how big the FIB hash tables would grow. And what's left is fixing GC. There is nothing AT ALL fundamental about a routing cache that precludes it from behaving sanely in the presence of a random source address DoS load. Absolutely NOTHING. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless Not true, that happens because of BUGs. Not because routing caches cannot behave sanely in such situations. The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what And today, this is because of BUGs in how the GC works. You can design the GC process so that it does the right thing and recycles only the DoS entries (those being very non-localized). You should interact with Robert Olsson who has been doing tests on the effect of gigabit rate full-on DoS runs where every packet creates a new routing cache entry. Franks a lot, David S. Miller davem@redhat.com From davem@redhat.com Sun Jun 8 22:56:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:56:12 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595u82x001454 for ; Sun, 8 Jun 2003 22:56:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16763; Sun, 8 Jun 2003 22:53:10 -0700 Date: Sun, 08 Jun 2003 22:53:09 -0700 (PDT) Message-Id: <20030608.225309.39172149.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <3EE4045D.4040002@pobox.com> References: <20030607191522.GB3346@gtf.org> <20030607.235825.71096085.davem@redhat.com> <3EE4045D.4040002@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2966 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Sun, 08 Jun 2003 23:51:57 -0400 David S. Miller wrote: > Have you extracted out all the init_etherdev() killings Al and > myself did so you can backport them to 2.4.x too? That's the plan, yes. That's your plan, but did you do any of this yet? It'll keep going deeper and deeper into bitkeeper history the longer that you wait :-) From davem@redhat.com Sun Jun 8 23:01:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:01:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5961h2x001807 for ; Sun, 8 Jun 2003 23:01:44 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16798; Sun, 8 Jun 2003 22:58:37 -0700 Date: Sun, 08 Jun 2003 22:58:37 -0700 (PDT) Message-Id: <20030608.225837.115923841.davem@redhat.com> To: xerox@foonet.net Cc: hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <000701c32e47$ddd25290$4a00000a@badass> References: <20030608230300.X33412@shell.cyberus.ca> <000701c32e47$ddd25290$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2967 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:27:48 -0400 Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. So you make packets that do things like this GC the oldest (LRU) routing cache entry. This isn't rocket science, and well behaved flows will still get all the benefits of the routing cache. The only person penalized will be the attacker since his routing cache entries will purge out quickly and as a response to HIS traffic. Nothing you can do No, there are many things we can do. Prove to me that routing caches are unable to behave acceptibly in random source address DoS situations. (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? Because %99 of traffic is well behaved flows, trains of packets. Even the most loaded core routers see flow lifetimes of at least 8 or 9 packets. Even if the flows lasted 3 packets, the input route lookup work saved (source address validation in particular, which requires access to a centralized global table and thus does not scale well on SMP) is entriely worth it. From davem@redhat.com Sun Jun 8 23:06:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:06:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5966W2x002155 for ; Sun, 8 Jun 2003 23:06:33 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16822; Sun, 8 Jun 2003 23:03:32 -0700 Date: Sun, 08 Jun 2003 23:03:32 -0700 (PDT) Message-Id: <20030608.230332.48514434.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001501c32e4b$35d67d60$4a00000a@badass> References: <20030608.224446.78724665.davem@redhat.com> <001501c32e4b$35d67d60$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2968 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:51:45 -0400 I'd love to test this out.. If it could do full gigabit line rate with random ips that would be soooooooo nice :> It isn't impossible with the current design, that I am quire sure of. Here is a simple idea, make the routing cache miss case steal an entry sitting at the end of the hash chain this new one will map to. It only steals entries which have not been recently used. The big problem area on SMP is fib_validate_source. I'm sure some clear thinking can wipe that off the profiles too. From xerox@foonet.net Sun Jun 8 23:29:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:29:39 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596TX2x002674 for ; Sun, 8 Jun 2003 23:29:33 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h596TTeq022703; Mon, 9 Jun 2003 02:29:29 -0400 From: "CIT/Paul" To: "'David S. Miller'" Cc: , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 02:28:30 -0400 Organization: CIT Message-ID: <001801c32e50$57ef0750$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608.225837.115923841.davem@redhat.com> Importance: Normal X-archive-position: 2970 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev OK so let's try this.. If you can show me a linux router can can route 100mbps or more of juno-z.101f.c attack without dropping packets I will be thoroughly impressed :) I am willing to test out any code/patches and settings that you can think of and post some results.. Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: David S. Miller [mailto:davem@redhat.com] Sent: Monday, June 09, 2003 1:59 AM To: xerox@foonet.net Cc: hadi@shell.cyberus.ca; sim@netnation.com; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:27:48 -0400 Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. So you make packets that do things like this GC the oldest (LRU) routing cache entry. This isn't rocket science, and well behaved flows will still get all the benefits of the routing cache. The only person penalized will be the attacker since his routing cache entries will purge out quickly and as a response to HIS traffic. Nothing you can do No, there are many things we can do. Prove to me that routing caches are unable to behave acceptibly in random source address DoS situations. (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? Because %99 of traffic is well behaved flows, trains of packets. Even the most loaded core routers see flow lifetimes of at least 8 or 9 packets. Even if the flows lasted 3 packets, the input route lookup work saved (source address validation in particular, which requires access to a centralized global table and thus does not scale well on SMP) is entriely worth it. From davem@redhat.com Sun Jun 8 23:28:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:28:50 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596Sd2x002595 for ; Sun, 8 Jun 2003 23:28:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16891; Sun, 8 Jun 2003 23:25:37 -0700 Date: Sun, 08 Jun 2003 23:25:37 -0700 (PDT) Message-Id: <20030608.232537.102562046.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2969 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Sun, 8 Jun 2003 23:15:46 -0400 (EDT) The more i think about it the more i think CEF is a lame escape from route caches. It is one perspective :-) What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). I do not believe that slow path is slow. In fact after I fixed hash table growth in fib_hash.c Simon showed us clearly how DoS performance was _NOT_ tied to the number of routes loaded into the kernel. What is slow are things like fib_validate_source() on SMP and the GC (and some other things, I need to study Simon's profiles more deeply). The GC is aparently really badly behaved now during DoS like traffic. My main current quick idea is to make rt_intern_hash() attempt to flush out entries in the same hash chain instead of allocating new entries. I also question the setting of ip_rt_max_size in relation to the number of hash chains (it's set to n_hashchains * 16 currently, that sounds wrong, maybe something more like n_hashchains * 2 or even n_hashchains * 3). I'll try to cook up a patch to test. We might even be able to kill of route cache GC entriely if this scheme works well. From davem@redhat.com Sun Jun 8 23:31:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:31:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596VU2x003089 for ; Sun, 8 Jun 2003 23:31:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16930; Sun, 8 Jun 2003 23:28:28 -0700 Date: Sun, 08 Jun 2003 23:28:27 -0700 (PDT) Message-Id: <20030608.232827.88487519.davem@redhat.com> To: xerox@foonet.net Cc: hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001801c32e50$57ef0750$4a00000a@badass> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2971 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 02:28:30 -0400 OK so let's try this.. If you can show me a linux router can can route 100mbps or more of juno-z.101f.c attack without dropping packets I will be thoroughly impressed :) I am willing to test out any code/patches and settings that you can think of and post some results.. Ok, Robert are you willing to help too? :-) From sim@netnation.com Sun Jun 8 23:47:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:47:25 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596lJ2x003568 for ; Sun, 8 Jun 2003 23:47:20 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGR5-00065R-4W; Sun, 08 Jun 2003 23:47:19 -0700 Date: Sun, 8 Jun 2003 23:47:19 -0700 From: Simon Kirby To: CIT/Paul Cc: "'Florian Weimer'" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609064719.GA20613@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2972 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 07:55:58PM -0400, CIT/Paul wrote: > A denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst case > scenario for forwarding packets) will cause the cache to constantly > Add new entries at pretty much the rate of the attack.. This can stifle > just about any linux router with a measly 10 megabits/second of traffic > unless > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what > The cpu is with less than 100,000 packets per second and in mosts cases > less than 30k.. That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. The > overhead is the route cache and the related systems that use it and also > netfilter is very slow :/ One of these days they will fix it..... If Whoa, wait a second. You got a 7200 VXR to do 300kpps? I would have liked to see that. We couldn't get our 7206 VXR routers to do anything more than about 12 Mbit/second of small packets, which I believe is about 40,000 packets per second. This is with CEF disabled, because it ended up duplicating packets and doing some other strange things with CEF enabled. Also, I remember trying with a bucketload of netfilter rules and finding that the performance difference was hardly noticeable. Linux can route small packets with random src/dst at much faster than 10 Mbit/sec. It depeends on the hardware as you say, but it shouldn't ever be that slow on reasonable hardware. I remember back even in 1998 with the 2.0 kernel (before the route cache existed) on a Celeron 300A with eepro100 cards (eepro100 driver, no interrupt coalescing, definitely no NAPI) was cable of routing at least 20 Mbit/second of SYN packets from random sources. In fact, I remember it happily choking some old 3Com switches we had at the time. I recently saw 90 Mbit/second of additional traffic (small packets with random sources) going through our routers (now single Athlon 1800MP (MP for APIC), tg3, NAPI, BGP routing tables), and they didn't seem to care. It's definitely not yet perfect, but it's not bad. The hashing fixes for large routing tables which Dave M. recently posted has made the situation much better -- it was very broken before. What did your routing table look like when you were doing tests? I have fiddled with the route cache garbage collection parameters a bit, but I haven't really been able to reduce the CPU usage by much at all. Really, though, shouldn't the route cache overhead be fairly small in comparison to everything else involved in forwarding? Simon- From sim@netnation.com Sun Jun 8 23:52:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:52:14 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596qB2x003901 for ; Sun, 8 Jun 2003 23:52:11 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGVn-00067g-2Y; Sun, 08 Jun 2003 23:52:11 -0700 Date: Sun, 8 Jun 2003 23:52:11 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609065211.GB20613@netnation.com> References: <20030608.224446.78724665.davem@redhat.com> <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.230332.48514434.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2973 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:03:32PM -0700, David S. Miller wrote: > I'd love to test this out.. If it could do full gigabit line rate with > random ips that would be soooooooo nice :> Agreed. :) > It isn't impossible with the current design, that I am > quire sure of. > > Here is a simple idea, make the routing cache miss case steal > an entry sitting at the end of the hash chain this new one will > map to. It only steals entries which have not been recently used. I just asked whether this was possible in a previous email, but you must have missed it. I am seeing a lot of memory management stuff in profiles, so I think recycling routing cache entries (if only when the table is full and the garbage collector would otherwise need to run) would be very helpful. Is it possible to get a good guess of what cache entry to recycle without walking for a while or without some kind of LRU? > The big problem area on SMP is fib_validate_source. I'm sure some > clear thinking can wipe that off the profiles too. Not running the important stuff with SMP yet, so I don't care about this at the moment. O:) Simon- From davem@redhat.com Sun Jun 8 23:52:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:52:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596ql2x003985 for ; Sun, 8 Jun 2003 23:52:48 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16999; Sun, 8 Jun 2003 23:49:46 -0700 Date: Sun, 08 Jun 2003 23:49:46 -0700 (PDT) Message-Id: <20030608.234946.35677224.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609064719.GA20613@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2974 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:47:19 -0700 Really, though, shouldn't the route cache overhead be fairly small in comparison to everything else involved in forwarding? If GC is just doing dumb things, it is possible. These costs can be hidden in non-rtcache places in the form of cache misses and displacement on rtcache objects which can show up as higher costs in other places. From davem@redhat.com Sun Jun 8 23:59:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:59:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596xO2x004566 for ; Sun, 8 Jun 2003 23:59:24 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA17051; Sun, 8 Jun 2003 23:56:23 -0700 Date: Sun, 08 Jun 2003 23:56:22 -0700 (PDT) Message-Id: <20030608.235622.38700262.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609065211.GB20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2975 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:52:11 -0700 On Sun, Jun 08, 2003 at 11:03:32PM -0700, David S. Miller wrote: > Here is a simple idea, make the routing cache miss case steal > an entry sitting at the end of the hash chain this new one will > map to. It only steals entries which have not been recently used. I just asked whether this was possible in a previous email, but you must have missed it. I am seeing a lot of memory management stuff in profiles, so I think recycling routing cache entries (if only when the table is full and the garbage collector would otherwise need to run) would be very helpful. Yes, indeed. Is it possible to get a good guess of what cache entry to recycle without walking for a while or without some kind of LRU? This is what my (and therefore your) suggested scheme is trying to do. We have to walk the entire destination hash chain _ANYWAYS_ to verify that a matching entry has not been put into the cache while we were procuring the new one. During this walk we can also choose a candidate rtcache entry to free. Something like the patch at the end of this email, doesn't compile it's just a work in progress. The trick is picking TIMEOUT1 and TIMEOUT2 :) Another point is that the default ip_rt_gc_min_interval is absolutely horrible for DoS like attacks. When DoS traffic can fill the rtcache multiple times per second, using a GC interval of 5 seconds is the worst possible choice. :) When I see things like this, I can only come to the conclusion that the tuning Alexey originally did when coding up the rtcache merely needs to be scaled up to modern day packet rates. --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Sun Jun 8 23:45:47 2003 @@ -717,14 +717,15 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { - struct rtable *rth, **rthp; - unsigned long now = jiffies; + struct rtable *rth, **rthp, *cand, **candp; + unsigned long now = jiffies, cand_use = now; int attempts = !in_softirq(); restart: rthp = &rt_hash_table[hash].chain; spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; while ((rth = *rthp) != NULL) { if (compare_keys(&rth->fl, &rt->fl)) { /* Put it first */ @@ -753,7 +754,21 @@ return 0; } + if (rt_may_expire(rth, TIMEOUT1, TIMEOUT2)) { + unsigned long this_use = rth->u.dst.lastuse; + + if (time_before_eq(this_use, cand_use)) { + cand = rth; + candp = rthp; + cand_use = this_use; + } + } rthp = &rth->u.rt_next; + } + + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); } /* Try to bind route to arp only if it is output From sim@netnation.com Sun Jun 8 23:59:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:00:01 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596xu2x004600 for ; Sun, 8 Jun 2003 23:59:56 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGdH-0006FW-RS; Sun, 08 Jun 2003 23:59:55 -0700 Date: Sun, 8 Jun 2003 23:59:55 -0700 From: Simon Kirby To: "David S. Miller" Cc: hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609065955.GC20613@netnation.com> References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> <20030608.232537.102562046.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.232537.102562046.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2976 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:25:37PM -0700, David S. Miller wrote: > I do not believe that slow path is slow. In fact after I fixed > hash table growth in fib_hash.c Simon showed us clearly how DoS > performance was _NOT_ tied to the number of routes loaded into > the kernel. Not anymore. :) Btw, that patch seems to be stable here. Will we be seeing it sneak into 2.4? > My main current quick idea is to make rt_intern_hash() attempt > to flush out entries in the same hash chain instead of allocating > new entries. > > I also question the setting of ip_rt_max_size in relation to the > number of hash chains (it's set to n_hashchains * 16 currently, > that sounds wrong, maybe something more like n_hashchains * 2 or > even n_hashchains * 3). The route cache on our routers here grows to several thousand entries most of the time because of the quantity of traffic we route, and then all gets happily blown away when the next BGP table change comes along, which seems to happen about 10-20 times per miunte (!). It would probably be beneficial for us to reduce the amount of work required when blowing it away and keep it as small as possible. > I'll try to cook up a patch to test. We might even be able to Woohoo! > kill of route cache GC entriely if this scheme works well. I asked Alexey about this before and he mentioned it was there because it made a big difference in processing latency to postpone cleanup to a GC run. It should be possible to do recycling only when the table is full (when the box is getting smashed). This way latencies would be lowest in the common case and it would recycle and not have spurts of GC latency in the DoS case. Simon- From davem@redhat.com Mon Jun 9 00:06:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:06:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597662x005309 for ; Mon, 9 Jun 2003 00:06:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA17107; Mon, 9 Jun 2003 00:03:01 -0700 Date: Mon, 09 Jun 2003 00:03:00 -0700 (PDT) Message-Id: <20030609.000300.35030075.davem@redhat.com> To: sim@netnation.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609065955.GC20613@netnation.com> References: <20030608230300.X33412@shell.cyberus.ca> <20030608.232537.102562046.davem@redhat.com> <20030609065955.GC20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2977 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:59:55 -0700 On Sun, Jun 08, 2003 at 11:25:37PM -0700, David S. Miller wrote: > I do not believe that slow path is slow. In fact after I fixed > hash table growth in fib_hash.c Simon showed us clearly how DoS > performance was _NOT_ tied to the number of routes loaded into > the kernel. Not anymore. :) Btw, that patch seems to be stable here. Will we be seeing it sneak into 2.4? Yes, 2.4.22-pre1 will get it or somewhere thereabouts. > I also question the setting of ip_rt_max_size in relation to the > number of hash chains (it's set to n_hashchains * 16 currently, > that sounds wrong, maybe something more like n_hashchains * 2 or > even n_hashchains * 3). The route cache on our routers here grows to several thousand entries most of the time because of the quantity of traffic we route, and then all gets happily blown away when the next BGP table change comes along, which seems to happen about 10-20 times per miunte (!). It would probably be beneficial for us to reduce the amount of work required when blowing it away and keep it as small as possible. This is simple, by using a generation count. When route lookup sees a matching entry with a stale generation count, we pass this entry as-is into ip_route_{input,output}_slow() and use it instead of allocating new entry. It is the same trick as used by the flow cache. I'll code this up as well. > kill of route cache GC entriely if this scheme works well. I asked Alexey about this before and he mentioned it was there because it made a big difference in processing latency to postpone cleanup to a GC run. The problem is that GC cannot currently keep up with DoS like traffic pattern. As a result, routing latency is not smooth at all, you get spikes because each GC run goes for up to an entire jiffie because it has so much work to do. Meanwhile, during this expensive GC processing, packet processing is frozen on UP system. net/core/flow.c:flow_cache_lookup() is instructive, it implements several of these ideas being discussed today. From sim@netnation.com Mon Jun 9 00:13:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:13:51 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597DV2x006021 for ; Mon, 9 Jun 2003 00:13:31 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGqQ-0006RC-Je; Mon, 09 Jun 2003 00:13:30 -0700 Date: Mon, 9 Jun 2003 00:13:30 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609071330.GD20613@netnation.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001801c32e50$57ef0750$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2978 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > OK so let's try this.. If you can show me a linux router can can route > 100mbps or more of juno-z.101f.c attack without dropping packets I will > be thoroughly impressed :) > > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) In the tests I've been doing so far, I've been dropping responses (in the INPUT chain), so I haven't been testing the forwarding through of packets (though it is testing the routing input). I'll see if I can set up a router, target, and DoS box. I haven't been able to get juno-z.101f.c to saturate 100 Mbit/sec outgoing, but I've only tried it on eepro100 boxes. Has anybody got it to send more? Mmm, need more tg3 cards... Simon- From sim@netnation.com Mon Jun 9 00:36:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:36:54 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597ai2x007239 for ; Mon, 9 Jun 2003 00:36:45 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHCu-0006gK-5x; Mon, 09 Jun 2003 00:36:44 -0700 Date: Mon, 9 Jun 2003 00:36:44 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609073644.GE20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.235622.38700262.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2979 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > We have to walk the entire destination hash chain _ANYWAYS_ to verify > that a matching entry has not been put into the cache while we were > procuring the new one. During this walk we can also choose a > candidate rtcache entry to free. Ah, neat. I should try reading this stuff. :) > Something like the patch at the end of this email, doesn't compile > it's just a work in progress. The trick is picking TIMEOUT1 and > TIMEOUT2 :) > > Another point is that the default ip_rt_gc_min_interval is > absolutely horrible for DoS like attacks. When DoS traffic > can fill the rtcache multiple times per second, using a GC > interval of 5 seconds is the worst possible choice. :) Yes, I've reduced the gc_min_interval to 1, and it has been that way for some time. BTW, you may be interested in this old email from Alexey: http://www.tux.org/hypermail/linux-kernel/1999week05/1113.html (This was back when the GC was limited so much that legitimate traffic was overflowing the table. DoS attacks must have been really effective then. :)) Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From xerox@foonet.net Mon Jun 9 01:12:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:12:05 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598Bx2x008585 for ; Mon, 9 Jun 2003 01:12:00 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h598Bseq021814; Mon, 9 Jun 2003 04:11:55 -0400 From: "CIT/Paul" To: "'Simon Kirby'" Cc: "'David S. Miller'" , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 04:10:55 -0400 Organization: CIT Message-ID: <000401c32e5e$a707b6d0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030609071330.GD20613@netnation.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 2980 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 1.26 ghz routers.. I can't even send 50mbit of this though one of my routers Without it using 100% of both cpus because of the route cache.. It goes up to 500,000 entries if I let it and it adds 80,000 new entries per second and they are all cache misses.. I'd be glad to show you the setup sometime :) I showed it to jamal and we tested some stuff. Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Simon Kirby [mailto:sim@netnation.com] Sent: Monday, June 09, 2003 3:14 AM To: CIT/Paul Cc: 'David S. Miller'; hadi@shell.cyberus.ca; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > OK so let's try this.. If you can show me a linux router can can route > 100mbps or more of juno-z.101f.c attack without dropping packets I > will be thoroughly impressed :) > > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) In the tests I've been doing so far, I've been dropping responses (in the INPUT chain), so I haven't been testing the forwarding through of packets (though it is testing the routing input). I'll see if I can set up a router, target, and DoS box. I haven't been able to get juno-z.101f.c to saturate 100 Mbit/sec outgoing, but I've only tried it on eepro100 boxes. Has anybody got it to send more? Mmm, need more tg3 cards... Simon- From sim@netnation.com Mon Jun 9 01:18:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:18:11 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598I32x009001 for ; Mon, 9 Jun 2003 01:18:04 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHqt-00075s-2l; Mon, 09 Jun 2003 01:18:03 -0700 Date: Mon, 9 Jun 2003 01:18:03 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609081803.GF20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.235622.38700262.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2981 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > + if (cand) { > + *candp = cand->u.rt_next; > + rt_free(cand); > } Hmm...It looks like this is still freeing the entry.. Is it possible to recycle the dst without reallocating it? This is the end of the time-sorted profile output of the test box saturated by incoming juno packets (firewalled in INPUT chain to avoid responses to spoofed src IPs), NAPI 100% of the time, tg3: 158 tg3_poll 0.5197 1630 ip_rcv_finish 2.8348 142 ipv4_dst_destroy 2.9583 429 fib_rules_policy 3.8304 8959 ip_route_input_slow 3.8885 2438 ip_rcv 4.3536 2504 alloc_skb 5.2167 1991 __kfree_skb 5.4103 2279 netif_receive_skb 5.6975 929 skb_release_data 6.4514 669 ip_local_deliver 6.9688 1175 __constant_c_and_count_memset 7.3438 2367 tcp_match 7.3969 124 kmem_cache_alloc 7.7500 4535 fib_validate_source 8.0982 598 __fib_res_prefsrc 9.3438 8896 rt_garbage_collect 9.4237 3582 inet_select_addr 9.7337 1747 kfree 9.9261 717 ipt_hook 11.2031 938 kmalloc 11.7250 1747 jhash_3words 12.1319 6879 nf_hook_slow 12.6452 2439 eth_type_trans 12.7031 1695 kfree_skbmem 13.2422 2358 nf_iterate 13.3977 872 rt_hash_code 13.6250 2933 fib_semantic_match 14.1010 16553 ipt_do_table 14.9937 15339 tg3_rx 16.2489 2482 tg3_recycle_rx 17.2361 5967 __kmem_cache_alloc 18.6469 1237 ipt_route_hook 19.3281 3120 do_gettimeofday 21.6667 8299 ip_packet_match 24.6994 8031 fib_lookup 25.0969 1877 fib_rule_put 29.3281 6088 dst_destroy 34.5909 26833 rt_intern_hash 34.9388 10666 kmem_cache_free 66.6625 20193 fn_hash_lookup 70.1146 10516 dst_alloc 73.0278 64803 ip_route_input 150.0069 This is with a routing table of 300,000 entries (though only one prefix) and with your hash fix patch. ip_route_input is still highest, but dst_alloc is an obvious second. ip_route_input is actually always the highest (excluding the IRQ handling stuff), and doesn't seem to change at all based on routing table size. http://blue.netnation.com/sim/ref/ Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From davem@redhat.com Mon Jun 9 01:25:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:25:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598P52x009351 for ; Mon, 9 Jun 2003 01:25:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA17380; Mon, 9 Jun 2003 01:22:02 -0700 Date: Mon, 09 Jun 2003 01:22:02 -0700 (PDT) Message-Id: <20030609.012202.68055632.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2982 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 01:18:03 -0700 On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > + if (cand) { > + *candp = cand->u.rt_next; > + rt_free(cand); > } Hmm...It looks like this is still freeing the entry.. Is it possible to recycle the dst without reallocating it? Yes, can you test the patch I just sent you? We can modify that to recycle easily instead of freeing. Well... one problem is that in 2.5.x we have to kill off entries using RCU so such recycling may not be so easy there. This is with a routing table of 300,000 entries (though only one prefix) and with your hash fix patch. ip_route_input is still highest, but dst_alloc is an obvious second. ip_route_input is actually always the highest (excluding the IRQ handling stuff), and doesn't seem to change at all based on routing table size. We spend a decent amount of time mucking with fib rules, turning off multiple-tables support would kill that, although I suspect you're actually using that :) From sim@netnation.com Mon Jun 9 01:27:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:27:28 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598RJ2x009750 for ; Mon, 9 Jun 2003 01:27:19 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHzq-0007An-Pu; Mon, 09 Jun 2003 01:27:18 -0700 Date: Mon, 9 Jun 2003 01:27:18 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609082718.GG20613@netnation.com> References: <20030609071330.GD20613@netnation.com> <000401c32e5e$a707b6d0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000401c32e5e$a707b6d0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2983 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 04:10:55AM -0400, CIT/Paul wrote: > I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 > 1.26 ghz routers.. I can't even send 50mbit of this though one of my > routers > Without it using 100% of both cpus because of the route cache.. It goes > up to 500,000 entries if I let it and it adds 80,000 new entries per > second and they are all cache misses.. I'd be glad to show you the setup > sometime :) I showed it to jamal and we tested some stuff. Hmm.. We're running on single 1800MP Athlons here. Have you had a chance to profile it? - add "profile=1" to the kernel command line - reboot - run juno-z.101f.c from remote box - run "readprofile -r" on the router - twiddle fingers for a while - run "readprofile -n -m your_System.map > foo" - stop juno :) - run "sort -n +2 < foo > readprofile.time_sorted" I'm interested to see if your profile results line up to what I'm seeing here on UP (though I have the kernel compiled SMP...Oops). Wait a second... 500,000 entries in the route cache? WTF? What is your max_size set to? That will massively overfill the hash bucket and definitely take up way too much CPU. It shouldn't be able to get there at all unless you have raised max_size. Here I have: echo 4 > gc_elasticity # Higher is weaker, 0 will nuke all [dfl: 8] echo 1 > gc_interval # Garbage collection interval (seconds) [dfl: 60] echo 1 > gc_min_interval # Garbage collection min interval (seconds) [dfl: 5] echo 90 > gc_timeout # Entry lifetime (seconds) [dfl: 300] [sroot@r1:/proc/sys/net/ipv4/route]# grep . * ... gc_elasticity:4 gc_interval:1 gc_min_interval:1 gc_thresh:4096 gc_timeout:90 max_delay:10 max_size:65536 Simon- From sim@netnation.com Mon Jun 9 01:32:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:32:10 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598W02x011901 for ; Mon, 9 Jun 2003 01:32:00 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PI4N-0007DJ-Sw; Mon, 09 Jun 2003 01:31:59 -0700 Date: Mon, 9 Jun 2003 01:31:59 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609083159.GH20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> <20030609.012202.68055632.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.012202.68055632.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2984 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 01:22:02AM -0700, David S. Miller wrote: > Hmm...It looks like this is still freeing the entry.. Is it possible to > recycle the dst without reallocating it? > > Yes, can you test the patch I just sent you? We can modify that > to recycle easily instead of freeing. Cool. I'll see if I can set something up to try that at work tomorrow. Insufficient hardware here at home. > We spend a decent amount of time mucking with fib rules, turning > off multiple-tables support would kill that, although I suspect > you're actually using that :) We use it occasionally for various things. I'll try profiling with it turned off to see how much of an impact it has. Simon- From davem@redhat.com Mon Jun 9 01:59:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:00:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598xs2x014044 for ; Mon, 9 Jun 2003 01:59:54 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA17478; Mon, 9 Jun 2003 01:56:49 -0700 Date: Mon, 09 Jun 2003 01:56:48 -0700 (PDT) Message-Id: <20030609.015648.55736734.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609071330.GD20613@netnation.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030609071330.GD20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2985 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 00:13:30 -0700 On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) BTW, ignoring juno, Robert Olsson has some pktgen hacks that allow that to generate new-dst-per-packet DoS like traffic. It's much more effective than Juno-z Robert could you should these guys your hacks to do that? Next, here is an interesting first pass patch to try. Once we hit gc_thresh, at every new DST allocation we try to shrink the destination hash chain. It ought to be very effective in the presence of poorly behaved traffic such as random-src-address DoS. The patch is against 2.5.x current... The next task is to try and handle rt_cache_flush more cheaply, given Simon's mention that he gets from 10 to 20 BGP updates per minute. Another idea to this dilemma is maybe to see if Zebra can batch things a little bit... but that kind of solution might not be possible since I don't know how that stuff works. --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Mon Jun 9 01:09:45 2003 @@ -882,6 +882,42 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(&rt_hash_table[hash].lock); } +static void __rt_hash_shrink(unsigned int hash) +{ + struct rtable *rth, **rthp; + struct rtable *cand, **candp; + unsigned int min_use = ~(unsigned int) 0; + + spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; + candp = NULL; + rthp = &rt_hash_table[hash].chain; + while ((rth = *rthp) != NULL) { + if (!atomic_read(&rth->u.dst.__refcnt) && + ((unsigned int) rth->u.dst.__use) < min_use) { + cand = rth; + candp = rthp; + min_use = rth->u.dst.__use; + } + rthp = &rth->u.rt_next; + } + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); + } + + spin_unlock_bh(&rt_hash_table[hash].lock); +} + +static inline struct rtable *ip_rt_dst_alloc(unsigned int hash) +{ + if (atomic_read(&ipv4_dst_ops.entries) > + ipv4_dst_ops.gc_thresh) + __rt_hash_shrink(hash); + + return dst_alloc(&ipv4_dst_ops); +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, u8 tos, struct net_device *dev) { @@ -912,9 +948,10 @@ void ip_rt_redirect(u32 old_gw, u32 dadd for (i = 0; i < 2; i++) { for (k = 0; k < 2; k++) { - unsigned hash = rt_hash_code(daddr, - skeys[i] ^ (ikeys[k] << 5), - tos); + unsigned int hash = rt_hash_code(daddr, + skeys[i] ^ + (ikeys[k] << 5), + tos); rthp=&rt_hash_table[hash].chain; @@ -942,7 +979,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd dst_hold(&rth->u.dst); rcu_read_unlock(); - rt = dst_alloc(&ipv4_dst_ops); + rt = ip_rt_dst_alloc(hash); if (rt == NULL) { ip_rt_put(rth); in_dev_put(in_dev); @@ -1352,7 +1389,7 @@ static void rt_set_nexthop(struct rtable static int ip_route_input_mc(struct sk_buff *skb, u32 daddr, u32 saddr, u8 tos, struct net_device *dev, int our) { - unsigned hash; + unsigned int hash; struct rtable *rth; u32 spec_dst; struct in_device *in_dev = in_dev_get(dev); @@ -1375,7 +1412,9 @@ static int ip_route_input_mc(struct sk_b dev, &spec_dst, &itag) < 0) goto e_inval; - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1421,7 +1460,6 @@ static int ip_route_input_mc(struct sk_b RT_CACHE_STAT_INC(in_slow_mc); in_dev_put(in_dev); - hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); return rt_intern_hash(hash, rth, (struct rtable**) &skb->dst); e_nobufs: @@ -1584,7 +1622,7 @@ int ip_route_input_slow(struct sk_buff * goto e_inval; } - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1663,7 +1701,7 @@ brd_input: RT_CACHE_STAT_INC(in_brd); local_input: - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2048,7 +2086,10 @@ make_route: } } - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(oldflp->fl4_dst, + oldflp->fl4_src ^ (oldflp->oif << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2107,7 +2148,6 @@ make_route: rth->rt_flags = flags; - hash = rt_hash_code(oldflp->fl4_dst, oldflp->fl4_src ^ (oldflp->oif << 5), tos); err = rt_intern_hash(hash, rth, rp); done: if (free_res) From davem@redhat.com Mon Jun 9 02:04:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:04:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5994L2x014390 for ; Mon, 9 Jun 2003 02:04:21 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA17503; Mon, 9 Jun 2003 02:01:16 -0700 Date: Mon, 09 Jun 2003 02:01:16 -0700 (PDT) Message-Id: <20030609.020116.10308258.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2986 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 01:18:03 -0700 10516 dst_alloc 73.0278 Gross, we effectively initialize a new dst multiple times :( In fact, we modify the same cache lines at least 3 times. There's a lot more we can do in this area. But this patch below kills some of it. Again, patch is against 2.5.x-current. Actually, it is a relatively good sign, it means this is a relatively unexplored area of the networking :-))) --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 +++ net/core/dst.c Mon Jun 9 01:53:41 2003 @@ -122,13 +122,31 @@ void * dst_alloc(struct dst_ops * ops) dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + dst->next = NULL; atomic_set(&dst->__refcnt, 0); - dst->ops = ops; + dst->__use = 0; + dst->child = NULL; + dst->dev = NULL; + dst->obsolete = 0; + dst->flags = 0; dst->lastuse = jiffies; + dst->expires = 0; + dst->header_len = 0; + dst->trailer_len = 0; + memset(dst->metrics, 0, sizeof(dst->metrics)); dst->path = dst; + dst->rate_last = 0; + dst->rate_tokens = 0; + dst->error = 0; + dst->neighbour = NULL; + dst->hh = NULL; + dst->xfrm = NULL; dst->input = dst_discard; dst->output = dst_blackhole; + dst->ops = ops; + INIT_RCU_HEAD(&dst->rcu_head); + memset(dst->info, 0, + ops->entry_size - offsetof(struct dst_entry, info)); #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif From lpetande@morphine.tml.hut.fi Mon Jun 9 02:07:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:07:13 -0700 (PDT) Received: from tml-gw.tml.hut.fi (tml.hut.fi [130.233.44.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h599762x014739 for ; Mon, 9 Jun 2003 02:07:07 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id MAA32560 for ; Mon, 9 Jun 2003 12:07:05 +0300 X-Authentication-Warning: tml-gw.tml.hut.fi: smap set sender to using -f Received: from mail.tml.hut.fi(130.233.45.70) by tml-gw.tml.hut.fi via smap (V2.0) id xma032548; Mon, 9 Jun 03 12:06:43 +0300 Received: from localhost (localhost [127.0.0.1]) by mail.tml.hut.fi (Postfix) with ESMTP id 7B84018C235; Mon, 9 Jun 2003 12:06:43 +0300 (EEST) Received: from mail.tml.hut.fi ([127.0.0.1]) by localhost (mail.tml.hut.fi [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 26944-01-3; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by mail.tml.hut.fi (Postfix) with ESMTP id AC40F18C233; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from tml.hut.fi (localhost [127.0.0.1]) by morphine.tml.hut.fi (8.12.2+Sun/8.12.2) with ESMTP id h5996gF5025239; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from localhost (lpetande@localhost) by tml.hut.fi (8.12.2+Sun/8.12.2/Submit) with ESMTP id h5996Zfr025236; Mon, 9 Jun 2003 12:06:41 +0300 (EEST) Date: Mon, 9 Jun 2003 12:06:35 +0300 (EEST) From: Henrik Petander To: Masahide NAKAMURA Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2987 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@morphine.tml.hut.fi Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: > > We don't think we have to change the logic handling policy with > the reason because we can treat MIPv6 policy just like IPsec. > > When we want to apply both MIPv6 and IPsec to the same target, > we need one policy that has two or more of templates(e.g. one is > MIPv6's template and the other is IPsec's). Does this also mean that the IPSec and MIPv6 policies and SAs need to be configured at the same time or is it possible to add templates to an existing policy? > > Regarding above case, however, we have a problem like below: > > draft(9.3.1 in draft-ietf-mobileip-ipv6-22) says, > > When attempting to verify AH authentication data in a packet that > contains a Home Address option, the receiving node MUST calculate > the AH authentication data as if the following were true: The Home > Address option contains the care-of address, and the source IPv6 > address field of the IPv6 header contains the home address. Yes, and this also applies to routing header types 0 and 2. They also need to be processed by AH so that the addresses are as the receiver sees them after processing the headers: home address in destination address and care-of address in the routing header. This is just not said in the mipv6 spec as the routing header IPSec interactions are not specified by it. > > Because xfrm decides to call dst_output in the order of templates, > at first we had no idea which is the former template, MIPv6 or IPsec(Home > Address Option or AH). MIPv6 headers should be added first for AH to work. A different issue related to the different addresses is that the SPD lookup should be done with the original source address, i.e. home address, if home address option is used and with the final destination address, if routing header is used. SPD lookup works now for TCP (with RT header), but not for raw sockets, which the mipv6 daemon will use. We will provide a patch for fixing the SPD lookups with raw sockets, which add routing header and home address option from socket options. Henrik ---------------------------------- Henrik Petander Helsinki University of Technology, GO/Core Project Henrik.Petander@hut.fi Office: +358 (0)9 451 5846 GSM: +358 (0)40 741 5248 ---------------------------------- From ak@suse.de Mon Jun 9 02:47:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:47:49 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h599lf2x019157 for ; Mon, 9 Jun 2003 02:47:42 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 1D6BA1483F; Mon, 9 Jun 2003 11:47:36 +0200 (MEST) Date: Mon, 9 Jun 2003 11:47:34 +0200 From: Andi Kleen To: "David S. Miller" Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-ID: <20030609094734.GD2728@wotan.suse.de> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.020116.10308258.davem@redhat.com> X-archive-position: 2988 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 02:01:16AM -0700, David S. Miller wrote: > From: Simon Kirby > Date: Mon, 9 Jun 2003 01:18:03 -0700 > > 10516 dst_alloc 73.0278 > > Gross, we effectively initialize a new dst multiple times :( > In fact, we modify the same cache lines at least 3 times. > > There's a lot more we can do in this area. But this patch below kills > some of it. Again, patch is against 2.5.x-current. > > Actually, it is a relatively good sign, it means this is a relatively > unexplored area of the networking :-))) > > --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 > +++ net/core/dst.c Mon Jun 9 01:53:41 2003 > @@ -122,13 +122,31 @@ void * dst_alloc(struct dst_ops * ops) > dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); > if (!dst) > return NULL; > - memset(dst, 0, ops->entry_size); > + dst->next = NULL; > atomic_set(&dst->__refcnt, 0); > - dst->ops = ops; > + dst->__use = 0; > + dst->child = NULL; > + dst->dev = NULL; > + dst->obsolete = 0; > + dst->flags = 0; > dst->lastuse = jiffies; > + dst->expires = 0; > + dst->header_len = 0; > + dst->trailer_len = 0; > + memset(dst->metrics, 0, sizeof(dst->metrics)); gcc will generate a lot better code for the memsets if you can tell it somehow they are long aligned and a multiple of 8 bytes. e.g. redeclare them as long instead of char. If it cannot figure out the alignment it often (or least on x86) calls to the external memset function. > dst->path = dst; > + dst->rate_last = 0; > + dst->rate_tokens = 0; > + dst->error = 0; > + dst->neighbour = NULL; > + dst->hh = NULL; > + dst->xfrm = NULL; > dst->input = dst_discard; > dst->output = dst_blackhole; > + dst->ops = ops; > + INIT_RCU_HEAD(&dst->rcu_head); > + memset(dst->info, 0, > + ops->entry_size - offsetof(struct dst_entry, info)); Same here. -Andi From davem@redhat.com Mon Jun 9 03:06:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:06:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59A6e2x020087 for ; Mon, 9 Jun 2003 03:06:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17659; Mon, 9 Jun 2003 03:03:35 -0700 Date: Mon, 09 Jun 2003 03:03:34 -0700 (PDT) Message-Id: <20030609.030334.02284330.davem@redhat.com> To: ak@suse.de Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609094734.GD2728@wotan.suse.de> References: <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> <20030609094734.GD2728@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2989 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Mon, 9 Jun 2003 11:47:34 +0200 gcc will generate a lot better code for the memsets if you can tell it somehow they are long aligned and a multiple of 8 bytes. True, but the real bug is that we're initializing any of this crap here at all. Right now we write over the same cachelines 3 or so times. It should really just happen once. From ak@suse.de Mon Jun 9 03:13:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:13:14 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AD82x020428 for ; Mon, 9 Jun 2003 03:13:09 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 49DA51480C; Mon, 9 Jun 2003 12:13:03 +0200 (MEST) Date: Mon, 9 Jun 2003 12:13:02 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-ID: <20030609101302.GA9643@wotan.suse.de> References: <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> <20030609094734.GD2728@wotan.suse.de> <20030609.030334.02284330.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.030334.02284330.davem@redhat.com> X-archive-position: 2990 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 03:03:34AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Mon, 9 Jun 2003 11:47:34 +0200 > > gcc will generate a lot better code for the memsets if you can tell > it somehow they are long aligned and a multiple of 8 bytes. > > True, but the real bug is that we're initializing any of this > crap here at all. Right now we write over the same cachelines > 3 or so times. It should really just happen once. It's unlikely to be the reason for the profile hit on a modern x86. They are all really fast at reading/writing L1. More likely it is the cache miss for fetching the lines initially. Perhaps it is cache thrashing the dst_entry heads. Adding a strategic prefetch somewhere early may help a lot. -Andi From davem@redhat.com Mon Jun 9 03:16:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:16:50 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AGj2x020736 for ; Mon, 9 Jun 2003 03:16:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17734; Mon, 9 Jun 2003 03:13:41 -0700 Date: Mon, 09 Jun 2003 03:13:41 -0700 (PDT) Message-Id: <20030609.031341.77044985.davem@redhat.com> To: ak@suse.de Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609101302.GA9643@wotan.suse.de> References: <20030609094734.GD2728@wotan.suse.de> <20030609.030334.02284330.davem@redhat.com> <20030609101302.GA9643@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2991 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Mon, 9 Jun 2003 12:13:02 +0200 On Mon, Jun 09, 2003 at 03:03:34AM -0700, David S. Miller wrote: > True, but the real bug is that we're initializing any of this > crap here at all. Right now we write over the same cachelines > 3 or so times. It should really just happen once. It's unlikely to be the reason for the profile hit on a modern x86. They are all really fast at reading/writing L1. It's store buffer compression that's being messed up. I've seen this on just about any processor. This is also why the net/core/skbuff.c initialization hacks are so effective as well. Trust me, this has every symptom of excess store buffer traffic :) From yoshfuji@wide.ad.jp Mon Jun 9 03:40:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:40:26 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AeH2x021150 for ; Mon, 9 Jun 2003 03:40:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h59AemBo007231; Mon, 9 Jun 2003 19:40:49 +0900 Date: Mon, 09 Jun 2003 19:40:46 +0900 (JST) Message-Id: <20030609.194046.29425359.yoshfuji@wide.ad.jp> To: davem@redhat.com Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030609.031341.77044985.davem@redhat.com> References: <20030609.030334.02284330.davem@redhat.com> <20030609101302.GA9643@wotan.suse.de> <20030609.031341.77044985.davem@redhat.com> X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2992 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@wide.ad.jp Precedence: bulk X-list: netdev In article <20030609.031341.77044985.davem@redhat.com> (at Mon, 09 Jun 2003 03:13:41 -0700 (PDT)), "David S. Miller" says: > It's unlikely to be the reason for the profile hit on a modern x86. > They are all really fast at reading/writing L1. : > This is also why the net/core/skbuff.c initialization hacks are so > effective as well. > > Trust me, this has every symptom of excess store buffer traffic :) Ok, how about this? Index: linux25/include/net/dst.h =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux25/include/net/dst.h,v retrieving revision 1.7 diff -u -r1.7 dst.h --- linux25/include/net/dst.h 20 Apr 2003 14:55:48 -0000 1.7 +++ linux25/include/net/dst.h 9 Jun 2003 10:26:30 -0000 @@ -38,7 +38,7 @@ struct dst_entry { struct dst_entry *next; - atomic_t __refcnt; /* client references */ + int __use; struct dst_entry *child; struct net_device *dev; @@ -48,14 +48,12 @@ #define DST_NOXFRM 2 #define DST_NOPOLICY 4 #define DST_NOHASH 8 - unsigned long lastuse; unsigned long expires; unsigned short header_len; /* more space at head required */ unsigned short trailer_len; /* space to reserve at tail */ u32 metrics[RTAX_MAX]; - struct dst_entry *path; unsigned long rate_last; /* rate limiting for ICMP */ unsigned long rate_tokens; @@ -66,16 +64,24 @@ struct hh_cache *hh; struct xfrm_state *xfrm; - int (*input)(struct sk_buff*); - int (*output)(struct sk_buff*); - #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; #endif - struct dst_ops *ops; struct rcu_head rcu_head; - + + /* These elements should be at the end of dst_entry{}; + * see net/core/dst.c:dst_alloc() -- yoshfuji */ + u32 __dst_memset_tail[0]; + + atomic_t __refcnt; /* client references */ + unsigned long lastuse; + + struct dst_entry *path; + int (*input)(struct sk_buff*); + int (*output)(struct sk_buff*); + struct dst_ops *ops; + char info[0]; }; Index: linux25/net/core/dst.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux25/net/core/dst.c,v retrieving revision 1.1.1.9 diff -u -r1.1.1.9 dst.c --- linux25/net/core/dst.c 27 May 2003 02:59:54 -0000 1.1.1.9 +++ linux25/net/core/dst.c 9 Jun 2003 10:26:30 -0000 @@ -122,13 +122,16 @@ dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + memset(dst, 0, offsetof(struct dst_entry, __dst_memset_tail)); atomic_set(&dst->__refcnt, 0); - dst->ops = ops; dst->lastuse = jiffies; dst->path = dst; dst->input = dst_discard; dst->output = dst_blackhole; + dst->ops = ops; + if (ops->entry_size > offsetof(struct dst_entry, info)) + memset(&dst->info, 0, ops->entry_size - offsetof(struct dst_entry, info)); + #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Mon Jun 9 03:43:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:43:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Ahn2x021455 for ; Mon, 9 Jun 2003 03:43:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17839; Mon, 9 Jun 2003 03:40:40 -0700 Date: Mon, 09 Jun 2003 03:40:39 -0700 (PDT) Message-Id: <20030609.034039.26980950.davem@redhat.com> To: yoshfuji@wide.ad.jp Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609.194046.29425359.yoshfuji@wide.ad.jp> References: <20030609101302.GA9643@wotan.suse.de> <20030609.031341.77044985.davem@redhat.com> <20030609.194046.29425359.yoshfuji@wide.ad.jp> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2993 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Mon, 09 Jun 2003 19:40:46 +0900 (JST) Ok, how about this? The memset_tail thing is unnecessary, and better to put the non-zero objects at the beginning then you can go. memset(dst->${FIRST_ZERO_MEMBER}, 0, ops->entry_size - offsetof(struct dst_entry, ${FIRST_ZERO_MEMBER})); But even _THIS_ is stupid. All this initialization really should move to caller. We can provide a "dst_init()" helper for protocols that don't want to bother optimizing this. From hadi@shell.cyberus.ca Mon Jun 9 04:39:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:39:33 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59BdK2x022548 for ; Mon, 9 Jun 2003 04:39:21 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PKz7-0008yC-0u; Mon, 09 Jun 2003 07:38:45 -0400 Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'David S. Miller'" , fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <000401c32e5e$a707b6d0$4a00000a@badass> Message-ID: <20030609072227.R34462@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2994 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, CIT/Paul wrote: > I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 > 1.26 ghz routers.. I can't even send 50mbit of this though one of my > routers > Without it using 100% of both cpus because of the route cache.. It goes > up to 500,000 entries if I let it and it adds 80,000 new entries per > second and they are all cache misses.. I'd be glad to show you the setup > sometime :) I showed it to jamal and we tested some stuff. > Yes, you have a nice setup and thats why you should test all the patches DaveM is posting. Dave, Paul is running in a real ISP environment i think he is very valuable in helping to test these patches and collect any says that might be needed. Now watch him disapear ;-> BTW, re: BGP, someone should fix zebra to do batching if it doesnt do it already (i saw that in one emails). In addition arp all the nexthops right before installing the entries in the FIB. Repeat the arp every X timeout. nexthops failinjg ARPs should be removed. That should give you something close to what i think CEF was designed for i.e when the packets get to us, part of the route is resolved already. Additional thought Dave: i think prefetching the rth would help in 2.5 at least when you have lotsa collisions. call prefetch(nextrth) right after smp_read_barrier_depends() everywhere in route.c cheers, jamal PS:- this is one of those fun times i wish i had a setup and time ;-> From vnuorval@tcs.hut.fi Mon Jun 9 04:53:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:53:34 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59BrA2x022952 for ; Mon, 9 Jun 2003 04:53:11 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Mon, 9 Jun 2003 14:43:18 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h59BhHjH014978; Mon, 9 Jun 2003 14:43:17 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h59BhBQ7014972; Mon, 9 Jun 2003 14:43:11 +0300 Date: Mon, 9 Jun 2003 14:43:11 +0300 (EEST) From: Ville Nuorvala To: "David S. Miller" cc: kuznet@ms2.inr.ac.ru, , , , , , Subject: ipv6 tunnel patch (was: Re: [patch]: ipv6 tunnel for MIPv6) In-Reply-To: <20030607.033059.48393210.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-377318441-1375410448-1055158991=:13811" X-archive-position: 2995 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---377318441-1375410448-1055158991=:13811 Content-Type: TEXT/PLAIN; charset=US-ASCII On Sat, 7 Jun 2003, David S. Miller wrote: > Looks ok, but sorry two things need to be fixed up first: > > 1) Doesn't apply anymore, I think it's because of the > struct sock member renames, just replace sk->foo > with sk->sk_foo > Done... > 2) Just export all those routines from net/ipv6/ipv6_syms.c > always, remove the ifdefs. ...and done! > > I promise to apply it after you fix this stuff up :))) Ok here's the last revision of the patch :) It's done against ChangeSet 1.1308. Also available at: http://www.mipl.mediapoli.com/patches/ip6-tunnel-r3.patch Thanks! -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 ---377318441-1375410448-1055158991=:13811 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="ip6-tunnel-r3.patch" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="ip6-tunnel-r3.patch" ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvaW5jbHVkZS9saW51eC9p Zl9hcnAuaCBtZXJnZS0yLjUvaW5jbHVkZS9saW51eC9pZl9hcnAuaA0KLS0t IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lmX2FycC5oCVdlZCBKdW4gIDQg MTM6NDM6MDMgMjAwMw0KKysrIG1lcmdlLTIuNS9pbmNsdWRlL2xpbnV4L2lm X2FycC5oCU1vbiBKdW4gIDkgMTA6MTQ6MjQgMjAwMw0KQEAgLTYwLDcgKzYw LDcgQEANCiAjZGVmaW5lIEFSUEhSRF9SQVdIRExDCTUxOAkJLyogUmF3IEhE TEMJCQkqLw0KIA0KICNkZWZpbmUgQVJQSFJEX1RVTk5FTAk3NjgJCS8qIElQ SVAgdHVubmVsCQkJKi8NCi0jZGVmaW5lIEFSUEhSRF9UVU5ORUw2CTc2OQkJ LyogSVBJUDYgdHVubmVsCQkJKi8NCisjZGVmaW5lIEFSUEhSRF9UVU5ORUw2 CTc2OQkJLyogSVA2SVA2IHR1bm5lbCAgICAgICAJCSovDQogI2RlZmluZSBB UlBIUkRfRlJBRAk3NzAgICAgICAgICAgICAgLyogRnJhbWUgUmVsYXkgQWNj ZXNzIERldmljZSAgICAqLw0KICNkZWZpbmUgQVJQSFJEX1NLSVAJNzcxCQkv KiBTS0lQIHZpZgkJCSovDQogI2RlZmluZSBBUlBIUkRfTE9PUEJBQ0sJNzcy CQkvKiBMb29wYmFjayBkZXZpY2UJCSovDQpkaWZmIC1OdXIgLS1leGNsdWRl PVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0 IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lwNl90dW5uZWwuaCBtZXJnZS0y LjUvaW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgNCi0tLSBsaW51eC0yLjUv aW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDow MCAxOTcwDQorKysgbWVyZ2UtMi41L2luY2x1ZGUvbGludXgvaXA2X3R1bm5l bC5oCU1vbiBKdW4gIDkgMTA6MTQ6MjQgMjAwMw0KQEAgLTAsMCArMSwzMiBA QA0KKy8qDQorICogJElkJA0KKyAqLw0KKw0KKyNpZm5kZWYgX0lQNl9UVU5O RUxfSA0KKyNkZWZpbmUgX0lQNl9UVU5ORUxfSA0KKw0KKyNkZWZpbmUgSVBW Nl9UTFZfVE5MX0VOQ0FQX0xJTUlUIDQNCisjZGVmaW5lIElQVjZfREVGQVVM VF9UTkxfRU5DQVBfTElNSVQgNA0KKw0KKy8qIGRvbid0IGFkZCBlbmNhcHN1 bGF0aW9uIGxpbWl0IGlmIG9uZSBpc24ndCBwcmVzZW50IGluIGlubmVyIHBh Y2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX0lHTl9FTkNBUF9MSU1JVCAw eDENCisvKiBjb3B5IHRoZSB0cmFmZmljIGNsYXNzIGZpZWxkIGZyb20gdGhl IGlubmVyIHBhY2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX1VTRV9PUklH X1RDTEFTUyAweDINCisvKiBjb3B5IHRoZSBmbG93bGFiZWwgZnJvbSB0aGUg aW5uZXIgcGFja2V0ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfVVNFX09SSUdf RkxPV0xBQkVMIDB4NA0KKy8qIGJlaW5nIHVzZWQgZm9yIE1vYmlsZSBJUHY2 ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfTUlQNl9ERVYgMHg4DQorDQorc3Ry dWN0IGlwNl90bmxfcGFybSB7DQorCWNoYXIgbmFtZVtJRk5BTVNJWl07CS8q IG5hbWUgb2YgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgbGluazsJCS8qIGlm aW5kZXggb2YgdW5kZXJseWluZyBMMiBpbnRlcmZhY2UgKi8NCisJX191OCBw cm90bzsJCS8qIHR1bm5lbCBwcm90b2NvbCAqLw0KKwlfX3U4IGVuY2FwX2xp bWl0OwkvKiBlbmNhcHN1bGF0aW9uIGxpbWl0IGZvciB0dW5uZWwgKi8NCisJ X191OCBob3BfbGltaXQ7CQkvKiBob3AgbGltaXQgZm9yIHR1bm5lbCAqLw0K KwlfX3UzMiBmbG93aW5mbzsJCS8qIHRyYWZmaWMgY2xhc3MgYW5kIGZsb3ds YWJlbCBmb3IgdHVubmVsICovDQorCV9fdTMyIGZsYWdzOwkJLyogdHVubmVs IGZsYWdzICovDQorCXN0cnVjdCBpbjZfYWRkciBsYWRkcjsJLyogbG9jYWwg dHVubmVsIGVuZC1wb2ludCBhZGRyZXNzICovDQorCXN0cnVjdCBpbjZfYWRk ciByYWRkcjsJLyogcmVtb3RlIHR1bm5lbCBlbmQtcG9pbnQgYWRkcmVzcyAq Lw0KK307DQorDQorI2VuZGlmDQpkaWZmIC1OdXIgLS1leGNsdWRlPVNDQ1Mg LS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4 LTIuNS9pbmNsdWRlL25ldC9pcDZfdHVubmVsLmggbWVyZ2UtMi41L2luY2x1 ZGUvbmV0L2lwNl90dW5uZWwuaA0KLS0tIGxpbnV4LTIuNS9pbmNsdWRlL25l dC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDowMCAxOTcwDQorKysg bWVyZ2UtMi41L2luY2x1ZGUvbmV0L2lwNl90dW5uZWwuaAlNb24gSnVuICA5 IDEwOjE0OjI0IDIwMDMNCkBAIC0wLDAgKzEsNDQgQEANCisvKg0KKyAqICRJ ZCQNCisgKi8NCisNCisjaWZuZGVmIF9ORVRfSVA2X1RVTk5FTF9IDQorI2Rl ZmluZSBfTkVUX0lQNl9UVU5ORUxfSA0KKw0KKyNpbmNsdWRlIDxsaW51eC9p cHY2Lmg+DQorI2luY2x1ZGUgPGxpbnV4L25ldGRldmljZS5oPg0KKyNpbmNs dWRlIDxsaW51eC9pcDZfdHVubmVsLmg+DQorDQorLyogY2FwYWJsZSBvZiBz ZW5kaW5nIHBhY2tldHMgKi8NCisjZGVmaW5lIElQNl9UTkxfRl9DQVBfWE1J VCAweDEwMDAwDQorLyogY2FwYWJsZSBvZiByZWNlaXZpbmcgcGFja2V0cyAq Lw0KKyNkZWZpbmUgSVA2X1ROTF9GX0NBUF9SQ1YgMHgyMDAwMA0KKw0KKyNk ZWZpbmUgSVA2X1ROTF9NQVggMTI4DQorDQorLyogSVB2NiB0dW5uZWwgKi8N CisNCitzdHJ1Y3QgaXA2X3RubCB7DQorCXN0cnVjdCBpcDZfdG5sICpuZXh0 OwkvKiBuZXh0IHR1bm5lbCBpbiBsaXN0ICovDQorCXN0cnVjdCBuZXRfZGV2 aWNlICpkZXY7CS8qIHZpcnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0 dW5uZWwgKi8NCisJc3RydWN0IG5ldF9kZXZpY2Vfc3RhdHMgc3RhdDsJLyog c3RhdGlzdGljcyBmb3IgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgcmVjdXJz aW9uOwkJLyogZGVwdGggb2YgaGFyZF9zdGFydF94bWl0IHJlY3Vyc2lvbiAq Lw0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJtIHBhcm1zOwkvKiB0dW5uZWwgY29u ZmlndXJhdGlvbiBwYXJhbXRlcnMgKi8NCisJc3RydWN0IGZsb3dpIGZsOwkv KiBmbG93aSB0ZW1wbGF0ZSBmb3IgeG1pdCAqLw0KK307DQorDQorLyogVHVu bmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgZGVzdGluYXRpb24gc3ViLW9wdGlv biAqLw0KKw0KK3N0cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSB7DQorCV9f dTggdHlwZTsJCS8qIHR5cGUtY29kZSBmb3Igb3B0aW9uICAgICAgICAgKi8N CisJX191OCBsZW5ndGg7CQkvKiBvcHRpb24gbGVuZ3RoICAgICAgICAgICAg ICAgICovDQorCV9fdTggZW5jYXBfbGltaXQ7CS8qIHR1bm5lbCBlbmNhcHN1 bGF0aW9uIGxpbWl0ICAgKi8NCit9IF9fYXR0cmlidXRlX18gKChwYWNrZWQp KTsNCisNCisjaWZkZWYgX19LRVJORUxfXw0KKyNpZmRlZiBDT05GSUdfSVBW Nl9UVU5ORUwNCitleHRlcm4gaW50IF9faW5pdCBpcDZfdHVubmVsX2luaXQo dm9pZCk7DQorZXh0ZXJuIHZvaWQgaXA2X3R1bm5lbF9jbGVhbnVwKHZvaWQp Ow0KKyNlbmRpZg0KKyNlbmRpZg0KKyNlbmRpZg0KZGlmZiAtTnVyIC0tZXhj bHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5n ZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvS2NvbmZpZyBtZXJnZS0yLjUvbmV0 L2lwdjYvS2NvbmZpZw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9LY29uZmln CU1vbiBKdW4gIDkgMDk6MTE6MjQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQv aXB2Ni9LY29uZmlnCU1vbiBKdW4gIDkgMTA6MTQ6MjcgMjAwMw0KQEAgLTU1 LDQgKzU1LDEyIEBADQogDQogCSAgSWYgdW5zdXJlLCBzYXkgWS4NCiANCitj b25maWcgSVBWNl9UVU5ORUwNCisJdHJpc3RhdGUgIklQdjY6IElQdjYtaW4t SVB2NiB0dW5uZWwiDQorCWRlcGVuZHMgb24gSVBWNg0KKwktLS1oZWxwLS0t DQorCSAgU3VwcG9ydCBmb3IgSVB2Ni1pbi1JUHY2IHR1bm5lbHMgZGVzY3Jp YmVkIGluIFJGQyAyNDczLg0KKw0KKwkgIElmIHVuc3VyZSwgc2F5IE4uDQor DQogc291cmNlICJuZXQvaXB2Ni9uZXRmaWx0ZXIvS2NvbmZpZyINCmRpZmYg LU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhj bHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L01ha2VmaWxlIG1l cmdlLTIuNS9uZXQvaXB2Ni9NYWtlZmlsZQ0KLS0tIGxpbnV4LTIuNS9uZXQv aXB2Ni9NYWtlZmlsZQlXZWQgSnVuICA0IDEzOjQzOjA2IDIwMDMNCisrKyBt ZXJnZS0yLjUvbmV0L2lwdjYvTWFrZWZpbGUJTW9uIEp1biAgOSAxMDoxNDoy NyAyMDAzDQpAQCAtMTUsMyArMTUsNSBAQA0KIG9iai0kKENPTkZJR19JTkVU Nl9FU1ApICs9IGVzcDYubw0KIG9iai0kKENPTkZJR19JTkVUNl9JUENPTVAp ICs9IGlwY29tcDYubw0KIG9iai0kKENPTkZJR19ORVRGSUxURVIpCSs9IG5l dGZpbHRlci8NCisNCitvYmotJChDT05GSUdfSVBWNl9UVU5ORUwpICs9IGlw Nl90dW5uZWwubw0KZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVk ZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0 L2lwdjYvYWZfaW5ldDYuYyBtZXJnZS0yLjUvbmV0L2lwdjYvYWZfaW5ldDYu Yw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9hZl9pbmV0Ni5jCU1vbiBKdW4g IDkgMDk6MTE6MjQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9hZl9p bmV0Ni5jCU1vbiBKdW4gIDkgMTA6MTQ6MzYgMjAwMw0KQEAgLTU3LDYgKzU3 LDkgQEANCiAjaW5jbHVkZSA8bmV0L3RyYW5zcF92Ni5oPg0KICNpbmNsdWRl IDxuZXQvaXA2X3JvdXRlLmg+DQogI2luY2x1ZGUgPG5ldC9hZGRyY29uZi5o Pg0KKyNpZiBDT05GSUdfSVBWNl9UVU5ORUwNCisjaW5jbHVkZSA8bmV0L2lw Nl90dW5uZWwuaD4NCisjZW5kaWYNCiANCiAjaW5jbHVkZSA8YXNtL3VhY2Nl c3MuaD4NCiAjaW5jbHVkZSA8YXNtL3N5c3RlbS5oPg0KQEAgLTc3Niw2ICs3 NzksMTEgQEANCiAJZXJyID0gbmRpc2NfaW5pdCgmaW5ldDZfZmFtaWx5X29w cyk7DQogCWlmIChlcnIpDQogCQlnb3RvIG5kaXNjX2ZhaWw7DQorI2lmZGVm IENPTkZJR19JUFY2X1RVTk5FTA0KKwllcnIgPSBpcDZfdHVubmVsX2luaXQo KTsNCisJaWYgKGVycikNCisJCWdvdG8gaXA2X3R1bm5lbF9mYWlsOw0KKyNl bmRpZg0KIAllcnIgPSBpZ21wNl9pbml0KCZpbmV0Nl9mYW1pbHlfb3BzKTsN CiAJaWYgKGVycikNCiAJCWdvdG8gaWdtcF9mYWlsOw0KQEAgLTgzMCw2ICs4 MzgsMTAgQEANCiAJaWdtcDZfY2xlYW51cCgpOw0KICNlbmRpZg0KIGlnbXBf ZmFpbDoNCisjaWZkZWYgQ09ORklHX0lQVjZfVFVOTkVMDQorCWlwNl90dW5u ZWxfY2xlYW51cCgpOw0KK2lwNl90dW5uZWxfZmFpbDoNCisjZW5kaWYNCiAJ bmRpc2NfY2xlYW51cCgpOw0KIG5kaXNjX2ZhaWw6DQogCWljbXB2Nl9jbGVh bnVwKCk7DQpAQCAtODY1LDYgKzg3Nyw5IEBADQogCWlwNl9yb3V0ZV9jbGVh bnVwKCk7DQogCWlwdjZfcGFja2V0X2NsZWFudXAoKTsNCiAJaWdtcDZfY2xl YW51cCgpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUwNCisJaXA2X3R1 bm5lbF9jbGVhbnVwKCk7DQorI2VuZGlmDQogCW5kaXNjX2NsZWFudXAoKTsN CiAJaWNtcHY2X2NsZWFudXAoKTsNCiAjaWZkZWYgQ09ORklHX1NZU0NUTA0K ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1 bm5lbC5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVsLmMNCi0tLSBs aW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1bm5lbC5jCVRodSBKYW4gIDEgMDI6 MDA6MDAgMTk3MA0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVs LmMJTW9uIEp1biAgOSAxMDozOTo1MCAyMDAzDQpAQCAtMCwwICsxLDEyNjEg QEANCisvKg0KKyAqCUlQdjYgb3ZlciBJUHY2IHR1bm5lbCBkZXZpY2UNCisg KglMaW51eCBJTkVUNiBpbXBsZW1lbnRhdGlvbg0KKyAqDQorICoJQXV0aG9y czoNCisgKglWaWxsZSBOdW9ydmFsYQkJPHZudW9ydmFsQHRjcy5odXQuZmk+ CQ0KKyAqDQorICoJJElkJA0KKyAqDQorICogICAgICBCYXNlZCBvbjoNCisg KiAgICAgIGxpbnV4L25ldC9pcHY2L3NpdC5jDQorICoNCisgKiAgICAgIFJG QyAyNDczDQorICoNCisgKglUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2Fy ZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yDQorICogICAgICBt b2RpZnkgaXQgdW5kZXIgdGhlIHRlcm1zIG9mIHRoZSBHTlUgR2VuZXJhbCBQ dWJsaWMgTGljZW5zZQ0KKyAqICAgICAgYXMgcHVibGlzaGVkIGJ5IHRoZSBG cmVlIFNvZnR3YXJlIEZvdW5kYXRpb247IGVpdGhlciB2ZXJzaW9uDQorICog ICAgICAyIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFu eSBsYXRlciB2ZXJzaW9uLg0KKyAqDQorICovDQorDQorI2luY2x1ZGUgPGxp bnV4L2NvbmZpZy5oPg0KKyNpbmNsdWRlIDxsaW51eC9tb2R1bGUuaD4NCisj aW5jbHVkZSA8bGludXgvZXJybm8uaD4NCisjaW5jbHVkZSA8bGludXgvdHlw ZXMuaD4NCisjaW5jbHVkZSA8bGludXgvc29ja2V0Lmg+DQorI2luY2x1ZGUg PGxpbnV4L3NvY2tpb3MuaD4NCisjaW5jbHVkZSA8bGludXgvaWYuaD4NCisj aW5jbHVkZSA8bGludXgvaW4uaD4NCisjaW5jbHVkZSA8bGludXgvaXAuaD4N CisjaW5jbHVkZSA8bGludXgvaWZfdHVubmVsLmg+DQorI2luY2x1ZGUgPGxp bnV4L25ldC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pbjYuaD4NCisjaW5jbHVk ZSA8bGludXgvbmV0ZGV2aWNlLmg+DQorI2luY2x1ZGUgPGxpbnV4L2lmX2Fy cC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pY21wdjYuaD4NCisjaW5jbHVkZSA8 bGludXgvaW5pdC5oPg0KKyNpbmNsdWRlIDxsaW51eC9yb3V0ZS5oPg0KKyNp bmNsdWRlIDxsaW51eC9ydG5ldGxpbmsuaD4NCisNCisjaW5jbHVkZSA8YXNt L3VhY2Nlc3MuaD4NCisjaW5jbHVkZSA8YXNtL2F0b21pYy5oPg0KKw0KKyNp bmNsdWRlIDxuZXQvaXAuaD4NCisjaW5jbHVkZSA8bmV0L3NvY2suaD4NCisj aW5jbHVkZSA8bmV0L2lwdjYuaD4NCisjaW5jbHVkZSA8bmV0L3Byb3RvY29s Lmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfcm91dGUuaD4NCisjaW5jbHVkZSA8 bmV0L2FkZHJjb25mLmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfdHVubmVsLmg+ DQorDQorTU9EVUxFX0FVVEhPUigiVmlsbGUgTnVvcnZhbGEiKTsNCitNT0RV TEVfREVTQ1JJUFRJT04oIklQdjYtaW4tSVB2NiB0dW5uZWwiKTsNCitNT0RV TEVfTElDRU5TRSgiR1BMIik7DQorDQorI2RlZmluZSBJUFY2X1RMVl9URUxf RFNUX1NJWkUgOA0KKw0KKyNpZmRlZiBJUDZfVE5MX0RFQlVHDQorI2RlZmlu ZSBJUDZfVE5MX1RSQUNFKHguLi4pIHByaW50ayhLRVJOX0RFQlVHICIlczoi IHggIlxuIiwgX19GVU5DVElPTl9fKQ0KKyNlbHNlDQorI2RlZmluZSBJUDZf VE5MX1RSQUNFKHguLi4pIGRvIHs7fSB3aGlsZSgwKQ0KKyNlbmRpZg0KKw0K KyNkZWZpbmUgSVBWNl9UQ0xBU1NfTUFTSyAoSVBWNl9GTE9XSU5GT19NQVNL ICYgfklQVjZfRkxPV0xBQkVMX01BU0spDQorDQorLyogc29ja2V0KHMpIHVz ZWQgYnkgaXA2aXA2X3RubF94bWl0KCkgZm9yIHJlc2VuZGluZyBwYWNrZXRz ICovDQorc3RhdGljIHN0cnVjdCBzb2NrZXQgKl9faXA2X3NvY2tldFtOUl9D UFVTXTsNCisjZGVmaW5lIGlwNl9zb2NrZXQgX19pcDZfc29ja2V0W3NtcF9w cm9jZXNzb3JfaWQoKV0NCisNCitzdGF0aWMgdm9pZCBpcDZfeG1pdF9sb2Nr KHZvaWQpDQorew0KKwlsb2NhbF9iaF9kaXNhYmxlKCk7DQorCWlmICh1bmxp a2VseSghc3Bpbl90cnlsb2NrKCZpcDZfc29ja2V0LT5zay0+c2tfbG9jay5z bG9jaykpKQ0KKwkJQlVHKCk7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlwNl94 bWl0X3VubG9jayh2b2lkKQ0KK3sNCisJc3Bpbl91bmxvY2tfYmgoJmlwNl9z b2NrZXQtPnNrLT5za19sb2NrLnNsb2NrKTsNCit9DQorDQorI2RlZmluZSBI QVNIX1NJWkUgIDMyDQorDQorI2RlZmluZSBIQVNIKGFkZHIpICgoKGFkZHIp LT5zNl9hZGRyMzJbMF0gXiAoYWRkciktPnM2X2FkZHIzMlsxXSBeIFwNCisJ ICAgICAgICAgICAgIChhZGRyKS0+czZfYWRkcjMyWzJdIF4gKGFkZHIpLT5z Nl9hZGRyMzJbM10pICYgXA0KKyAgICAgICAgICAgICAgICAgICAgKEhBU0hf U0laRSAtIDEpKQ0KKw0KK3N0YXRpYyBpbnQgaXA2aXA2X2ZiX3RubF9kZXZf aW5pdChzdHJ1Y3QgbmV0X2RldmljZSAqZGV2KTsNCitzdGF0aWMgaW50IGlw NmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldik7DQor DQorLyogdGhlIElQdjYgdHVubmVsIGZhbGxiYWNrIGRldmljZSAqLw0KK3N0 YXRpYyBzdHJ1Y3QgbmV0X2RldmljZSBpcDZpcDZfZmJfdG5sX2RldiA9IHsN CisJLm5hbWUgPSAiaXA2dG5sMCIsDQorCS5pbml0ID0gaXA2aXA2X2ZiX3Ru bF9kZXZfaW5pdA0KK307DQorDQorLyogdGhlIElQdjYgZmFsbGJhY2sgdHVu bmVsICovDQorc3RhdGljIHN0cnVjdCBpcDZfdG5sIGlwNmlwNl9mYl90bmwg PSB7DQorCS5kZXYgPSAmaXA2aXA2X2ZiX3RubF9kZXYsDQorCS5wYXJtcyA9 ey5uYW1lID0gImlwNnRubDAiLCAucHJvdG8gPSBJUFBST1RPX0lQVjZ9DQor fTsNCisNCisvKiBsaXN0cyBmb3Igc3RvcmluZyB0dW5uZWxzIGluIHVzZSAq Lw0KK3N0YXRpYyBzdHJ1Y3QgaXA2X3RubCAqdG5sc19yX2xbSEFTSF9TSVpF XTsNCitzdGF0aWMgc3RydWN0IGlwNl90bmwgKnRubHNfd2NbMV07DQorc3Rh dGljIHN0cnVjdCBpcDZfdG5sICoqdG5sc1syXSA9IHsgdG5sc193YywgdG5s c19yX2wgfTsNCisNCisvKiBsb2NrIGZvciB0aGUgdHVubmVsIGxpc3RzICov DQorc3RhdGljIHJ3bG9ja190IGlwNmlwNl9sb2NrID0gUldfTE9DS19VTkxP Q0tFRDsNCisNCisvKioNCisgKiBpcDZpcDZfdG5sX2xvb2t1cCAtIGZldGNo IHR1bm5lbCBtYXRjaGluZyB0aGUgZW5kLXBvaW50IGFkZHJlc3Nlcw0KKyAq ICAgQHJlbW90ZTogdGhlIGFkZHJlc3Mgb2YgdGhlIHR1bm5lbCBleGl0LXBv aW50IA0KKyAqICAgQGxvY2FsOiB0aGUgYWRkcmVzcyBvZiB0aGUgdHVubmVs IGVudHJ5LXBvaW50IA0KKyAqDQorICogUmV0dXJuOiAgDQorICogICB0dW5u ZWwgbWF0Y2hpbmcgZ2l2ZW4gZW5kLXBvaW50cyBpZiBmb3VuZCwNCisgKiAg IGVsc2UgZmFsbGJhY2sgdHVubmVsIGlmIGl0cyBkZXZpY2UgaXMgdXAsIA0K KyAqICAgZWxzZSAlTlVMTA0KKyAqKi8NCisNCitzdHJ1Y3QgaXA2X3RubCAq DQoraXA2aXA2X3RubF9sb29rdXAoc3RydWN0IGluNl9hZGRyICpyZW1vdGUs IHN0cnVjdCBpbjZfYWRkciAqbG9jYWwpDQorew0KKwl1bnNpZ25lZCBoMCA9 IEhBU0gocmVtb3RlKTsNCisJdW5zaWduZWQgaDEgPSBIQVNIKGxvY2FsKTsN CisJc3RydWN0IGlwNl90bmwgKnQ7DQorDQorCWZvciAodCA9IHRubHNfcl9s W2gwIF4gaDFdOyB0OyB0ID0gdC0+bmV4dCkgew0KKwkJaWYgKCFpcHY2X2Fk ZHJfY21wKGxvY2FsLCAmdC0+cGFybXMubGFkZHIpICYmDQorCQkgICAgIWlw djZfYWRkcl9jbXAocmVtb3RlLCAmdC0+cGFybXMucmFkZHIpICYmDQorCQkg ICAgKHQtPmRldi0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJCXJldHVybiB0Ow0K Kwl9DQorCWlmICgodCA9IHRubHNfd2NbMF0pICE9IE5VTEwgJiYgKHQtPmRl di0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJcmV0dXJuIHQ7DQorDQorCXJldHVy biBOVUxMOw0KK30NCisNCisvKioNCisgKiBpcDZpcDZfYnVja2V0IC0gZ2V0 IGhlYWQgb2YgbGlzdCBtYXRjaGluZyBnaXZlbiB0dW5uZWwgcGFyYW1ldGVy cw0KKyAqICAgQHA6IHBhcmFtZXRlcnMgY29udGFpbmluZyB0dW5uZWwgZW5k LXBvaW50cyANCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2 X2J1Y2tldCgpIHJldHVybnMgdGhlIGhlYWQgb2YgdGhlIGxpc3QgbWF0Y2hp bmcgdGhlIA0KKyAqICAgJnN0cnVjdCBpbjZfYWRkciBlbnRyaWVzIGxhZGRy IGFuZCByYWRkciBpbiBAcC4NCisgKg0KKyAqIFJldHVybjogaGVhZCBvZiBJ UHY2IHR1bm5lbCBsaXN0IA0KKyAqKi8NCisNCitzdGF0aWMgc3RydWN0IGlw Nl90bmwgKioNCitpcDZpcDZfYnVja2V0KHN0cnVjdCBpcDZfdG5sX3Bhcm0g KnApDQorew0KKwlzdHJ1Y3QgaW42X2FkZHIgKnJlbW90ZSA9ICZwLT5yYWRk cjsNCisJc3RydWN0IGluNl9hZGRyICpsb2NhbCA9ICZwLT5sYWRkcjsNCisJ dW5zaWduZWQgaCA9IDA7DQorCWludCBwcmlvID0gMDsNCisNCisJaWYgKCFp cHY2X2FkZHJfYW55KHJlbW90ZSkgfHwgIWlwdjZfYWRkcl9hbnkobG9jYWwp KSB7DQorCQlwcmlvID0gMTsNCisJCWggPSBIQVNIKHJlbW90ZSkgXiBIQVNI KGxvY2FsKTsNCisJfQ0KKwlyZXR1cm4gJnRubHNbcHJpb11baF07DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfbGluayAtIGFkZCB0dW5uZWwgdG8g aGFzaCB0YWJsZQ0KKyAqICAgQHQ6IHR1bm5lbCB0byBiZSBhZGRlZA0KKyAq Ki8NCisNCitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfbGluayhzdHJ1Y3Qg aXA2X3RubCAqdCkNCit7DQorCXN0cnVjdCBpcDZfdG5sICoqdHAgPSBpcDZp cDZfYnVja2V0KCZ0LT5wYXJtcyk7DQorDQorCXdyaXRlX2xvY2tfYmgoJmlw NmlwNl9sb2NrKTsNCisJdC0+bmV4dCA9ICp0cDsNCisJd3JpdGVfdW5sb2Nr X2JoKCZpcDZpcDZfbG9jayk7DQorCSp0cCA9IHQ7DQorfQ0KKw0KKy8qKg0K KyAqIGlwNmlwNl90bmxfdW5saW5rIC0gcmVtb3ZlIHR1bm5lbCBmcm9tIGhh c2ggdGFibGUNCisgKiAgIEB0OiB0dW5uZWwgdG8gYmUgcmVtb3ZlZA0KKyAq Ki8NCisNCitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfdW5saW5rKHN0cnVj dCBpcDZfdG5sICp0KQ0KK3sNCisJc3RydWN0IGlwNl90bmwgKip0cDsNCisN CisJZm9yICh0cCA9IGlwNmlwNl9idWNrZXQoJnQtPnBhcm1zKTsgKnRwOyB0 cCA9ICYoKnRwKS0+bmV4dCkgew0KKwkJaWYgKHQgPT0gKnRwKSB7DQorCQkJ d3JpdGVfbG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwkJCSp0cCA9IHQtPm5l eHQ7DQorCQkJd3JpdGVfdW5sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQkJ YnJlYWs7DQorCQl9DQorCX0NCit9DQorDQorLyoqDQorICogaXA2X3RubF9j cmVhdGUoKSAtIGNyZWF0ZSBhIG5ldyB0dW5uZWwNCisgKiAgIEBwOiB0dW5u ZWwgcGFyYW1ldGVycw0KKyAqICAgQHB0OiBwb2ludGVyIHRvIG5ldyB0dW5u ZWwNCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgQ3JlYXRlIHR1bm5l bCBtYXRjaGluZyBnaXZlbiBwYXJhbWV0ZXJzLg0KKyAqIA0KKyAqIFJldHVy bjogDQorICogICAwIG9uIHN1Y2Nlc3MNCisgKiovDQorDQorc3RhdGljIGlu dA0KK2lwNl90bmxfY3JlYXRlKHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAsIHN0 cnVjdCBpcDZfdG5sICoqcHQpDQorew0KKwlzdHJ1Y3QgbmV0X2RldmljZSAq ZGV2Ow0KKwlpbnQgZXJyID0gLUVOT0JVRlM7DQorCXN0cnVjdCBpcDZfdG5s ICp0Ow0KKw0KKwlkZXYgPSBrbWFsbG9jKHNpemVvZiAoKmRldikgKyBzaXpl b2YgKCp0KSwgR0ZQX0tFUk5FTCk7DQorCWlmICghZGV2KQ0KKwkJcmV0dXJu IGVycjsNCisNCisJbWVtc2V0KGRldiwgMCwgc2l6ZW9mICgqZGV2KSArIHNp emVvZiAoKnQpKTsNCisJZGV2LT5wcml2ID0gKHZvaWQgKikgKGRldiArIDEp Ow0KKwl0ID0gKHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJdC0+ ZGV2ID0gZGV2Ow0KKwlkZXYtPmluaXQgPSBpcDZpcDZfdG5sX2Rldl9pbml0 Ow0KKwltZW1jcHkoJnQtPnBhcm1zLCBwLCBzaXplb2YgKCpwKSk7DQorCXQt PnBhcm1zLm5hbWVbSUZOQU1TSVogLSAxXSA9ICdcMCc7DQorCWlmICh0LT5w YXJtcy5ob3BfbGltaXQgPiAyNTUpDQorCQl0LT5wYXJtcy5ob3BfbGltaXQg PSAtMTsNCisJc3RyY3B5KGRldi0+bmFtZSwgdC0+cGFybXMubmFtZSk7DQor CWlmICghZGV2LT5uYW1lWzBdKSB7DQorCQlpbnQgaSA9IDA7DQorCQlpbnQg ZXhpc3RzID0gMDsNCisNCisJCWRvIHsNCisJCQlzcHJpbnRmKGRldi0+bmFt ZSwgImlwNnRubCVkIiwgKytpKTsNCisJCQlleGlzdHMgPSAoX19kZXZfZ2V0 X2J5X25hbWUoZGV2LT5uYW1lKSAhPSBOVUxMKTsNCisJCX0gd2hpbGUgKGkg PCBJUDZfVE5MX01BWCAmJiBleGlzdHMpOw0KKw0KKwkJaWYgKGkgPT0gSVA2 X1ROTF9NQVgpIHsNCisJCQlnb3RvIGZhaWxlZDsNCisJCX0NCisJCW1lbWNw eSh0LT5wYXJtcy5uYW1lLCBkZXYtPm5hbWUsIElGTkFNU0laKTsNCisJfQ0K KwlTRVRfTU9EVUxFX09XTkVSKGRldik7DQorCWlmICgoZXJyID0gcmVnaXN0 ZXJfbmV0ZGV2aWNlKGRldikpIDwgMCkgew0KKwkJZ290byBmYWlsZWQ7DQor CX0NCisJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkqcHQgPSB0Ow0KKwlyZXR1 cm4gMDsNCitmYWlsZWQ6DQorCWtmcmVlKGRldik7DQorCXJldHVybiBlcnI7 DQorfQ0KKw0KKy8qKg0KKyAqIGlwNl90bmxfZGVzdHJveSgpIC0gZGVzdHJv eSBvbGQgdHVubmVsDQorICogICBAdDogdHVubmVsIHRvIGJlIGRlc3Ryb3ll ZA0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgd2hhdGV2ZXIgdW5yZWdpc3Rl cl9uZXRkZXZpY2UoKSByZXR1cm5zDQorICoqLw0KKw0KK3N0YXRpYyBpbmxp bmUgaW50DQoraXA2X3RubF9kZXN0cm95KHN0cnVjdCBpcDZfdG5sICp0KQ0K K3sNCisJcmV0dXJuIHVucmVnaXN0ZXJfbmV0ZGV2aWNlKHQtPmRldik7DQor fQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfbG9jYXRlIC0gZmluZCBvciBj cmVhdGUgdHVubmVsIG1hdGNoaW5nIGdpdmVuIHBhcmFtZXRlcnMNCisgKiAg IEBwOiB0dW5uZWwgcGFyYW1ldGVycyANCisgKiAgIEBjcmVhdGU6ICE9IDAg aWYgYWxsb3dlZCB0byBjcmVhdGUgbmV3IHR1bm5lbCBpZiBubyBtYXRjaCBm b3VuZA0KKyAqDQorICogRGVzY3JpcHRpb246DQorICogICBpcDZpcDZfdG5s X2xvY2F0ZSgpIGZpcnN0IHRyaWVzIHRvIGxvY2F0ZSBhbiBleGlzdGluZyB0 dW5uZWwNCisgKiAgIGJhc2VkIG9uIEBwYXJtcy4gSWYgdGhpcyBpcyB1bnN1 Y2Nlc3NmdWwsIGJ1dCBAY3JlYXRlIGlzIHNldCBhIG5ldw0KKyAqICAgdHVu bmVsIGRldmljZSBpcyBjcmVhdGVkIGFuZCByZWdpc3RlcmVkIGZvciB1c2Uu DQorICoNCisgKiBSZXR1cm46DQorICogICAwIGlmIHR1bm5lbCBsb2NhdGVk IG9yIGNyZWF0ZWQsDQorICogICAtRUlOVkFMIGlmIHBhcmFtZXRlcnMgaW5j b3JyZWN0LA0KKyAqICAgLUVOT0RFViBpZiBubyBtYXRjaGluZyB0dW5uZWwg YXZhaWxhYmxlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZfdG5s X2xvY2F0ZShzdHJ1Y3QgaXA2X3RubF9wYXJtICpwLCBzdHJ1Y3QgaXA2X3Ru bCAqKnB0LCBpbnQgY3JlYXRlKQ0KK3sNCisJc3RydWN0IGluNl9hZGRyICpy ZW1vdGUgPSAmcC0+cmFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAqbG9jYWwg PSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKw0KKwlpZiAo cC0+cHJvdG8gIT0gSVBQUk9UT19JUFY2KQ0KKwkJcmV0dXJuIC1FSU5WQUw7 DQorDQorCWZvciAodCA9ICppcDZpcDZfYnVja2V0KHApOyB0OyB0ID0gdC0+ bmV4dCkgew0KKwkJaWYgKCFpcHY2X2FkZHJfY21wKGxvY2FsLCAmdC0+cGFy bXMubGFkZHIpICYmDQorCQkgICAgIWlwdjZfYWRkcl9jbXAocmVtb3RlLCAm dC0+cGFybXMucmFkZHIpKSB7DQorCQkJKnB0ID0gdDsNCisJCQlyZXR1cm4g KGNyZWF0ZSA/IC1FRVhJU1QgOiAwKTsNCisJCX0NCisJfQ0KKwlpZiAoIWNy ZWF0ZSkgew0KKwkJcmV0dXJuIC1FTk9ERVY7DQorCX0NCisJcmV0dXJuIGlw Nl90bmxfY3JlYXRlKHAsIHB0KTsNCit9DQorDQorLyoqDQorICogaXA2aXA2 X3RubF9kZXZfZGVzdHJ1Y3RvciAtIHR1bm5lbCBkZXZpY2UgZGVzdHJ1Y3Rv cg0KKyAqICAgQGRldjogdGhlIGRldmljZSB0byBiZSBkZXN0cm95ZWQNCisg KiovDQorDQorc3RhdGljIHZvaWQNCitpcDZpcDZfdG5sX2Rldl9kZXN0cnVj dG9yKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYpDQorew0KKwlrZnJlZShkZXYp Ow0KK30NCisNCisvKioNCisgKiBpcDZpcDZfdG5sX2Rldl91bmluaXQgLSB0 dW5uZWwgZGV2aWNlIHVuaW5pdGlhbGl6ZXINCisgKiAgIEBkZXY6IHRoZSBk ZXZpY2UgdG8gYmUgZGVzdHJveWVkDQorICogICANCisgKiBEZXNjcmlwdGlv bjoNCisgKiAgIGlwNmlwNl90bmxfZGV2X3VuaW5pdCgpIHJlbW92ZXMgdHVu bmVsIGZyb20gaXRzIGxpc3QNCisgKiovDQorDQorc3RhdGljIHZvaWQNCitp cDZpcDZfdG5sX2Rldl91bmluaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikN Cit7DQorCWlmIChkZXYgPT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQl3 cml0ZV9sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQl0bmxzX3djWzBdID0g TlVMTDsNCisJCXdyaXRlX3VubG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwl9 IGVsc2Ugew0KKwkJc3RydWN0IGlwNl90bmwgKnQgPSAoc3RydWN0IGlwNl90 bmwgKikgZGV2LT5wcml2Ow0KKwkJaXA2aXA2X3RubF91bmxpbmsodCk7DQor CX0NCit9DQorDQorLyoqDQorICogcGFyc2VfdHZsX3RubF9lbmNfbGltIC0g aGFuZGxlIGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICogICBAc2ti OiByZWNlaXZlZCBzb2NrZXQgYnVmZmVyDQorICoNCisgKiBSZXR1cm46IA0K KyAqICAgMCBpZiBub25lIHdhcyBmb3VuZCwgDQorICogICBlbHNlIGluZGV4 IHRvIGVuY2Fwc3VsYXRpb24gbGltaXQNCisgKiovDQorDQorc3RhdGljIF9f dTE2DQorcGFyc2VfdGx2X3RubF9lbmNfbGltKHN0cnVjdCBza19idWZmICpz a2IsIF9fdTggKiByYXcpDQorew0KKwlzdHJ1Y3QgaXB2NmhkciAqaXB2Nmgg PSAoc3RydWN0IGlwdjZoZHIgKikgcmF3Ow0KKwlfX3U4IG5leHRoZHIgPSBp cHY2aC0+bmV4dGhkcjsNCisJX191MTYgb2ZmID0gc2l6ZW9mICgqaXB2Nmgp Ow0KKw0KKwl3aGlsZSAoaXB2Nl9leHRfaGRyKG5leHRoZHIpICYmIG5leHRo ZHIgIT0gTkVYVEhEUl9OT05FKSB7DQorCQlfX3UxNiBvcHRsZW4gPSAwOw0K KwkJc3RydWN0IGlwdjZfb3B0X2hkciAqaGRyOw0KKwkJaWYgKHJhdyArIG9m ZiArIHNpemVvZiAoKmhkcikgPiBza2ItPmRhdGEgJiYNCisJCSAgICAhcHNr Yl9tYXlfcHVsbChza2IsIHJhdyAtIHNrYi0+ZGF0YSArIG9mZiArIHNpemVv ZiAoKmhkcikpKQ0KKwkJCWJyZWFrOw0KKw0KKwkJaGRyID0gKHN0cnVjdCBp cHY2X29wdF9oZHIgKikgKHJhdyArIG9mZik7DQorCQlpZiAobmV4dGhkciA9 PSBORVhUSERSX0ZSQUdNRU5UKSB7DQorCQkJc3RydWN0IGZyYWdfaGRyICpm cmFnX2hkciA9IChzdHJ1Y3QgZnJhZ19oZHIgKikgaGRyOw0KKwkJCWlmIChm cmFnX2hkci0+ZnJhZ19vZmYpDQorCQkJCWJyZWFrOw0KKwkJCW9wdGxlbiA9 IDg7DQorCQl9IGVsc2UgaWYgKG5leHRoZHIgPT0gTkVYVEhEUl9BVVRIKSB7 DQorCQkJb3B0bGVuID0gKGhkci0+aGRybGVuICsgMikgPDwgMjsNCisJCX0g ZWxzZSB7DQorCQkJb3B0bGVuID0gaXB2Nl9vcHRsZW4oaGRyKTsNCisJCX0N CisJCWlmIChuZXh0aGRyID09IE5FWFRIRFJfREVTVCkgew0KKwkJCV9fdTE2 IGkgPSBvZmYgKyAyOw0KKwkJCXdoaWxlICgxKSB7DQorCQkJCXN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKw0KKwkJCQkvKiBObyBtb3Jl IHJvb20gZm9yIGVuY2Fwc3VsYXRpb24gbGltaXQgKi8NCisJCQkJaWYgKGkg KyBzaXplb2YgKCp0ZWwpID4gb2ZmICsgb3B0bGVuKQ0KKwkJCQkJYnJlYWs7 DQorDQorCQkJCXRlbCA9IChzdHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0g KikgJnJhd1tpXTsNCisJCQkJLyogcmV0dXJuIGluZGV4IG9mIG9wdGlvbiBp ZiBmb3VuZCBhbmQgdmFsaWQgKi8NCisJCQkJaWYgKHRlbC0+dHlwZSA9PSBJ UFY2X1RMVl9UTkxfRU5DQVBfTElNSVQgJiYNCisJCQkJICAgIHRlbC0+bGVu Z3RoID09IDEpDQorCQkJCQlyZXR1cm4gaTsNCisJCQkJLyogZWxzZSBqdW1w IHRvIG5leHQgb3B0aW9uICovDQorCQkJCWlmICh0ZWwtPnR5cGUpDQorCQkJ CQlpICs9IHRlbC0+bGVuZ3RoICsgMjsNCisJCQkJZWxzZQ0KKwkJCQkJaSsr Ow0KKwkJCX0NCisJCX0NCisJCW5leHRoZHIgPSBoZHItPm5leHRoZHI7DQor CQlvZmYgKz0gb3B0bGVuOw0KKwl9DQorCXJldHVybiAwOw0KK30NCisNCisv KioNCisgKiBpcDZpcDZfZXJyIC0gdHVubmVsIGVycm9yIGhhbmRsZXINCisg Kg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2VycigpIHNob3Vs ZCBoYW5kbGUgZXJyb3JzIGluIHRoZSB0dW5uZWwgYWNjb3JkaW5nDQorICog ICB0byB0aGUgc3BlY2lmaWNhdGlvbnMgaW4gUkZDIDI0NzMuDQorICoqLw0K Kw0KK3ZvaWQgaXA2aXA2X2VycihzdHJ1Y3Qgc2tfYnVmZiAqc2tiLCBzdHJ1 Y3QgaW5ldDZfc2tiX3Bhcm0gKm9wdCwNCisJCSAgIGludCB0eXBlLCBpbnQg Y29kZSwgaW50IG9mZnNldCwgX191MzIgaW5mbykNCit7DQorCXN0cnVjdCBp cHY2aGRyICppcHY2aCA9IChzdHJ1Y3QgaXB2NmhkciAqKSBza2ItPmRhdGE7 DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKwlpbnQgcmVsX21zZyA9IDA7DQor CWludCByZWxfdHlwZSA9IElDTVBWNl9ERVNUX1VOUkVBQ0g7DQorCWludCBy ZWxfY29kZSA9IElDTVBWNl9BRERSX1VOUkVBQ0g7DQorCV9fdTMyIHJlbF9p bmZvID0gMDsNCisJX191MTYgbGVuOw0KKw0KKwkvKiBJZiB0aGUgcGFja2V0 IGRvZXNuJ3QgY29udGFpbiB0aGUgb3JpZ2luYWwgSVB2NiBoZWFkZXIgd2Ug YXJlIA0KKwkgICBpbiB0cm91YmxlIHNpbmNlIHdlIG1pZ2h0IG5lZWQgdGhl IHNvdXJjZSBhZGRyZXNzIGZvciBmdXJ0ZXIgDQorCSAgIHByb2Nlc3Npbmcg b2YgdGhlIGVycm9yLiAqLw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2Nr KTsNCisJaWYgKCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5kYWRk ciwgJmlwdjZoLT5zYWRkcikpID09IE5VTEwpDQorCQlnb3RvIG91dDsNCisN CisJc3dpdGNoICh0eXBlKSB7DQorCQlfX3UzMiB0ZWxpOw0KKwkJc3RydWN0 IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCQlfX3UzMiBtdHU7DQor CWNhc2UgSUNNUFY2X0RFU1RfVU5SRUFDSDoNCisJCWlmIChuZXRfcmF0ZWxp bWl0KCkpDQorCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCSAgICAgICAi JXM6IFBhdGggdG8gZGVzdGluYXRpb24gaW52YWxpZCAiDQorCQkJICAgICAg ICJvciBpbmFjdGl2ZSFcbiIsIHQtPnBhcm1zLm5hbWUpOw0KKwkJcmVsX21z ZyA9IDE7DQorCQlicmVhazsNCisJY2FzZSBJQ01QVjZfVElNRV9FWENFRUQ6 DQorCQlpZiAoY29kZSA9PSBJQ01QVjZfRVhDX0hPUExJTUlUKSB7DQorCQkJ aWYgKG5ldF9yYXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklO Rw0KKwkJCQkgICAgICAgIiVzOiBUb28gc21hbGwgaG9wIGxpbWl0IG9yICIN CisJCQkJICAgICAgICJyb3V0aW5nIGxvb3AgaW4gdHVubmVsIVxuIiwgDQor CQkJCSAgICAgICB0LT5wYXJtcy5uYW1lKTsNCisJCQlyZWxfbXNnID0gMTsN CisJCX0NCisJCWJyZWFrOw0KKwljYXNlIElDTVBWNl9QQVJBTVBST0I6DQor CQkvKiBpZ25vcmUgaWYgcGFyYW1ldGVyIHByb2JsZW0gbm90IGNhdXNlZCBi eSBhIHR1bm5lbA0KKwkJICAgZW5jYXBzdWxhdGlvbiBsaW1pdCBzdWItb3B0 aW9uICovDQorCQlpZiAoY29kZSAhPSBJQ01QVjZfSERSX0ZJRUxEKSB7DQor CQkJYnJlYWs7DQorCQl9DQorCQl0ZWxpID0gcGFyc2VfdGx2X3RubF9lbmNf bGltKHNrYiwgc2tiLT5kYXRhKTsNCisNCisJCWlmICh0ZWxpICYmIHRlbGkg PT0gaW5mbyAtIDIpIHsNCisJCQl0ZWwgPSAoc3RydWN0IGlwdjZfdGx2X3Ru bF9lbmNfbGltICopICZza2ItPmRhdGFbdGVsaV07DQorCQkJaWYgKHRlbC0+ ZW5jYXBfbGltaXQgPD0gMSkgew0KKwkJCQlpZiAobmV0X3JhdGVsaW1pdCgp KQ0KKwkJCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCQkJICAgICAgICIl czogVG9vIHNtYWxsIGVuY2Fwc3VsYXRpb24gIg0KKwkJCQkJICAgICAgICJs aW1pdCBvciByb3V0aW5nIGxvb3AgaW4gIg0KKwkJCQkJICAgICAgICJ0dW5u ZWwhXG4iLCB0LT5wYXJtcy5uYW1lKTsNCisJCQkJcmVsX21zZyA9IDE7DQor CQkJfQ0KKwkJfQ0KKwkJYnJlYWs7DQorCWNhc2UgSUNNUFY2X1BLVF9UT09C SUc6DQorCQltdHUgPSBpbmZvIC0gb2Zmc2V0Ow0KKwkJaWYgKG10dSA8PSBJ UFY2X01JTl9NVFUpIHsNCisJCQltdHUgPSBJUFY2X01JTl9NVFU7DQorCQl9 DQorCQl0LT5kZXYtPm10dSA9IG10dTsNCisNCisJCWlmICgobGVuID0gc2l6 ZW9mICgqaXB2NmgpICsgaXB2NmgtPnBheWxvYWRfbGVuKSA+IG10dSkgew0K KwkJCXJlbF90eXBlID0gSUNNUFY2X1BLVF9UT09CSUc7DQorCQkJcmVsX2Nv ZGUgPSAwOw0KKwkJCXJlbF9pbmZvID0gbXR1Ow0KKwkJCXJlbF9tc2cgPSAx Ow0KKwkJfQ0KKwkJYnJlYWs7DQorCX0NCisJaWYgKHJlbF9tc2cgJiYgIHBz a2JfbWF5X3B1bGwoc2tiLCBvZmZzZXQgKyBzaXplb2YgKCppcHY2aCkpKSB7 DQorCQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0Ow0KKwkJc3RydWN0IHNrX2J1ZmYg KnNrYjIgPSBza2JfY2xvbmUoc2tiLCBHRlBfQVRPTUlDKTsNCisJCWlmICgh c2tiMikNCisJCQlnb3RvIG91dDsNCisNCisJCWRzdF9yZWxlYXNlKHNrYjIt PmRzdCk7DQorCQlza2IyLT5kc3QgPSBOVUxMOw0KKwkJc2tiX3B1bGwoc2ti Miwgb2Zmc2V0KTsNCisJCXNrYjItPm5oLnJhdyA9IHNrYjItPmRhdGE7DQor DQorCQkvKiBUcnkgdG8gZ3Vlc3MgaW5jb21pbmcgaW50ZXJmYWNlICovDQor CQlydCA9IHJ0Nl9sb29rdXAoJnNrYjItPm5oLmlwdjZoLT5zYWRkciwgTlVM TCwgMCwgMCk7DQorDQorCQlpZiAocnQgJiYgcnQtPnJ0NmlfZGV2KQ0KKwkJ CXNrYjItPmRldiA9IHJ0LT5ydDZpX2RldjsNCisNCisJCWljbXB2Nl9zZW5k KHNrYjIsIHJlbF90eXBlLCByZWxfY29kZSwgcmVsX2luZm8sIHNrYjItPmRl dik7DQorDQorCQlpZiAocnQpDQorCQkJZHN0X2ZyZWUoJnJ0LT51LmRzdCk7 DQorDQorCQlrZnJlZV9za2Ioc2tiMik7DQorCX0NCitvdXQ6DQorCXJlYWRf dW5sb2NrKCZpcDZpcDZfbG9jayk7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlw Nl9yY3YgLSBkZWNhcHN1bGF0ZSBJUHY2IHBhY2tldCBhbmQgcmV0cmFuc21p dCBpdCBsb2NhbGx5DQorICogICBAc2tiOiByZWNlaXZlZCBzb2NrZXQgYnVm ZmVyDQorICoNCisgKiBSZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlw Nl9yY3Yoc3RydWN0IHNrX2J1ZmYgKipwc2tiLCB1bnNpZ25lZCBpbnQgKm5o b2ZmcCkNCit7DQorCXN0cnVjdCBza19idWZmICpza2IgPSAqcHNrYjsNCisJ c3RydWN0IGlwdjZoZHIgKmlwdjZoOw0KKwlzdHJ1Y3QgaXA2X3RubCAqdDsN CisNCisJaWYgKCFwc2tiX21heV9wdWxsKHNrYiwgc2l6ZW9mICgqaXB2Nmgp KSkNCisJCWdvdG8gZGlzY2FyZDsNCisNCisJaXB2NmggPSBza2ItPm5oLmlw djZoOw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisNCisJaWYg KCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5zYWRkciwgJmlwdjZo LT5kYWRkcikpICE9IE5VTEwpIHsNCisJCWlmICghKHQtPnBhcm1zLmZsYWdz ICYgSVA2X1ROTF9GX0NBUF9SQ1YpKSB7DQorCQkJdC0+c3RhdC5yeF9kcm9w cGVkKys7DQorCQkJcmVhZF91bmxvY2soJmlwNmlwNl9sb2NrKTsNCisJCQln b3RvIGRpc2NhcmQ7DQorCQl9DQorCQlza2ItPm1hYy5yYXcgPSBza2ItPm5o LnJhdzsNCisJCXNrYi0+bmgucmF3ID0gc2tiLT5kYXRhOw0KKwkJc2tiLT5w cm90b2NvbCA9IGh0b25zKEVUSF9QX0lQVjYpOw0KKwkJc2tiLT5wa3RfdHlw ZSA9IFBBQ0tFVF9IT1NUOw0KKwkJbWVtc2V0KHNrYi0+Y2IsIDAsIHNpemVv ZihzdHJ1Y3QgaW5ldDZfc2tiX3Bhcm0pKTsNCisJCXNrYi0+ZGV2ID0gdC0+ ZGV2Ow0KKwkJZHN0X3JlbGVhc2Uoc2tiLT5kc3QpOw0KKwkJc2tiLT5kc3Qg PSBOVUxMOw0KKwkJdC0+c3RhdC5yeF9wYWNrZXRzKys7DQorCQl0LT5zdGF0 LnJ4X2J5dGVzICs9IHNrYi0+bGVuOw0KKwkJbmV0aWZfcngoc2tiKTsNCisJ CXJlYWRfdW5sb2NrKCZpcDZpcDZfbG9jayk7DQorCQlyZXR1cm4gMDsNCisJ fQ0KKwlyZWFkX3VubG9jaygmaXA2aXA2X2xvY2spOw0KKwlpY21wdjZfc2Vu ZChza2IsIElDTVBWNl9ERVNUX1VOUkVBQ0gsIElDTVBWNl9BRERSX1VOUkVB Q0gsIDAsIHNrYi0+ZGV2KTsNCitkaXNjYXJkOg0KKwlrZnJlZV9za2Ioc2ti KTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIHR4b3B0X2xlbiAt IGdldCBuZWNlc3Nhcnkgc2l6ZSBmb3IgbmV3ICZzdHJ1Y3QgaXB2Nl90eG9w dGlvbnMNCisgKiAgIEBvcmlnX29wdDogb2xkIG9wdGlvbnMNCisgKg0KKyAq IFJldHVybjoNCisgKiAgIFNpemUgb2Ygb2xkIG9uZSBwbHVzIHNpemUgb2Yg dHVubmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICoqLw0KKw0K K3N0YXRpYyBpbmxpbmUgaW50DQordHhvcHRfbGVuKHN0cnVjdCBpcHY2X3R4 b3B0aW9ucyAqb3JpZ19vcHQpDQorew0KKwlpbnQgbGVuID0gc2l6ZW9mICgq b3JpZ19vcHQpICsgODsNCisNCisJaWYgKG9yaWdfb3B0ICYmIG9yaWdfb3B0 LT5kc3Qwb3B0KQ0KKwkJbGVuICs9IGlwdjZfb3B0bGVuKG9yaWdfb3B0LT5k c3Qwb3B0KTsNCisJcmV0dXJuIGxlbjsNCit9DQorDQorLyoqDQorICogbWVy Z2Vfb3B0aW9ucyAtIGFkZCBlbmNhcHN1bGF0aW9uIGxpbWl0IHRvIG9yaWdp bmFsIG9wdGlvbnMNCisgKiAgIEBlbmNhcF9saW1pdDogbnVtYmVyIG9mIGFs bG93ZWQgZW5jYXBzdWxhdGlvbiBsaW1pdHMNCisgKiAgIEBvcmlnX29wdDog b3JpZ2luYWwgb3B0aW9ucw0KKyAqIA0KKyAqIFJldHVybjoNCisgKiAgIFBv aW50ZXIgdG8gbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgY29udGFpbmlu ZyB0aGUgdHVubmVsDQorICogICBlbmNhcHN1bGF0aW9uIGxpbWl0DQorICoq Lw0KKw0KK3N0YXRpYyBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKg0KK21lcmdl X29wdGlvbnMoc3RydWN0IHNvY2sgKnNrLCBfX3U4IGVuY2FwX2xpbWl0LA0K KwkgICAgICBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9yaWdfb3B0KQ0KK3sN CisJc3RydWN0IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCXN0cnVj dCBpcHY2X3R4b3B0aW9ucyAqb3B0Ow0KKwlfX3U4ICpyYXc7DQorCV9fdTgg cGFkX3RvID0gODsNCisJaW50IG9wdF9sZW4gPSB0eG9wdF9sZW4ob3JpZ19v cHQpOw0KKw0KKwlpZiAoIShvcHQgPSBzb2NrX2ttYWxsb2Moc2ssIG9wdF9s ZW4sIEdGUF9BVE9NSUMpKSkgew0KKwkJcmV0dXJuIE5VTEw7DQorCX0NCisN CisJbWVtc2V0KG9wdCwgMCwgb3B0X2xlbik7DQorCW9wdC0+dG90X2xlbiA9 IG9wdF9sZW47DQorCW9wdC0+ZHN0MG9wdCA9IChzdHJ1Y3QgaXB2Nl9vcHRf aGRyICopIChvcHQgKyAxKTsNCisJb3B0LT5vcHRfbmZsZW4gPSA4Ow0KKw0K KwlyYXcgPSAoX191OCAqKSBvcHQtPmRzdDBvcHQ7DQorDQorCXRlbCA9IChz dHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0gKikgKG9wdC0+ZHN0MG9wdCAr IDEpOw0KKwl0ZWwtPnR5cGUgPSBJUFY2X1RMVl9UTkxfRU5DQVBfTElNSVQ7 DQorCXRlbC0+bGVuZ3RoID0gMTsNCisJdGVsLT5lbmNhcF9saW1pdCA9IGVu Y2FwX2xpbWl0Ow0KKw0KKwlpZiAob3JpZ19vcHQpIHsNCisJCV9fdTggKm9y aWdfcmF3Ow0KKw0KKwkJb3B0LT5ob3BvcHQgPSBvcmlnX29wdC0+aG9wb3B0 Ow0KKw0KKwkJLyogS2VlcCB0aGUgb3JpZ2luYWwgZGVzdGluYXRpb24gb3B0 aW9ucyBwcm9wZXJseQ0KKwkJICAgYWxpZ25lZCBhbmQgbWVyZ2UgcG9zc2li bGUgb2xkIHBhZGRpbmdzIHRvIHRoZQ0KKwkJICAgbmV3IHBhZGRpbmcgb3B0 aW9uICovDQorCQlpZiAoKG9yaWdfcmF3ID0gKF9fdTggKikgb3JpZ19vcHQt PmRzdDBvcHQpICE9IE5VTEwpIHsNCisJCQlfX3U4IHR5cGU7DQorCQkJaW50 IGkgPSBzaXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXBhZF90 byArPSBzaXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXdoaWxl IChpIDwgaXB2Nl9vcHRsZW4ob3JpZ19vcHQtPmRzdDBvcHQpKSB7DQorCQkJ CXR5cGUgPSBvcmlnX3Jhd1tpKytdOw0KKwkJCQlpZiAodHlwZSA9PSBJUFY2 X1RMVl9QQUQwKQ0KKwkJCQkJcGFkX3RvKys7DQorCQkJCWVsc2UgaWYgKHR5 cGUgPT0gSVBWNl9UTFZfUEFETikgew0KKwkJCQkJaW50IGxlbiA9IG9yaWdf cmF3W2krK107DQorCQkJCQlpICs9IGxlbjsNCisJCQkJCXBhZF90byArPSBs ZW4gKyAyOw0KKwkJCQl9IGVsc2Ugew0KKwkJCQkJYnJlYWs7DQorCQkJCX0N CisJCQl9DQorCQkJb3B0LT5kc3Qwb3B0LT5oZHJsZW4gPSBvcmlnX29wdC0+ ZHN0MG9wdC0+aGRybGVuICsgMTsNCisJCQltZW1jcHkocmF3ICsgcGFkX3Rv LCBvcmlnX3JhdyArIHBhZF90byAtIDgsDQorCQkJICAgICAgIG9wdF9sZW4g LSBzaXplb2YgKCpvcHQpIC0gcGFkX3RvKTsNCisJCX0NCisJCW9wdC0+c3Jj cnQgPSBvcmlnX29wdC0+c3JjcnQ7DQorCQlvcHQtPm9wdF9uZmxlbiArPSBv cmlnX29wdC0+b3B0X25mbGVuOw0KKw0KKwkJb3B0LT5kc3Qxb3B0ID0gb3Jp Z19vcHQtPmRzdDFvcHQ7DQorCQlvcHQtPmF1dGggPSBvcmlnX29wdC0+YXV0 aDsNCisJCW9wdC0+b3B0X2ZsZW4gPSBvcmlnX29wdC0+b3B0X2ZsZW47DQor CX0NCisJcmF3WzVdID0gSVBWNl9UTFZfUEFETjsNCisNCisJLyogc3VidHJh Y3QgbGVuZ3RocyBvZiBkZXN0aW5hdGlvbiBzdWJvcHRpb24gaGVhZGVyLA0K KwkgICB0dW5uZWwgZW5jYXBzdWxhdGlvbiBsaW1pdCBhbmQgcGFkIE4gaGVh ZGVyICovDQorCXJhd1s2XSA9IHBhZF90byAtIDc7DQorDQorCXJldHVybiBv cHQ7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfYWRkcl9jb25mbGlj dCAtIGNvbXBhcmUgcGFja2V0IGFkZHJlc3NlcyB0byB0dW5uZWwncyBvd24N CisgKiAgIEB0OiB0aGUgb3V0Z29pbmcgdHVubmVsIGRldmljZQ0KKyAqICAg QGhkcjogSVB2NiBoZWFkZXIgZnJvbSB0aGUgaW5jb21pbmcgcGFja2V0IA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBBdm9pZCB0cml2aWFsIHR1 bm5lbGluZyBsb29wIGJ5IGNoZWNraW5nIHRoYXQgdHVubmVsIGV4aXQtcG9p bnQgDQorICogICBkb2Vzbid0IG1hdGNoIHNvdXJjZSBvZiBpbmNvbWluZyBw YWNrZXQuDQorICoNCisgKiBSZXR1cm46IA0KKyAqICAgMSBpZiBjb25mbGlj dCwNCisgKiAgIDAgZWxzZQ0KKyAqKi8NCisNCitzdGF0aWMgaW5saW5lIGlu dA0KK2lwNmlwNl90bmxfYWRkcl9jb25mbGljdChzdHJ1Y3QgaXA2X3RubCAq dCwgc3RydWN0IGlwdjZoZHIgKmhkcikNCit7DQorCXJldHVybiAhaXB2Nl9h ZGRyX2NtcCgmdC0+cGFybXMucmFkZHIsICZoZHItPnNhZGRyKTsNCit9DQor DQorLyoqDQorICogaXA2aXA2X3RubF94bWl0IC0gZW5jYXBzdWxhdGUgcGFj a2V0IGFuZCBzZW5kIA0KKyAqICAgQHNrYjogdGhlIG91dGdvaW5nIHNvY2tl dCBidWZmZXINCisgKiAgIEBkZXY6IHRoZSBvdXRnb2luZyB0dW5uZWwgZGV2 aWNlIA0KKyAqDQorICogRGVzY3JpcHRpb246DQorICogICBCdWlsZCBuZXcg aGVhZGVyIGFuZCBkbyBzb21lIHNhbml0eSBjaGVja3Mgb24gdGhlIHBhY2tl dCBiZWZvcmUgc2VuZGluZw0KKyAqICAgaXQgdG8gaXA2X2J1aWxkX3htaXQo KS4NCisgKg0KKyAqIFJldHVybjogDQorICogICAwDQorICoqLw0KKw0KK2lu dCBpcDZpcDZfdG5sX3htaXQoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3RydWN0 IG5ldF9kZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0g KHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJc3RydWN0IG5ldF9k ZXZpY2Vfc3RhdHMgKnN0YXRzID0gJnQtPnN0YXQ7DQorCXN0cnVjdCBpcHY2 aGRyICppcHY2aCA9IHNrYi0+bmguaXB2Nmg7DQorCXN0cnVjdCBpcHY2X3R4 b3B0aW9ucyAqb3JpZ19vcHQgPSBOVUxMOw0KKwlzdHJ1Y3QgaXB2Nl90eG9w dGlvbnMgKm9wdCA9IE5VTEw7DQorCV9fdTggZW5jYXBfbGltaXQgPSAwOw0K KwlfX3UxNiBvZmZzZXQ7DQorCXN0cnVjdCBmbG93aSBmbDsNCisJc3RydWN0 IGlwNl9mbG93bGFiZWwgKmZsX2xibCA9IE5VTEw7DQorCWludCBlcnIgPSAw Ow0KKwlzdHJ1Y3QgZHN0X2VudHJ5ICpkc3Q7DQorCWludCBsaW5rX2ZhaWx1 cmUgPSAwOw0KKwlzdHJ1Y3Qgc29jayAqc2sgPSBpcDZfc29ja2V0LT5zazsN CisJc3RydWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOw0KKwlp bnQgbXR1Ow0KKw0KKwlpZiAodC0+cmVjdXJzaW9uKyspIHsNCisJCXN0YXRz LT5jb2xsaXNpb25zKys7DQorCQlnb3RvIHR4X2VycjsNCisJfQ0KKwlpZiAo c2tiLT5wcm90b2NvbCAhPSBodG9ucyhFVEhfUF9JUFY2KSB8fA0KKwkgICAg ISh0LT5wYXJtcy5mbGFncyAmIElQNl9UTkxfRl9DQVBfWE1JVCkgfHwNCisJ ICAgIGlwNmlwNl90bmxfYWRkcl9jb25mbGljdCh0LCBpcHY2aCkpIHsNCisJ CWdvdG8gdHhfZXJyOw0KKwl9DQorCWlmICgob2Zmc2V0ID0gcGFyc2VfdGx2 X3RubF9lbmNfbGltKHNrYiwgc2tiLT5uaC5yYXcpKSA+IDApIHsNCisJCXN0 cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKwkJdGVsID0gKHN0 cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSAqKSAmc2tiLT5uaC5yYXdbb2Zm c2V0XTsNCisJCWlmICh0ZWwtPmVuY2FwX2xpbWl0IDw9IDEpIHsNCisJCQlp Y21wdjZfc2VuZChza2IsIElDTVBWNl9QQVJBTVBST0IsDQorCQkJCSAgICBJ Q01QVjZfSERSX0ZJRUxELCBvZmZzZXQgKyAyLCBza2ItPmRldik7DQorCQkJ Z290byB0eF9lcnI7DQorCQl9DQorCQllbmNhcF9saW1pdCA9IHRlbC0+ZW5j YXBfbGltaXQgLSAxOw0KKwl9IGVsc2UgaWYgKCEodC0+cGFybXMuZmxhZ3Mg JiBJUDZfVE5MX0ZfSUdOX0VOQ0FQX0xJTUlUKSkgew0KKwkJZW5jYXBfbGlt aXQgPSB0LT5wYXJtcy5lbmNhcF9saW1pdDsNCisJfQ0KKwlpcDZfeG1pdF9s b2NrKCk7DQorDQorCW1lbWNweSgmZmwsICZ0LT5mbCwgc2l6ZW9mIChmbCkp Ow0KKw0KKwlpZiAoKHQtPnBhcm1zLmZsYWdzICYgSVA2X1ROTF9GX1VTRV9P UklHX1RDTEFTUykpDQorCQlmbC5mbDZfZmxvd2xhYmVsIHw9ICgqKF9fdTMy ICopIGlwdjZoICYgSVBWNl9UQ0xBU1NfTUFTSyk7DQorCWlmICgodC0+cGFy bXMuZmxhZ3MgJiBJUDZfVE5MX0ZfVVNFX09SSUdfRkxPV0xBQkVMKSkNCisJ CWZsLmZsNl9mbG93bGFiZWwgfD0gKCooX191MzIgKikgaXB2NmggJiBJUFY2 X0ZMT1dMQUJFTF9NQVNLKTsNCisNCisJaWYgKGZsLmZsNl9mbG93bGFiZWwp IHsNCisJCWZsX2xibCA9IGZsNl9zb2NrX2xvb2t1cChzaywgZmwuZmw2X2Zs b3dsYWJlbCk7DQorCQlpZiAoZmxfbGJsKQ0KKwkJCW9yaWdfb3B0ID0gZmxf bGJsLT5vcHQ7DQorCX0NCisJaWYgKGVuY2FwX2xpbWl0ID4gMCkgew0KKwkJ aWYgKCEob3B0ID0gbWVyZ2Vfb3B0aW9ucyhzaywgZW5jYXBfbGltaXQsIG9y aWdfb3B0KSkpIHsNCisJCQlnb3RvIHR4X2Vycl9mcmVlX2ZsX2xibDsNCisJ CX0NCisJfSBlbHNlIHsNCisJCW9wdCA9IG9yaWdfb3B0Ow0KKwl9DQorCWRz dCA9IF9fc2tfZHN0X2NoZWNrKHNrLCBucC0+ZHN0X2Nvb2tpZSk7DQorDQor CWlmIChkc3QpIHsNCisJCWlmIChucC0+ZGFkZHJfY2FjaGUgPT0gTlVMTCB8 fA0KKwkJICAgIGlwdjZfYWRkcl9jbXAoJmZsLmZsNl9kc3QsIG5wLT5kYWRk cl9jYWNoZSkgfHwNCisJCSAgICAoZmwub2lmICYmIGZsLm9pZiAhPSBkc3Qt PmRldi0+aWZpbmRleCkpIHsNCisJCQlkc3QgPSBOVUxMOw0KKwkJfQ0KKwl9 DQorCWlmIChkc3QgPT0gTlVMTCkgew0KKwkJZHN0ID0gaXA2X3JvdXRlX291 dHB1dChzaywgJmZsKTsNCisJCWlmIChkc3QtPmVycm9yKSB7DQorCQkJc3Rh dHMtPnR4X2NhcnJpZXJfZXJyb3JzKys7DQorCQkJbGlua19mYWlsdXJlID0g MTsNCisJCQlnb3RvIHR4X2Vycl9kc3RfcmVsZWFzZTsNCisJCX0NCisJCS8q IGxvY2FsIHJvdXRpbmcgbG9vcCAqLw0KKwkJaWYgKGRzdC0+ZGV2ID09IGRl dikgew0KKwkJCXN0YXRzLT5jb2xsaXNpb25zKys7DQorCQkJaWYgKG5ldF9y YXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORyANCisJCQkJ ICAgICAgICIlczogTG9jYWwgcm91dGluZyBsb29wIGRldGVjdGVkIVxuIiwN CisJCQkJICAgICAgIHQtPnBhcm1zLm5hbWUpOw0KKwkJCWdvdG8gdHhfZXJy X2RzdF9yZWxlYXNlOw0KKwkJfQ0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5k YWRkciwgJmZsLmZsNl9kc3QpOw0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5z YWRkciwgJmZsLmZsNl9zcmMpOw0KKwl9DQorCW10dSA9IGRzdF9wbXR1KGRz dCkgLSBzaXplb2YgKCppcHY2aCk7DQorCWlmIChvcHQpIHsNCisJCW10dSAt PSAob3B0LT5vcHRfbmZsZW4gKyBvcHQtPm9wdF9mbGVuKTsNCisJfQ0KKwlp ZiAobXR1IDwgSVBWNl9NSU5fTVRVKQ0KKwkJbXR1ID0gSVBWNl9NSU5fTVRV Ow0KKwlpZiAoc2tiLT5kc3QgJiYgbXR1IDwgZHN0X3BtdHUoc2tiLT5kc3Qp KSB7DQorCQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gKHN0cnVjdCBydDZfaW5m byAqKSBza2ItPmRzdDsNCisJCXJ0LT5ydDZpX2ZsYWdzIHw9IFJURl9NT0RJ RklFRDsNCisJCXJ0LT51LmRzdC5tZXRyaWNzW1JUQVhfTVRVLTFdID0gbXR1 Ow0KKwl9DQorCWlmIChza2ItPmxlbiA+IG10dSkgew0KKwkJaWNtcHY2X3Nl bmQoc2tiLCBJQ01QVjZfUEtUX1RPT0JJRywgMCwgbXR1LCBkZXYpOw0KKwkJ Z290byB0eF9lcnJfb3B0X3JlbGVhc2U7DQorCX0NCisJZXJyID0gaXA2X2Fw cGVuZF9kYXRhKHNrLCBpcF9nZW5lcmljX2dldGZyYWcsIHNrYi0+bmgucmF3 LCBza2ItPmxlbiwgMCwNCisJCQkgICAgICB0LT5wYXJtcy5ob3BfbGltaXQs IG9wdCwgJmZsLCANCisJCQkgICAgICAoc3RydWN0IHJ0Nl9pbmZvICopZHN0 LCBNU0dfRE9OVFdBSVQpOw0KKw0KKwlpZiAoZXJyKSB7DQorCQlpcDZfZmx1 c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwl9IGVsc2Ugew0KKwkJZXJyID0g aXA2X3B1c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwkJZXJyID0gKGVyciA8 IDAgPyBlcnIgOiAwKTsNCisJfQ0KKwlpZiAoIWVycikgew0KKwkJc3RhdHMt PnR4X2J5dGVzICs9IHNrYi0+bGVuOw0KKwkJc3RhdHMtPnR4X3BhY2tldHMr KzsNCisJfSBlbHNlIHsNCisJCXN0YXRzLT50eF9lcnJvcnMrKzsNCisJCXN0 YXRzLT50eF9hYm9ydGVkX2Vycm9ycysrOw0KKwl9DQorCWlmIChvcHQgJiYg b3B0ICE9IG9yaWdfb3B0KQ0KKwkJc29ja19rZnJlZV9zKHNrLCBvcHQsIG9w dC0+dG90X2xlbik7DQorDQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsN CisJaXA2X2RzdF9zdG9yZShzaywgZHN0LCAmbnAtPmRhZGRyKTsNCisJaXA2 X3htaXRfdW5sb2NrKCk7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1 cnNpb24tLTsNCisJcmV0dXJuIDA7DQordHhfZXJyX2RzdF9yZWxlYXNlOg0K Kwlkc3RfcmVsZWFzZShkc3QpOw0KK3R4X2Vycl9vcHRfcmVsZWFzZToNCisJ aWYgKG9wdCAmJiBvcHQgIT0gb3JpZ19vcHQpDQorCQlzb2NrX2tmcmVlX3Mo c2ssIG9wdCwgb3B0LT50b3RfbGVuKTsNCit0eF9lcnJfZnJlZV9mbF9sYmw6 DQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2X3htaXRfdW5s b2NrKCk7DQorCWlmIChsaW5rX2ZhaWx1cmUpDQorCQlkc3RfbGlua19mYWls dXJlKHNrYik7DQordHhfZXJyOg0KKwlzdGF0cy0+dHhfZXJyb3JzKys7DQor CXN0YXRzLT50eF9kcm9wcGVkKys7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0 LT5yZWN1cnNpb24tLTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyB2 b2lkIGlwNl90bmxfc2V0X2NhcChzdHJ1Y3QgaXA2X3RubCAqdCkNCit7DQor CXN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAgPSAmdC0+cGFybXM7DQorCXN0cnVj dCBpbjZfYWRkciAqbGFkZHIgPSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpbjZf YWRkciAqcmFkZHIgPSAmcC0+cmFkZHI7DQorCWludCBsdHlwZSA9IGlwdjZf YWRkcl90eXBlKGxhZGRyKTsNCisJaW50IHJ0eXBlID0gaXB2Nl9hZGRyX3R5 cGUocmFkZHIpOw0KKw0KKwlwLT5mbGFncyAmPSB+KElQNl9UTkxfRl9DQVBf WE1JVHxJUDZfVE5MX0ZfQ0FQX1JDVik7DQorDQorCWlmIChsdHlwZSAhPSBJ UFY2X0FERFJfQU5ZICYmIHJ0eXBlICE9IElQVjZfQUREUl9BTlkgJiYNCisJ ICAgICgobHR5cGV8cnR5cGUpICYNCisJICAgICAoSVBWNl9BRERSX1VOSUNB U1R8DQorCSAgICAgIElQVjZfQUREUl9MT09QQkFDS3xJUFY2X0FERFJfTElO S0xPQ0FMfA0KKwkgICAgICBJUFY2X0FERFJfTUFQUEVEfElQVjZfQUREUl9S RVNFUlZFRCkpID09IElQVjZfQUREUl9VTklDQVNUKSB7DQorCQlzdHJ1Y3Qg bmV0X2RldmljZSAqbGRldiA9IE5VTEw7DQorCQlpbnQgbF9vayA9IDE7DQor CQlpbnQgcl9vayA9IDE7DQorDQorCQlpZiAocC0+bGluaykNCisJCQlsZGV2 ID0gZGV2X2dldF9ieV9pbmRleChwLT5saW5rKTsNCisJCQ0KKwkJaWYgKChs dHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkgJiYgIWlwdjZfY2hrX2FkZHIobGFk ZHIsIGxkZXYpKQ0KKwkJCWxfb2sgPSAwOw0KKwkJDQorCQlpZiAoKHJ0eXBl JklQVjZfQUREUl9VTklDQVNUKSAmJiBpcHY2X2Noa19hZGRyKHJhZGRyLCBO VUxMKSkNCisJCQlyX29rID0gMDsNCisJCQ0KKwkJaWYgKGxfb2sgJiYgcl9v aykgew0KKwkJCWlmIChsdHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJ cC0+ZmxhZ3MgfD0gSVA2X1ROTF9GX0NBUF9YTUlUOw0KKwkJCWlmIChydHlw ZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+ZmxhZ3MgfD0gSVA2X1RO TF9GX0NBUF9SQ1Y7DQorCQl9DQorCQlpZiAobGRldikNCisJCQlkZXZfcHV0 KGxkZXYpOw0KKwl9DQorfQ0KKw0KKw0KK3N0YXRpYyB2b2lkIGlwNmlwNl90 bmxfbGlua19jb25maWcoc3RydWN0IGlwNl90bmwgKnQpDQorew0KKwlzdHJ1 Y3QgbmV0X2RldmljZSAqZGV2ID0gdC0+ZGV2Ow0KKwlzdHJ1Y3QgaXA2X3Ru bF9wYXJtICpwID0gJnQtPnBhcm1zOw0KKwlzdHJ1Y3QgZmxvd2kgKmZsOw0K KwkvKiBTZXQgdXAgZmxvd2kgdGVtcGxhdGUgKi8NCisJZmwgPSAmdC0+Zmw7 DQorCWlwdjZfYWRkcl9jb3B5KCZmbC0+Zmw2X3NyYywgJnAtPmxhZGRyKTsN CisJaXB2Nl9hZGRyX2NvcHkoJmZsLT5mbDZfZHN0LCAmcC0+cmFkZHIpOw0K KwlmbC0+b2lmID0gcC0+bGluazsNCisJZmwtPmZsNl9mbG93bGFiZWwgPSAw Ow0KKw0KKwlpZiAoIShwLT5mbGFncyZJUDZfVE5MX0ZfVVNFX09SSUdfVENM QVNTKSkNCisJCWZsLT5mbDZfZmxvd2xhYmVsIHw9IElQVjZfVENMQVNTX01B U0sgJiBodG9ubChwLT5mbG93aW5mbyk7DQorCWlmICghKHAtPmZsYWdzJklQ Nl9UTkxfRl9VU0VfT1JJR19GTE9XTEFCRUwpKQ0KKwkJZmwtPmZsNl9mbG93 bGFiZWwgfD0gSVBWNl9GTE9XTEFCRUxfTUFTSyAmIGh0b25sKHAtPmZsb3dp bmZvKTsNCisNCisJaXA2X3RubF9zZXRfY2FwKHQpOw0KKw0KKwlpZiAocC0+ ZmxhZ3MmSVA2X1ROTF9GX0NBUF9YTUlUICYmIHAtPmZsYWdzJklQNl9UTkxf Rl9DQVBfUkNWKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7 DQorCWVsc2UNCisJCWRldi0+ZmxhZ3MgJj0gfklGRl9QT0lOVE9QT0lOVDsN CisNCisJaWYgKHAtPmZsYWdzICYgSVA2X1ROTF9GX0NBUF9YTUlUKSB7DQor CQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gcnQ2X2xvb2t1cCgmcC0+cmFkZHIs ICZwLT5sYWRkciwNCisJCQkJCQkgcC0+bGluaywgMCk7DQorCQlpZiAocnQp IHsNCisJCQlzdHJ1Y3QgbmV0X2RldmljZSAqcnRkZXY7DQorCQkJaWYgKCEo cnRkZXYgPSBydC0+cnQ2aV9kZXYpIHx8DQorCQkJICAgIHJ0ZGV2LT50eXBl ID09IEFSUEhSRF9UVU5ORUw2KSB7DQorCQkJCS8qIGFzIGxvbmcgYXMgdHVu bmVscyB1c2UgdGhlIHNhbWUgc29ja2V0IA0KKwkJCQkgICBmb3IgdHJhbnNt aXNzaW9uLCBsb2NhbGx5IG5lc3RlZCB0dW5uZWxzIA0KKwkJCQkgICB3b24n dCB3b3JrICovDQorCQkJCWRzdF9yZWxlYXNlKCZydC0+dS5kc3QpOw0KKwkJ CQlnb3RvIG5vX2xpbms7DQorCQkJfSBlbHNlIHsNCisJCQkJZGV2LT5pZmxp bmsgPSBydGRldi0+aWZpbmRleDsNCisJCQkJZGV2LT5oYXJkX2hlYWRlcl9s ZW4gPSBydGRldi0+aGFyZF9oZWFkZXJfbGVuICsNCisJCQkJCXNpemVvZiAo c3RydWN0IGlwdjZoZHIpOw0KKwkJCQlkZXYtPm10dSA9IHJ0ZGV2LT5tdHUg LSBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCQkJaWYgKGRldi0+bXR1 IDwgSVBWNl9NSU5fTVRVKQ0KKwkJCQkJZGV2LT5tdHUgPSBJUFY2X01JTl9N VFU7DQorCQkJCQ0KKwkJCQlkc3RfcmVsZWFzZSgmcnQtPnUuZHN0KTsNCisJ CQl9DQorCQl9DQorCX0gZWxzZSB7DQorCW5vX2xpbms6DQorCQlkZXYtPmlm bGluayA9IDA7DQorCQlkZXYtPmhhcmRfaGVhZGVyX2xlbiA9IExMX01BWF9I RUFERVIgKyBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCWRldi0+bXR1 ID0gRVRIX0RBVEFfTEVOIC0gc2l6ZW9mIChzdHJ1Y3QgaXB2Nmhkcik7DQor CX0NCit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9jaGFuZ2UgLSB1cGRh dGUgdGhlIHR1bm5lbCBwYXJhbWV0ZXJzDQorICogICBAdDogdHVubmVsIHRv IGJlIGNoYW5nZWQNCisgKiAgIEBwOiB0dW5uZWwgY29uZmlndXJhdGlvbiBw YXJhbWV0ZXJzDQorICogICBAYWN0aXZlOiAhPSAwIGlmIHR1bm5lbCBpcyBy ZWFkeSBmb3IgdXNlDQorICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIGlw NmlwNl90bmxfY2hhbmdlKCkgdXBkYXRlcyB0aGUgdHVubmVsIHBhcmFtZXRl cnMNCisgKiovDQorDQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfY2hhbmdl KHN0cnVjdCBpcDZfdG5sICp0LCBzdHJ1Y3QgaXA2X3RubF9wYXJtICpwKQ0K K3sNCisJaXB2Nl9hZGRyX2NvcHkoJnQtPnBhcm1zLmxhZGRyLCAmcC0+bGFk ZHIpOw0KKwlpcHY2X2FkZHJfY29weSgmdC0+cGFybXMucmFkZHIsICZwLT5y YWRkcik7DQorCXQtPnBhcm1zLmZsYWdzID0gcC0+ZmxhZ3M7DQorCXQtPnBh cm1zLmhvcF9saW1pdCA9IChwLT5ob3BfbGltaXQgPD0gMjU1ID8gcC0+aG9w X2xpbWl0IDogLTEpOw0KKwl0LT5wYXJtcy5lbmNhcF9saW1pdCA9IHAtPmVu Y2FwX2xpbWl0Ow0KKwl0LT5wYXJtcy5mbG93aW5mbyA9IHAtPmZsb3dpbmZv Ow0KKwlpcDZpcDZfdG5sX2xpbmtfY29uZmlnKHQpOw0KKwlyZXR1cm4gMDsN Cit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9pb2N0bCAtIGNvbmZpZ3Vy ZSBpcHY2IHR1bm5lbHMgZnJvbSB1c2Vyc3BhY2UgDQorICogICBAZGV2OiB2 aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICogICBA aWZyOiBwYXJhbWV0ZXJzIHBhc3NlZCBmcm9tIHVzZXJzcGFjZQ0KKyAqICAg QGNtZDogY29tbWFuZCB0byBiZSBwZXJmb3JtZWQNCisgKg0KKyAqIERlc2Ny aXB0aW9uOg0KKyAqICAgaXA2aXA2X3RubF9pb2N0bCgpIGlzIHVzZWQgZm9y IG1hbmFnaW5nIElQdjYgdHVubmVscyANCisgKiAgIGZyb20gdXNlcnNwYWNl LiANCisgKg0KKyAqICAgVGhlIHBvc3NpYmxlIGNvbW1hbmRzIGFyZSB0aGUg Zm9sbG93aW5nOg0KKyAqICAgICAlU0lPQ0dFVFRVTk5FTDogZ2V0IHR1bm5l bCBwYXJhbWV0ZXJzIGZvciBkZXZpY2UNCisgKiAgICAgJVNJT0NBRERUVU5O RUw6IGFkZCB0dW5uZWwgbWF0Y2hpbmcgZ2l2ZW4gdHVubmVsIHBhcmFtZXRl cnMNCisgKiAgICAgJVNJT0NDSEdUVU5ORUw6IGNoYW5nZSB0dW5uZWwgcGFy YW1ldGVycyB0byB0aG9zZSBnaXZlbg0KKyAqICAgICAlU0lPQ0RFTFRVTk5F TDogZGVsZXRlIHR1bm5lbA0KKyAqDQorICogICBUaGUgZmFsbGJhY2sgZGV2 aWNlICJpcDZ0bmwwIiwgY3JlYXRlZCBkdXJpbmcgbW9kdWxlIA0KKyAqICAg aW5pdGlhbGl6YXRpb24sIGNhbiBiZSB1c2VkIGZvciBjcmVhdGluZyBvdGhl ciB0dW5uZWwgZGV2aWNlcy4NCisgKg0KKyAqIFJldHVybjoNCisgKiAgIDAg b24gc3VjY2VzcywNCisgKiAgICUtRUZBVUxUIGlmIHVuYWJsZSB0byBjb3B5 IGRhdGEgdG8gb3IgZnJvbSB1c2Vyc3BhY2UsDQorICogICAlLUVQRVJNIGlm IGN1cnJlbnQgcHJvY2VzcyBoYXNuJ3QgJUNBUF9ORVRfQURNSU4gc2V0DQor ICogICAlLUVJTlZBTCBpZiBwYXNzZWQgdHVubmVsIHBhcmFtZXRlcnMgYXJl IGludmFsaWQsDQorICogICAlLUVFWElTVCBpZiBjaGFuZ2luZyBhIHR1bm5l bCdzIHBhcmFtZXRlcnMgd291bGQgY2F1c2UgYSBjb25mbGljdA0KKyAqICAg JS1FTk9ERVYgaWYgYXR0ZW1wdGluZyB0byBjaGFuZ2Ugb3IgZGVsZXRlIGEg bm9uZXhpc3RpbmcgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitp cDZpcDZfdG5sX2lvY3RsKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYsIHN0cnVj dCBpZnJlcSAqaWZyLCBpbnQgY21kKQ0KK3sNCisJaW50IGVyciA9IDA7DQor CWludCBjcmVhdGU7DQorCXN0cnVjdCBpcDZfdG5sX3Bhcm0gcDsNCisJc3Ry dWN0IGlwNl90bmwgKnQgPSBOVUxMOw0KKw0KKwlzd2l0Y2ggKGNtZCkgew0K KwljYXNlIFNJT0NHRVRUVU5ORUw6DQorCQlpZiAoZGV2ID09ICZpcDZpcDZf ZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21fdXNlcigmcCwNCisJ CQkJCSAgIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLA0KKwkJCQkJICAgc2l6 ZW9mIChwKSkpIHsNCisJCQkJZXJyID0gLUVGQVVMVDsNCisJCQkJYnJlYWs7 DQorCQkJfQ0KKwkJCWlmICgoZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAs ICZ0LCAwKSkgPT0gLUVOT0RFVikNCisJCQkJdCA9IChzdHJ1Y3QgaXA2X3Ru bCAqKSBkZXYtPnByaXY7DQorCQkJZWxzZSBpZiAoZXJyKQ0KKwkJCQlicmVh azsNCisJCX0gZWxzZQ0KKwkJCXQgPSAoc3RydWN0IGlwNl90bmwgKikgZGV2 LT5wcml2Ow0KKw0KKwkJbWVtY3B5KCZwLCAmdC0+cGFybXMsIHNpemVvZiAo cCkpOw0KKwkJaWYgKGNvcHlfdG9fdXNlcihpZnItPmlmcl9pZnJ1LmlmcnVf ZGF0YSwgJnAsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsN CisJCX0NCisJCWJyZWFrOw0KKwljYXNlIFNJT0NBRERUVU5ORUw6DQorCWNh c2UgU0lPQ0NIR1RVTk5FTDoNCisJCWVyciA9IC1FUEVSTTsNCisJCWNyZWF0 ZSA9IChjbWQgPT0gU0lPQ0FERFRVTk5FTCk7DQorCQlpZiAoIWNhcGFibGUo Q0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorCQlpZiAoY29weV9mcm9t X3VzZXIoJnAsIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLCBzaXplb2YgKHAp KSkgew0KKwkJCWVyciA9IC1FRkFVTFQ7DQorCQkJYnJlYWs7DQorCQl9DQor CQlpZiAoIWNyZWF0ZSAmJiBkZXYgIT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7 DQorCQkJdCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9 DQorCQlpZiAoIXQgJiYgKGVyciA9IGlwNmlwNl90bmxfbG9jYXRlKCZwLCAm dCwgY3JlYXRlKSkpIHsNCisJCQlicmVhazsNCisJCX0NCisJCWlmIChjbWQg PT0gU0lPQ0NIR1RVTk5FTCkgew0KKwkJCWlmICh0LT5kZXYgIT0gZGV2KSB7 DQorCQkJCWVyciA9IC1FRVhJU1Q7DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJ CQlpcDZpcDZfdG5sX3VubGluayh0KTsNCisJCQllcnIgPSBpcDZpcDZfdG5s X2NoYW5nZSh0LCAmcCk7DQorCQkJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkJ CW5ldGRldl9zdGF0ZV9jaGFuZ2UoZGV2KTsNCisJCX0NCisJCWlmIChjb3B5 X3RvX3VzZXIoaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCSAmdC0+ cGFybXMsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJ CX0gZWxzZSB7DQorCQkJZXJyID0gMDsNCisJCX0NCisJCWJyZWFrOw0KKwlj YXNlIFNJT0NERUxUVU5ORUw6DQorCQllcnIgPSAtRVBFUk07DQorCQlpZiAo IWNhcGFibGUoQ0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorDQorCQlp ZiAoZGV2ID09ICZpcDZpcDZfZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5 X2Zyb21fdXNlcigmcCwgaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJ CQkgICBzaXplb2YgKHApKSkgew0KKwkJCQllcnIgPSAtRUZBVUxUOw0KKwkJ CQlicmVhazsNCisJCQl9DQorCQkJZXJyID0gaXA2aXA2X3RubF9sb2NhdGUo JnAsICZ0LCAwKTsNCisJCQlpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJCQlp ZiAodCA9PSAmaXA2aXA2X2ZiX3RubCkgew0KKwkJCQllcnIgPSAtRVBFUk07 DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJCX0gZWxzZSB7DQorCQkJdCA9IChz dHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQllcnIgPSBp cDZfdG5sX2Rlc3Ryb3kodCk7DQorCQlicmVhazsNCisJZGVmYXVsdDoNCisJ CWVyciA9IC1FSU5WQUw7DQorCX0NCisJcmV0dXJuIGVycjsNCit9DQorDQor LyoqDQorICogaXA2aXA2X3RubF9nZXRfc3RhdHMgLSByZXR1cm4gdGhlIHN0 YXRzIGZvciB0dW5uZWwgZGV2aWNlIA0KKyAqICAgQGRldjogdmlydHVhbCBk ZXZpY2UgYXNzb2NpYXRlZCB3aXRoIHR1bm5lbA0KKyAqDQorICogUmV0dXJu OiBzdGF0cyBmb3IgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBzdHJ1Y3Qg bmV0X2RldmljZV9zdGF0cyAqDQoraXA2aXA2X3RubF9nZXRfc3RhdHMoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCXJldHVybiAmKCgoc3RydWN0 IGlwNl90bmwgKikgZGV2LT5wcml2KS0+c3RhdCk7DQorfQ0KKw0KKy8qKg0K KyAqIGlwNmlwNl90bmxfY2hhbmdlX210dSAtIGNoYW5nZSBtdHUgbWFudWFs bHkgZm9yIHR1bm5lbCBkZXZpY2UNCisgKiAgIEBkZXY6IHZpcnR1YWwgZGV2 aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiAgIEBuZXdfbXR1OiB0 aGUgbmV3IG10dQ0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgMCBvbiBzdWNj ZXNzLA0KKyAqICAgJS1FSU5WQUwgaWYgbXR1IHRvbyBzbWFsbA0KKyAqKi8N CisNCitzdGF0aWMgaW50DQoraXA2aXA2X3RubF9jaGFuZ2VfbXR1KHN0cnVj dCBuZXRfZGV2aWNlICpkZXYsIGludCBuZXdfbXR1KQ0KK3sNCisJaWYgKG5l d19tdHUgPCBJUFY2X01JTl9NVFUpIHsNCisJCXJldHVybiAtRUlOVkFMOw0K Kwl9DQorCWRldi0+bXR1ID0gbmV3X210dTsNCisJcmV0dXJuIDA7DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfZGV2X2luaXRfZ2VuIC0gZ2VuZXJh bCBpbml0aWFsaXplciBmb3IgYWxsIHR1bm5lbCBkZXZpY2VzDQorICogICBA ZGV2OiB2aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQor ICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIFNldCBmdW5jdGlvbiBwb2lu dGVycyBhbmQgaW5pdGlhbGl6ZSB0aGUgJnN0cnVjdCBmbG93aSB0ZW1wbGF0 ZSB1c2VkDQorICogICBieSB0aGUgdHVubmVsLg0KKyAqKi8NCisNCitzdGF0 aWMgdm9pZA0KK2lwNmlwNl90bmxfZGV2X2luaXRfZ2VuKHN0cnVjdCBuZXRf ZGV2aWNlICpkZXYpDQorew0KKwlzdHJ1Y3QgaXA2X3RubCAqdCA9IChzdHJ1 Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCXN0cnVjdCBmbG93aSAqZmwg PSAmdC0+Zmw7DQorDQorCW1lbXNldChmbCwgMCwgc2l6ZW9mICgqZmwpKTsN CisJZmwtPnByb3RvID0gSVBQUk9UT19JUFY2Ow0KKw0KKwlkZXYtPmRlc3Ry dWN0b3IgPSBpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yOw0KKwlkZXYtPnVu aW5pdCA9IGlwNmlwNl90bmxfZGV2X3VuaW5pdDsNCisJZGV2LT5oYXJkX3N0 YXJ0X3htaXQgPSBpcDZpcDZfdG5sX3htaXQ7DQorCWRldi0+Z2V0X3N0YXRz ID0gaXA2aXA2X3RubF9nZXRfc3RhdHM7DQorCWRldi0+ZG9faW9jdGwgPSBp cDZpcDZfdG5sX2lvY3RsOw0KKwlkZXYtPmNoYW5nZV9tdHUgPSBpcDZpcDZf dG5sX2NoYW5nZV9tdHU7DQorCWRldi0+dHlwZSA9IEFSUEhSRF9UVU5ORUw2 Ow0KKwlkZXYtPmZsYWdzIHw9IElGRl9OT0FSUDsNCisJaWYgKGlwdjZfYWRk cl90eXBlKCZ0LT5wYXJtcy5yYWRkcikgJiBJUFY2X0FERFJfVU5JQ0FTVCAm Jg0KKwkgICAgaXB2Nl9hZGRyX3R5cGUoJnQtPnBhcm1zLmxhZGRyKSAmIElQ VjZfQUREUl9VTklDQVNUKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRP UE9JTlQ7DQorCS8qIEhtbS4uLiBNQVhfQUREUl9MRU4gaXMgOCwgc28gdGhl IGlwdjYgYWRkcmVzc2VzIGNhbid0IGJlIA0KKwkgICBjb3BpZWQgdG8gZGV2 LT5kZXZfYWRkciBhbmQgZGV2LT5icm9hZGNhc3QsIGxpa2UgdGhlIGlwdjQN CisJICAgYWRkcmVzc2VzIHdlcmUgaW4gaXBpcC5jLCBpcF9ncmUuYyBhbmQg c2l0LmMuICovDQorCWRldi0+YWRkcl9sZW4gPSAwOw0KK30NCisNCisvKioN CisgKiBpcDZpcDZfdG5sX2Rldl9pbml0IC0gaW5pdGlhbGl6ZXIgZm9yIGFs bCBub24gZmFsbGJhY2sgdHVubmVsIGRldmljZXMNCisgKiAgIEBkZXY6IHZp cnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiovDQor DQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5l dF9kZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0 cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJaXA2aXA2X3RubF9kZXZf aW5pdF9nZW4oZGV2KTsNCisJaXA2aXA2X3RubF9saW5rX2NvbmZpZyh0KTsN CisJcmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9mYl90bmxf ZGV2X2luaXQgLSBpbml0aWFsaXplciBmb3IgZmFsbGJhY2sgdHVubmVsIGRl dmljZQ0KKyAqICAgQGRldjogZmFsbGJhY2sgZGV2aWNlDQorICoNCisgKiBS ZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlwNl9mYl90bmxfZGV2X2lu aXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCWlwNmlwNl90bmxf ZGV2X2luaXRfZ2VuKGRldik7DQorCXRubHNfd2NbMF0gPSAmaXA2aXA2X2Zi X3RubDsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyBzdHJ1Y3QgaW5l dDZfcHJvdG9jb2wgaXA2aXA2X3Byb3RvY29sID0gew0KKwkuaGFuZGxlciA9 IGlwNmlwNl9yY3YsDQorCS5lcnJfaGFuZGxlciA9IGlwNmlwNl9lcnIsDQor CS5mbGFncyA9IElORVQ2X1BST1RPX0ZJTkFMDQorfTsNCisNCisvKioNCisg KiBpcDZfdHVubmVsX2luaXQgLSByZWdpc3RlciBwcm90b2NvbCBhbmQgcmVz ZXJ2ZSBuZWVkZWQgcmVzb3VyY2VzDQorICoNCisgKiBSZXR1cm46IDAgb24g c3VjY2Vzcw0KKyAqKi8NCisNCitpbnQgX19pbml0IGlwNl90dW5uZWxfaW5p dCh2b2lkKQ0KK3sNCisJaW50IGksIGosIGVycjsNCisJc3RydWN0IHNvY2sg KnNrOw0KKwlzdHJ1Y3QgaXB2Nl9waW5mbyAqbnA7DQorDQorCWlwNmlwNl9m Yl90bmxfZGV2LnByaXYgPSAodm9pZCAqKSAmaXA2aXA2X2ZiX3RubDsNCisN CisJZm9yIChpID0gMDsgaSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFj cHVfcG9zc2libGUoaSkpDQorCQkJY29udGludWU7DQorDQorCQllcnIgPSBz b2NrX2NyZWF0ZShQRl9JTkVUNiwgU09DS19SQVcsIElQUFJPVE9fSVBWNiwg DQorCQkJCSAgJl9faXA2X3NvY2tldFtpXSk7DQorCQlpZiAoZXJyIDwgMCkg ew0KKwkJCXByaW50ayhLRVJOX0VSUiANCisJCQkgICAgICAgIkZhaWxlZCB0 byBjcmVhdGUgdGhlIElQdjYgdHVubmVsIHNvY2tldCAiDQorCQkJICAgICAg ICIoZXJyICVkKS5cbiIsIA0KKwkJCSAgICAgICBlcnIpOw0KKwkJCWdvdG8g ZmFpbDsNCisJCX0NCisJCXNrID0gX19pcDZfc29ja2V0W2ldLT5zazsNCisJ CXNrLT5za19hbGxvY2F0aW9uID0gR0ZQX0FUT01JQzsNCisNCisJCW5wID0g aW5ldDZfc2soc2spOw0KKwkJbnAtPmhvcF9saW1pdCA9IDI1NTsNCisJCW5w LT5tY19sb29wID0gMDsNCisNCisJCXNrLT5za19wcm90LT51bmhhc2goc2sp Ow0KKwl9DQorCWlmICgoZXJyID0gaW5ldDZfYWRkX3Byb3RvY29sKCZpcDZp cDZfcHJvdG9jb2wsIElQUFJPVE9fSVBWNikpIDwgMCkgew0KKwkJcHJpbnRr KEtFUk5fRVJSICJGYWlsZWQgdG8gcmVnaXN0ZXIgSVB2NiBwcm90b2NvbFxu Iik7DQorCQlnb3RvIGZhaWw7DQorCX0NCisNCisJU0VUX01PRFVMRV9PV05F UigmaXA2aXA2X2ZiX3RubF9kZXYpOw0KKwlyZWdpc3Rlcl9uZXRkZXYoJmlw NmlwNl9mYl90bmxfZGV2KTsNCisNCisJcmV0dXJuIDA7DQorZmFpbDoNCisJ Zm9yIChqID0gMDsgaiA8IGk7IGorKykgew0KKwkJaWYgKCFjcHVfcG9zc2li bGUoaikpDQorCQkJY29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZf c29ja2V0W2pdKTsNCisJCV9faXA2X3NvY2tldFtqXSA9IE5VTEw7DQorCX0N CisJcmV0dXJuIGVycjsNCit9DQorDQorLyoqDQorICogaXA2X3R1bm5lbF9j bGVhbnVwIC0gZnJlZSByZXNvdXJjZXMgYW5kIHVucmVnaXN0ZXIgcHJvdG9j b2wNCisgKiovDQorDQordm9pZCBpcDZfdHVubmVsX2NsZWFudXAodm9pZCkN Cit7DQorCWludCBpOw0KKw0KKwl1bnJlZ2lzdGVyX25ldGRldigmaXA2aXA2 X2ZiX3RubF9kZXYpOw0KKw0KKwlpbmV0Nl9kZWxfcHJvdG9jb2woJmlwNmlw Nl9wcm90b2NvbCwgSVBQUk9UT19JUFY2KTsNCisNCisJZm9yIChpID0gMDsg aSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaSkp DQorCQkJY29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0 W2ldKTsNCisJCV9faXA2X3NvY2tldFtpXSA9IE5VTEw7DQorCX0NCit9DQor DQorI2lmZGVmIE1PRFVMRQ0KK21vZHVsZV9pbml0KGlwNl90dW5uZWxfaW5p dCk7DQorbW9kdWxlX2V4aXQoaXA2X3R1bm5lbF9jbGVhbnVwKTsNCisjZW5k aWYNCmRpZmYgLU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2Vl cGVyIC0tZXhjbHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L2lw djZfc3ltcy5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYw0KLS0t IGxpbnV4LTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYwlNb24gSnVuICA5IDA5 OjExOjI1IDIwMDMNCisrKyBtZXJnZS0yLjUvbmV0L2lwdjYvaXB2Nl9zeW1z LmMJTW9uIEp1biAgOSAxMDozNjo0NiAyMDAzDQpAQCAtMzgsMyArMzgsOSBA QA0KIEVYUE9SVF9TWU1CT0woaXA2X2ZpbmRfMXN0ZnJhZ29wdCk7DQogRVhQ T1JUX1NZTUJPTCh4ZnJtNl9yY3YpOw0KIEVYUE9SVF9TWU1CT0woeGZybTZf Y2xlYXJfbXV0YWJsZV9vcHRpb25zKTsNCitFWFBPUlRfU1lNQk9MKHJ0Nl9s b29rdXApOw0KK0VYUE9SVF9TWU1CT0woZmw2X3NvY2tfbG9va3VwKTsNCitF WFBPUlRfU1lNQk9MKGlwdjZfZXh0X2hkcik7DQorRVhQT1JUX1NZTUJPTChp cDZfYXBwZW5kX2RhdGEpOw0KK0VYUE9SVF9TWU1CT0woaXA2X2ZsdXNoX3Bl bmRpbmdfZnJhbWVzKTsNCitFWFBPUlRfU1lNQk9MKGlwNl9wdXNoX3BlbmRp bmdfZnJhbWVzKTsNCmRpZmYgLU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1 ZGU9Qml0S2VlcGVyIC0tZXhjbHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25l dC9uZXRzeW1zLmMgbWVyZ2UtMi41L25ldC9uZXRzeW1zLmMNCi0tLSBsaW51 eC0yLjUvbmV0L25ldHN5bXMuYwlNb24gSnVuICA5IDA4OjQ1OjU3IDIwMDMN CisrKyBtZXJnZS0yLjUvbmV0L25ldHN5bXMuYwlNb24gSnVuICA5IDEwOjM4 OjE2IDIwMDMNCkBAIC00NzcsOCArNDc3LDEwIEBADQogRVhQT1JUX1NZTUJP TChzeXNjdGxfbWF4X3N5bl9iYWNrbG9nKTsNCiAjZW5kaWYNCiANCi1FWFBP UlRfU1lNQk9MKGlwX2dlbmVyaWNfZ2V0ZnJhZyk7DQorI2VuZGlmDQogDQor I2lmIGRlZmluZWQgKENPTkZJR19JUFY2X01PRFVMRSkgfHwgZGVmaW5lZCAo Q09ORklHX0lQX1NDVFBfTU9EVUxFKSB8fCBkZWZpbmVkIChDT05GSUdfSVBW Nl9UVU5ORUxfTU9EVUxFKQ0KK0VYUE9SVF9TWU1CT0woaXBfZ2VuZXJpY19n ZXRmcmFnKTsNCiAjZW5kaWYNCiANCiBFWFBPUlRfU1lNQk9MKHRjcF9yZWFk X3NvY2spOw0K ---377318441-1375410448-1055158991=:13811-- From davem@redhat.com Mon Jun 9 04:58:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:58:58 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Bwr2x023287 for ; Mon, 9 Jun 2003 04:58:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id EAA18251; Mon, 9 Jun 2003 04:55:48 -0700 Date: Mon, 09 Jun 2003 04:55:47 -0700 (PDT) Message-Id: <20030609.045547.91327851.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609072227.R34462@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> <20030609072227.R34462@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2996 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) Yes, you have a nice setup and thats why you should test all the patches DaveM is posting. Dave, Paul is running in a real ISP environment i think he is very valuable in helping to test these patches and collect any says that might be needed. Now watch him disapear ;-> If he doesn't test my patches he isn't very useful, so we'll see :-) Additional thought Dave: i think prefetching the rth would help in 2.5 at least when you have lotsa collisions. call prefetch(nextrth) right after smp_read_barrier_depends() everywhere in route.c You're going to prefetch "nextrth" when the first thing we're going to access is "&nextrth->fl"? :-) It only makes sense to prefetch the 'fl' member of the first hash chain entry and that's what I've done in my tree. This points out that it would make sense to put the struct flowi up into the dst entry. From hch@lst.de Mon Jun 9 05:06:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:06:18 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59C662x023725 for ; Mon, 9 Jun 2003 05:06:07 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h59C64DC031415 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Mon, 9 Jun 2003 14:06:04 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h59C63tA031413 for netdev@oss.sgi.com; Mon, 9 Jun 2003 14:06:03 +0200 Date: Mon, 9 Jun 2003 14:06:03 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] switch skfp over to initcalls Message-ID: <20030609120603.GA31393@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2997 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) --- 1.20/drivers/net/Space.c Wed May 21 03:56:26 2003 +++ edited/drivers/net/Space.c Tue Jun 3 22:17:09 2003 @@ -105,9 +105,6 @@ /* Detachable devices ("pocket adaptors") */ extern int de620_probe(struct net_device *); -/* FDDI adapters */ -extern int skfp_probe(struct net_device *dev); - /* Fibre Channel adapters */ extern int iph5526_probe(struct net_device *dev); @@ -401,29 +398,6 @@ return -ENODEV; } -#ifdef CONFIG_FDDI -static int __init fddiif_probe(struct net_device *dev) -{ - unsigned long base_addr = dev->base_addr; - - if (base_addr == 1) - return 1; /* ENXIO */ - - if (1 -#ifdef CONFIG_APFDDI - && apfddi_init(dev) -#endif -#ifdef CONFIG_SKFP - && skfp_probe(dev) -#endif - && 1 ) { - return 1; /* -ENODEV or -EAGAIN would be more accurate. */ - } - return 0; -} -#endif - - #ifdef CONFIG_NET_FC static int fcif_probe(struct net_device *dev) { @@ -614,52 +588,6 @@ #define NEXT_DEV (&tr0_dev) #endif - -#ifdef CONFIG_FDDI -static struct net_device fddi7_dev = { - .name = "fddi7", - .next = NEXT_DEV, - .init = fddiif_probe -}; -static struct net_device fddi6_dev = { - .name = "fddi6", - .next = &fddi7_dev, - .init = fddiif_probe -}; -static struct net_device fddi5_dev = { - .name = "fddi5", - .next = &fddi6_dev, - .init = fddiif_probe -}; -static struct net_device fddi4_dev = { - .name = "fddi4", - .next = &fddi5_dev, - .init = fddiif_probe -}; -static struct net_device fddi3_dev = { - .name = "fddi3", - .next = &fddi4_dev, - .init = fddiif_probe -}; -static struct net_device fddi2_dev = { - .name = "fddi2", - .next = &fddi3_dev, - .init = fddiif_probe -}; -static struct net_device fddi1_dev = { - .name = "fddi1", - .next = &fddi2_dev, - .init = fddiif_probe -}; -static struct net_device fddi0_dev = { - .name = "fddi0", - .next = &fddi1_dev, - .init = fddiif_probe -}; -#undef NEXT_DEV -#define NEXT_DEV (&fddi0_dev) -#endif - #ifdef CONFIG_NET_FC static struct net_device fc1_dev = { --- 1.12/drivers/net/skfp/skfddi.c Fri May 9 02:40:17 2003 +++ edited/drivers/net/skfp/skfddi.c Tue Jun 3 22:19:04 2003 @@ -2539,72 +2539,25 @@ } // drv_reset_indication - -//--------------- functions for use as a module ---------------- - -#ifdef MODULE -/************************ - * - * Note now that module autoprobing is allowed under PCI. The - * IRQ lines will not be auto-detected; instead I'll rely on the BIOSes - * to "do the right thing". - * - ************************/ -#define LP(a) ((struct s_smc*)(a)) static struct net_device *mdev; -/************************ - * - * init_module - * - * If compiled as a module, find - * adapters and initialize them. - * - ************************/ -int init_module(void) +static int __init skfd_init(void) { struct net_device *p; - PRINTK(KERN_INFO "FDDI init module\n"); if ((mdev = insert_device(NULL, skfp_probe)) == NULL) return -ENOMEM; - for (p = mdev; p != NULL; p = LP(p->priv)->os.next_module) { - PRINTK(KERN_INFO "device to register: %s\n", p->name); + for (p = mdev; p != NULL; p = ((struct s_smc *)p->priv)->os.next_module) { if (register_netdev(p) != 0) { printk("skfddi init_module failed\n"); return -EIO; } } - PRINTK(KERN_INFO "+++++ exit with success +++++\n"); return 0; -} // init_module +} -/************************ - * - * cleanup_module - * - * Release all resources claimed by this module. - * - ************************/ -void cleanup_module(void) -{ - PRINTK(KERN_INFO "cleanup_module\n"); - while (mdev != NULL) { - mdev = unlink_modules(mdev); - } - return; -} // cleanup_module - - -/************************ - * - * unlink_modules - * - * Unregister devices and release their memory. - * - ************************/ static struct net_device *unlink_modules(struct net_device *p) { struct net_device *next = NULL; @@ -2638,5 +2591,11 @@ return next; } // unlink_modules +static void __exit skfd_exit(void) +{ + while (mdev) + mdev = unlink_modules(mdev); +} -#endif /* MODULE */ +module_init(skfd_init); +module_exit(skfd_exit); From hadi@shell.cyberus.ca Mon Jun 9 05:19:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:19:24 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59CJE2x024134 for ; Mon, 9 Jun 2003 05:19:15 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PLbu-0008zh-Lh; Mon, 09 Jun 2003 08:18:50 -0400 Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) From: Jamal Hadi To: "David S. Miller" cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.045547.91327851.davem@redhat.com> Message-ID: <20030609080430.I34540@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2998 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, David S. Miller wrote: > From: Jamal Hadi > Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) > > Yes, you have a nice setup and thats why you should test all the patches > DaveM is posting. Dave, Paul is running in a real ISP environment i think > he is very valuable in helping to test these patches and collect > any says that might be needed. Now watch him disapear ;-> > > If he doesn't test my patches he isn't very useful, > so we'll see :-) Ok foo the pressure in on you now ;-> You wanna see things fixed then run the damn tests or stop bitching ;-> > You're going to prefetch "nextrth" when the first thing we're > going to access is "&nextrth->fl"? :-) > > It only makes sense to prefetch the 'fl' member of the first hash > chain entry and that's what I've done in my tree. This points out > that it would make sense to put the struct flowi up into the dst > entry. yes moving the flowi up makes more sense. I found in my tests with a ethernet driver that prefetching the _next_ dma descriptor gave better numbers than prefetching the current one but i didnt spend too much time. I am going to revisit this. Good thought on rearranging the structure, may help with the descriptors as well. cheers, jamal From davem@redhat.com Mon Jun 9 05:35:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:35:40 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59CZT2x024615 for ; Mon, 9 Jun 2003 05:35:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA18415; Mon, 9 Jun 2003 05:32:19 -0700 Date: Mon, 09 Jun 2003 05:32:18 -0700 (PDT) Message-Id: <20030609.053218.54202815.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609080430.I34540@shell.cyberus.ca> References: <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> <20030609080430.I34540@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2999 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) I found in my tests with a ethernet driver that prefetching the _next_ dma descriptor gave better numbers than prefetching the current one but i didnt spend too much time. Two issues: 1) We have some cycles to borrow for head entry, we can make prefetch right before rcu_read_lock() 2) Ideally, hash chains will not exceed 1 (2 at the max) entries. Just some thinking... From nakam@linux-ipv6.org Mon Jun 9 05:39:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:39:47 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Cdb2x024938 for ; Mon, 9 Jun 2003 05:39:38 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19PKxT-00017F-00; Mon, 09 Jun 2003 20:37:03 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030609203659.089b241b.nakam@linux-ipv6.org> In-Reply-To: References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Mon, 09 Jun 2003 20:37:03 +0900 X-archive-position: 3000 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 12:06:35 +0300 (EEST) Henrik Petander wrote: > On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: > > > > We don't think we have to change the logic handling policy with > > the reason because we can treat MIPv6 policy just like IPsec. > > > > When we want to apply both MIPv6 and IPsec to the same target, > > we need one policy that has two or more of templates(e.g. one is > > MIPv6's template and the other is IPsec's). > > Does this also mean that the IPSec and MIPv6 policies and SAs need to be > configured at the same time or is it possible to add templates to an > existing policy? Currently no interface to add templates directly to it. : > A different issue related to the different addresses is that the SPD > lookup should be done with the original source address, i.e. home address, > if home address option is used and with the final destination address, if > routing header is used. SPD lookup works now for TCP (with RT header), but > not for raw sockets, which the mipv6 daemon will use. We will provide a > patch for fixing the SPD lookups with raw sockets, which add routing > header and home address option from socket options. > Ok, I want to see your patch when it is provided because now I'm not so clear about using socket option in the above case. Regards, -- Masahide NAKAMURA From ralph@istop.com Mon Jun 9 06:04:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:04:27 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59D472x026506 for ; Mon, 9 Jun 2003 06:04:18 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 4393636AEB; Mon, 9 Jun 2003 09:04:06 -0400 (EDT) Date: Mon, 9 Jun 2003 09:04:06 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'Florian Weimer'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> Message-ID: References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3001 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Jamal Hadi wrote: > I am sure there are people who will like to sell you linux devices > at half the cisco prices doing Millions of PPS via hardware assists. > Support these linux supporting companies instead ;-> Are you serious? Who is making these boxes? -Ralph From hadi@shell.cyberus.ca Mon Jun 9 06:22:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:22:43 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59DMa2x028671 for ; Mon, 9 Jun 2003 06:22:37 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PMbD-00091q-Ju; Mon, 09 Jun 2003 09:22:11 -0400 Date: Mon, 9 Jun 2003 09:22:11 -0400 (EDT) From: Jamal Hadi To: "David S. Miller" cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.053218.54202815.davem@redhat.com> Message-ID: <20030609091907.Y34702@shell.cyberus.ca> References: <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> <20030609080430.I34540@shell.cyberus.ca> <20030609.053218.54202815.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3002 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, David S. Miller wrote: > From: Jamal Hadi > Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) > > I found in my tests with a ethernet driver that prefetching the > _next_ dma descriptor gave better numbers than prefetching the > current one but i didnt spend too much time. > > Two issues: > > 1) We have some cycles to borrow for head entry, we can make > prefetch right before rcu_read_lock() > > 2) Ideally, hash chains will not exceed 1 (2 at the max) > entries. > I dont think youll see much benefit with 1 or 2 entries. I was thinking more along the lines of people with over 100K entries total; Let me run with this and get back to you. cheers, jamal From davem@redhat.com Mon Jun 9 06:25:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:25:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59DPM2x029047 for ; Mon, 9 Jun 2003 06:25:23 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id GAA18572; Mon, 9 Jun 2003 06:22:17 -0700 Date: Mon, 09 Jun 2003 06:22:17 -0700 (PDT) Message-Id: <20030609.062217.48383829.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609091907.Y34702@shell.cyberus.ca> References: <20030609080430.I34540@shell.cyberus.ca> <20030609.053218.54202815.davem@redhat.com> <20030609091907.Y34702@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3003 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 09:22:11 -0400 (EDT) I dont think youll see much benefit with 1 or 2 entries. I was thinking more along the lines of people with over 100K entries total; You simply don't want the chains to get that long. In my experience, even with prefetching tricks, past 2 or 3 entry deep hash chains you run into serious problems. TCP has the same issue BTW, in fact DoS-like behavior is the common thing there. Every time you create a new TCP connection on a server it's exactly like a routing cache miss. Let me run with this and get back to you. Ok. From ralph@istop.com Mon Jun 9 07:06:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:06:31 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59E6E2x030168 for ; Mon, 9 Jun 2003 07:06:15 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 1D94736952; Mon, 9 Jun 2003 09:28:59 -0400 (EDT) Date: Mon, 9 Jun 2003 09:28:59 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: CIT/Paul , "'Florian Weimer'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030609064719.GA20613@netnation.com> Message-ID: References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3004 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Simon Kirby wrote: > You got a 7200 VXR to do 300kpps? I would have liked to see that. > We couldn't get our 7206 VXR routers to do anything more than about 12 > Mbit/second of small packets, which I believe is about 40,000 packets > per second. This is with CEF disabled, because it ended up duplicating > packets and doing some other strange things with CEF enabled. The trick is finding the good IOS revs. 12.0(7)T and 12.2(11)T have been good ones for me. Finding other ISPs running ciscos to exchange tips and ideas has been much easier than finding folks running linux. A sure-fire way to get flamed is to post to NANOG asking what's the best Linux router setup! For most ISPs it's better to spend $20K on a 7206VXR/NPE-G1 than to spend days trying to figure out what kernel + patch set, NIC, and motherboard combination will squeeze the best performance out of a PC router. And once you've done that you still have zebra quirks to worry about... -Ralph From davem@redhat.com Mon Jun 9 07:17:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:18:06 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59EHv2x030561 for ; Mon, 9 Jun 2003 07:17:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA18814; Mon, 9 Jun 2003 07:14:52 -0700 Date: Mon, 09 Jun 2003 07:14:51 -0700 (PDT) Message-Id: <20030609.071451.108794109.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, hadi@shell.cyberus.ca, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3005 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Ok Simon/Robert/Mr.Foo :), give this a try, it's my final installment for the evening :-) If this shows improvement, we can make even larger strides by moving the struct flowi up into struct dst_entry. --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 +++ net/core/dst.c Mon Jun 9 03:13:56 2003 @@ -122,13 +122,34 @@ void * dst_alloc(struct dst_ops * ops) dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + dst->next = NULL; atomic_set(&dst->__refcnt, 0); - dst->ops = ops; + dst->__use = 0; + dst->child = NULL; + dst->dev = NULL; + dst->obsolete = 0; + dst->flags = 0; dst->lastuse = jiffies; + dst->expires = 0; + dst->header_len = 0; + dst->trailer_len = 0; + memset(dst->metrics, 0, sizeof(dst->metrics)); dst->path = dst; + dst->rate_last = 0; + dst->rate_tokens = 0; + dst->error = 0; + dst->neighbour = NULL; + dst->hh = NULL; + dst->xfrm = NULL; dst->input = dst_discard; dst->output = dst_blackhole; +#ifdef CONFIG_NET_CLS_ROUTE + dst->tclassid = 0; +#endif + dst->ops = ops; + INIT_RCU_HEAD(&dst->rcu_head); + memset(dst->info, 0, + ops->entry_size - offsetof(struct dst_entry, info)); #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Mon Jun 9 06:49:15 2003 @@ -88,6 +88,7 @@ #include #include #include +#include #include #include #include @@ -882,6 +883,60 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(&rt_hash_table[hash].lock); } +static void __rt_hash_shrink(unsigned int hash) +{ + struct rtable *rth, **rthp; + struct rtable *cand, **candp; + unsigned int min_use = ~(unsigned int) 0; + + spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; + candp = NULL; + rthp = &rt_hash_table[hash].chain; + while ((rth = *rthp) != NULL) { + if (!atomic_read(&rth->u.dst.__refcnt) && + ((unsigned int) rth->u.dst.__use) < min_use) { + cand = rth; + candp = rthp; + min_use = rth->u.dst.__use; + } + rthp = &rth->u.rt_next; + } + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); + } + + spin_unlock_bh(&rt_hash_table[hash].lock); +} + +static inline struct rtable *ip_rt_dst_alloc(unsigned int hash) +{ + if (atomic_read(&ipv4_dst_ops.entries) > + ipv4_dst_ops.gc_thresh) + __rt_hash_shrink(hash); + + return dst_alloc(&ipv4_dst_ops); +} + +static void ip_rt_copy(struct rtable *rt, struct rtable *old) +{ + memcpy(rt, old, sizeof(*rt)); + + INIT_RCU_HEAD(&rt->u.dst.rcu_head); + rt->u.dst.__use = 1; + atomic_set(&rt->u.dst.__refcnt, 1); + rt->u.dst.child = NULL; + if (rt->u.dst.dev) + dev_hold(rt->u.dst.dev); + rt->u.dst.obsolete = 0; + rt->u.dst.lastuse = jiffies; + rt->u.dst.path = &rt->u.dst; + rt->u.dst.neighbour = NULL; + rt->u.dst.hh = NULL; + rt->u.dst.xfrm = NULL; +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, u8 tos, struct net_device *dev) { @@ -912,9 +967,10 @@ void ip_rt_redirect(u32 old_gw, u32 dadd for (i = 0; i < 2; i++) { for (k = 0; k < 2; k++) { - unsigned hash = rt_hash_code(daddr, - skeys[i] ^ (ikeys[k] << 5), - tos); + unsigned int hash = rt_hash_code(daddr, + skeys[i] ^ + (ikeys[k] << 5), + tos); rthp=&rt_hash_table[hash].chain; @@ -942,7 +998,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd dst_hold(&rth->u.dst); rcu_read_unlock(); - rt = dst_alloc(&ipv4_dst_ops); + rt = ip_rt_dst_alloc(hash); if (rt == NULL) { ip_rt_put(rth); in_dev_put(in_dev); @@ -950,19 +1006,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd } /* Copy all the information. */ - *rt = *rth; - INIT_RCU_HEAD(&rt->u.dst.rcu_head); - rt->u.dst.__use = 1; - atomic_set(&rt->u.dst.__refcnt, 1); - rt->u.dst.child = NULL; - if (rt->u.dst.dev) - dev_hold(rt->u.dst.dev); - rt->u.dst.obsolete = 0; - rt->u.dst.lastuse = jiffies; - rt->u.dst.path = &rt->u.dst; - rt->u.dst.neighbour = NULL; - rt->u.dst.hh = NULL; - rt->u.dst.xfrm = NULL; + ip_rt_copy(rt, rth); rt->rt_flags |= RTCF_REDIRECTED; @@ -1352,7 +1396,7 @@ static void rt_set_nexthop(struct rtable static int ip_route_input_mc(struct sk_buff *skb, u32 daddr, u32 saddr, u8 tos, struct net_device *dev, int our) { - unsigned hash; + unsigned int hash; struct rtable *rth; u32 spec_dst; struct in_device *in_dev = in_dev_get(dev); @@ -1375,7 +1419,9 @@ static int ip_route_input_mc(struct sk_b dev, &spec_dst, &itag) < 0) goto e_inval; - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1421,7 +1467,6 @@ static int ip_route_input_mc(struct sk_b RT_CACHE_STAT_INC(in_slow_mc); in_dev_put(in_dev); - hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); return rt_intern_hash(hash, rth, (struct rtable**) &skb->dst); e_nobufs: @@ -1584,45 +1629,42 @@ int ip_route_input_slow(struct sk_buff * goto e_inval; } - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; atomic_set(&rth->u.dst.__refcnt, 1); - rth->u.dst.flags= DST_HOST; - if (in_dev->cnf.no_policy) - rth->u.dst.flags |= DST_NOPOLICY; - if (in_dev->cnf.no_xfrm) - rth->u.dst.flags |= DST_NOXFRM; - rth->fl.fl4_dst = daddr; + rth->u.dst.dev = out_dev->dev; + dev_hold(out_dev->dev); + rth->u.dst.flags= (DST_HOST | + (in_dev->cnf.no_policy ? DST_NOPOLICY : 0) | + (in_dev->cnf.no_xfrm ? DST_NOXFRM : 0)); + rth->u.dst.input = ip_forward; + rth->u.dst.output = ip_output; + + rth->rt_flags = flags; + rth->rt_src = saddr; rth->rt_dst = daddr; - rth->fl.fl4_tos = tos; + rth->rt_iif = dev->ifindex; + rth->rt_gateway = daddr; + + rth->fl.iif = dev->ifindex; + rth->fl.fl4_dst = daddr; + rth->fl.fl4_src = saddr; #ifdef CONFIG_IP_ROUTE_FWMARK rth->fl.fl4_fwmark= skb->nfmark; #endif - rth->fl.fl4_src = saddr; - rth->rt_src = saddr; - rth->rt_gateway = daddr; + rth->fl.fl4_tos = tos; + rth->rt_spec_dst= spec_dst; #ifdef CONFIG_IP_ROUTE_NAT rth->rt_src_map = fl.fl4_src; rth->rt_dst_map = fl.fl4_dst; - if (flags&RTCF_DNAT) + if (flags & RTCF_DNAT) rth->rt_gateway = fl.fl4_dst; #endif - rth->rt_iif = - rth->fl.iif = dev->ifindex; - rth->u.dst.dev = out_dev->dev; - dev_hold(rth->u.dst.dev); - rth->fl.oif = 0; - rth->rt_spec_dst= spec_dst; - - rth->u.dst.input = ip_forward; - rth->u.dst.output = ip_output; rt_set_nexthop(rth, &res, itag); - rth->rt_flags = flags; - #ifdef CONFIG_NET_FASTROUTE if (netdev_fastroute && !(flags&(RTCF_NAT|RTCF_MASQ|RTCF_DOREDIRECT))) { struct net_device *odev = rth->u.dst.dev; @@ -1663,45 +1705,45 @@ brd_input: RT_CACHE_STAT_INC(in_brd); local_input: - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; + atomic_set(&rth->u.dst.__refcnt, 1); + rth->u.dst.dev = &loopback_dev; + dev_hold(&loopback_dev); + rth->u.dst.flags= (DST_HOST | + (in_dev->cnf.no_policy ? DST_NOPOLICY : 0)); + rth->u.dst.input= ip_local_deliver; rth->u.dst.output= ip_rt_bug; +#ifdef CONFIG_NET_CLS_ROUTE + rth->u.dst.tclassid = itag; +#endif - atomic_set(&rth->u.dst.__refcnt, 1); - rth->u.dst.flags= DST_HOST; - if (in_dev->cnf.no_policy) - rth->u.dst.flags |= DST_NOPOLICY; - rth->fl.fl4_dst = daddr; + rth->rt_flags = flags|RTCF_LOCAL; + rth->rt_type = res.type; + rth->rt_src = saddr; rth->rt_dst = daddr; - rth->fl.fl4_tos = tos; + rth->rt_iif = dev->ifindex; + rth->rt_gateway = daddr; + + rth->fl.iif = dev->ifindex; + rth->fl.fl4_dst = daddr; + rth->fl.fl4_src = saddr; #ifdef CONFIG_IP_ROUTE_FWMARK rth->fl.fl4_fwmark= skb->nfmark; #endif - rth->fl.fl4_src = saddr; - rth->rt_src = saddr; + rth->fl.fl4_tos = tos; + rth->rt_spec_dst= spec_dst; #ifdef CONFIG_IP_ROUTE_NAT rth->rt_dst_map = fl.fl4_dst; rth->rt_src_map = fl.fl4_src; #endif -#ifdef CONFIG_NET_CLS_ROUTE - rth->u.dst.tclassid = itag; -#endif - rth->rt_iif = - rth->fl.iif = dev->ifindex; - rth->u.dst.dev = &loopback_dev; - dev_hold(rth->u.dst.dev); - rth->rt_gateway = daddr; - rth->rt_spec_dst= spec_dst; - rth->u.dst.input= ip_local_deliver; - rth->rt_flags = flags|RTCF_LOCAL; if (res.type == RTN_UNREACHABLE) { rth->u.dst.input= ip_error; rth->u.dst.error= -err; rth->rt_flags &= ~RTCF_LOCAL; } - rth->rt_type = res.type; goto intern; no_route: @@ -1767,6 +1809,8 @@ int ip_route_input(struct sk_buff *skb, tos &= IPTOS_RT_MASK; hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos); + prefetch(&rt_hash_table[hash].chain->fl); + rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { smp_read_barrier_depends(); @@ -2048,7 +2092,10 @@ make_route: } } - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(oldflp->fl4_dst, + oldflp->fl4_src ^ (oldflp->oif << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2104,10 +2151,6 @@ make_route: rt_set_nexthop(rth, &res, 0); - - rth->rt_flags = flags; - - hash = rt_hash_code(oldflp->fl4_dst, oldflp->fl4_src ^ (oldflp->oif << 5), tos); err = rt_intern_hash(hash, rth, rp); done: if (free_res) @@ -2132,6 +2175,8 @@ int __ip_route_output_key(struct rtable struct rtable *rth; hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), flp->fl4_tos); + + prefetch(&rt_hash_table[hash].chain->fl); rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { From babydr@baby-dragons.com Mon Jun 9 07:37:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:37:12 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.system-techniques.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Eb42x001024 for ; Mon, 9 Jun 2003 07:37:05 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h59Eb17W006564; Mon, 9 Jun 2003 10:37:01 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h59Eb1SL006561; Mon, 9 Jun 2003 10:37:01 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Mon, 9 Jun 2003 10:37:01 -0400 (EDT) From: "Mr. James W. Laferriere" To: Jamal Hadi cc: Linux networking maillist , netdev@oss.sgi.com Subject: Re: netlink tester program In-Reply-To: <20030608212033.Y33230@shell.cyberus.ca> Message-ID: References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> <20030608212033.Y33230@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3006 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello Jamal , Is there a time frame to update the draft ? Many people do not see these lists . Yours & Mr. Haas's work is only viewable to these thru the ietf draft/rfc/... process . Rather narrow veiw but necessary at times . Tia , JimL On Sun, 8 Jun 2003, Jamal Hadi wrote: ...snip... > apologies, I actually have a unrelated daytime job that tends to keep me > too occupied at times ;-> > > Netlink2 draft is work in progress. The draft tends to lag reality. > I believe what you refer to has been fixed. Refer to the slides at: > http://www.zurich.ibm.com/~rha/netlink2.pdf > > > BTW, is netlink2 support planned for linux in the near > > future? > > You will see code from us that is GPL. Consider netlink2 as a distributed > netlink. netlink is already proven so why reinvent the wheel? > Essentially you should be able to manager clusters of linux network > devices (think firewalls, routers etc) with netlink/netlink2. > There are some mechanisms for distributdness that are missing. These are > the holes we are going to fill. > > Note some of the stuff i am working on at: > www.cyberus.ca/~hadi/patches/action which fits the whole forces paradigm > and works quiet well with netlink today and netlink2 next. > (I stopped updating that web page for sometime now, talk to me if > interested in the patches and if you would like to help in testing, > coding, etc) -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ From davem@redhat.com Mon Jun 9 08:00:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 08:00:28 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59F0I2x002275 for ; Mon, 9 Jun 2003 08:00:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA18915; Mon, 9 Jun 2003 07:55:35 -0700 Date: Mon, 09 Jun 2003 07:55:34 -0700 (PDT) Message-Id: <20030609.075534.124074402.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: ipv6 tunnel patch From: "David S. Miller" In-Reply-To: References: <20030607.033059.48393210.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3007 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Mon, 9 Jun 2003 14:43:11 +0300 (EEST) Ok here's the last revision of the patch :) It's done against ChangeSet 1.1308. Applied, thank you. From shemminger@osdl.org Mon Jun 9 09:24:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:24:42 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GOV2x003418 for ; Mon, 9 Jun 2003 09:24:32 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59GNRX06913; Mon, 9 Jun 2003 09:23:27 -0700 Date: Mon, 9 Jun 2003 09:23:27 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: xerox@foonet.net, hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-Id: <20030609092327.41899cb5.shemminger@osdl.org> In-Reply-To: <20030608.232827.88487519.davem@redhat.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030608.232827.88487519.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3008 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Has anyone looked into using Judy array's to speedup the route cache. HP has opened it up (see http://sourceforge.net/projects/judy ) and it should have better scaling for these type of attacks. From sim@netnation.com Mon Jun 9 09:30:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:30:37 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GUA2x003792 for ; Mon, 9 Jun 2003 09:30:11 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PPX8-0005GM-6M; Mon, 09 Jun 2003 09:30:10 -0700 Date: Mon, 9 Jun 2003 09:30:10 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030609163010.GA11509@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3009 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 09:28:59AM -0400, Ralph Doncaster wrote: > The trick is finding the good IOS revs. 12.0(7)T and 12.2(11)T have been > good ones for me. Finding other ISPs running ciscos to exchange tips and > ideas has been much easier than finding folks running linux. A sure-fire > way to get flamed is to post to NANOG asking what's the best Linux router > setup! > > For most ISPs it's better to spend $20K on a 7206VXR/NPE-G1 than to spend > days trying to figure out what kernel + patch set, NIC, and motherboard > combination will squeeze the best performance out of a PC router. And > once you've done that you still have zebra quirks to worry about... I beg to differ. We had much more pain trying to get those things to work properly than putting together two boxes that have been up now for almost a year without incident. Running Zebra, keepalived, etc., without any problems at all. What Zebra quirks? There has not yet been one crash or failure, which is much better than we could say for the 7206s. And I wouldn't exactly call it difficult to "squeeze" performance out of a PC when the 7206 VXRs have a 200 MHz processor. The main reason we switched is when we realized we could set up a powerful Linux box full of gigabit NICs for less than the price of one gigabit interface. At the time we purchased the NICs (3C996B-T) for less than $150 CDN each, and they're probably cheaper now. Simon- From davem@redhat.com Mon Jun 9 09:41:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:41:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GfD2x004186 for ; Mon, 9 Jun 2003 09:41:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA19368; Mon, 9 Jun 2003 09:37:56 -0700 Date: Mon, 09 Jun 2003 09:37:55 -0700 (PDT) Message-Id: <20030609.093755.51688774.davem@redhat.com> To: shemminger@osdl.org Cc: xerox@foonet.net, hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609092327.41899cb5.shemminger@osdl.org> References: <001801c32e50$57ef0750$4a00000a@badass> <20030608.232827.88487519.davem@redhat.com> <20030609092327.41899cb5.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3010 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 9 Jun 2003 09:23:27 -0700 Has anyone looked into using Judy array's to speedup the route cache. HP has opened it up (see http://sourceforge.net/projects/judy ) and it should have better scaling for these type of attacks. Like all such seemingly promising schemes, insert/retrieve are optimized at the expense of delete. I normally don't even look at such algorithms anymore, they all are amazing if you only build tables and look for things in them but are unusable when O(1) insert/delete/lookup are absolutely required. From davem@redhat.com Mon Jun 9 09:54:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:54:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GsB2x004619 for ; Mon, 9 Jun 2003 09:54:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA19441; Mon, 9 Jun 2003 09:51:06 -0700 Date: Mon, 09 Jun 2003 09:51:06 -0700 (PDT) Message-Id: <20030609.095106.22030084.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch skfp over to initcalls From: "David S. Miller" In-Reply-To: <20030609120603.GA31393@lst.de> References: <20030609120603.GA31393@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3011 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Mon, 9 Jun 2003 14:06:03 +0200 This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) Applied, thanks. From shemminger@osdl.org Mon Jun 9 10:10:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:10:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HAU2x005409; Mon, 9 Jun 2003 10:10:30 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59HAJX22232; Mon, 9 Jun 2003 10:10:19 -0700 Date: Mon, 9 Jun 2003 10:10:18 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: ralf@oss.sgi.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-Id: <20030609101018.0ca2e1f9.shemminger@osdl.org> In-Reply-To: <20030607.013010.116359540.davem@redhat.com> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3012 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 07 Jun 2003 01:30:10 -0700 (PDT) "David S. Miller" wrote: > From: Stephen Hemminger > Date: Fri, 6 Jun 2003 16:16:58 -0700 > > This driver never calls unregister in it's module exit function: > > static void __exit ioc3_cleanup_module(void) > { > pci_unregister_driver(&ioc3_driver); > } > > pci_unregister_driver() invokes, for each PCI driver instance > registered, the ->remove() method for that driver. > > What is the problem? tg3.c and many other drivers work exactly > like this, using the PCI registry mechanism as a helper to do > all the grunge work or device iteration. pci_unregister_driver does the iteration but not the net device cleanup. The problem is the driver never calls unregister_netdev, it just free's the device structure. If this ever happens, the net device list would be corrupt. Don't have the hardware to actually do this though. Looks like the right fix is: --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s struct ioc3 *ioc3 = ip->regs; iounmap(ioc3); + unregister_netdev(dev); pci_release_regions(pdev); kfree(dev); } From garzik@gtf.org Mon Jun 9 10:12:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:32 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCP2x005765; Mon, 9 Jun 2003 10:12:28 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id A73FE666D; Mon, 9 Jun 2003 13:12:24 -0400 (EDT) Date: Mon, 9 Jun 2003 13:12:24 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-ID: <20030609171224.GA14623@gtf.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609101018.0ca2e1f9.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 3014 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:10:18AM -0700, Stephen Hemminger wrote: > Looks like the right fix is: > --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 > +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 > @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s > struct ioc3 *ioc3 = ip->regs; > > iounmap(ioc3); > + unregister_netdev(dev); > pci_release_regions(pdev); > kfree(dev); > } You want to unregister before iounmap. Jeff From davem@redhat.com Mon Jun 9 10:12:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCF2x005736; Mon, 9 Jun 2003 10:12:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19545; Mon, 9 Jun 2003 10:09:14 -0700 Date: Mon, 09 Jun 2003 10:09:13 -0700 (PDT) Message-Id: <20030609.100913.67895069.davem@redhat.com> To: shemminger@osdl.org Cc: ralf@oss.sgi.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 From: "David S. Miller" In-Reply-To: <20030609101018.0ca2e1f9.shemminger@osdl.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3013 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 9 Jun 2003 10:10:18 -0700 pci_unregister_driver does the iteration but not the net device cleanup. You're absolutely right. From davem@redhat.com Mon Jun 9 10:12:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCr2x006045 for ; Mon, 9 Jun 2003 10:12:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19560; Mon, 9 Jun 2003 10:09:49 -0700 Date: Mon, 09 Jun 2003 10:09:48 -0700 (PDT) Message-Id: <20030609.100948.84378474.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <3EE4BF39.2020503@pobox.com> References: <3EE4045D.4040002@pobox.com> <20030608.225309.39172149.davem@redhat.com> <3EE4BF39.2020503@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3015 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Mon, 09 Jun 2003 13:09:13 -0400 David S. Miller wrote: > That's your plan, but did you do any of this yet? It'll keep > going deeper and deeper into bitkeeper history the longer that > you wait :-) Yes, I have been following my plan. You will see when Marcelo opens 2.4.22-pre1 that I have been committing these to my net-drivers-2.4 queue. Awesome. Now why was I asking this to begin with? You wanted to do something, what was that? :-) From garzik@gtf.org Mon Jun 9 10:14:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:14:50 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HEl2x006644 for ; Mon, 9 Jun 2003 10:14:47 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id DA97C666D; Mon, 9 Jun 2003 13:14:46 -0400 (EDT) Date: Mon, 9 Jun 2003 13:14:46 -0400 From: Jeff Garzik To: "David S. Miller" Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030609171446.GA15239@gtf.org> References: <3EE4045D.4040002@pobox.com> <20030608.225309.39172149.davem@redhat.com> <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.100948.84378474.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3016 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:09:48AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Mon, 09 Jun 2003 13:09:13 -0400 > > David S. Miller wrote: > > That's your plan, but did you do any of this yet? It'll keep > > going deeper and deeper into bitkeeper history the longer that > > you wait :-) > > Yes, I have been following my plan. You will see when Marcelo opens > 2.4.22-pre1 that I have been committing these to my net-drivers-2.4 queue. > > Awesome. > > Now why was I asking this to begin with? You wanted to do something, > what was that? :-) I wanted to wait on the s/kfree/release_netdev/ patch until the other stuff is done. Said patch can be applied anytime, and doing it in this order reduces 2.4 backport merge pain. :) Jeff From davem@redhat.com Mon Jun 9 10:21:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:21:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HLD2x007011 for ; Mon, 9 Jun 2003 10:21:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19625; Mon, 9 Jun 2003 10:18:07 -0700 Date: Mon, 09 Jun 2003 10:18:07 -0700 (PDT) Message-Id: <20030609.101807.119873069.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030609171446.GA15239@gtf.org> References: <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> <20030609171446.GA15239@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3017 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Mon, 9 Jun 2003 13:14:46 -0400 I wanted to wait on the s/kfree/release_netdev/ patch until the other stuff is done. Said patch can be applied anytime, and doing it in this order reduces 2.4 backport merge pain. :) No problem, but once all the init_etherdev() etc. crap is abolished, Stephen's work or something similar goes in... Ok? From davem@redhat.com Mon Jun 9 10:21:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:21:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HL72x007006 for ; Mon, 9 Jun 2003 10:21:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19607; Mon, 9 Jun 2003 10:16:15 -0700 Date: Mon, 09 Jun 2003 10:16:15 -0700 (PDT) Message-Id: <20030609.101615.133898193.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: etsh_cucu@yahoo.com, david-b@pacbell.net, rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <20030608212033.Y33230@shell.cyberus.ca> References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> <20030608212033.Y33230@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3018 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Sun, 8 Jun 2003 21:35:09 -0400 (EDT) Netlink2 draft is work in progress. The draft tends to lag reality. I believe what you refer to has been fixed. Refer to the slides at: http://www.zurich.ibm.com/~rha/netlink2.pdf ... Consider netlink2 as a distributed netlink. Beautiful, if you're going to allow this protocol to go over the wire, you have to choose a network byte order and swap in/out of it. Should be fun :) Unfortunately I see no mention of this issue in the slides, should I be scared? From davem@redhat.com Mon Jun 9 10:23:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:23:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HN52x007608 for ; Mon, 9 Jun 2003 10:23:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19658; Mon, 9 Jun 2003 10:20:00 -0700 Date: Mon, 09 Jun 2003 10:19:59 -0700 (PDT) Message-Id: <20030609.101959.62355600.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16091.11735.721251.925522@robur.slu.se> References: <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3019 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Mon, 2 Jun 2003 12:58:31 +0200 And later GC have to remove all enties with spin_lock_bh hold (no packet processing runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something to observe. Please note, in 2.5.x, holding of this lock on one cpu does not prevent packet processing (even for routes on same hash chain) on another cpu because we use RCU there. From davem@redhat.com Mon Jun 9 10:24:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:24:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HOZ2x007920 for ; Mon, 9 Jun 2003 10:24:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19665; Mon, 9 Jun 2003 10:21:32 -0700 Date: Mon, 09 Jun 2003 10:21:31 -0700 (PDT) Message-Id: <20030609.102131.73669235.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16091.32021.75335.227150@robur.slu.se> References: <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> <16091.32021.75335.227150@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3020 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Mon, 2 Jun 2003 18:36:37 +0200 Simon Kirby writes: > Is it possible to have a dst LRU or a simpler approximation of such and > recycle dst entries rather than deallocating/reallocating them? This > would relieve a lot of work from the garbage collector and avoid the > periodic large garbage collection latency. It could be tuned to only > occur in an attack (I remember Alexey saying that the deferred garbage > collection was implemented to reduce latency in normal opreation). I don't see how this can be done. Others may? Full recycle is very doable in 2.4.x, in 2.5.x is an enormously hard problem because we use RCU there (readers run completely without locks). From shemminger@osdl.org Mon Jun 9 10:51:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:51:23 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HpJ2x008863 for ; Mon, 9 Jun 2003 10:51:20 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Hp6X03779; Mon, 9 Jun 2003 10:51:06 -0700 Date: Mon, 9 Jun 2003 10:51:06 -0700 From: Stephen Hemminger To: Jeff Garzik , "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] warning in ethtool ixgb Message-Id: <20030609105106.0330bbec.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3021 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Looks like ethtool now knows about 10G devices in 2.5.70 bk-latest, so ixgb genrates a warning. This should fix that: diff -Nru a/drivers/net/ixgb/ixgb_ethtool.c b/drivers/net/ixgb/ixgb_ethtool.c --- a/drivers/net/ixgb/ixgb_ethtool.c Mon Jun 9 10:49:01 2003 +++ b/drivers/net/ixgb/ixgb_ethtool.c Mon Jun 9 10:49:01 2003 @@ -50,9 +50,6 @@ return (IXGB_EEPROM_SIZE << 1); } -#define SUPPORTED_10000baseT_Full (1 << 11) -#define SPEED_10000 10000 - static void ixgb_ethtool_gset(struct ixgb_adapter *adapter, struct ethtool_cmd *ecmd) { From shemminger@osdl.org Mon Jun 9 11:09:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:09:21 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59I9A2x009422; Mon, 9 Jun 2003 11:09:11 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59I8uX09307; Mon, 9 Jun 2003 11:08:56 -0700 Date: Mon, 9 Jun 2003 11:08:55 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: davem@redhat.com, ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-Id: <20030609110855.2e264ce1.shemminger@osdl.org> In-Reply-To: <20030609171224.GA14623@gtf.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> <20030609171224.GA14623@gtf.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3022 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 13:12:24 -0400 Jeff Garzik wrote: > On Mon, Jun 09, 2003 at 10:10:18AM -0700, Stephen Hemminger wrote: > > Looks like the right fix is: > > --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 > > +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 > > @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s > > struct ioc3 *ioc3 = ip->regs; > > > > iounmap(ioc3); > > + unregister_netdev(dev); > > pci_release_regions(pdev); > > kfree(dev); > > } > > You want to unregister before iounmap. > > Jeff > > Okay: --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 +++ ioc3-eth.c 2003-06-09 11:08:01.000000000 -0700 @@ -1613,6 +1613,7 @@ static void __devexit ioc3_remove_one (s struct ioc3_private *ip = dev->priv; struct ioc3 *ioc3 = ip->regs; + unregister_netdev(dev); iounmap(ioc3); pci_release_regions(pdev); kfree(dev); From garzik@gtf.org Mon Jun 9 11:11:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:11:58 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IBq2x009776 for ; Mon, 9 Jun 2003 11:11:52 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 963BA666E; Mon, 9 Jun 2003 14:11:51 -0400 (EDT) Date: Mon, 9 Jun 2003 14:11:51 -0400 From: Jeff Garzik To: "David S. Miller" Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030609181151.GA20308@gtf.org> References: <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> <20030609171446.GA15239@gtf.org> <20030609.101807.119873069.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.101807.119873069.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3023 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:18:07AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Mon, 9 Jun 2003 13:14:46 -0400 > > I wanted to wait on the s/kfree/release_netdev/ patch until the other > stuff is done. Said patch can be applied anytime, and doing it in this > order reduces 2.4 backport merge pain. :) > > No problem, but once all the init_etherdev() etc. crap is abolished, > Stephen's work or something similar goes in... > > Ok? Right. That's what I want :) Jeff From shemminger@osdl.org Mon Jun 9 11:51:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:51:19 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IpD2x011074 for ; Mon, 9 Jun 2003 11:51:13 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Ip1X25445; Mon, 9 Jun 2003 11:51:01 -0700 Date: Mon, 9 Jun 2003 11:51:01 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] expose alloc_netdev for use by drivers. Message-Id: <20030609115101.1f875e0c.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3024 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Several network drivers (tun, bridge, slip, vlan, ...) need to allocate private data structures in a like manner to ether_allocdev. This exposes the net_init hook for them to use. diff -Nru a/drivers/net/net_init.c b/drivers/net/net_init.c --- a/drivers/net/net_init.c Mon Jun 9 11:42:25 2003 +++ b/drivers/net/net_init.c Mon Jun 9 11:42:25 2003 @@ -70,7 +70,7 @@ */ -static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, +struct net_device *alloc_netdev(int sizeof_priv, const char *mask, void (*setup)(struct net_device *)) { struct net_device *dev; @@ -96,6 +96,7 @@ return dev; } +EXPORT_SYMBOL(alloc_netdev); static struct net_device *init_alloc_dev(int sizeof_priv) { diff -Nru a/include/linux/etherdevice.h b/include/linux/etherdevice.h --- a/include/linux/etherdevice.h Mon Jun 9 11:42:25 2003 +++ b/include/linux/etherdevice.h Mon Jun 9 11:42:25 2003 @@ -40,7 +40,6 @@ unsigned char *haddr); extern struct net_device *init_etherdev(struct net_device *dev, int sizeof_priv); extern struct net_device *alloc_etherdev(int sizeof_priv); - static inline void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) { memcpy (dest->data, src, len); diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Mon Jun 9 11:42:25 2003 +++ b/include/linux/netdevice.h Mon Jun 9 11:42:25 2003 @@ -815,6 +815,8 @@ extern void fc_setup(struct net_device *dev); extern void fc_freedev(struct net_device *dev); /* Support for loadable net-drivers */ +extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, + void (*setup)(struct net_device *)); extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); /* Functions used for multicast support */ From shemminger@osdl.org Mon Jun 9 11:54:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:54:28 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IsM2x011688 for ; Mon, 9 Jun 2003 11:54:23 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Is8X25887; Mon, 9 Jun 2003 11:54:08 -0700 Date: Mon, 9 Jun 2003 11:54:08 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] bridge using alloc_netdev Message-Id: <20030609115408.6b90dc4e.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3025 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This changes the bridge driver to encapsulate it's private information the same way that ether drivers do. This allows later delayed release to work properly. Tested on my machine. Since release_netdev isn't in yet, the destructor is just set to kfree -- that is why I wanted the release_netdev hook to make it in part I, so all these sub drivers got patched only once. diff -Nru a/net/bridge/br_device.c b/net/bridge/br_device.c --- a/net/bridge/br_device.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_device.c Mon Jun 9 11:42:38 2003 @@ -110,10 +110,6 @@ return -1; } -static void br_dev_destruct(struct net_device *dev) -{ - kfree(dev->priv); -} void br_dev_setup(struct net_device *dev) { @@ -124,10 +120,13 @@ dev->hard_start_xmit = br_dev_xmit; dev->open = br_dev_open; dev->set_multicast_list = br_dev_set_multicast_list; - dev->destructor = br_dev_destruct; + dev->destructor = (void (*)(struct net_device *))kfree; SET_MODULE_OWNER(dev); dev->stop = br_dev_stop; dev->accept_fastpath = br_dev_accept_fastpath; dev->tx_queue_len = 0; dev->set_mac_address = NULL; + dev->priv_flags = IFF_EBRIDGE; + + ether_setup(dev); } diff -Nru a/net/bridge/br_if.c b/net/bridge/br_if.c --- a/net/bridge/br_if.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_if.c Mon Jun 9 11:42:38 2003 @@ -78,17 +78,14 @@ struct net_bridge *br; struct net_device *dev; - if ((br = kmalloc(sizeof(*br), GFP_KERNEL)) == NULL) + dev = alloc_netdev(sizeof(struct net_bridge), name, + br_dev_setup); + + if (!dev) return NULL; - memset(br, 0, sizeof(*br)); - dev = &br->dev; - - strlcpy(dev->name, name, sizeof(dev->name)); - dev->priv = br; - dev->priv_flags = IFF_EBRIDGE; - ether_setup(dev); - br_dev_setup(dev); + br = dev->priv; + br->dev = dev; br->lock = SPIN_LOCK_UNLOCKED; INIT_LIST_HEAD(&br->port_list); @@ -159,9 +156,9 @@ if ((br = new_nb(name)) == NULL) return -ENOMEM; - ret = register_netdev(&br->dev); + ret = register_netdev(br->dev); if (ret) - kfree(br); + kfree(br->dev); return ret; } @@ -219,7 +216,7 @@ br_stp_recalculate_bridge_id(br); br_fdb_insert(br, p, dev->dev_addr, 1); - if ((br->dev.flags & IFF_UP) && (dev->flags & IFF_UP)) + if ((br->dev->flags & IFF_UP) && (dev->flags & IFF_UP)) br_stp_enable_port(p); spin_unlock_bh(&br->lock); diff -Nru a/net/bridge/br_input.c b/net/bridge/br_input.c --- a/net/bridge/br_input.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_input.c Mon Jun 9 11:42:38 2003 @@ -40,7 +40,7 @@ br->statistics.rx_bytes += skb->len; indev = skb->dev; - skb->dev = &br->dev; + skb->dev = br->dev; NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL, br_pass_frame_up_finish); @@ -67,7 +67,7 @@ br = p->br; passedup = 0; - if (br->dev.flags & IFF_PROMISC) { + if (br->dev->flags & IFF_PROMISC) { struct sk_buff *skb2; skb2 = skb_clone(skb, GFP_ATOMIC); @@ -140,7 +140,7 @@ return -1; } - if (!memcmp(p->br->dev.dev_addr, dest, ETH_ALEN)) + if (!memcmp(p->br->dev->dev_addr, dest, ETH_ALEN)) skb->pkt_type = PACKET_HOST; NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, diff -Nru a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c --- a/net/bridge/br_netfilter.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_netfilter.c Mon Jun 9 11:42:38 2003 @@ -37,7 +37,7 @@ sizeof(struct bridge_skb_cb))) #define has_bridge_parent(device) ((device)->br_port != NULL) -#define bridge_parent(device) (&((device)->br_port->br->dev)) +#define bridge_parent(device) ((device)->br_port->br->dev) /* We need these fake structures to make netfilter happy -- * lots of places assume that skb->dst != NULL, which isn't diff -Nru a/net/bridge/br_notify.c b/net/bridge/br_notify.c --- a/net/bridge/br_notify.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_notify.c Mon Jun 9 11:42:38 2003 @@ -52,7 +52,7 @@ break; case NETDEV_DOWN: - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { spin_lock_bh(&br->lock); br_stp_disable_port(p); spin_unlock_bh(&br->lock); @@ -60,7 +60,7 @@ break; case NETDEV_UP: - if (!(br->dev.flags & IFF_UP)) { + if (!(br->dev->flags & IFF_UP)) { spin_lock_bh(&br->lock); br_stp_enable_port(p); spin_unlock_bh(&br->lock); diff -Nru a/net/bridge/br_private.h b/net/bridge/br_private.h --- a/net/bridge/br_private.h Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_private.h Mon Jun 9 11:42:38 2003 @@ -81,7 +81,7 @@ { spinlock_t lock; struct list_head port_list; - struct net_device dev; + struct net_device *dev; struct net_device_stats statistics; rwlock_t hash_lock; struct hlist_head hash[BR_HASH_SIZE]; diff -Nru a/net/bridge/br_stp.c b/net/bridge/br_stp.c --- a/net/bridge/br_stp.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp.c Mon Jun 9 11:42:38 2003 @@ -26,7 +26,7 @@ void br_log_state(const struct net_bridge_port *p) { pr_info("%s: port %d(%s) entering %s state\n", - p->br->dev.name, p->port_no, p->dev->name, + p->br->dev->name, p->port_no, p->dev->name, br_port_state_names[p->state]); } @@ -130,7 +130,7 @@ br_topology_change_detection(br); del_timer(&br->tcn_timer); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); mod_timer(&br->hello_timer, jiffies + br->hello_time); } @@ -289,10 +289,10 @@ /* called under bridge lock */ void br_topology_change_detection(struct net_bridge *br) { - if (!(br->dev.flags & IFF_UP)) + if (!(br->dev->flags & IFF_UP)) return; - pr_info("%s: topology change detected", br->dev.name); + pr_info("%s: topology change detected", br->dev->name); if (br_is_root_bridge(br)) { printk(", propagating"); br->topology_change = 1; @@ -446,7 +446,7 @@ { if (br_is_designated_port(p)) { pr_info("%s: received tcn bpdu on port %i(%s)\n", - p->br->dev.name, p->port_no, p->dev->name); + p->br->dev->name, p->port_no, p->dev->name); br_topology_change_detection(p->br); br_topology_change_acknowledge(p); diff -Nru a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c --- a/net/bridge/br_stp_bpdu.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_bpdu.c Mon Jun 9 11:42:38 2003 @@ -145,7 +145,7 @@ spin_lock_bh(&br->lock); if (p->state == BR_STATE_DISABLED - || !(br->dev.flags & IFF_UP) + || !(br->dev->flags & IFF_UP) || !br->stp_enabled || memcmp(buf, header, 6)) goto out; diff -Nru a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c --- a/net/bridge/br_stp_if.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_if.c Mon Jun 9 11:42:38 2003 @@ -93,7 +93,7 @@ br = p->br; printk(KERN_INFO "%s: port %i(%s) entering %s state\n", - br->dev.name, p->port_no, p->dev->name, "disabled"); + br->dev->name, p->port_no, p->dev->name, "disabled"); wasroot = br_is_root_bridge(br); br_become_designated_port(p); @@ -124,7 +124,7 @@ memcpy(oldaddr, br->bridge_id.addr, ETH_ALEN); memcpy(br->bridge_id.addr, addr, ETH_ALEN); - memcpy(br->dev.dev_addr, addr, ETH_ALEN); + memcpy(br->dev->dev_addr, addr, ETH_ALEN); list_for_each_entry(p, &br->port_list, list) { if (!memcmp(p->designated_bridge.addr, oldaddr, ETH_ALEN)) diff -Nru a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c --- a/net/bridge/br_stp_timer.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_timer.c Mon Jun 9 11:42:38 2003 @@ -38,9 +38,9 @@ { struct net_bridge *br = (struct net_bridge *)arg; - pr_debug("%s: hello timer expired\n", br->dev.name); + pr_debug("%s: hello timer expired\n", br->dev->name); spin_lock_bh(&br->lock); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); br->hello_timer.expires = jiffies + br->hello_time; @@ -61,7 +61,7 @@ pr_info("%s: neighbor %.2x%.2x.%.2x:%.2x:%.2x:%.2x:%.2x:%.2x lost on port %d(%s)\n", - br->dev.name, + br->dev->name, id->prio[0], id->prio[1], id->addr[0], id->addr[1], id->addr[2], id->addr[3], id->addr[4], id->addr[5], @@ -89,7 +89,7 @@ struct net_bridge *br = p->br; pr_debug("%s: %d(%s) forward delay timer\n", - br->dev.name, p->port_no, p->dev->name); + br->dev->name, p->port_no, p->dev->name); spin_lock_bh(&br->lock); if (p->state == BR_STATE_LISTENING) { p->state = BR_STATE_LEARNING; @@ -108,9 +108,9 @@ { struct net_bridge *br = (struct net_bridge *) arg; - pr_debug("%s: tcn timer expired\n", br->dev.name); + pr_debug("%s: tcn timer expired\n", br->dev->name); spin_lock_bh(&br->lock); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_transmit_tcn(br); br->tcn_timer.expires = jiffies + br->bridge_hello_time; @@ -123,7 +123,7 @@ { struct net_bridge *br = (struct net_bridge *) arg; - pr_debug("%s: topo change timer expired\n", br->dev.name); + pr_debug("%s: topo change timer expired\n", br->dev->name); spin_lock_bh(&br->lock); br->topology_change_detected = 0; br->topology_change = 0; @@ -135,7 +135,7 @@ struct net_bridge_port *p = (struct net_bridge_port *) arg; pr_debug("%s: %d(%s) hold timer expired\n", - p->br->dev.name, p->port_no, p->dev->name); + p->br->dev->name, p->port_no, p->dev->name); spin_lock_bh(&p->br->lock); if (p->config_pending) diff -Nru a/net/bridge/netfilter/ebt_redirect.c b/net/bridge/netfilter/ebt_redirect.c --- a/net/bridge/netfilter/ebt_redirect.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/netfilter/ebt_redirect.c Mon Jun 9 11:42:38 2003 @@ -22,7 +22,7 @@ if (hooknr != NF_BR_BROUTING) memcpy((**pskb).mac.ethernet->h_dest, - in->br_port->br->dev.dev_addr, ETH_ALEN); + in->br_port->br->dev->dev_addr, ETH_ALEN); else { memcpy((**pskb).mac.ethernet->h_dest, in->dev_addr, ETH_ALEN); diff -Nru a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c --- a/net/bridge/netfilter/ebtables.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/netfilter/ebtables.c Mon Jun 9 11:42:38 2003 @@ -135,10 +135,10 @@ if (FWINV2(ebt_dev_check(e->out, out), EBT_IOUT)) return 1; if ((!in || !in->br_port) ? 0 : FWINV2(ebt_dev_check( - e->logical_in, &in->br_port->br->dev), EBT_ILOGICALIN)) + e->logical_in, in->br_port->br->dev), EBT_ILOGICALIN)) return 1; if ((!out || !out->br_port) ? 0 : FWINV2(ebt_dev_check( - e->logical_out, &out->br_port->br->dev), EBT_ILOGICALOUT)) + e->logical_out, out->br_port->br->dev), EBT_ILOGICALOUT)) return 1; if (e->bitmask & EBT_SOURCEMAC) { From shemminger@osdl.org Mon Jun 9 11:55:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:56:02 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Itv2x012008 for ; Mon, 9 Jun 2003 11:55:58 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59IthX26254; Mon, 9 Jun 2003 11:55:43 -0700 Date: Mon, 9 Jun 2003 11:55:43 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] vlan network device using alloc_netdev Message-Id: <20030609115543.42092c96.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3026 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change how vlan driver allocates private data so that it is like ether devices, and will support later changes for delayed free. diff -Nru a/net/8021q/vlan.c b/net/8021q/vlan.c --- a/net/8021q/vlan.c Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan.c Mon Jun 9 11:43:20 2003 @@ -334,6 +334,33 @@ return ret; } +static void vlan_setup(struct net_device *new_dev) +{ + SET_MODULE_OWNER(new_dev); + + /* new_dev->ifindex = 0; it will be set when added to + * the global list. + * iflink is set as well. + */ + new_dev->get_stats = vlan_dev_get_stats; + + /* Make this thing known as a VLAN device */ + new_dev->priv_flags |= IFF_802_1Q_VLAN; + + /* Set us up to have no queue, as the underlying Hardware device + * can do all the queueing we could want. + */ + new_dev->tx_queue_len = 0; + + /* set up method calls */ + new_dev->change_mtu = vlan_dev_change_mtu; + new_dev->open = vlan_dev_open; + new_dev->stop = vlan_dev_stop; + new_dev->set_mac_address = vlan_dev_set_mac_address; + new_dev->set_multicast_list = vlan_dev_set_multicast_list; + new_dev->destructor = (void (*)(struct net_device *)) kfree; +} + /* Attach a VLAN device to a mac address (ie Ethernet Card). * Returns the device that was created, or NULL if there was * an error of some kind. @@ -344,8 +371,8 @@ struct vlan_group *grp; struct net_device *new_dev; struct net_device *real_dev; /* the ethernet device */ - int malloc_size = 0; int r; + char name[IFNAMSIZ]; #ifdef VLAN_DEBUG printk(VLAN_DBG "%s: if_name -:%s:- vid: %i\n", @@ -403,21 +430,6 @@ goto out_unlock; } - malloc_size = (sizeof(struct net_device)); - new_dev = (struct net_device *) kmalloc(malloc_size, GFP_KERNEL); - VLAN_MEM_DBG("net_device malloc, addr: %p size: %i\n", - new_dev, malloc_size); - - if (new_dev == NULL) - goto out_unlock; - - memset(new_dev, 0, malloc_size); - - /* Set us up to have no queue, as the underlying Hardware device - * can do all the queueing we could want. - */ - new_dev->tx_queue_len = 0; - /* Gotta set up the fields for the device. */ #ifdef VLAN_DEBUG printk(VLAN_DBG "About to allocate name, vlan_name_type: %i\n", @@ -426,54 +438,44 @@ switch (vlan_name_type) { case VLAN_NAME_TYPE_RAW_PLUS_VID: /* name will look like: eth1.0005 */ - sprintf(new_dev->name, "%s.%.4i", real_dev->name, VLAN_ID); + snprintf(name, IFNAMSIZ, "%s.%.4i", real_dev->name, VLAN_ID); break; case VLAN_NAME_TYPE_PLUS_VID_NO_PAD: /* Put our vlan.VID in the name. * Name will look like: vlan5 */ - sprintf(new_dev->name, "vlan%i", VLAN_ID); + snprintf(name, IFNAMSIZ, "vlan%i", VLAN_ID); break; case VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD: /* Put our vlan.VID in the name. * Name will look like: eth0.5 */ - sprintf(new_dev->name, "%s.%i", real_dev->name, VLAN_ID); + snprintf(name, IFNAMSIZ, "%s.%i", real_dev->name, VLAN_ID); break; case VLAN_NAME_TYPE_PLUS_VID: /* Put our vlan.VID in the name. * Name will look like: vlan0005 */ default: - sprintf(new_dev->name, "vlan%.4i", VLAN_ID); + snprintf(name, IFNAMSIZ, "vlan%.4i", VLAN_ID); }; + new_dev = alloc_netdev(sizeof(struct vlan_dev_info), name, + vlan_setup); + if (new_dev == NULL) + goto out_unlock; + #ifdef VLAN_DEBUG printk(VLAN_DBG "Allocated new name -:%s:-\n", new_dev->name); #endif - /* set up method calls */ - new_dev->init = vlan_dev_init; - new_dev->destructor = vlan_dev_destruct; - SET_MODULE_OWNER(new_dev); - - /* new_dev->ifindex = 0; it will be set when added to - * the global list. - * iflink is set as well. - */ - new_dev->get_stats = vlan_dev_get_stats; - /* IFF_BROADCAST|IFF_MULTICAST; ??? */ new_dev->flags = real_dev->flags; new_dev->flags &= ~IFF_UP; - /* Make this thing known as a VLAN device */ - new_dev->priv_flags |= IFF_802_1Q_VLAN; - /* need 4 bytes for extra VLAN header info, * hope the underlying device can handle it. */ new_dev->mtu = real_dev->mtu; - new_dev->change_mtu = vlan_dev_change_mtu; /* TODO: maybe just assign it to be ETHERNET? */ new_dev->type = real_dev->type; @@ -484,24 +486,14 @@ new_dev->hard_header_len += VLAN_HLEN; } - new_dev->priv = kmalloc(sizeof(struct vlan_dev_info), - GFP_KERNEL); VLAN_MEM_DBG("new_dev->priv malloc, addr: %p size: %i\n", new_dev->priv, sizeof(struct vlan_dev_info)); - if (new_dev->priv == NULL) - goto out_free_newdev; - - memset(new_dev->priv, 0, sizeof(struct vlan_dev_info)); - memcpy(new_dev->broadcast, real_dev->broadcast, real_dev->addr_len); memcpy(new_dev->dev_addr, real_dev->dev_addr, real_dev->addr_len); new_dev->addr_len = real_dev->addr_len; - new_dev->open = vlan_dev_open; - new_dev->stop = vlan_dev_stop; - if (real_dev->features & NETIF_F_HW_VLAN_TX) { new_dev->hard_header = real_dev->hard_header; new_dev->hard_start_xmit = vlan_dev_hwaccel_hard_start_xmit; @@ -512,8 +504,6 @@ new_dev->rebuild_header = vlan_dev_rebuild_header; } new_dev->hard_header_parse = real_dev->hard_header_parse; - new_dev->set_mac_address = vlan_dev_set_mac_address; - new_dev->set_multicast_list = vlan_dev_set_multicast_list; VLAN_DEV_INFO(new_dev)->vlan_id = VLAN_ID; /* 1 through VLAN_VID_MASK */ VLAN_DEV_INFO(new_dev)->real_dev = real_dev; @@ -526,7 +516,7 @@ #endif if (register_netdevice(new_dev)) - goto out_free_newdev_priv; + goto out_free_newdev; /* So, got the sucker initialized, now lets place * it into our local structure. @@ -572,9 +562,7 @@ out_free_unregister: unregister_netdev(new_dev); - -out_free_newdev_priv: - kfree(new_dev->priv); + goto out_put_dev; out_free_newdev: kfree(new_dev); diff -Nru a/net/8021q/vlan.h b/net/8021q/vlan.h --- a/net/8021q/vlan.h Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan.h Mon Jun 9 11:43:20 2003 @@ -65,8 +65,6 @@ int vlan_dev_set_mac_address(struct net_device *dev, void* addr); int vlan_dev_open(struct net_device* dev); int vlan_dev_stop(struct net_device* dev); -int vlan_dev_init(struct net_device* dev); -void vlan_dev_destruct(struct net_device* dev); int vlan_dev_set_ingress_priority(char* dev_name, __u32 skb_prio, short vlan_prio); int vlan_dev_set_egress_priority(char* dev_name, __u32 skb_prio, short vlan_prio); int vlan_dev_set_vlan_flag(char* dev_name, __u32 flag, short flag_val); diff -Nru a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c --- a/net/8021q/vlan_dev.c Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan_dev.c Mon Jun 9 11:43:20 2003 @@ -766,28 +766,6 @@ vlan_flush_mc_list(dev); return 0; } - -int vlan_dev_init(struct net_device *dev) -{ - /* TODO: figure this out, maybe do nothing?? */ - return 0; -} - -void vlan_dev_destruct(struct net_device *dev) -{ - if (dev) { - vlan_flush_mc_list(dev); - if (dev->priv) { - if (VLAN_DEV_INFO(dev)->dent) - BUG(); - - kfree(dev->priv); - dev->priv = NULL; - } - kfree(dev); - } -} - /** Taken from Gleb + Lennert's VLAN code, and modified... */ void vlan_dev_set_multicast_list(struct net_device *vlan_dev) { From shemminger@osdl.org Mon Jun 9 11:59:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:59:15 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IxA2x012340 for ; Mon, 9 Jun 2003 11:59:11 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59IwvX27144; Mon, 9 Jun 2003 11:58:57 -0700 Date: Mon, 9 Jun 2003 11:58:57 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] tun using alloc_netdev Message-Id: <20030609115857.38bb31d6.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3027 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change how TUN allocates private data, to be like ethernet devices. This allows later changes that make network device structure persist if sysfs hooks are open. Compiles and loads, but don't know how to test it. One gratitious change was to add C99 initializer for tun_miscdev. diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c --- a/drivers/net/tun.c Mon Jun 9 11:43:37 2003 +++ b/drivers/net/tun.c Mon Jun 9 11:43:37 2003 @@ -122,12 +122,6 @@ DBG(KERN_INFO "%s: tun_net_init\n", tun->name); - SET_MODULE_OWNER(dev); - dev->open = tun_net_open; - dev->hard_start_xmit = tun_net_xmit; - dev->stop = tun_net_close; - dev->get_stats = tun_net_stats; - switch (tun->flags & TUN_TYPE_MASK) { case TUN_TUN_DEV: /* Point-to-Point TUN Device */ @@ -199,14 +193,14 @@ skb_reserve(skb, 2); memcpy_fromiovec(skb_put(skb, len), iv, len); - skb->dev = &tun->dev; + skb->dev = tun->dev; switch (tun->flags & TUN_TYPE_MASK) { case TUN_TUN_DEV: skb->mac.raw = skb->data; skb->protocol = pi.proto; break; case TUN_TAP_DEV: - skb->protocol = eth_type_trans(skb, &tun->dev); + skb->protocol = eth_type_trans(skb, tun->dev); break; }; @@ -325,7 +319,7 @@ schedule(); continue; } - netif_start_queue(&tun->dev); + netif_start_queue(tun->dev); ret = tun_put_user(tun, skb, (struct iovec *) iv, len); @@ -347,6 +341,24 @@ return tun_chr_readv(file, &iv, 1, pos); } +static void tun_setup(struct net_device *dev) +{ + struct tun_struct *tun = dev->priv; + + skb_queue_head_init(&tun->readq); + init_waitqueue_head(&tun->read_wait); + + tun->owner = -1; + dev->init = tun_net_init; + tun->name = dev->name; + SET_MODULE_OWNER(dev); + dev->open = tun_net_open; + dev->hard_start_xmit = tun_net_xmit; + dev->stop = tun_net_close; + dev->get_stats = tun_net_stats; + dev->destructor = (void (*)(struct net_device *))kfree; +} + static int tun_set_iff(struct file *file, struct ifreq *ifr) { struct tun_struct *tun; @@ -367,30 +379,18 @@ return -EPERM; } else { char *name; - - /* Allocate new device */ - if (!(tun = kmalloc(sizeof(struct tun_struct), GFP_KERNEL)) ) - return -ENOMEM; - memset(tun, 0, sizeof(struct tun_struct)); - - skb_queue_head_init(&tun->readq); - init_waitqueue_head(&tun->read_wait); - - tun->owner = -1; - tun->dev.init = tun_net_init; - tun->dev.priv = tun; - SET_MODULE_OWNER(&tun->dev); + unsigned long flags = 0; err = -EINVAL; /* Set dev type */ if (ifr->ifr_flags & IFF_TUN) { /* TUN device */ - tun->flags |= TUN_TUN_DEV; + flags |= TUN_TUN_DEV; name = "tun%d"; } else if (ifr->ifr_flags & IFF_TAP) { /* TAP device */ - tun->flags |= TUN_TAP_DEV; + flags |= TUN_TAP_DEV; name = "tap%d"; } else goto failed; @@ -398,12 +398,19 @@ if (*ifr->ifr_name) name = ifr->ifr_name; - if ((err = dev_alloc_name(&tun->dev, name)) < 0) - goto failed; - if ((err = register_netdevice(&tun->dev))) + dev = alloc_netdev(sizeof(struct tun_struct), name, + tun_setup); + if (!dev) + return -ENOMEM; + + tun = dev->priv; + tun->flags = flags; + + if ((err = register_netdevice(tun->dev))) { + kfree(dev); goto failed; + } - tun->name = tun->dev.name; } DBG(KERN_INFO "%s: tun_set_iff\n", tun->name); @@ -419,9 +426,7 @@ strcpy(ifr->ifr_name, tun->name); return 0; - -failed: - kfree(tun); + failed: return err; } @@ -548,10 +553,8 @@ /* Drop read queue */ skb_queue_purge(&tun->readq); - if (!(tun->flags & TUN_PERSIST)) { - dev_close(&tun->dev); - unregister_netdevice(&tun->dev); - } + if (!(tun->flags & TUN_PERSIST)) + unregister_netdevice(tun->dev); rtnl_unlock(); @@ -574,11 +577,10 @@ .fasync = tun_chr_fasync }; -static struct miscdevice tun_miscdev= -{ - TUN_MINOR, - "net/tun", - &tun_fops +static struct miscdevice tun_miscdev = { + .minor = TUN_MINOR, + .name = "net/tun", + .fops = &tun_fops }; int __init tun_init(void) diff -Nru a/include/linux/if_tun.h b/include/linux/if_tun.h --- a/include/linux/if_tun.h Mon Jun 9 11:43:37 2003 +++ b/include/linux/if_tun.h Mon Jun 9 11:43:37 2003 @@ -40,7 +40,7 @@ wait_queue_head_t read_wait; struct sk_buff_head readq; - struct net_device dev; + struct net_device *dev; struct net_device_stats stats; struct fasync_struct *fasync; From davem@redhat.com Mon Jun 9 12:03:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 12:03:15 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59J3B2x012707 for ; Mon, 9 Jun 2003 12:03:12 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA20323; Mon, 9 Jun 2003 12:00:02 -0700 Date: Mon, 09 Jun 2003 12:00:02 -0700 (PDT) Message-Id: <20030609.120002.129768701.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev From: "David S. Miller" In-Reply-To: <20030609115857.38bb31d6.shemminger@osdl.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3028 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev All this stuff looks great Stephen, I'll apply this tomorrow unless there are major objections. From davem@redhat.com Mon Jun 9 14:33:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 14:34:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59LXv2x017152 for ; Mon, 9 Jun 2003 14:33:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA20856; Mon, 9 Jun 2003 14:30:46 -0700 Date: Mon, 09 Jun 2003 14:30:46 -0700 (PDT) Message-Id: <20030609.143046.48683937.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <004f01c32ebe$b4bd88d0$4a00000a@badass> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3029 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 15:38:30 -0400 I've tried other settings, secret-interval 1 which seems to 'flush' the cache every second or 60 seconds as I have it here.. If I have secret interval set to 1 the GC never runs because the cache never gets > my gc thresh.. Set secret interval to infinity. Even the default setting of 10 minutes is overly anal. It's only picking a new random secret for the hash so that algorithmic attacks are less likely even if the attacker find a method by which to determine the secret key on your system. It is impossible for an attacker to do this as far as I am aware. Also tried with max_size 16000 but juno pegs the route cache What do you mean, specifically, by "pegs"? This seems to be a good compromise for now.. Setting the secret interval smaller than it's default serves no purpose. I would recommend instead to incrase it. Ok you see this happening but during this the router is almost unusable.. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3 root 20 -1 0 0 0 RW< 48.5 0.0 34:04 ksoftirqd_CPU0 4 root 20 -1 0 0 0 RW< 46.7 0.0 34:14 ksoftirqd_CPU1 Both cpus are slammed at 100% by the ksoftirqds. ksoftirqd kicks in WAY too early, try my patch below. This is using e1000 with interrups limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time and drops more packets than without :> Something is very wrong, NAPI can only give your system more CPU time by which to do packet processing. Some good kernel profiles would be nice too. Anyways, here is the patch to make ksoftirqd no kick in so quickly, it's based upon a 2.4.x patch from Ingo Molnar: --- kernel/softirq.c.~1~ Mon Jun 9 14:28:02 2003 +++ kernel/softirq.c Mon Jun 9 14:29:28 2003 @@ -52,11 +52,22 @@ wake_up_process(tsk); } +/* + * We restart softirq processing MAX_SOFTIRQ_RESTART times, + * and we fall back to softirqd after that. + * + * This number has been established via experimentation. + * The two things to balance is latency against fairness - + * we want to handle softirqs as soon as possible, but they + * should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_RESTART 10 + asmlinkage void do_softirq(void) { + int max_restart = MAX_SOFTIRQ_RESTART; __u32 pending; unsigned long flags; - __u32 mask; if (in_interrupt()) return; @@ -68,7 +79,6 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); restart: /* Reset the pending bitmask before enabling irqs */ @@ -88,10 +98,8 @@ local_irq_disable(); pending = local_softirq_pending(); - if (pending & mask) { - mask &= ~pending; + if (pending && --max_restart) goto restart; - } if (pending) wakeup_softirqd(smp_processor_id()); __local_bh_enable(); From sim@netnation.com Mon Jun 9 15:19:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:19:19 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59MJC2x017843 for ; Mon, 9 Jun 2003 15:19:12 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PUyt-00037Z-97; Mon, 09 Jun 2003 15:19:11 -0700 Date: Mon, 9 Jun 2003 15:19:11 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609221911.GF11509@netnation.com> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <004f01c32ebe$b4bd88d0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 3030 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 03:38:30PM -0400, CIT/Paul wrote: > gc_elasticity:1 > gc_interval:600 > gc_min_interval:1 > gc_thresh:60000 > gc_timeout:15 > max_delay:10 > max_size:512000 ^^^ EEP, no! Even the default of 65536 is too big. No wonder you have no CPU left. This should never be bigger than 65536 (unless the hash is increased), but even then it should be set smaller and the GC interval should be fixed. With a table that large, it's going to be walking the buckets all of the time. > I've tried other settings, secret-interval 1 which seems to 'flush' the > cache every second or 60 seconds as I have it here.. That's only for permutating the hash table to avoid remote hash exploits. Ideally, you don't want anything clearing the route cache except for the regular garbage collection (where the gc_elasticity controls how much of it gets nuked). > If I have secret interval set to 1 the GC never runs because the cache > never gets > my gc thresh.. I've also tried this with > Gc_thresh 2000 and more aggressive settings (timeout 5, interval 10).. > Also tried with max_size 16000 but juno pegs the route cache > And I get massive amounts of dst_cache_overflow messages .. Try setting gc_min_interval to 0 and gc_elasticity to 4 (so that it doesn't entirely nuke it all the time, but so that it runs fairly often and prunes quite a bit). gc_min_interval:0 will actually make it clear as it allocates, if I remember correctly. > This is 'normal' traffic on the router (using the rtstat program) > > ./rts -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot > mc GC: tot ignored goal_miss ovrf > 59272 26954 1826 0 0 0 0 0 6 0 > 0 0 0 0 0 Yes, your route cache is way too large for the hash. Ours looks like this: [sroot@r2:/root]# rtstat -i 1 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc 870721946 16394 1013 8 4 4 0 0 38 12 0 870722937 16278 1007 8 0 10 0 0 32 6 0 870723935 16362 999 5 0 6 0 0 34 8 0 870725083 16483 1158 1 0 0 0 2 26 6 0 870726047 16634 974 0 0 4 0 0 42 0 0 870726168 14315 2338 13 10 8 0 0 34 44 2 870726168 14683 1383 0 8 2 0 0 30 12 2 870726864 16172 1155 0 6 2 0 0 28 4 0 870728079 17842 1234 0 0 0 0 0 28 12 0 870729106 17545 1036 2 0 2 0 0 30 6 0 ...Hmm, the size is a bit off there. I'm not sure what that's all about. Did you have to hack on rtstat.c at all? Alternative: [sroot@r2:/root]# while (1) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 8064 8706 9299 9939 10277 10857 11426 11731 12328 12796 13096 13623 1139 2712 4233 561 2468 3948 5075 5459 6114 6768 7502 7815 8303 8969 9602 10090 10566 11194 11765 11987 12678 12920 13563 14136 14693 2336 3652 4814 5954 6449 6741 7412 8036 ....Hmm, even that is growing a bit large. Pfft. I guess we were doing less traffic last time I checked this. :) Maybe you have a bit more traffic than us in normal operation and it's growing faster because of that. Still, with a gc_elasticity of 1 it should be clearing it out very quickly. ...Though I just tried that, and it's not. In fact, the gc_elasticity doesn't seem to be making much of a difference at all. The only thing that seems to really change it is if I set gc_min_interval to 0: [sroot@r2:/proc/sys/net/ipv4/route]# echo 0 > gc_min_interval [sroot@r2:/proc/sys/net/ipv4/route]# while ( 1 ) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 9674 9547 9678 9525 9625 9544 9385 497 2579 3820 4083 4099 4068 4054 4089 4095 4137 4072 4071 4137 2141 3414 4044 2487 3759 4047 4085 4092 4156 4089 4008 475 2497 3729 4146 4085 4116 It seems to regulate it after it gets cleared the first time. If I set gc_elasticity to 1 it seems to bounce around a lot more -- 4 is much smoother. It didn't seem to make a difference with gc_min_interval set to 1, though... hmmm. We've been running normally with gc_min_interval set to 1, but it looks like the BGP table updates have kept the cache from growing too large. > Check what happens when I load up juno.. Yeah... Juno's just going to hit it harder and show the problems with it having to walk through such large hash buckets. How big is your routing table on this box? Is it running BGP? > slammed at 100% by the ksoftirqds. This is using e1000 with interrups > limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time and > drops more packets than without :> Hmm, that's weird. It works quite well here on a single CPU box with tg3 cards. Simon- From Robert.Olsson@data.slu.se Mon Jun 9 15:40:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:41:01 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Mer2x018302 for ; Mon, 9 Jun 2003 15:40:55 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id AAA14450; Tue, 10 Jun 2003 00:39:56 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.3260.654259.708727@robur.slu.se> Date: Tue, 10 Jun 2003 00:39:56 +0200 To: "David S. Miller" Cc: sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030609.015648.55736734.davem@redhat.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030609071330.GD20613@netnation.com> <20030609.015648.55736734.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3031 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > BTW, ignoring juno, Robert Olsson has some pktgen hacks that allow > that to generate new-dst-per-packet DoS like traffic. It's much > more effective than Juno-z > > Robert could you should these guys your hacks to do that? Sure. What a discussion... Well I'm happy for the past lazy days. I've include some references in the experiment from last week and it should be interesting for people in this discussion. Summary: Forwarding experiment with different rates of new incoming destinations/sec. Ranging from DoS attack to single destination flow. With full 123k routes. http://robur.slu.se/Linux/net-development/experiments/router-flow-test.html Your latest patch looks interesting... good thinking. Operations and tuning would be simplier. Hope to have time for a test tomorrow. Testing is very manual work still. Cheers. --ro From Robert.Olsson@data.slu.se Mon Jun 9 15:55:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:55:18 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Mt92x018711 for ; Mon, 9 Jun 2003 15:55:10 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id AAA14703; Tue, 10 Jun 2003 00:54:32 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.4136.328760.955758@robur.slu.se> Date: Tue, 10 Jun 2003 00:54:32 +0200 To: Simon Kirby Cc: CIT/Paul , "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609221911.GF11509@netnation.com> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3032 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > [sroot@r2:/root]# rtstat -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc > 870721946 16394 1013 8 4 4 0 0 38 12 0 > 870722937 16278 1007 8 0 10 0 0 32 6 0 > ...Hmm, the size is a bit off there. I'm not sure what that's all about. Seems you have an older version of rtstat. There are stats for the GC process there too. You can get recent rtstat from: robur.slu.se:/pub/Linux/net-development/rt_cache_stat/rtstat.c I'm about to propose some stats even for hash spinning.... --- linux/include/net/route.h.orig 2003-03-24 22:59:53.000000000 +0100 +++ linux/include/net/route.h 2003-05-16 11:04:07.000000000 +0200 @@ -102,6 +102,8 @@ unsigned int gc_ignored; unsigned int gc_goal_miss; unsigned int gc_dst_overflow; + unsigned int in_hlist_search; + unsigned int out_hlist_search; }; extern struct rt_cache_stat *rt_cache_stat; --- linux/net/ipv4/route.c.orig 2003-03-24 23:01:48.000000000 +0100 +++ linux/net/ipv4/route.c 2003-05-16 11:18:54.000000000 +0200 @@ -321,7 +321,7 @@ for (i = 0; i < NR_CPUS; i++) { if (!cpu_possible(i)) continue; - len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", + len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", dst_entries, per_cpu_ptr(rt_cache_stat, i)->in_hit, per_cpu_ptr(rt_cache_stat, i)->in_slow_tot, @@ -338,7 +338,9 @@ per_cpu_ptr(rt_cache_stat, i)->gc_total, per_cpu_ptr(rt_cache_stat, i)->gc_ignored, per_cpu_ptr(rt_cache_stat, i)->gc_goal_miss, - per_cpu_ptr(rt_cache_stat, i)->gc_dst_overflow + per_cpu_ptr(rt_cache_stat, i)->gc_dst_overflow, + per_cpu_ptr(rt_cache_stat, i)->in_hlist_search, + per_cpu_ptr(rt_cache_stat, i)->out_hlist_search ); } @@ -1771,6 +1773,7 @@ skb->dst = (struct dst_entry*)rth; return 0; } + RT_CACHE_STAT_INC(in_hlist_search); } rcu_read_unlock(); @@ -2137,6 +2140,7 @@ *rp = rth; return 0; } + RT_CACHE_STAT_INC(out_hlist_search); } rcu_read_unlock(); Cheers. --ro From xerox@foonet.net Mon Jun 9 15:57:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:57:33 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59MvM2x019029 for ; Mon, 9 Jun 2003 15:57:22 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h59MvHeq019566; Mon, 9 Jun 2003 18:57:17 -0400 From: "CIT/Paul" To: "'Simon Kirby'" Cc: "'David S. Miller'" , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 18:56:18 -0400 Organization: CIT Message-ID: <008001c32eda$56760830$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030609221911.GF11509@netnation.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 3033 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss under high load.. So I find that the e1000 ITR works just as well And there is no reason for NAPI at this point. I will try your settings :) net.ipv4.route.secret_interval = 600 net.ipv4.route.min_adv_mss = 256 net.ipv4.route.min_pmtu = 552 net.ipv4.route.mtu_expires = 600 net.ipv4.route.gc_elasticity = 4 net.ipv4.route.error_burst = 500 net.ipv4.route.error_cost = 100 net.ipv4.route.redirect_silence = 2048 net.ipv4.route.redirect_number = 9 net.ipv4.route.redirect_load = 2 net.ipv4.route.gc_interval = 600 net.ipv4.route.gc_timeout = 15 net.ipv4.route.gc_min_interval = 0 net.ipv4.route.max_size = 32768 net.ipv4.route.gc_thresh = 2000 net.ipv4.route.max_delay = 10 net.ipv4.route.min_delay = 5 Current settings.... Rtstat output: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2010 9014 14039 0 0 0 0 0 0 6 2 14038 0 49 0 2008 8675 13999 0 0 0 1 0 1 5 2 13992 0 56 0 2002 8529 16484 0 0 0 1 0 0 7 2 16483 0 43 0 2009 8549 15304 0 0 0 0 0 1 10 2 15303 0 55 0 2007 8491 16118 0 0 0 0 0 0 10 2 16117 0 50 0 2024 8219 18306 0 0 0 1 0 0 7 2 18309 0 14 0 2005 8586 15536 0 0 0 0 0 0 9 2 15536 0 42 0 2007 8804 15797 0 0 0 0 0 0 7 2 15796 0 42 0 2012 8535 16519 0 0 0 1 0 0 7 2 16518 0 28 0 2004 8348 15709 0 0 0 0 1 0 8 2 15707 0 42 0 ... 2043 8600 18278 0 0 0 0 0 0 12 2 18285 0 15 0 2030 8631 17731 0 0 0 1 0 0 9 2 17737 0 7 0 2002 8489 14653 0 0 0 1 0 2 5 2 14650 0 35 0 2015 8147 15004 0 0 0 0 0 0 9 2 15003 0 57 0 2015 8352 17303 0 0 0 2 0 0 8 2 17308 0 7 0 2025 8451 16768 0 0 0 0 0 0 6 2 16768 0 35 0 2013 8531 16464 0 0 0 0 0 0 13 2 16476 0 7 0 2013 8117 15202 0 0 0 1 1 0 7 2 15198 0 35 0 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2019 7913 15054 0 0 0 1 0 0 9 2 15057 0 42 0 2008 8258 16019 0 0 0 0 0 1 9 2 16020 0 43 0 2025 8211 17897 0 0 0 1 0 0 5 2 17902 0 0 0 CPU NORMAL: CPU0 states: 36.0% user, 29.0% system, 0.0% nice, 33.0% idle CPU1 states: 18.0% user, 61.0% system, 0.0% nice, 19.0% idle CPU0 states: 21.0% user, 44.0% system, 0.0% nice, 35.0% idle CPU1 states: 18.0% user, 47.0% system, 0.0% nice, 35.0% idle 3 root 10 -1 0 0 0 SW< 0.0 0.0 35:29 ksoftirqd_CPU0 4 root 10 -1 0 0 0 SW< 0.0 0.0 35:35 ksoftirqd_CPU1 Rtstat under light juno: 2315 7955 51691 0 0 0 1 1 1 5 1 51695 0 0 0 2336 6620 47387 0 0 0 1 0 1 5 1 47393 0 0 0 2371 5630 49726 0 0 0 0 0 1 12 2 49737 0 0 0 2372 5420 53458 0 0 0 1 0 0 2 1 53460 0 0 0 2369 4891 48983 0 0 0 0 0 1 5 2 48988 0 0 0 2389 4529 50525 0 0 0 0 1 1 8 1 50532 0 0 0 2334 4645 49092 0 0 1 1 0 0 1 1 49093 0 0 0 2358 5033 48971 0 0 0 1 0 1 6 2 48977 0 0 0 2366 4864 51411 0 0 0 2 0 1 8 1 51419 0 0 0 2370 5035 49444 0 0 0 0 0 0 4 2 49448 0 0 0 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2391 5328 49098 0 0 0 1 0 3 12 3 49110 0 0 0 2363 5586 50687 0 0 0 2 0 0 7 1 50693 0 0 0 2361 4571 49243 0 0 0 0 0 0 2 1 49243 0 0 0 2356 5758 56664 0 0 1 1 0 1 5 1 56666 0 0 0 2375 5581 62098 0 0 0 2 0 0 8 2 62103 0 0 0 2393 3895 50762 0 0 0 1 0 0 5 0 50764 0 0 0 2335 4066 56659 0 0 0 1 0 0 10 2 56667 0 0 0 2315 3607 49990 0 0 0 1 0 0 4 1 49992 0 0 0 2339 4369 54149 0 0 0 1 0 0 7 1 54153 0 0 0 CPU under JUNO: CPU0 states: 0.0% user, 99.3% system, 0.2% nice, 0.0% idle CPU1 states: 0.2% user, 99.3% system, 0.1% nice, 0.0% idle 4 root 14 -1 0 0 0 SW< 21.0 0.0 35:33 ksoftirqd_CPU1 3 root 15 -1 0 0 0 SW< 20.1 0.0 35:27 ksoftirqd_CPU0 This is 10mbit of juno....... Or around 9.6 or so... RTS normal with 8000 thresh: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8003 11474 9076 0 0 0 2 0 0 4 2 9071 0 10 0 8010 11425 9205 0 0 0 0 0 0 7 2 9203 0 14 0 8006 11393 12516 0 0 0 1 0 4 5 0 12509 0 20 0 8005 12082 9188 0 0 0 2 0 0 5 2 9184 0 14 0 8004 11447 8893 0 0 0 0 0 0 8 2 8890 0 12 0 8004 12346 8898 0 0 0 1 0 2 5 2 8891 0 10 0 8003 11557 8944 0 0 0 2 0 1 7 1 8942 0 14 0 8004 12812 9890 0 0 0 0 0 1 5 1 9878 0 16 0 8004 12166 11363 0 0 0 1 0 2 3 2 11349 0 23 0 8012 11933 8881 0 0 0 2 0 0 6 2 8874 0 15 0 8003 11938 9024 0 0 0 0 0 1 5 1 9017 0 12 0 8003 12107 8682 0 0 0 1 0 2 3 2 8674 0 13 0 8008 11328 8945 0 0 0 1 0 2 6 1 8942 0 10 0 CPU: CPU0 states: 0.0% user, 50.0% system, 0.0% nice, 49.0% idle CPU1 states: 1.0% user, 57.0% system, 0.0% nice, 40.0% idle CPU0 states: 0.0% user, 27.0% system, 0.0% nice, 72.0% idle CPU1 states: 0.0% user, 41.0% system, 0.0% nice, 58.0% idle 3 root 12 -1 0 0 0 SW< 0.0 0.0 35:29 ksoftirqd_CPU0 4 root 9 -1 0 0 0 SW< 0.0 0.0 35:35 ksoftirqd_CPU1 I've mucked with TONNnss of settings.. I've even had the route-cache up to over 600,000 entries and the CPU still has room left for more.. It can't possibly be the size of the cache, it simply has to be the constant creation and teardown of entries .. I can't hit anywhere NEAR 100kpps On this router with the amount of load on it.. The routing table: ip ro ls | wc 516 2598 21032 Doesn't have too much in it.. It's running bgp but im not taking the full routes right now.. We will later though. There are some ip rules Also some netfilters iptables-save | wc 1154 7658 46126 Of course there isn't 1154 entries because some of that is the chains and things but there are a lot of rules in netfilter also.. Everything seems to slow it down :/ especially the mangle table.. If I add 1000 entries to the mangle table in netfilter it uses massive cpu .. Netfilter seems to be a hog. Like I said I've tested this with NO netfilter and nothing else on a test box except for the kernel, e1000 , ITR set to ~4000 and all sorts of changing the settings and I still can't hit 100kpps routing with juno-z Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Simon Kirby [mailto:sim@netnation.com] Sent: Monday, June 09, 2003 6:19 PM To: CIT/Paul Cc: 'David S. Miller'; hadi@shell.cyberus.ca; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Mon, Jun 09, 2003 at 03:38:30PM -0400, CIT/Paul wrote: > gc_elasticity:1 > gc_interval:600 > gc_min_interval:1 > gc_thresh:60000 > gc_timeout:15 > max_delay:10 > max_size:512000 ^^^ EEP, no! Even the default of 65536 is too big. No wonder you have no CPU left. This should never be bigger than 65536 (unless the hash is increased), but even then it should be set smaller and the GC interval should be fixed. With a table that large, it's going to be walking the buckets all of the time. > I've tried other settings, secret-interval 1 which seems to 'flush' > the cache every second or 60 seconds as I have it here.. That's only for permutating the hash table to avoid remote hash exploits. Ideally, you don't want anything clearing the route cache except for the regular garbage collection (where the gc_elasticity controls how much of it gets nuked). > If I have secret interval set to 1 the GC never runs because the cache > never gets > my gc thresh.. I've also tried this with Gc_thresh 2000 > and more aggressive settings (timeout 5, interval 10).. Also tried > with max_size 16000 but juno pegs the route cache And I get massive > amounts of dst_cache_overflow messages .. Try setting gc_min_interval to 0 and gc_elasticity to 4 (so that it doesn't entirely nuke it all the time, but so that it runs fairly often and prunes quite a bit). gc_min_interval:0 will actually make it clear as it allocates, if I remember correctly. > This is 'normal' traffic on the router (using the rtstat program) > > ./rts -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot > mc GC: tot ignored goal_miss ovrf > 59272 26954 1826 0 0 0 0 0 6 0 > 0 0 0 0 0 Yes, your route cache is way too large for the hash. Ours looks like this: [sroot@r2:/root]# rtstat -i 1 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc 870721946 16394 1013 8 4 4 0 0 38 12 0 870722937 16278 1007 8 0 10 0 0 32 6 0 870723935 16362 999 5 0 6 0 0 34 8 0 870725083 16483 1158 1 0 0 0 2 26 6 0 870726047 16634 974 0 0 4 0 0 42 0 0 870726168 14315 2338 13 10 8 0 0 34 44 2 870726168 14683 1383 0 8 2 0 0 30 12 2 870726864 16172 1155 0 6 2 0 0 28 4 0 870728079 17842 1234 0 0 0 0 0 28 12 0 870729106 17545 1036 2 0 2 0 0 30 6 0 ...Hmm, the size is a bit off there. I'm not sure what that's all about. Did you have to hack on rtstat.c at all? Alternative: [sroot@r2:/root]# while (1) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 8064 8706 9299 9939 10277 10857 11426 11731 12328 12796 13096 13623 1139 2712 4233 561 2468 3948 5075 5459 6114 6768 7502 7815 8303 8969 9602 10090 10566 11194 11765 11987 12678 12920 13563 14136 14693 2336 3652 4814 5954 6449 6741 7412 8036 ....Hmm, even that is growing a bit large. Pfft. I guess we were doing less traffic last time I checked this. :) Maybe you have a bit more traffic than us in normal operation and it's growing faster because of that. Still, with a gc_elasticity of 1 it should be clearing it out very quickly. ...Though I just tried that, and it's not. In fact, the gc_elasticity doesn't seem to be making much of a difference at all. The only thing that seems to really change it is if I set gc_min_interval to 0: [sroot@r2:/proc/sys/net/ipv4/route]# echo 0 > gc_min_interval [sroot@r2:/proc/sys/net/ipv4/route]# while ( 1 ) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 9674 9547 9678 9525 9625 9544 9385 497 2579 3820 4083 4099 4068 4054 4089 4095 4137 4072 4071 4137 2141 3414 4044 2487 3759 4047 4085 4092 4156 4089 4008 475 2497 3729 4146 4085 4116 It seems to regulate it after it gets cleared the first time. If I set gc_elasticity to 1 it seems to bounce around a lot more -- 4 is much smoother. It didn't seem to make a difference with gc_min_interval set to 1, though... hmmm. We've been running normally with gc_min_interval set to 1, but it looks like the BGP table updates have kept the cache from growing too large. > Check what happens when I load up juno.. Yeah... Juno's just going to hit it harder and show the problems with it having to walk through such large hash buckets. How big is your routing table on this box? Is it running BGP? > slammed at 100% by the ksoftirqds. This is using e1000 with interrups > limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time > and drops more packets than without :> Hmm, that's weird. It works quite well here on a single CPU box with tg3 cards. Simon- From davem@redhat.com Mon Jun 9 16:08:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 16:09:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59N8v2x019498 for ; Mon, 9 Jun 2003 16:08:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA21213; Mon, 9 Jun 2003 16:05:48 -0700 Date: Mon, 09 Jun 2003 16:05:47 -0700 (PDT) Message-Id: <20030609.160547.41648991.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <008001c32eda$56760830$4a00000a@badass> References: <20030609221911.GF11509@netnation.com> <008001c32eda$56760830$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3034 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 18:56:18 -0400 And there is no reason for NAPI at this point. Intel's ITR give you high latency, NAPI is far superior than any hardware based interrupt mitigation scheme whatsoever. You have some system specific problem with NAPI and we need to analyze that. I've mucked with TONNnss of settings.. I've even had the route-cache up to over 600,000 entries and the CPU still has room left for more.. It can't possibly be the size of the cache, You are letting your hash chains reach the size of "max_size" divided by the number of hash chains. This means that every packet into your machine has to walk that many hash chains. You can keep doing some shamans dance saying that the size you have choosen doesn't matter, but the people who have written this code and work with it every day know that it does. From hadi@shell.cyberus.ca Mon Jun 9 17:03:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:03:56 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A03k2x020464 for ; Mon, 9 Jun 2003 17:03:47 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PWbf-0009IC-Jy; Mon, 09 Jun 2003 20:03:19 -0400 Date: Mon, 9 Jun 2003 20:03:19 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'David S. Miller'" , fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <008001c32eda$56760830$4a00000a@badass> Message-ID: <20030609195652.E35696@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3035 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, CIT/Paul wrote: > NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss > under high load.. So I find that the e1000 ITR works just as well > And there is no reason for NAPI at this point. > Foo, you on cheap crack again? Please just try the tests as described if you want to help. It doesnt help anyone when you wildly wave your hands like that. Why dont we take you offline - give me access to your machine i have a couple of hours to kill. cheers, jamal From ralph@istop.com Mon Jun 9 17:32:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:32:59 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A0Wn2x021061 for ; Mon, 9 Jun 2003 17:32:50 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id B186E3699E; Mon, 9 Jun 2003 20:32:45 -0400 (EDT) Date: Mon, 9 Jun 2003 20:32:48 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030609195652.E35696@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3036 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Jamal Hadi wrote: > On Mon, 9 Jun 2003, CIT/Paul wrote: > > > NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss > > under high load.. So I find that the e1000 ITR works just as well > > And there is no reason for NAPI at this point. > > > > Foo, you on cheap crack again? > Please just try the tests as described if you want to help. It doesnt help > anyone when you wildly wave your hands like that. From personal experience, after trying numerous things for over a year one can get very frustrated. Although your contribution has been useful, you are also guilty of wildly waving your hands around too. Many moons ago when I lamented that my 2.2.19 kernel, 750Mhz duron, 3c59x core router performance sucked you told me NAPI would solve the performance problems. It didn't. And Rob's latest numbers seem to show that even with the latest and greatest patches 148kpps is still a dream. It's good to see that people are finally doing tests to simulate real-world routing (instead of just pretending the problem doesn't exist because they were able to get 148kpps in some contrived test). Here's my CPU graphs for the box; it's only doing routing and firewalling isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html eth1 and eth2 are both sending and receiving ~30mbps of traffic (at 8-10kpps in and out on each interface). The other variable that I haven't seen people discuss but have anecdotal evidence will measurably impact performance is the motherboard used (chipset and chipset configuration/timing). Lastly from the software side Linux doesn't seem to have anything like BSD's parameter to control user/system CPU sharing. Once my CPU load reaches 70-80%, I'd rather have some dropped packets than let the CPU hit 100% and end up with my BGP sessions drop. -Ralph From krkumar@us.ibm.com Mon Jun 9 17:55:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:56:05 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A0tk2x021487 for ; Mon, 9 Jun 2003 17:55:56 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5A0tesZ173504; Mon, 9 Jun 2003 20:55:40 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5A0tbSe129102; Mon, 9 Jun 2003 20:55:38 -0400 Message-ID: <3EE52C92.4060509@us.ibm.com> Date: Mon, 09 Jun 2003 17:55:46 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru, "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: [PATCH] Panic in ipv6_add_dev Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3037 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi, I am using 2.5.70 and using VLAN to configure many interfaces, and after some are configured, the system panics in unregister_sysctl_table called from (STACK) neigh_sysctl_unregister, neigh_parms_release, ipv_add_dev. The problem is that we have called neigh_parms_alloc, but not neigh_sysctl_register. Hence calling neigh_parms_release() in the middle frees up the sysctl_header entry for the nd_table as a side-effect (due to the memcpy in neigh_parms_alloc). We need to initialize sysctl_table to NULL in neigh_parms_alloc so that a release can be called safely at any time. Thanks, - KK diff -ruN linux-2.5.70.org/net/core/neighbour.c linux-2.5.70/net/core/neighbour.c --- linux-2.5.70.org/net/core/neighbour.c 2003-06-09 17:32:10.000000000 -0700 +++ linux-2.5.70/net/core/neighbour.c 2003-06-09 17:36:22.000000000 -0700 @@ -1094,6 +1094,7 @@ kfree(p); return NULL; } + p->sysctl_table = NULL; write_lock_bh(&tbl->lock); p->next = tbl->parms.next; tbl->parms.next = p; From ralph@istop.com Mon Jun 9 18:30:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 18:30:41 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A1UC2x022793 for ; Mon, 9 Jun 2003 18:30:13 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 3A11136A81; Mon, 9 Jun 2003 20:56:41 -0400 (EDT) Date: Mon, 9 Jun 2003 20:56:43 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: CIT/Paul , "'David S. Miller'" , "hadi@shell.cyberus.ca" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030609221911.GF11509@netnation.com> Message-ID: References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3038 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > [sroot@r2:/root]# while (1) > [sroot@r2:(while)]# sleep 1 > [sroot@r2:(while)]# ip -o route show cache | wc -l > [sroot@r2:(while)]# end I considered doing the same test on my box, but I don't have enough juice left to do it every second: root@tor-router# time ip -o route show cache | wc -l 15023 real 0m1.563s user 0m0.380s sys 0m1.180s So instead... root@tor-router# while (true); do sleep 5; ip -o route show cache | wc -l; done 12630 15659 17951 20733 8875 9282 11913 4216 9437 11973 14503 17088 -Ralph From sim@netnation.com Mon Jun 9 18:53:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 18:53:24 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A1rD2x023445 for ; Mon, 9 Jun 2003 18:53:13 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PYK0-0006cC-0Q; Mon, 09 Jun 2003 18:53:12 -0700 Date: Mon, 9 Jun 2003 18:53:12 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030610015311.GB23009@netnation.com> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3039 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 08:32:48PM -0400, Ralph Doncaster wrote: > Here's my CPU graphs for the box; it's only doing routing and firewalling > isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) > http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html > > eth1 and eth2 are both sending and receiving ~30mbps of traffic (at > 8-10kpps in and out on each interface). Interesting! Your CPU use is quite a bit higher than ours. It looks like we have fairly similar network configurations. We're advertising a /24 and a /20 of which about 60% of the IPs are in use. Each router forwards about 60 Mbit/second (16 kpps) during the day, and the CPU load is usually around 18-25%. This is with a single CPU, though I accidentally compiled the kernel SMP. I had forgotten to add CPU utilization to the cricket graphs, so I'll have a better idea from now on, but I've never seen it above 30% (from "vmstat 1") except in attack cases. The difference is probably just the fact that this is running on slightly faster hardware (single Athlon 1800MP, Tyan Tiger MPX board). > Lastly from the software side Linux doesn't seem to have anything like > BSD's parameter to control user/system CPU sharing. Once my CPU load > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > 100% and end up with my BGP sessions drop. Hmm. I found that once NAPI was happening, userspace seemed to get a fairly decent amount of time. I'm not exactly sure what the settings are, but I was able to run things through SSH quite easily (not without noticeable slowness, though). Actually, the slowness appeared to be mostly the result of incoming packet drops ("vmstat 1" output where it was _sending_ data and getting the ACKs some time later was perfectly smooth). We just set up a dual Opertron box today with dual onboard Tigon3s, so I'll see if I can do some profiling. I hooked it via crossover to a Xeon 2.4 GHz box with onboard e1000, so I should be able to do some remote profiling tonight. Simon- From ralph@istop.com Mon Jun 9 19:45:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 19:45:42 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A2jS2x025147 for ; Mon, 9 Jun 2003 19:45:31 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 6857936A7A; Mon, 9 Jun 2003 22:45:27 -0400 (EDT) Date: Mon, 9 Jun 2003 22:45:29 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030609204257.L35799@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3040 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Jamal Hadi wrote: > Problem is people disappear real quick when asked to run tests that > could validate certain concepts. I wish everyone would emulate S Kirby > he actually gives good info. The test results Rob posted today show that the testing can be done in a lab environment. Most of the people I know that would actually see 50kpps in the real world don't have the time to apply various patches and run a bunch of tests; pretending the problem doesn't exist when someone doesn't run tests to prove is a poor excuse. > > Here's my CPU graphs for the box; it's only doing routing and firewalling > > isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) > > http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html > > > > eth1 and eth2 are both sending and receiving ~30mbps of traffic (at > > 8-10kpps in and out on each interface). > > Is this still the duron 750Mhz? Are you running zebra? Did you > check out some of the ideas i talked about earlier? Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra 0.93b. If the ideas you're referring to are changing the zebra source to arp the next-nops, then no, I haven't tried it (and am not likely to any time soon). > Robert has a good collection for what is good hardware. I am so outdated > i dont keep track anymore. My fastest machine is still an ASuse dual > 450Mhz. There's still more dead-end suggestions than good ones (i.e. the O'Reilley high performance routing book). > > Lastly from the software side Linux doesn't seem to have anything like > > BSD's parameter to control user/system CPU sharing. Once my CPU load > > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > > 100% and end up with my BGP sessions drop. > > > > Well, heres a good example: With NAPI, have your sessions been dropped? Yup, twice in the last 2 weeks. > Have you tried a different NIC? Not sure how well the 3com is maintained > for example. > Try a tulip or tg3 or e1000 or the dlink gige. Initially I was looking for tulip cards but almost nobody is producing them any more. Almost a year ago I came across the following list, which is why I went with the 3com (at the time it indicated rx/tx irqmit for the 3com, until I emailed the author that I found out it was tx only) http://www.fefe.de/linuxeth/ I had joined the vortex list last fall looking for some tips and that didn't help much (other than telling me that the 3com wasn't the best choice). I've since bought a couple tg3 and a bunch of e1000 cards that I'm planning to put into production. Rob's test results seem to show that even if I replace my 3c905cx cards with e1000's I'll still get killed with a 50kpps synflood with my current CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I can't do that in a 1U rack-mount box. Yeah, I could probably do it with water cooling, but that's not an option in a telco hotel like 151 Front St. (Toronto). A couple weeks ago I got one of my techs to test freeBSD/polling with full routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to suggest it will handle a 50kpps synflood DOS. It would be nice if Linux could do the same. Despite the BSD bashing (to be expected on a Linux list, I guess), I will be using BSD as well as Linux for core routing. The plan is 1 linux router and 1 bsd router each running zebra, connected to separate upstream transit providers, running ibgp between them, and both advertising a default route into OSPF. Then if I get hit with a DOS that kills Linux, the BSD box will have a much better chance of staying up than if I just used a second Linux box for redundancy. If the BSD boxes turn out to have twice the performance of the linux boxes, it may be better for me to dump linux for routing altogether. :-( -Ralph From slblake@petri-meat.com Mon Jun 9 20:05:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:06:02 -0700 (PDT) Received: from server26.totalchoicehosting.com (rs-207-44-248-87.ev1.net [207.44.248.87] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A35o2x025994 for ; Mon, 9 Jun 2003 20:05:51 -0700 Received: from rdu74-174-070.nc.rr.com ([24.74.174.70]) by server26.totalchoicehosting.com with esmtp (Exim 3.36 #1) id 19PZS9-00032B-00; Mon, 09 Jun 2003 22:05:41 -0500 Subject: Re: Route cache performance under stress From: Steven Blake To: Florian Weimer Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: <874r30r9z2.fsf@deneb.enyo.de> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> Content-Type: text/plain Organization: Message-Id: <1055214346.1199.65.camel@photon> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 09 Jun 2003 23:05:47 -0400 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server26.totalchoicehosting.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [0 0] X-AntiAbuse: Sender Address Domain - petri-meat.com X-archive-position: 3041 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: slblake@petri-meat.com Precedence: bulk X-list: netdev On Sun, 2003-06-08 at 09:10, Florian Weimer wrote: > "David S. Miller" writes: > > > Although, I hope it's not "too similar" to what CEF does because > > undoubtedly Cisco has a bazillion patents in this area. > > Most things in this area are patented, and the patents are extremely > fuzzy (e.g. policy-based routing with hierarchical sequence of > decisions has been patented countless times). 8-( > > > This is actually an argument for coming up with out own algorithms > > without any knowledge of what CEF does or might do. :( > > The branchless variant is not described in the IOS book, and I can't > tell if Cisco routers use it. If this idea is really novel, we are in > pretty good shape because we no longer use trees, tries or whatever, > but a DFA. 8-) Based on my quick reading of your code sample, I think you have just reinvented multibit trees; in your case with a fixed stride of 8 bits. > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). If you are curious, or just have a lot of time on your hands, you might find the following set of references useful: http://www.petri-meat.com/slblake/networking/refs/lpm_pkt-class/ IMHO, the best LPM algorithm (in terms of balancing lookup speed vs. memory consumption vs. update rate) is CRT, described in the first paper [ASIK]. It is patented, but there is hope that it might get released under GPL in the near future. Regards, // Steve From ralph@istop.com Mon Jun 9 20:18:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:18:52 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A3Ih2x026731 for ; Mon, 9 Jun 2003 20:18:44 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id B5F0D36A2B; Mon, 9 Jun 2003 23:18:42 -0400 (EDT) Date: Mon, 9 Jun 2003 23:18:45 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610015311.GB23009@netnation.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030610015311.GB23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3042 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > "vmstat 1") except in attack cases. The difference is probably just the > fact that this is running on slightly faster hardware (single Athlon > 1800MP, Tyan Tiger MPX board). What happened to Linux users being able to brag about how much they could do with CPUs that were useless for running Windows? On a 1Ghz CPU you've got almost 7,000 cycles to route a packet in order to handle 148kpps. I can't see why the slow path should be more than 2,000 cycles. I know some people's attitude is don't talk if you're not going to write the code. If I had the time I would; from my earliest days of programming I've been optimizing performance to the maximum. I can still remember using page 0 on my c64 to store an 8-bit register in 3 cycles instead of four... So to put a stake in the ground, I'd like to see a 1Ghz celeron with e1000 cards handle 148kpps of DOS traffic at <50% CPU utilization (with full routing tables & no firewalling). If that's not a reasonable expectation, someone please let me know. Even if my time was only worth $500/day, in the past year and a half I spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( -Ralph From greearb@candelatech.com Mon Jun 9 20:24:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:24:16 -0700 (PDT) Received: from grok.yi.org (IDENT:Kec0++FnlFSJ3MW+qk3egPD7hn3dYXea@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A3O92x027209 for ; Mon, 9 Jun 2003 20:24:10 -0700 Received: from candelatech.com (IDENT:keqxvO2uK1Z9NpHE2qFZHfwyAQ+0s8ct@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5A3Nvl29701; Mon, 9 Jun 2003 20:23:57 -0700 Message-ID: <3EE54F4D.50909@candelatech.com> Date: Mon, 09 Jun 2003 20:23:57 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: ralph+d@istop.com CC: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3043 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Ralph Doncaster wrote: > Initially I was looking for tulip cards but almost nobody is producing > them any more. Almost a year ago I came across the following list, which > is why I went with the 3com (at the time it indicated rx/tx irqmit for the > 3com, until I emailed the author that I found out it was tx only) > http://www.fefe.de/linuxeth/ If you want 4-port tulip NICs, I've had decent luck with the Phobox p430tx NICs ($350 or so per NIC, so not cheap). That said, the e1000s are definately better as far as my own testing has been concerned. (I'm doing packet pushing & reception, no significant routing, though). One waring about e1000's, make sure you have active airflow across the NICs if you put two together. Otherwise, buy a dual port NIC...it has a single chip and you will have less cooling issues. Ben > > I had joined the vortex list last fall looking for some tips and that > didn't help much (other than telling me that the 3com wasn't the best > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > I'm planning to put into production. > > Rob's test results seem to show that even if I replace my 3c905cx cards > with e1000's I'll still get killed with a 50kpps synflood with my current > CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I > can't do that in a 1U rack-mount box. Yeah, I could probably do it with > water cooling, but that's not an option in a telco hotel like 151 Front > St. (Toronto). > > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. > > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. If the BSD boxes turn out to have > twice the performance of the linux boxes, it may be better for me to dump > linux for routing altogether. :-( > > -Ralph > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From ralph@istop.com Mon Jun 9 21:17:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:17:31 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4HC2x029435 for ; Mon, 9 Jun 2003 21:17:15 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 2FC4B369E5; Mon, 9 Jun 2003 23:41:05 -0400 (EDT) Date: Mon, 9 Jun 2003 23:41:07 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Ben Greear Cc: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress In-Reply-To: <3EE54F4D.50909@candelatech.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <3EE54F4D.50909@candelatech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3044 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ben Greear wrote: > One waring about e1000's, make sure you have active airflow across the NICs > if you put two together. Otherwise, buy a dual port NIC...it has a single > chip and you will have less cooling issues. I liked how easy the e1000's are to come by; even more so than the 3com cards. Intel seems to be grabbing market share by agressive pricing (bought 4 last week for C$50 ea), so almost every computer equipment distributor carries the intel cards. Since I already have the single-port cards, I guess I'll install them with a couple empty PCI slots between them to help with the cooling. -Ralph From sim@netnation.com Mon Jun 9 21:34:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:35:01 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4Yr2x030207 for ; Mon, 9 Jun 2003 21:34:54 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PaqT-0007tW-JC; Mon, 09 Jun 2003 21:34:53 -0700 Date: Mon, 9 Jun 2003 21:34:53 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030610043453.GC23009@netnation.com> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3045 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 11:18:45PM -0400, Ralph Doncaster wrote: > What happened to Linux users being able to brag about how much they could > do with CPUs that were useless for running Windows? On a 1Ghz CPU you've > got almost 7,000 cycles to route a packet in order to handle 148kpps. I > can't see why the slow path should be more than 2,000 cycles. We're still here. I want the code to be fast and efficient as much as you do. I'd be willing to bet that a lot of this will get fixed now, though. Broken parts of the code only get fixed if enough people whine or especially if somebody decides to actually fix it. My guess is that the "using Linux as an Internet router with more than 10 Mbit/sec of bandwidth" user base is relatively small. > I know some people's attitude is don't talk if you're not going to write > the code. If I had the time I would; from my earliest days of programming > I've been optimizing performance to the maximum. I can still remember > using page 0 on my c64 to store an 8-bit register in 3 cycles instead of > four... I wrote an entire game in TASM once. :) > So to put a stake in the ground, I'd like to see a 1Ghz celeron with e1000 > cards handle 148kpps of DOS traffic at <50% CPU utilization (with full > routing tables & no firewalling). Sounds reasonable. The routing table size issue has now been eliminated, so that should make no difference to the equation. > If that's not a reasonable expectation, someone please let me know. > Even if my time was only worth $500/day, in the past year and a half I > spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( But in the end you'll end up with a system that you'll know the inner workings of and that will be open source, maintainable, scalable, easy to replicate, and easy to upgrade. And it'll have tcpdump, damn it. :) On Mon, Jun 09, 2003 at 10:45:29PM -0400, Ralph Doncaster wrote: > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. I was going to ask before, and it's probably not even possible anymore, but have you tried on a 2.0 kernel before? 2.0 kernels probably have a lot of other problems and don't support the new hardware, but it would be interesting to see how it scales to many srcs/dsts before the route cache was integrated. It probably scales a lot more like FreeBSD does. You'd probably have to use eepro100s or something, though. > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. Good idea. Others have also suggested using Zebra on one and another of the BGP routing daemons on another to avoid routing-daemon-specific DoS issues (or accidental remote crash bugs). Anyway, the performance issues should be fixable. It is going to take some work, but there seem to be some interested people. I'm going to try to set up something that will allow for easy comparisons of patches so that we can measure progress, and perhaps reach an eventual goal. Simon- From yoshfuji@linux-ipv6.org Mon Jun 9 21:55:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:55:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4tB2x003991 for ; Mon, 9 Jun 2003 21:55:12 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5A4u1Bo012511; Tue, 10 Jun 2003 13:56:01 +0900 Date: Tue, 10 Jun 2003 13:56:01 +0900 (JST) Message-Id: <20030610.135601.20565349.yoshfuji@linux-ipv6.org> To: netdev@oss.sgi.com, linux-net@vger.kernel.org Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, krkumar@us.ibm.com Subject: Re: [PATCH] Panic in ipv6_add_dev From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EE52C92.4060509@us.ibm.com> References: <3EE52C92.4060509@us.ibm.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3046 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <3EE52C92.4060509@us.ibm.com> (at Mon, 09 Jun 2003 17:55:46 -0700), Krishna Kumar says: > We need to initialize sysctl_table to NULL in neigh_parms_alloc so that a > release can be called safely at any time. It solves the problem, patch should be applied. Well, it is also the problem that the tasks of neigh_parms_alloc() / neigh_sysctl_register() and neigh_parms_release() / neigh_sysctl_unregister() were not symmetric. We have neigh_parms_alloc() - neigh_parms_release() pair and neigh_sysctl_register() - neigh_sysctl_unregister() pair. Memory for sysctl table is allocated by neigh_sysctl_register(). While it was/is very natural to free it by neigh_sysctl_unregister(), it was freed by neigh_parms_release(), in rather different context... Here's the fix. (This patch alone also solve the problem.) Index: linux25-LINUS/net/netsyms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/netsyms.c,v retrieving revision 1.1.1.29 diff -u -r1.1.1.29 netsyms.c --- linux25-LINUS/net/netsyms.c 31 May 2003 07:30:46 -0000 1.1.1.29 +++ linux25-LINUS/net/netsyms.c 10 Jun 2003 04:25:32 -0000 @@ -190,6 +190,7 @@ #endif #ifdef CONFIG_SYSCTL EXPORT_SYMBOL(neigh_sysctl_register); +EXPORT_SYMBOL(neigh_sysctl_unregister); #endif EXPORT_SYMBOL(pneigh_lookup); EXPORT_SYMBOL(pneigh_enqueue); Index: linux25-LINUS/net/core/neighbour.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/core/neighbour.c,v retrieving revision 1.1.1.7 diff -u -r1.1.1.7 neighbour.c --- linux25-LINUS/net/core/neighbour.c 26 May 2003 08:04:08 -0000 1.1.1.7 +++ linux25-LINUS/net/core/neighbour.c 10 Jun 2003 04:25:32 -0000 @@ -1113,9 +1113,6 @@ if (*p == parms) { *p = parms->next; write_unlock_bh(&tbl->lock); -#ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(parms); -#endif kfree(parms); return; } @@ -1178,9 +1175,6 @@ } } write_unlock(&neigh_tbl_lock); -#ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(&tbl->parms); -#endif return 0; } Index: linux25-LINUS/net/ipv4/devinet.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv4/devinet.c,v retrieving revision 1.1.1.10 diff -u -r1.1.1.10 devinet.c --- linux25-LINUS/net/ipv4/devinet.c 26 May 2003 08:04:08 -0000 1.1.1.10 +++ linux25-LINUS/net/ipv4/devinet.c 10 Jun 2003 04:25:32 -0000 @@ -197,7 +197,9 @@ /* in_dev_put following below will kill the in_device */ write_unlock_bh(&inetdev_lock); - +#ifdef CONFIG_SYSCTL + neigh_sysctl_unregister(in_dev->arp_parms); +#endif neigh_parms_release(&arp_tbl, in_dev->arp_parms); in_dev_put(in_dev); } Index: linux25-LINUS/net/ipv6/addrconf.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/addrconf.c,v retrieving revision 1.1.1.20 diff -u -r1.1.1.20 addrconf.c --- linux25-LINUS/net/ipv6/addrconf.c 5 Jun 2003 07:47:43 -0000 1.1.1.20 +++ linux25-LINUS/net/ipv6/addrconf.c 10 Jun 2003 04:25:33 -0000 @@ -1925,10 +1925,11 @@ /* Shot the device (if unregistered) */ if (how == 1) { - neigh_parms_release(&nd_tbl, idev->nd_parms); #ifdef CONFIG_SYSCTL addrconf_sysctl_unregister(&idev->cnf); + neigh_sysctl_unregister(&idev->nd_parms); #endif + neigh_parms_release(&nd_tbl, idev->nd_parms); in6_dev_put(idev); } return 0; Index: linux25-LINUS/net/ipv6/ndisc.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ndisc.c,v retrieving revision 1.1.1.17 diff -u -r1.1.1.17 ndisc.c --- linux25-LINUS/net/ipv6/ndisc.c 31 May 2003 07:30:52 -0000 1.1.1.17 +++ linux25-LINUS/net/ipv6/ndisc.c 10 Jun 2003 04:25:33 -0000 @@ -1487,6 +1487,9 @@ void ndisc_cleanup(void) { +#ifdef CONFIG_SYSCTL + neigh_sysctl_unregister(&nd_tbl.parms); +#endif neigh_table_clear(&nd_tbl); sock_release(ndisc_socket); ndisc_socket = NULL; /* For safety. */ -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From sim@netnation.com Tue Jun 10 00:57:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 00:57:47 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A7vX2x011756 for ; Tue, 10 Jun 2003 00:57:34 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Pe0a-0001fd-HO; Tue, 10 Jun 2003 00:57:32 -0700 Date: Tue, 10 Jun 2003 00:57:32 -0700 From: Simon Kirby To: ralph+d@istop.com, Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Route cache performance tests Message-ID: <20030610075732.GD23009@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.4i X-archive-position: 3047 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev Okay, I got a chance to run some first tests and have found some simple results that might be worth a read. The test setup is as follows (I'll probably be using this setup for a number of other tests): [ My work desktop, other test boxes on network ] | | | | | [ 100 Mbit Switch ] | | (100 Mbit) | [ Dual tg3 dual 1.4 GHz Opertron box, 1 GB RAM ] | | (1000 MBit) | [ Single e1000 single 2.4 GHz Xeon box ] I have a route added on the test boxes to stuff traffic destined for the Xeon box through the Opertron box. Forwarding is enabled on the Opertron box, and it has a route for the Xeon box. I am testing with Juno right now because it generates the (pseudo-)random IP traffic which we is where the problem is right now. We already know Linux can do hundreds of thousands of pps of ip<->ip traffic, so we can test that later. Juno seems to be able to send about 150,000 pps from my Celery desktop. Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward an amazing 39,000 packets per second. Woohoo! NAPI definitely kicks in and seems to work even on SMP (blink?). The output of "rtstat -i 1" is somewhat interesting. The "GC: tot" field seems to almost exactly match the forwarded packet count, which is handy: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8 4 4 0 0 0 0 0 0 0 0 0 0 0 0 8 3 3 0 0 0 0 0 0 0 0 0 0 0 0 8 5 6 0 0 0 0 0 0 0 0 0 0 0 0 8 4 4 0 0 0 0 0 0 0 0 0 0 0 0 8 5 5 0 0 0 0 0 0 0 0 0 0 0 0 9 3 5 0 0 1 0 0 0 0 0 0 0 0 0 33549 11 65533 0 0 0 0 0 0 0 0 57347 57345 1 0 53499 13 65200 0 0 1 0 0 0 0 0 65196 65194 1 0 65536 19 65540 0 0 1 0 0 0 0 0 65538 64879 0 0 65536 11 33980 0 0 0 0 0 0 0 0 33978 6123 0 0 65536 9 37491 0 0 1 0 0 0 0 0 37489 930 0 0 65536 13 40487 0 0 0 0 0 0 0 0 40484 991 0 0 65536 13 39287 0 0 1 0 0 0 0 0 39284 933 0 0 65536 10 40790 0 0 1 0 0 0 0 0 40789 1006 0 0 65536 17 37783 0 0 0 0 0 0 0 0 37781 866 0 0 65536 8 38092 0 0 0 0 0 0 0 0 38090 880 0 0 65536 14 38086 0 0 1 0 0 0 0 0 38085 877 0 0 65536 13 39587 0 0 0 0 0 0 0 0 39586 922 0 0 65536 18 39882 0 0 1 0 0 0 0 0 39880 908 0 0 65536 8 39292 0 0 0 0 0 0 0 0 39290 894 0 0 65536 10 38390 0 0 4 0 0 0 0 0 38389 879 0 0 65536 13 38087 0 0 0 0 0 0 0 0 38086 830 0 0 65536 10 38692 0 0 0 0 0 0 0 0 38690 845 0 0 65536 16 38982 0 0 1 0 0 0 0 0 38981 899 0 0 The above is with stock settings. Note how the table completely fills up causing the forward rate to suffer. In an attempt to improve performance, I tried "echo 0 > gc_min_interval": size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 65536 15 39585 0 0 0 0 0 0 0 0 39585 909 0 0 65535 13 39587 0 0 1 0 0 0 0 0 39587 877 0 0 32027 10 70044 0 0 0 0 0 0 0 0 70043 0 6 0 32013 8 71092 0 0 0 0 0 0 0 0 71091 0 0 0 31995 10 72290 0 0 1 0 0 0 0 0 72290 0 0 0 31969 13 71087 0 0 2 0 0 0 0 0 71083 0 0 0 31950 5 71695 0 0 0 0 0 0 0 0 71693 0 0 0 31937 10 71690 0 0 2 0 0 0 0 0 71690 0 0 0 31927 10 71390 0 0 0 0 0 0 0 0 71389 0 0 0 31915 18 71382 0 0 0 0 0 0 0 0 71381 0 0 0 31897 5 71395 0 0 0 0 0 0 0 0 71394 0 0 0 31881 7 70793 0 0 0 0 0 0 0 0 70793 0 0 0 31869 5 71095 0 0 0 0 0 0 0 0 71094 0 0 0 31863 16 71084 0 0 0 0 0 0 0 0 71082 0 0 0 31846 22 70778 0 0 0 0 0 0 0 0 70776 0 0 0 31825 5 70795 0 0 1 0 0 0 0 0 70795 0 0 0 31816 10 70490 0 0 0 0 0 0 0 0 70488 0 0 0 And then decided to try "ip route flush cache": size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 31768 8 70192 0 0 0 0 0 0 0 0 70190 0 0 0 31757 15 70185 0 0 1 0 0 0 0 0 70184 0 0 0 31743 5 70495 0 0 1 0 0 0 0 0 70491 0 0 0 8204 2 83314 0 0 0 0 0 1 2 0 75524 0 89 0 8204 2 88859 0 0 0 0 0 1 0 0 88449 0 84 0 8203 3 85797 0 0 1 0 0 0 0 0 85795 0 0 0 8203 0 86100 0 0 0 0 0 0 0 0 86098 0 0 0 ...And then I tried reducing gc_thresh: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8200 7 85793 0 0 1 0 0 0 0 0 85790 0 0 0 8200 4 85796 0 0 1 0 0 0 0 0 85792 0 0 0 8200 13 86087 0 0 0 0 0 0 0 0 86086 0 0 0 8200 3 86097 0 0 0 0 0 0 0 0 86096 0 0 0 1530 4 87896 0 0 0 0 0 0 0 0 87277 0 562 0 1370 0 135832 0 0 0 0 0 0 0 0 135829 0 617 0 1348 0 135952 0 0 2 0 0 0 0 0 135952 0 543 0 1341 0 135740 0 0 0 0 0 0 0 0 135739 0 529 0 1348 1 135817 0 0 1 0 0 0 0 0 135817 0 567 0 I tried fiddling with more settings, even setting gc_thresh to 1, but I wasn't able to get the route cache much smaller than that or get it to forward any more packets per second. In any case, setting gc_min_interval to 0 definitely helped, but I suspect Dave's patches will make a bigger difference. Next up is 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. Simon- From hadi@shell.cyberus.ca Tue Jun 10 03:53:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 03:53:48 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AArX2x027234 for ; Tue, 10 Jun 2003 03:53:34 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PgkS-0009dD-Js; Tue, 10 Jun 2003 06:53:04 -0400 Date: Tue, 10 Jun 2003 06:53:04 -0400 (EDT) From: Jamal Hadi To: ralph+d@istop.com cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610061010.Y36963@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3048 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ralph Doncaster wrote: > On Mon, 9 Jun 2003, Jamal Hadi wrote: > > The test results Rob posted today show that the testing can be done in a > lab environment. I thought you were saying those were _not_ real world traffic patterns. Robert is just doing a worst case scenario testing. What would be useful is we actually test on real environments or maybe even collect real world traffic patterns and run them in the lab. Typically, real world is less intense than the lab. Ex: noone sends 100Mbps at 64 byte packet size. Typical packet is around 500 bytes average. If linux can handle that forwarding capacity, it should easily be doing close to Gige real world capacity. Have you seen how the big boys advertise? when tuning specs they talk about bits/sec. Juniper just announced a blade at supercom that can do firewalling at 500Mbps. > Most of the people I know that would actually see 50kpps > in the real world don't have the time to apply various patches and run a Now thats one big dilema, isnt it? Do you think i have time? Let me assure you that I dont get paid by anybody to do any of this stuff. Infact i havent been paid to do any of this stuff since 1994. Thats a lot of man hours in corporate speak. The point i am making is as a community we gotta put the hours together; the coder, the user etc. As someone who is not maintaining anything (lucky bastard that i am, my name is not even in the credits file - by choice) so i have the luxury to disappear once in a while. Imagine Davems reaction to a message like the above. > bunch of tests; pretending the problem doesn't exist when someone doesn't > run tests to prove is a poor excuse. > I think you _may_ be right theres a problem. However, as a defensive mechanism it is easier to tell someone to go away and come back with solid data. For example, you CPU graphs are very strange: Theres a few hundred variables that may be involved. I have spent many hours investigating peoples problems sshing to their machines only to find out they didnt follow instructions. After the 10th person doing the same thing, what do you expect my reaction to be? Please see the view from this side as well because it is almost a thankless task. > Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra > 0.93b. If the ideas you're referring to are changing the zebra source to > arp the next-nops, then no, I haven't tried it (and am not likely to any > time soon). > I think you may be suffering from the "too low" traffic NAPI syndrome. Under low traffic (1-2 Mbps) on lower end machines NAPI will consume more CPU because of an extra PCI operation per packet that is performed. As for the zebra thing, if you post my message to the Zebra list i am sure someone will be excited enough to do it. I need a few hours to do it but like you i dont have much time. > > Robert has a good collection for what is good hardware. I am so outdated > > i dont keep track anymore. My fastest machine is still an ASuse dual > > 450Mhz. > > There's still more dead-end suggestions than good ones (i.e. the > O'Reilley high performance routing book). > URL? > > Well, heres a good example: With NAPI, have your sessions been dropped? > Yup, twice in the last 2 weeks. > I have seen NAPI slow down throughput because of an intensive user space app. > > Have you tried a different NIC? Not sure how well the 3com is maintained > > for example. > > Try a tulip or tg3 or e1000 or the dlink gige. > > Initially I was looking for tulip cards but almost nobody is producing > them any more. Almost a year ago I came across the following list, which Thats not true. You could buy them off znyx. Yes, intel has EOLed the chips so i dont think Znyx will be doing this for much longer. Get yourself the giges instead. > is why I went with the 3com (at the time it indicated rx/tx irqmit for the > 3com, until I emailed the author that I found out it was tx only) > http://www.fefe.de/linuxeth/ > > I had joined the vortex list last fall looking for some tips and that > didn't help much (other than telling me that the 3com wasn't the best > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > I'm planning to put into production. > yes, move to the giges then lets talk again. I think your main problem is that 3com NAPI is not well supported. Lennert disappeared right after he released the patch and noone else has the interest of maintaining it. > Rob's test results seem to show that even if I replace my 3c905cx cards > with e1000's I'll still get killed with a 50kpps synflood with my current > CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I > can't do that in a 1U rack-mount box. Yeah, I could probably do it with > water cooling, but that's not an option in a telco hotel like 151 Front > St. (Toronto). > where are you getting the 50Kpps data from? I see him talkking of input rate of no less than 200Kpps. > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. > > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. If the BSD boxes turn out to have > twice the performance of the linux boxes, it may be better for me to dump > linux for routing altogether. :-( > This is why you dont get very positivre reaction. You use religious scripture and you expect that people will help prove you are wrong. Let the person who showed that BSD can do better publish the data. If they are in town, let me know because i am willing to walk to meet the challenge. Maybe we'll learn something. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:01:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:01:58 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AB1s2x028215 for ; Tue, 10 Jun 2003 04:01:55 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Pgsi-0009di-C3; Tue, 10 Jun 2003 07:01:36 -0400 Date: Tue, 10 Jun 2003 07:01:36 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610043453.GC23009@netnation.com> Message-ID: <20030610070045.N37047@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3049 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > Anyway, the performance issues should be fixable. It is going to take > some work, but there seem to be some interested people. I'm going to try > to set up something that will allow for easy comparisons of patches so > that we can measure progress, and perhaps reach an eventual goal. > Now heres the right spirit. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:23:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:23:59 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABNs2x029563 for ; Tue, 10 Jun 2003 04:23:54 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PhDs-0009eM-V6; Tue, 10 Jun 2003 07:23:28 -0400 Date: Tue, 10 Jun 2003 07:23:28 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance tests In-Reply-To: <20030610075732.GD23009@netnation.com> Message-ID: <20030610071638.R37090@shell.cyberus.ca> References: <20030610075732.GD23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3050 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Simon Kirby wrote: [some good stuff deleted] Simon, I havent looked at your data in details; i will. Someone like Robert would be able to snuff it much faster than i do. I just wanna say thanks for the effort, I will spend time catching up with you folks. It is clear that our next hurudle is gc. Do you have profiles for your data? Profiles would be nice to collect as well. > In any case, setting gc_min_interval to 0 definitely helped, but I > suspect Dave's patches will make a bigger difference. Next up is > 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. > Also since you are doing all that work post the kernels somewhere so people like foo can grab them and test as well. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:28:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:28:39 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABSZ2x030044 for ; Tue, 10 Jun 2003 04:28:35 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PhIW-0009ed-Pu; Tue, 10 Jun 2003 07:28:16 -0400 Date: Tue, 10 Jun 2003 07:28:16 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610043453.GC23009@netnation.com> Message-ID: <20030610072444.Q37105@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3051 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > I was going to ask before, and it's probably not even possible anymore, > but have you tried on a 2.0 kernel before? 2.0 kernels probably have a > lot of other problems and don't support the new hardware, but it would be > interesting to see how it scales to many srcs/dsts before the route cache > was integrated. It probably scales a lot more like FreeBSD does. You'd > probably have to use eepro100s or something, though. > As a side note, note that stateless forwarding like BSD patricie tries is no longer sufficient. Its no longer just looking up a nexthop, dec ttl, recompute csum that we are optimizing for. The dst cache/flowi is the way to go, so theres no going back;-> - we just gotta make what we have work better. cheers, jamal From pekkas@netcore.fi Tue Jun 10 04:42:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:42:13 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABg42x030794 for ; Tue, 10 Jun 2003 04:42:05 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5ABf9H21359; Tue, 10 Jun 2003 14:41:09 +0300 Date: Tue, 10 Jun 2003 14:41:08 +0300 (EEST) From: Pekka Savola To: Jamal Hadi cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610061010.Y36963@shell.cyberus.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3052 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > Typically, real world is less intense than the lab. Ex: noone sends > 100Mbps at 64 byte packet size. Some attackers do, and if your box dies because of that.. well, you don't like it and your managers certainly don't :-) > Typical packet is around 500 bytes > average. Not sure that's really the case. I have the impression the traffic is basically something like: - close to 1500 bytes (data transfers) - between 40-100 bytes (TCP acks, simple UDP requests, etc.) - something in between > If linux can handle that forwarding capacity, it should easily > be doing close to Gige real world capacity. Yes, but not the worst case capacity you really have to plan for :-( > Have you seen how the big boys advertise? when tuning specs they talk > about bits/sec. Juniper just announced a blade at supercom that can do > firewalling at 500Mbps. May be for some, but they *DO* give their pps figures also; many operators do, in fact, *explicitly* check the pps figures especially when there are some slower-path features in use (ACL's, IPv6, multicast, RPF, etc.): that's much more important than the optimal figures which are great for advertising material and press releases :-). -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 04:43:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:43:30 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABhL2x031390 for ; Tue, 10 Jun 2003 04:43:22 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5ABgssG004210; Tue, 10 Jun 2003 07:42:54 -0400 (EDT) Message-Id: <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> To: Jamal Hadi cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-reply-to: Your message of "Tue, 10 Jun 2003 06:53:04 EDT." <20030610061010.Y36963@shell.cyberus.ca> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 07:41:01 -0400 From: chas williams X-Spam-Score: () hits=-0.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3053 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030610061010.Y36963@shell.cyberus.ca>,Jamal Hadi writes: >is we actually test on real environments or maybe even collect real >world traffic patterns and run them in the lab. >Typically, real world is less intense than the lab. Ex: noone sends >100Mbps at 64 byte packet size. Typical packet is around 500 bytes >average. If linux can handle that forwarding capacity, it should easily i was curious at one point and collected a some packet size stats on our border router. while the average packet size is close to 500, the bulk (by count) of the traffic seems to be in the 64-95 byte range. (the length here is the link level size as given by tcpdump -e) # 27100000 packets average length = 747 0-31 5271 32-63 0 64-95 12143442 96-127 934314 128-159 202984 160-191 98772 192-223 49279 224-255 37826 256-287 28276 288-319 41675 320-351 42359 352-383 93709 384-415 24557 416-447 73969 448-479 25100 480-511 23210 512-543 86515 544-575 77779 576-607 146066 608-639 23967 640-671 23005 672-703 87471 704-735 13154 736-767 8818 768-799 20850 800-831 7678 832-863 7379 864-895 7920 896-927 5789 928-959 48122 960-991 35512 992-1023 26081 1024-1055 63541 1056-1087 23673 1088-1119 8397 1120-1151 5780 1152-1183 5133 1184-1215 8820 1216-1247 40251 1248-1279 6295 1280-1311 11420 1312-1343 31610 1344-1375 21802 1376-1407 22442 1408-1439 4932071 1440-1471 594385 1472-1503 439460 1504-1535 6434071 From jsd@monmouth.com Tue Jun 10 04:58:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:58:49 -0700 (PDT) Received: from tadenker.com (tadenker.com [65.103.215.217]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABwh2x032259 for ; Tue, 10 Jun 2003 04:58:44 -0700 Received: (qmail 16143 invoked from network); 10 Jun 2003 11:58:36 -0000 Received: from unknown (HELO av8n.net) (10.200.2.1) by jeeves.office.tad.private with SMTP; 10 Jun 2003 11:58:36 -0000 Received: (qmail 638 invoked from network); 10 Jun 2003 11:58:34 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 10 Jun 2003 11:58:34 -0000 Message-ID: <3EE5C7E9.6090401@monmouth.com> Date: Tue, 10 Jun 2003 07:58:33 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pekka Savola CC: Jamal Hadi , ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3054 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/10/2003 07:41 AM, Pekka Savola wrote: > >>Typical packet is around 500 bytes >>average. > > Not sure that's really the case. I have the impression the traffic is > basically something like: > - close to 1500 bytes (data transfers) > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > - something in between It helps to take a more sophisticated view of things. In typical networks: Most of the packet-count is to be found in small packets. Most of the byte-count is to be found in large packets. Some things (e.g. routing) depend mainly on the packet-count. Other things (e.g. encryption, layer-1 hardware requirements, memory bandwidth usage, ISP contracts) are sensitive to the byte-count. We shouldn't optimize one at the expense of the other. From hadi@shell.cyberus.ca Tue Jun 10 05:08:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 05:08:29 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AC8I2x000469 for ; Tue, 10 Jun 2003 05:08:19 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Phuf-0009fj-TO; Tue, 10 Jun 2003 08:07:41 -0400 Date: Tue, 10 Jun 2003 08:07:41 -0400 (EDT) From: Jamal Hadi To: Pekka Savola cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610075702.I37165@shell.cyberus.ca> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3055 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Pekka Savola wrote: > On Tue, 10 Jun 2003, Jamal Hadi wrote: > > Typically, real world is less intense than the lab. Ex: noone sends > > 100Mbps at 64 byte packet size. > > Some attackers do, and if your box dies because of that.. well, you don't > like it and your managers certainly don't :-) > Assuming the attacker has a 100mbps link to you, yes ;-> I am not trying to say we should ignore it; infact all our tests have been worst case scenarios. > > Typical packet is around 500 bytes > > average. > > Not sure that's really the case. I have the impression the traffic is > basically something like: > - close to 1500 bytes (data transfers) > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > - something in between > Its is typically trimodal (the ACKs, something in the 500 bytes and the 1500 byte end). The 500 average is derived from staring at cdf graphs: slightly dated more thorough: http://www.nlanr.net/NA/Learn/packetsizes.html Frequent collections by sprint: http://ipmon.sprint.com/packstat/packet.php?030407 so 500 bytes does sound reasonable. Theres a lot of papers that have been written on this subject. > > If linux can handle that forwarding capacity, it should easily > > be doing close to Gige real world capacity. > > Yes, but not the worst case capacity you really have to plan for :-( > agreed. > > Have you seen how the big boys advertise? when tuning specs they talk > > about bits/sec. Juniper just announced a blade at supercom that can do > > firewalling at 500Mbps. > > May be for some, but they *DO* give their pps figures also; many operators > do, in fact, *explicitly* check the pps figures especially when there are > some slower-path features in use (ACL's, IPv6, multicast, RPF, etc.): > that's much more important than the optimal figures which are great for > advertising material and press releases :-). > The announce in question i saw in some post supercom2003. I kept looking for conditions that apply to get that 500mbops but couldnt find any. A lot of people fall for the big brand name, so granted some people will check, quiet a few dont have that expertise and will buy because iut reads "juniper". cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 05:13:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 05:13:31 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ACDP2x000859 for ; Tue, 10 Jun 2003 05:13:26 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Phzm-0009g1-1l; Tue, 10 Jun 2003 08:12:58 -0400 Date: Tue, 10 Jun 2003 08:12:58 -0400 (EDT) From: Jamal Hadi To: "John S. Denker" cc: Pekka Savola , ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <3EE5C7E9.6090401@monmouth.com> Message-ID: <20030610080901.M37190@shell.cyberus.ca> References: <3EE5C7E9.6090401@monmouth.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3056 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, John S. Denker wrote: > On 06/10/2003 07:41 AM, Pekka Savola wrote: > > > >>Typical packet is around 500 bytes > >>average. > > > > Not sure that's really the case. I have the impression the traffic is > > basically something like: > > - close to 1500 bytes (data transfers) > > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > > - something in between > > It helps to take a more sophisticated view of things. > In typical networks: > Most of the packet-count is to be found in small packets. > Most of the byte-count is to be found in large packets. > > Some things (e.g. routing) depend mainly on the packet-count. > Other things (e.g. encryption, layer-1 hardware requirements, > memory bandwidth usage, ISP contracts) are sensitive to the > byte-count. > > We shouldn't optimize one at the expense of the other. You bring a good point. Theres another dimension actually: mostly driven by BSD mbuff style packet allocation; some tests show that some vendors are optimized for certain packet sizes, Linux skbuffs dont have this problem. We dont optimize for packet sizes given the linear nature of skbuffs. Donalds ether drivers tend to amortize some of the costs by reallocating skbs when the packet <= 100 bytes, but this is no longer valid with skb recycling and the magazine layer appearing in the slab. cheers, jamal From ralph@istop.com Tue Jun 10 06:11:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:11:33 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADBG2x005746 for ; Tue, 10 Jun 2003 06:11:17 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id E669F36D0F; Tue, 10 Jun 2003 09:10:38 -0400 (EDT) Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610061010.Y36963@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3057 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > I thought you were saying those were _not_ real world traffic patterns. I'm saying the tests that you and Rob did in the past did not reflect real-world use of Linux as a core router (i.e. small routing table and not many different traffic flows). The tests he posted yesterday are a big step forward. > Typically, real world is less intense than the lab. Ex: noone sends > 100Mbps at 64 byte packet size. Typical packet is around 500 bytes > average. If linux can handle that forwarding capacity, it should easily > be doing close to Gige real world capacity. No, it needs to work in the worst case. If some script kiddie can peg my CPU with a synflood then there's still a problem. > > Most of the people I know that would actually see 50kpps > > in the real world don't have the time to apply various patches and run a > > Now thats one big dilema, isnt it? Do you think i have time? Let me > assure you that I dont get paid by anybody to do any of this stuff. Sure I realize that. The problem I've seen occur is that Linux developers with big egos say "linux can route as well as a cisco 3640", or "linux routing is beats BSD any day". Then guys like me decide to give it a try, not realizing we're walking into a tarpit. If I had been told in the first place that running linux as a high-throughput router in a service provider environment was an unknown, things would have been different. > I have spent many hours investigating peoples problems sshing to their > machines only to find out they didnt follow instructions. After the > 10th person doing the same thing, what do you expect my reaction to be? Take 15 minutes and write a web page with the magic settings required to make things work. > > Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra > > 0.93b. If the ideas you're referring to are changing the zebra source to > > arp the next-nops, then no, I haven't tried it (and am not likely to any > > time soon). > > > > I think you may be suffering from the "too low" traffic NAPI syndrome. > Under low traffic (1-2 Mbps) on lower end machines NAPI will consume > more CPU because of an extra PCI operation per packet that is performed. No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx cards. > As for the zebra thing, if you post my message to the Zebra list i am sure > someone will be excited enough to do it. I need a few hours to do it > but like you i dont have much time. The last time I looked at the zebra list things seemed pretty dead. Most of the new work is now happening on the commercial zebra development. > > > Well, heres a good example: With NAPI, have your sessions been dropped? > > Yup, twice in the last 2 weeks. > > > > I have seen NAPI slow down throughput because of an intensive user space > app. This is a router with just zebra (zebra, ospfd, bgpd) running. > > I had joined the vortex list last fall looking for some tips and that > > didn't help much (other than telling me that the 3com wasn't the best > > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > > I'm planning to put into production. > > yes, move to the giges then lets talk again. I think your main problem is > that 3com NAPI is not well supported. Lennert disappeared right after he > released the patch and noone else has the interest of maintaining it. Yes, and it would be nice if you mentioned in your NAPI docs that people should use a tulip, tg3, or e1000 if they want it to work well. In making your sales pitches for NAPI you made it sound like any high-performance card should do fine (i.e. anything but a Realtek). > > Rob's test results seem to show that even if I replace my 3c905cx cards > > with e1000's I'll still get killed with a 50kpps synflood with my current > > CPU. > > where are you getting the 50Kpps data from? I see him talkking of > input rate of no less than 200Kpps. On his first graph, for 50k new incoming dst/sec throughput looks to be ~175kpps. And he's running a 1.8Ghz Xenon vs my 750Mhz Duron. > > used a second Linux box for redundancy. If the BSD boxes turn out to have > > twice the performance of the linux boxes, it may be better for me to dump > > linux for routing altogether. :-( > > > > This is why you dont get very positivre reaction. You use religious > scripture and you expect that people will help prove you are wrong. You don't seem to get it. There's at least a dozen things more important to me than seeing Linux routing performance compete with Cisco and BSD. I'm annoyed that people like you have told me linux is up to the task, and then when it's not I'm left SOL. I thought I was talking to competent techies, but now I see most of the techies were also Linux evangelists. Now that people like Rob and Dave are taking a hard look at it I think it's worth my while to ante up for a couple more rounds. I still fell like a sucker that should have walked away from the table a long time ago though. Jim Mercer and Marc Ackley at 151.net/tht.net told me they tried Linux/Zebra and gave up (and went with 7206vxr routers). And they're very pro-unix (still do all their netflow collection and billing on Unix). They're not likely to go back and give Linux another try. If the linux evangelists had just said Linux would be ready for core routing in a year (or whatever) instead, I think network operators would look at it more seriously rather than they joke that they see it as now. -Ralph From ralph@istop.com Tue Jun 10 06:34:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:34:20 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADYE2x006400 for ; Tue, 10 Jun 2003 06:34:14 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id C7E3E36C48; Tue, 10 Jun 2003 09:34:10 -0400 (EDT) Date: Tue, 10 Jun 2003 09:34:15 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance tests In-Reply-To: <20030610075732.GD23009@netnation.com> Message-ID: References: <20030610075732.GD23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3058 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Simon Kirby wrote: > Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward > an amazing 39,000 packets per second. Woohoo! I hope that's sarcasm. I know if you posted to NANOG saying it took a dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the list. Maybe I should be bragging about my 3-minute lap times on the Shannonville track in my M5! -Ralph From hadi@shell.cyberus.ca Tue Jun 10 06:37:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:37:12 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADb22x006711 for ; Tue, 10 Jun 2003 06:37:03 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PjId-0009iP-LR; Tue, 10 Jun 2003 09:36:31 -0400 Date: Tue, 10 Jun 2003 09:36:31 -0400 (EDT) From: Jamal Hadi To: ralph+d@istop.com cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610091736.V37313@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3059 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Ralph Doncaster wrote: > On Tue, 10 Jun 2003, Jamal Hadi wrote: > > > I thought you were saying those were _not_ real world traffic patterns. > > I'm saying the tests that you and Rob did in the past did not reflect > real-world use of Linux as a core router (i.e. small routing table and not > many different traffic flows). The tests he posted yesterday are a big > step forward. > I think at a minimal define what "real world" means. Is it 100 flows/sec at 20Kpps? what is it? > > Typically, real world is less intense than the lab. Ex: noone sends > > 100Mbps at 64 byte packet size. Typical packet is around 500 bytes > > average. If linux can handle that forwarding capacity, it should easily > > be doing close to Gige real world capacity. > > No, it needs to work in the worst case. If some script kiddie can peg my > CPU with a synflood then there's still a problem. > Lets work on defining "real world". Factor in the script kiddie. > Sure I realize that. The problem I've seen occur is that Linux developers > with big egos say "linux can route as well as a cisco 3640", or "linux > routing is beats BSD any day". Then guys like me decide to give it a try, > not realizing we're walking into a tarpit. If I had been told in the > first place that running linux as a high-throughput router in a service > provider environment was an unknown, things would have been different. > Heres where the problem is: If you interact at this low level then you oughta produce low level input. Provide people with data to help. Otherwise its a high maintanance task. > > I have spent many hours investigating peoples problems sshing to their > > machines only to find out they didnt follow instructions. After the > > 10th person doing the same thing, what do you expect my reaction to be? > > Take 15 minutes and write a web page with the magic settings required to > make things work. > I have many times. I still do. It is also a thankless task. > > I think you may be suffering from the "too low" traffic NAPI syndrome. > > Under low traffic (1-2 Mbps) on lower end machines NAPI will consume > > more CPU because of an extra PCI operation per packet that is performed. > > No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx > cards. > Change your NICs. I dont know what else to suggest. > > As for the zebra thing, if you post my message to the Zebra list i am sure > > someone will be excited enough to do it. I need a few hours to do it > > but like you i dont have much time. > > The last time I looked at the zebra list things seemed pretty dead. Most > of the new work is now happening on the commercial zebra development. > Maybe its time to fork Zebra into something that has the same momentum it had in the earlier days. > > yes, move to the giges then lets talk again. I think your main problem is > > that 3com NAPI is not well supported. Lennert disappeared right after he > > released the patch and noone else has the interest of maintaining it. > > Yes, and it would be nice if you mentioned in your NAPI docs that people > should use a tulip, tg3, or e1000 if they want it to work well. In making > your sales pitches for NAPI you made it sound like any high-performance > card should do fine (i.e. anything but a Realtek). > Theres a URL which points people to where the various NICS supported are. > On his first graph, for 50k new incoming dst/sec throughput looks to be > ~175kpps. And he's running a 1.8Ghz Xenon vs my 750Mhz Duron. > i think what would be interesting is to show CPU utilization as well. > > This is why you dont get very positivre reaction. You use religious > > scripture and you expect that people will help prove you are wrong. > > You don't seem to get it. There's at least a dozen things more important > to me than seeing Linux routing performance compete with Cisco and BSD. Again, if you wanna complain about it at the level you are i think its only fair you help. I actually dont care about CISCO or BSD. We dont win because someone else looses. We simply want to be the best. If you tell me BSD works better, i told you i will walk all the way downtown in the hope i'll find somethuing we can improve on. > I'm annoyed that people like you have told me linux is up to the task, and > then when it's not I'm left SOL. I thought I was talking to competent > techies, but now I see most of the techies were also Linux evangelists. > > Now that people like Rob and Dave are taking a hard look at it I think > it's worth my while to ante up for a couple more rounds. I still fell > like a sucker that should have walked away from the table a long time ago > though. > I think your setup maybe the question. Like i said theres probably a hunderd variables involved. It is up to you to isolate things. Yes, theres a support line in open source, but it is rewarded more when people show some effort. > Jim Mercer and Marc Ackley at 151.net/tht.net told me they tried > Linux/Zebra and gave up (and went with 7206vxr routers). And they're very > pro-unix (still do all their netflow collection and billing on Unix). > They're not likely to go back and give Linux another try. If the linux > evangelists had just said Linux would be ready for core routing in a year > (or whatever) instead, I think network operators would look at it more > seriously rather than they joke that they see it as now. > Theres a lot of BSD bigots in a lot of ISPS and IETF. It's human nature to be comfortable with what they know best. Most of the people i have met that put Linux down or consider it a joke come from the old BSD camp. Its their loss and i dismiss anything they have to say. Lets work on facts. What is it that we can do to improve Linux? Provide data. If you want to compare against BSD, what is it that _ you have facts on_ and not heard from other people that BSD does better? cheers, jamal From Robert.Olsson@data.slu.se Tue Jun 10 06:41:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:41:56 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADfk2x007045 for ; Tue, 10 Jun 2003 06:41:51 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id PAA28797; Tue, 10 Jun 2003 15:41:09 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.57333.369129.622540@robur.slu.se> Date: Tue, 10 Jun 2003 15:41:09 +0200 To: "David S. Miller" Cc: hadi@shell.cyberus.ca, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.160547.41648991.davem@redhat.com> References: <20030609221911.GF11509@netnation.com> <008001c32eda$56760830$4a00000a@badass> <20030609.160547.41648991.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3060 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev First run... Worst senario 1 dst/pkt w. 64 byte pkts. 2*10 Million packets injected. eth0, eth2. Input rate 2*190 kpps clone_skb=1. Routing table of 123946 routes. UP. NAPI gives fairmess between both DoS attackers. :-) But more testing to be done. plain w. DaveM patch ---------------------------------- 72 114 kpps throughput 30271883 12246290 hash misses (second last in my rt_cache_stat) 58% better... and it can be further improved. Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 1964858 9793618 9793618 8035147 16 0 0 0 BRU eth1 1500 0 19 0 0 0 1887577 0 0 0 BRU eth2 1500 0 1964698 9793419 9793419 8035305 3 0 0 0 BRU eth3 1500 0 1 0 0 0 1886904 0 0 0 BRU /proc/net/rt_cache_stat 000004ba 00000e27 003be7ba 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000000 003869c1 00360b4d 00025dcb 00025dca 01cde98b 00000000 With DaveM hash-list limit patch. Input rate 2*190 kpps clone_skb=1 Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 2990462 9680257 9680257 7009542 12 0 0 0 BRU eth1 1500 0 12 0 0 0 2990467 0 0 0 BRU eth2 1500 0 2990460 9673421 9673421 7009544 4 0 0 0 BRU eth3 1500 0 1 0 0 0 2990459 0 0 0 BRU /proc/net/rt_cache_stat 00000000 00000607 005b3cfb 00000000 00000000 00000000 00000000 00000000 00000000 00000002 00000000 005b2cfa 005b2ced 00000008 00000000 00badd12 00000003 Cheers. --ro From ralph@istop.com Tue Jun 10 07:00:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 07:00:42 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AE0O2x007515 for ; Tue, 10 Jun 2003 07:00:25 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id E3ADC374AD; Tue, 10 Jun 2003 09:18:32 -0400 (EDT) Date: Tue, 10 Jun 2003 09:18:37 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: Simon Kirby , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610072444.Q37105@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> <20030610072444.Q37105@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3061 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > As a side note, note that stateless forwarding like BSD patricie tries > is no longer sufficient. Its no longer just looking up a nexthop, dec ttl, > recompute csum that we are optimizing for. It would certainly be sufficient for core routing. If I can have flow manipulation at no extra cost, I'll take it. If it's going to double the horsepower requirements, I don't want it. -Ralph From ralph@istop.com Tue Jun 10 07:33:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 07:33:54 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AEXV2x008077 for ; Tue, 10 Jun 2003 07:33:32 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 2442638360; Tue, 10 Jun 2003 10:03:29 -0400 (EDT) Date: Tue, 10 Jun 2003 10:03:33 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610091736.V37313@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> <20030610091736.V37313@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3062 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > > No, it needs to work in the worst case. If some script kiddie can peg my > > CPU with a synflood then there's still a problem. > > > > Lets work on defining "real world". Factor in the script kiddie. "real world" is the worst-case DOS tool available. Synflood tools like juno seem to fit that category. If you think juno is not a good real-world test, then keep pissing people off and you'll find out how real it is. ;-) > > > I have spent many hours investigating peoples problems sshing to their > > > machines only to find out they didnt follow instructions. After the > > > 10th person doing the same thing, what do you expect my reaction to be? > > > > Take 15 minutes and write a web page with the magic settings required to > > make things work. > > > > I have many times. I still do. It is also a thankless task. URL? I've looked at almost everything on your web page since you were involved in the pppoe client software. I haven't seen anything that says how to sprinkle the pixie dust so my router works well. > > No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx > > cards. > > > > Change your NICs. I dont know what else to suggest. Yup. It just takes a bit of time and planning when the box is deployed in a POP 400km away... > > The last time I looked at the zebra list things seemed pretty dead. Most > > of the new work is now happening on the commercial zebra development. > > > > Maybe its time to fork Zebra into something that has the same momentum it > had in the earlier days. Hmmm... maybe we can both bug MCR to try your suggested changes... > > You don't seem to get it. There's at least a dozen things more important > > to me than seeing Linux routing performance compete with Cisco and BSD. > > Again, if you wanna complain about it at the level you are i think its > only fair you help. I actually dont care about CISCO or BSD. We dont win > because someone else looses. We simply want to be the best. You can want to be the best, but I don't think it's fair to sucker people into using Linux as a core router with false claims. > > Now that people like Rob and Dave are taking a hard look at it I think > > it's worth my while to ante up for a couple more rounds. I still fell > > like a sucker that should have walked away from the table a long time ago > > though. > > > > I think your setup maybe the question. Like i said theres probably a > hunderd variables involved. It is up to you to isolate things. > Yes, theres a support line in open source, but it is rewarded more > when people show some effort. Fuck, if you think I haven't put any effort into it already then there's no point in even trying any more. > to be comfortable with what they know best. Most of the people i have > met that put Linux down or consider it a joke come from the old > BSD camp. Its their loss and i dismiss anything they have to say. In my case I would have been better off to dismiss your advice a year ago. How does that help the Linux cause? -Ralph From ralph@istop.com Tue Jun 10 08:29:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:29:19 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFTD2x008866 for ; Tue, 10 Jun 2003 08:29:13 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id CFB8636AAF; Tue, 10 Jun 2003 11:29:12 -0400 (EDT) Date: Tue, 10 Jun 2003 11:29:18 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: Pekka Savola , CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610075702.I37165@shell.cyberus.ca> Message-ID: References: <20030610075702.I37165@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3063 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > Assuming the attacker has a 100mbps link to you, yes ;-> A script kiddie 0wning a box with a FE connection is nothing. During what was probably the worst DOS I got hit with, one of my upstream providers said they were seeing about 600mbps of traffic related to the attack. -Ralph From nakam@linux-ipv6.org Tue Jun 10 08:45:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:45:58 -0700 (PDT) Received: from localhost (p2162-ipbf07hodogaya.kanagawa.ocn.ne.jp [220.104.10.162]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFji2x009551 for ; Tue, 10 Jun 2003 08:45:45 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19PlEq-000139-00; Wed, 11 Jun 2003 00:40:44 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030611004035.40027642.nakam@linux-ipv6.org> In-Reply-To: <3EE5F85E.9080006@tml.hut.fi> References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Wed, 11 Jun 2003 00:40:44 +0900 X-archive-position: 3064 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev On Tue, 10 Jun 2003 18:25:18 +0300 Henrik Petander wrote: > Then the policies for mipv6 would need to be specified at the same time > as the ipsec policies. This is not a problem as long as the policies are > loaded at start up. However, this could lead to problems with > applications which specify their own policies, e.g. racoon. How about providing interface of handling templates to update existing policy in kernel? Regards, -- Masahide NAKAMURA From davem@redhat.com Tue Jun 10 08:53:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:53:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFr52x010028 for ; Tue, 10 Jun 2003 08:53:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22890; Tue, 10 Jun 2003 08:49:41 -0700 Date: Tue, 10 Jun 2003 08:49:40 -0700 (PDT) Message-Id: <20030610.084940.74727904.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3065 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Mon, 9 Jun 2003 20:32:48 -0400 (EDT) Lastly from the software side Linux doesn't seem to have anything like BSD's parameter to control user/system CPU sharing. Once my CPU load reaches 70-80%, I'd rather have some dropped packets than let the CPU hit 100% and end up with my BGP sessions drop. When packet (more specifically, software interrupt) processing reaches a certain level, we offload the work into process context. From chas@relax.cmf.nrl.navy.mil Tue Jun 10 08:55:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:55:07 -0700 (PDT) Received: from relax.cmf.nrl.navy.mil (relax.cmf.nrl.navy.mil [134.207.10.227]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFt32x010377 for ; Tue, 10 Jun 2003 08:55:03 -0700 Received: (from chas@localhost) by relax.cmf.nrl.navy.mil (8.11.6/8.11.6) id h5AFtbQ00899 for netdev@oss.sgi.com; Tue, 10 Jun 2003 11:55:37 -0400 Date: Tue, 10 Jun 2003 11:55:37 -0400 From: chas williams Message-Id: <200306101555.h5AFtbQ00899@relax.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [RFC] suggest changes cleanup to atm svc/pvc family X-archive-position: 3066 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev i was hoping some people might take a look at the following changes and let me know what they think. ftp://galileo.cmf.nrl.navy.mil/pub/chas/linux-atm/2_5_70_vcc_sklist_diffs a quick summary (since the diff is rather lengthy): - vcc are now in a global list protected by a rw lock (much like other protocol families). this means the atm devices dont hold a list of vcc's. this make some things much easier to write. - a few things where renamed to vcc_XXX from atm_XXX. eventually routines deal with vcc's will be vcc_, svc_, or pvc_. atm devices functions should be called atm_dev_XXX. this makes things a bit easier to read. - vcc are now reference counted properly (or so i think) (this doenst mean all the atm drivers understand this yet. the he driver should do the right thing though, holding a read on vcc sklist lock during recv operations to keep vcc's from prematurely disappearing. - SOCKOPS_WRAP was removed and lock_sock's introduced in the appropriate locations. i might have a missed some. - atm_ioctl was split into vcc_ioctl and atm_dev_ioctl - recvmsg was rewritten to take advantage of some the existing kernel routines that make datagram manipulation so much easier. - sendmsg needs rewritten but the ip components will need to skb_clone so they can skb_set_owner_w on skb's that might already be owned by another socket. right? - changed add_wait_queue to prepare_to_wait and finish_wait. is this the accepted interface? From davem@redhat.com Tue Jun 10 08:57:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:57:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFv02x010729 for ; Tue, 10 Jun 2003 08:57:00 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22919; Tue, 10 Jun 2003 08:53:42 -0700 Date: Tue, 10 Jun 2003 08:53:42 -0700 (PDT) Message-Id: <20030610.085342.41654796.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609204257.L35799@shell.cyberus.ca> References: <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3067 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 21:15:18 -0400 (EDT) Have you tried a different NIC? Not sure how well the 3com is maintained for example. Acutally, the main issue with 3c59x is that it still uses PIO accesses. This basically makes it useless for routing or anything wanting serious latency. Andrew Morton knows this, but he is such a good maintainer that he doesn't want to change over the MEM I/O accesses for fear of breaking something. It's actually a simple change to make if someone wants to spend a few cycles on it, then you can see what kind of performance you'll get with that. From davem@redhat.com Tue Jun 10 08:59:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:59:23 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFxJ2x011097 for ; Tue, 10 Jun 2003 08:59:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22933; Tue, 10 Jun 2003 08:56:01 -0700 Date: Tue, 10 Jun 2003 08:56:00 -0700 (PDT) Message-Id: <20030610.085600.71109220.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610015311.GB23009@netnation.com> References: <20030609195652.E35696@shell.cyberus.ca> <20030610015311.GB23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3068 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 18:53:12 -0700 Your CPU use is quite a bit higher than ours. Yeah, but his faster cpu is all being burnt to a crisp doing PIO accesses to the 3c59x card. I found that once NAPI was happening, userspace seemed to get a fairly decent amount of time. Unfortunately, NAPI won't help him with the current way the 3c59x driver works. It needs to provide a way to use MEM I/O before NAPI would start to be of use to him. From davem@redhat.com Tue Jun 10 09:10:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:10:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGAG2x011788 for ; Tue, 10 Jun 2003 09:10:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA22960; Tue, 10 Jun 2003 09:06:56 -0700 Date: Tue, 10 Jun 2003 09:06:56 -0700 (PDT) Message-Id: <20030610.090656.104052471.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: sim@netnation.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610015311.GB23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3069 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Mon, 9 Jun 2003 23:18:45 -0400 (EDT) Even if my time was only worth $500/day, in the past year and a half I spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( Slapping different machines together and mucking with zebra config files is not going to fix the kind of issues you are talking about. It is pure wasted effort. Someone needs to apply brains to the code and improve the algorithms and schemes we use. So far I see approximately 1 person doing something for every 1,000 guys complaining. So shut your yap and open up and editor and some algorithms books and papers. :) If you stop using Linux right now, I won't cry nor will I lose sleep tonight, I've never felt threatened by such things so I wouldn't advise using them to coerce me into somehow "working harder". :) See, I know the reasonable people will stick around and back me up as I continue to improve the code. Becuase I'm actually doing something about the problems. From davem@redhat.com Tue Jun 10 09:13:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:14:01 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGDv2x012407 for ; Tue, 10 Jun 2003 09:13:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA22983; Tue, 10 Jun 2003 09:10:44 -0700 Date: Tue, 10 Jun 2003 09:10:44 -0700 (PDT) Message-Id: <20030610.091044.78724912.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610043453.GC23009@netnation.com> References: <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3070 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 21:34:53 -0700 Broken parts of the code only get fixed if enough people whine This isn't how I operate... or especially if somebody decides to actually fix it. This is. I hack on something because I want to and it seems interesting to me at the moment. Not because someone is shitting their pants in public about it. :) So, for future reference, you'll get more using honey than vinegar from me :) Franks a lot, David S. Miller davem@redhat.com From bogdan.costescu@iwr.uni-heidelberg.de Tue Jun 10 09:15:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:15:30 -0700 (PDT) Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGFM2x012857 for ; Tue, 10 Jun 2003 09:15:25 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:q2RTacGrgKO+YS81+Qyx5C81bBI5qhL+@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h5AGF7F4014193; Tue, 10 Jun 2003 18:15:08 +0200 (MET DST) Received: from kenzo.iwr.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8) with ESMTP id h5AGF8f0027696; Tue, 10 Jun 2003 18:15:08 +0200 Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8/Submit) with ESMTP id h5AGF72r027692; Tue, 10 Jun 2003 18:15:07 +0200 Date: Tue, 10 Jun 2003 18:15:07 +0200 (CEST) From: Bogdan Costescu To: "David S. Miller" cc: hadi@shell.cyberus.ca, , , , , , Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030610.085342.41654796.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3071 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bogdan.costescu@iwr.uni-heidelberg.de Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > Acutally, the main issue with 3c59x is that it still > uses PIO accesses. This basically makes it useless > for routing or anything wanting serious latency. I did try about 2 years ago and converted the driver to MMIO. I wasn't able to see _any_ kind of improvement and I was using it in parallel computation where latency counts. I have to say though that I wasn't interested at that time in obtaining profiles and such because only the end-user performance was important. > Andrew Morton knows this, ... and knows about my MMIO trial too (mentioned also on vortex-list)... > but he is such a good maintainer that he doesn't want to change over the > MEM I/O accesses for fear of breaking something. Given that the 3c59x driver supports several generations of cards most of them being EOL-ed years ago, it's pretty hard to do such change. If a new driver would be forked that serviced only the latest generations (Cyclone = 905B and Tornado = 905C(X)), switching to MMIO would probably make sense along with lots of others small changes (large MTU/VLAN, polling descriptors, MII-only media selection etc.) and maybe have NAPI in the mix as well... > It's actually a simple change to make if someone wants to > spend a few cycles on it, Not if you include testing in those cycles :-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From lpetande@tml.hut.fi Tue Jun 10 09:17:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:17:17 -0700 (PDT) Received: from smtp-1.hut.fi (root@smtp-1.hut.fi [130.233.228.91]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGH92x013263 for ; Tue, 10 Jun 2003 09:17:12 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-1.hut.fi (8.12.9/8.12.9) with ESMTP id h5AFHFji009028; Tue, 10 Jun 2003 18:17:17 +0300 Message-ID: <3EE5F85E.9080006@tml.hut.fi> Date: Tue, 10 Jun 2003 18:25:18 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Masahide NAKAMURA CC: Henrik Petander , YOSHIFUJI Hideaki , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> <20030609203659.089b241b.nakam@linux-ipv6.org> In-Reply-To: <20030609203659.089b241b.nakam@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-1.hut.fi) X-DCC-HUTCC-Metrics: smtp-1.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 3072 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Masahide NAKAMURA wrote: > On Mon, 9 Jun 2003 12:06:35 +0300 (EEST) > Henrik Petander wrote: > > >>On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: >> >>>We don't think we have to change the logic handling policy with >>>the reason because we can treat MIPv6 policy just like IPsec. >>> >>>When we want to apply both MIPv6 and IPsec to the same target, >>>we need one policy that has two or more of templates(e.g. one is >>>MIPv6's template and the other is IPsec's). >> >>Does this also mean that the IPSec and MIPv6 policies and SAs need to be >>configured at the same time or is it possible to add templates to an >>existing policy? > > > Currently no interface to add templates directly to it. Then the policies for mipv6 would need to be specified at the same time as the ipsec policies. This is not a problem as long as the policies are loaded at start up. However, this could lead to problems with applications which specify their own policies, e.g. racoon. Henrik From ak@suse.de Tue Jun 10 09:20:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:20:41 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGKa2x013671 for ; Tue, 10 Jun 2003 09:20:37 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 02B7C142CE; Tue, 10 Jun 2003 18:20:31 +0200 (MEST) Date: Tue, 10 Jun 2003 18:20:29 +0200 From: Andi Kleen To: Bogdan Costescu Cc: "David S. Miller" , hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610162029.GA8168@wotan.suse.de> References: <20030610.085342.41654796.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 3073 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > > > but he is such a good maintainer that he doesn't want to change over the > > MEM I/O accesses for fear of breaking something. > > Given that the 3c59x driver supports several generations of cards most of > them being EOL-ed years ago, it's pretty hard to do such change. If a new > driver would be forked that serviced only the latest generations (Cyclone > = 905B and Tornado = 905C(X)), switching to MMIO would probably make sense > along with lots of others small changes (large MTU/VLAN, polling > descriptors, MII-only media selection etc.) and maybe have NAPI in the mix > as well... Can't you just wrap it in a few macros and offer a config for those who want the best performance and a runtime test for the others? Then switch between PIO and mmio dynamically. Even runtime test should be pretty painless these days, the CPU normally can execute hundreds or even thousands of tests in the time it takes to wait for an mmio or even PIO. > > > It's actually a simple change to make if someone wants to > > spend a few cycles on it, > > Not if you include testing in those cycles :-) Just make it a whitelist + a force module param. -Andi (who has a 3c980 and could do it, but already has too much on his todo list..) From garzik@gtf.org Tue Jun 10 09:23:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:23:50 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGNk2x014074 for ; Tue, 10 Jun 2003 09:23:47 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 48F006641; Tue, 10 Jun 2003 12:23:42 -0400 (EDT) Date: Tue, 10 Jun 2003 12:23:42 -0400 From: Jeff Garzik To: Andi Kleen Cc: Bogdan Costescu , "David S. Miller" , hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610162342.GB1959@gtf.org> References: <20030610.085342.41654796.davem@redhat.com> <20030610162029.GA8168@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610162029.GA8168@wotan.suse.de> User-Agent: Mutt/1.3.28i X-archive-position: 3074 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 06:20:29PM +0200, Andi Kleen wrote: > Can't you just wrap it in a few macros and offer a config for those > who want the best performance and a runtime test for the others? > Then switch between PIO and mmio dynamically. > > Even runtime test should be pretty painless these days, the CPU normally > can execute hundreds or even thousands of tests in the time it takes to > wait for an mmio or even PIO. I prefer a compile-time test. But yes, this is what several other net drivers do: offer a config option for MMIO (or PIO), and the default is MMIO unless that is known to be unsafe on certain boards (which, unfortunately, it is). Jeff From davem@redhat.com Tue Jun 10 09:31:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:31:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGVH2x014605 for ; Tue, 10 Jun 2003 09:31:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23117; Tue, 10 Jun 2003 09:27:48 -0700 Date: Tue, 10 Jun 2003 09:27:48 -0700 (PDT) Message-Id: <20030610.092748.115929981.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> References: <20030610061010.Y36963@shell.cyberus.ca> <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3075 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 10 Jun 2003 07:41:01 -0400 the bulk (by count) of the traffic seems to be in the 64-95 byte range. Ok, time to deploy ATM everywhere to replace our IP routers :) Sorry Chas, I couldn't resist... :) Reagardless, there are some sites on the net that publish things like BGP tables and traffic samples that people can use to do performance testing on new algorithms. I've read about it in papers by Vern Paxson (he used it to do his Bro thing) and others. I don't have a reference handy, anyone? I think it's called the IPMA project... From davem@redhat.com Tue Jun 10 09:37:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:37:19 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGbF2x015080 for ; Tue, 10 Jun 2003 09:37:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23136; Tue, 10 Jun 2003 09:33:50 -0700 Date: Tue, 10 Jun 2003 09:33:49 -0700 (PDT) Message-Id: <20030610.093349.48511220.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: jsd@monmouth.com, pekkas@netcore.fi, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610080901.M37190@shell.cyberus.ca> References: <3EE5C7E9.6090401@monmouth.com> <20030610080901.M37190@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3076 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Tue, 10 Jun 2003 08:12:58 -0400 (EDT) Theres another dimension actually: mostly driven by BSD mbuff style packet allocation; some tests show that some vendors are optimized for certain packet sizes, Linux skbuffs dont have this problem. Well, the most amusing part for me is that if you read all the papers on TCP congestion algorithms you'd think that routers dropped based upon packet sizes since the majority work on multiple of MSS this and multiple of MSS that. :) Routers drop packets, period. They do so using a variety of selection schemes (RED, CBQ, actually just egrep net/sched/sch_*.c :) but you're contribution to the router's work is measured in terms of packets and time when you come right down to it. From davem@redhat.com Tue Jun 10 09:41:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:41:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGfZ2x015536 for ; Tue, 10 Jun 2003 09:41:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23156; Tue, 10 Jun 2003 09:38:12 -0700 Date: Tue, 10 Jun 2003 09:38:11 -0700 (PDT) Message-Id: <20030610.093811.08342771.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610061010.Y36963@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3077 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx cards. This is because the driver still uses PIO, I am rather sure of this. From davem@redhat.com Tue Jun 10 09:42:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:42:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGgk2x015785 for ; Tue, 10 Jun 2003 09:42:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23165; Tue, 10 Jun 2003 09:39:27 -0700 Date: Tue, 10 Jun 2003 09:39:27 -0700 (PDT) Message-Id: <20030610.093927.21906828.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610061010.Y36963@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3078 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) Yes, and it would be nice if you mentioned in your NAPI docs that people should use a tulip, tg3, or e1000 if they want it to work well. In making your sales pitches for NAPI you made it sound like any high-performance card should do fine (i.e. anything but a Realtek). The problems the 3c59x has is nothing to do with NAPI vs. non-NAPI. You're routing rate is limited by how much time a PIO to the PCI device takes :) From bogdan.costescu@iwr.uni-heidelberg.de Tue Jun 10 09:45:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:45:20 -0700 (PDT) Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGjE2x016234 for ; Tue, 10 Jun 2003 09:45:15 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:q5UrjJhLGKaJ/W/fmsQF9/+ZUycREyip@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h5AGj3F4014940; Tue, 10 Jun 2003 18:45:03 +0200 (MET DST) Received: from kenzo.iwr.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8) with ESMTP id h5AGj3f0027954; Tue, 10 Jun 2003 18:45:03 +0200 Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8/Submit) with ESMTP id h5AGj3Vm027950; Tue, 10 Jun 2003 18:45:03 +0200 Date: Tue, 10 Jun 2003 18:45:03 +0200 (CEST) From: Bogdan Costescu To: "David S. Miller" cc: sim@netnation.com, , , , , , Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030610.085600.71109220.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3079 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bogdan.costescu@iwr.uni-heidelberg.de Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > Unfortunately, NAPI won't help him with the current way the 3c59x > driver works. It needs to provide a way to use MEM I/O before NAPI > would start to be of use to him. I don't really want to sound like defending the 3c59x driver, but... The 3c90x driver released by 3Com uses some mechanism "similar" to NAPI which is based on the on-board timer; these timer interrupts are scheduled dynamically. With this driver I would typically get TCP bandwidth figures 4-5 Mbps lower than those obtained with 3c59x and noticable difference in the parallel jobs timing (using MPI over TCP). I'm not saying that NAPI will perform the same way, just that there might be also hardware limits somewhere... But the real question is: does it make sense to spend time now in trying to improve a driver with hope for only a marginal speed increase ? After using these cards and the 3c59x driver with very good results for the past 4 years, I'm looking for GigE replacements. Shouldn't anybody concerned with performance do the same ? Does it make sense to pair a very fast CPU and memory with a 33MHz-32bit PCI bus ? And another important question: how much improvement can be gained from the driver ? Folks that do parallel computation over TCP over Ethernet know very well that the software in the kernel is the bottleneck (extra copies, TCP, IRQ management, etc). Packages that throw away TCP and use another communication protocol can typically achieve much better ping-pong times (they do have some other problems though) which shows that the hardware and NIC driver are capable enough. So until I see a profile showing that the CPU is spending most of the time in the driver, I won't be convinced that these changes are needed.... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From ak@suse.de Tue Jun 10 09:50:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:50:23 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGoJ2x016605 for ; Tue, 10 Jun 2003 09:50:20 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 978631505F; Tue, 10 Jun 2003 18:49:49 +0200 (MEST) Date: Tue, 10 Jun 2003 18:49:49 +0200 From: Andi Kleen To: Bogdan Costescu Cc: "David S. Miller" , sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610164949.GB13246@wotan.suse.de> References: <20030610.085600.71109220.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 3080 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > > And another important question: how much improvement can be gained from > the driver ? Folks that do parallel computation over TCP over Ethernet You can play some tricks with the driver to make eth_type_trans disappear from the profiles. This usually helps a lot because it avoids one full "fetch from cache cold memory" roundtrip per packet, which is slow on any CPU. -Andi From davem@redhat.com Tue Jun 10 09:56:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:56:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGuT2x017079 for ; Tue, 10 Jun 2003 09:56:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23255; Tue, 10 Jun 2003 09:51:36 -0700 Date: Tue, 10 Jun 2003 09:51:35 -0700 (PDT) Message-Id: <20030610.095135.28806569.davem@redhat.com> To: lpetande@tml.hut.fi Cc: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <3EE5F85E.9080006@tml.hut.fi> References: <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3081 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Tue, 10 Jun 2003 18:25:18 +0300 Then the policies for mipv6 would need to be specified at the same time as the ipsec policies. This is not a problem as long as the policies are loaded at start up. However, this could lead to problems with applications which specify their own policies, e.g. racoon. It is an important point. Ask yourself this, why do we have tunnel devices and don't implement them with cool routing or XFRM rules? We don't do this because as soon as you type "zebra" all your by-hand routes are gone, and as soon as you type "racoon" al your by-hand xfrm rules are gone. If you want to do these things using routes or xfrm rules, you must integrate the creation of them into either zebra or racoon. You cannot have a setup where mipv6d and racoon/zebra fight each other flushing each other's settings. It doesn't work. From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 09:59:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:59:15 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGxC2x017482 for ; Tue, 10 Jun 2003 09:59:13 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5AGwtsG008215; Tue, 10 Jun 2003 12:58:55 -0400 (EDT) Message-Id: <200306101658.h5AGwtsG008215@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-reply-to: Your message of "Tue, 10 Jun 2003 09:27:48 PDT." <20030610.092748.115929981.davem@redhat.com> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 12:57:02 -0400 From: chas williams X-Spam-Score: (*) hits=1.7 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3082 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030610.092748.115929981.davem@redhat.com>,"David S. Miller" write s: >Ok, time to deploy ATM everywhere to replace our IP routers :) >Sorry Chas, I couldn't resist... :) i see a lot of crying about the 'atm tax' but it seems to me that the 'ip tax' is typically much steeper (except when you graph packet_count*packet_size then you will see that the bulk of the data is carried by larger packets were the tax isnt as high). so for some applications, like voice, atm might actually be a winner as far as the tax goes (as long as you arent doing voice over ip over atm) hosestly i needed real numbers to tune the atm driver on our linux-router. i have two recv buffer pools--small and large (duh). i needed an idea of what to use for the small value. From davem@redhat.com Tue Jun 10 10:00:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:01:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AH0v2x017881 for ; Tue, 10 Jun 2003 10:00:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23285; Tue, 10 Jun 2003 09:56:07 -0700 Date: Tue, 10 Jun 2003 09:56:06 -0700 (PDT) Message-Id: <20030610.095606.23033683.davem@redhat.com> To: nakam@linux-ipv6.org Cc: lpetande@tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <20030611004035.40027642.nakam@linux-ipv6.org> References: <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> <20030611004035.40027642.nakam@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3083 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Masahide NAKAMURA Date: Wed, 11 Jun 2003 00:40:44 +0900 How about providing interface of handling templates to update existing policy in kernel? Who will manage mipv6 policies? racoon? See my other email on why any other setup simply will not work. From davem@redhat.com Tue Jun 10 10:05:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:05:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AH5n2x022527 for ; Tue, 10 Jun 2003 10:05:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23322; Tue, 10 Jun 2003 10:02:09 -0700 Date: Tue, 10 Jun 2003 10:02:09 -0700 (PDT) Message-Id: <20030610.100209.70199702.davem@redhat.com> To: jgarzik@pobox.com Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <20030610162342.GB1959@gtf.org> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3084 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 10 Jun 2003 12:23:42 -0400 I prefer a compile-time test. This means end users don't see the benefit, so I definitely prefer Andi's idea. From davem@redhat.com Tue Jun 10 10:15:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:15:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHFe2x023005 for ; Tue, 10 Jun 2003 10:15:41 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23355; Tue, 10 Jun 2003 10:12:19 -0700 Date: Tue, 10 Jun 2003 10:12:19 -0700 (PDT) Message-Id: <20030610.101219.38691038.davem@redhat.com> To: bogdan.costescu@iwr.uni-heidelberg.de Cc: sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: References: <20030610.085600.71109220.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3085 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Bogdan Costescu Date: Tue, 10 Jun 2003 18:45:03 +0200 (CEST) With this driver I would typically get TCP bandwidth figures 4-5 Mbps lower than those obtained with 3c59x and noticable difference in the parallel jobs timing (using MPI over TCP). I'm not saying that NAPI will perform the same way, just that there might be also hardware limits somewhere... I think it won't, hardware interrupt mitigation schemes have lots of problems that NAPI is more ept to deal with. But the real question is: does it make sense to spend time now in trying to improve a driver with hope for only a marginal speed increase ? People who have the cards care, and I think PIO-->MMIO is more than marginal. You're attempt to get "latency" was ill founded :) Your limits have to do with the wire speed, not all the cpu cycles being eaten by PIO acceses. On a DoS'd router, it's another situation altogether. And another important question: how much improvement can be gained from the driver ? Folks that do parallel computation over TCP over Ethernet know very well that the software in the kernel is the bottleneck (extra copies, TCP, IRQ management, etc). Your lmitations in parallel computation have to do with how TCP behaves more than how TCP is implemented. For starters try: echo "1" >/proc/sys/net/ipv4/tcp_low_latency That's the kind of thing that will help parallel computation folks, not driver hacks. From garzik@gtf.org Tue Jun 10 10:16:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:16:21 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHGH2x023193 for ; Tue, 10 Jun 2003 10:16:18 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 38D3F6641; Tue, 10 Jun 2003 13:16:17 -0400 (EDT) Date: Tue, 10 Jun 2003 13:16:17 -0400 From: Jeff Garzik To: "David S. Miller" Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610171617.GC1959@gtf.org> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.100209.70199702.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3086 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Tue, 10 Jun 2003 12:23:42 -0400 > > I prefer a compile-time test. > > This means end users don't see the benefit, so I definitely > prefer Andi's idea. Making every IO a conditional branch? Ug. Jeff From ak@suse.de Tue Jun 10 10:18:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:18:26 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHIM2x023942 for ; Tue, 10 Jun 2003 10:18:22 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id ED74814456; Tue, 10 Jun 2003 19:18:16 +0200 (MEST) Date: Tue, 10 Jun 2003 19:18:16 +0200 From: Andi Kleen To: Jeff Garzik Cc: "David S. Miller" , ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610171816.GA24640@wotan.suse.de> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> <20030610171617.GC1959@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610171617.GC1959@gtf.org> X-archive-position: 3088 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 01:16:17PM -0400, Jeff Garzik wrote: > On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > > From: Jeff Garzik > > Date: Tue, 10 Jun 2003 12:23:42 -0400 > > > > I prefer a compile-time test. > > > > This means end users don't see the benefit, so I definitely > > prefer Andi's idea. > > Making every IO a conditional branch? Ug. An IO takes hundreds or even thousands of cycles. The test and branch is completely lost in the noise. I bet you won't be able to measure a difference on any modern CPU. -Andi From davem@redhat.com Tue Jun 10 10:18:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:18:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHIH2x023946 for ; Tue, 10 Jun 2003 10:18:17 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23389; Tue, 10 Jun 2003 10:14:52 -0700 Date: Tue, 10 Jun 2003 10:14:52 -0700 (PDT) Message-Id: <20030610.101452.35033262.davem@redhat.com> To: jgarzik@pobox.com Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <20030610171617.GC1959@gtf.org> References: <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> <20030610171617.GC1959@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3087 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 10 Jun 2003 13:16:17 -0400 On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > This means end users don't see the benefit, so I definitely > prefer Andi's idea. Making every IO a conditional branch? Ug. A PIO costs hundreds if not thousands of instructions! Come on Jeff, get real :-) From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 10:25:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:25:33 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHPR2x024927 for ; Tue, 10 Jun 2003 10:25:27 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5AHPNsG008625 for ; Tue, 10 Jun 2003 13:25:23 -0400 (EDT) Received: (from chas@localhost) by locutus.cmf.nrl.navy.mil (8.12.7/8.12.7/Submit) id h5AHNT9x017552 for netdev@oss.sgi.com; Tue, 10 Jun 2003 13:23:30 -0400 Date: Tue, 10 Jun 2003 13:23:30 -0400 From: chas williams Message-Id: <200306101723.h5AHNT9x017552@locutus.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [RFC] assorted changes to atm protocol stack X-Spam-Score: (**) hits=2.4 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3089 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev i would appreciate it some people could review the following proposed changes for the atm protocol families: ftp://ftp.cmf.nrl.navy.mil/pub/chas/linux-atm/2_5_70_vcc_sklist_diffs - vcc's are keep in a global list (vcc_sklist) much like the other protocol stacks. - rename's various functions to vcc_XXX instead of calling everything atm_XXX. hopefully this will make things clearer at some point. - the he driver tries to do locking proper locking during the recv operation by holding a read_lock on the vcc sklist. other drivers will need to do this as well. (or otherwise hold reference counts) - got rid of SOCKOPS_WRAPPED and would like someone to tell me if i have enough lock_sock()'s in place. - rewrote recvmsg to use the existing skb network functions - rewriting sendmsg will need to wait since i believe skb from the network stack will need to be cloned so that they can be properly skb_set_ownwer_w() - split atm_ioctl into vcc_ioctl and atm_dev_ioctl. - changed to prepare_to_wait() and finish_wait() in svc.c. i assume that this is more correct, than directly using add_wait_queue and remove_wait_queue? other comments are welcome of course. From garzik@gtf.org Tue Jun 10 10:25:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:26:02 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHPv2x025114 for ; Tue, 10 Jun 2003 10:25:57 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id B6C826641; Tue, 10 Jun 2003 13:25:56 -0400 (EDT) Date: Tue, 10 Jun 2003 13:25:56 -0400 From: Jeff Garzik To: "David S. Miller" Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610172556.GD1959@gtf.org> References: <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> <20030610171617.GC1959@gtf.org> <20030610.101452.35033262.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.101452.35033262.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3090 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 10:14:52AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Tue, 10 Jun 2003 13:16:17 -0400 > > On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > > This means end users don't see the benefit, so I definitely > > prefer Andi's idea. > > Making every IO a conditional branch? Ug. > > A PIO costs hundreds if not thousands of instructions! > Come on Jeff, get real :-) MMIO isn't as bad as PIO, and the 'runtime choice' setup implies slowing down the MMIO path, too. Jeff From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 10:32:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:32:07 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHW22x025754 for ; Tue, 10 Jun 2003 10:32:03 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5AHVlsG008694; Tue, 10 Jun 2003 13:31:47 -0400 (EDT) Message-Id: <200306101731.h5AHVlsG008694@ginger.cmf.nrl.navy.mil> To: Jeff Garzik cc: "David S. Miller" , ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x In-reply-to: Your message of "Tue, 10 Jun 2003 13:16:17 EDT." <20030610171617.GC1959@gtf.org> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 13:29:54 -0400 From: chas williams X-Spam-Score: () hits=-0.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3091 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030610171617.GC1959@gtf.org>,Jeff Garzik writes: >Making every IO a conditional branch? Ug. you could just test once during driver init and setup an indirection to the appropriate function. its a little better than test and branch. From davem@redhat.com Tue Jun 10 10:33:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:34:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHXu2x026104 for ; Tue, 10 Jun 2003 10:33:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23492; Tue, 10 Jun 2003 10:30:31 -0700 Date: Tue, 10 Jun 2003 10:30:30 -0700 (PDT) Message-Id: <20030610.103030.15245573.davem@redhat.com> To: jgarzik@pobox.com Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <20030610172556.GD1959@gtf.org> References: <20030610171617.GC1959@gtf.org> <20030610.101452.35033262.davem@redhat.com> <20030610172556.GD1959@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3092 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 10 Jun 2003 13:25:56 -0400 MMIO isn't as bad as PIO, and the 'runtime choice' setup implies slowing down the MMIO path, too. No end user will see the change then, no distribution vendor worth their salt will ship with MMIO enabled. Right now we get only PIO and everybody suffers. Runtime MMIO selection is a net win for everyone. From davem@redhat.com Tue Jun 10 10:34:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:34:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHYl2x026341 for ; Tue, 10 Jun 2003 10:34:48 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23511; Tue, 10 Jun 2003 10:31:12 -0700 Date: Tue, 10 Jun 2003 10:31:11 -0700 (PDT) Message-Id: <20030610.103111.26297722.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: jgarzik@pobox.com, ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <200306101731.h5AHVlsG008694@ginger.cmf.nrl.navy.mil> References: <20030610171617.GC1959@gtf.org> <200306101731.h5AHVlsG008694@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3093 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 10 Jun 2003 13:29:54 -0400 In message <20030610171617.GC1959@gtf.org>,Jeff Garzik writes: >Making every IO a conditional branch? Ug. you could just test once during driver init and setup an indirection to the appropriate function. its a little better than test and branch. Function calls are actually more expensive, you eat an entry in the cpu's return address cache. From davem@redhat.com Tue Jun 10 10:35:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:35:58 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHZs2x026717 for ; Tue, 10 Jun 2003 10:35:54 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23537; Tue, 10 Jun 2003 10:32:35 -0700 Date: Tue, 10 Jun 2003 10:32:34 -0700 (PDT) Message-Id: <20030610.103234.116374169.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610.084940.74727904.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3094 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 13:33:01 -0400 (EDT) On Tue, 10 Jun 2003, David S. Miller wrote: > When packet (more specifically, software interrupt) processing > reaches a certain level, we offload the work into process context. That sounds good. Adjust the nice value of the ksoftirqd tasks, that's the only thing available. But your problem has to do with all the PIO accesses, that absolutely kills the machine. From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 10:41:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:41:28 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHfP2x027407 for ; Tue, 10 Jun 2003 10:41:25 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5AHf8sG008844; Tue, 10 Jun 2003 13:41:08 -0400 (EDT) Message-Id: <200306101741.h5AHf8sG008844@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: jgarzik@pobox.com, ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x In-reply-to: Your message of "Tue, 10 Jun 2003 10:31:11 PDT." <20030610.103111.26297722.davem@redhat.com> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 13:39:15 -0400 From: chas williams X-Spam-Score: (*) hits=1.7 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3095 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev >Function calls are actually more expensive, you eat an entry in >the cpu's return address cache. i was thinking you could do it higher up, like around the hard_start_xmit level. this would create a bit of replicated code, but you could abuse the preprocessor a bit i imagine. From davem@redhat.com Tue Jun 10 10:46:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:46:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHkZ2x027865 for ; Tue, 10 Jun 2003 10:46:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23611; Tue, 10 Jun 2003 10:43:06 -0700 Date: Tue, 10 Jun 2003 10:43:05 -0700 (PDT) Message-Id: <20030610.104305.10318104.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: jgarzik@pobox.com, ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <200306101741.h5AHf8sG008844@ginger.cmf.nrl.navy.mil> References: <20030610.103111.26297722.davem@redhat.com> <200306101741.h5AHf8sG008844@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3096 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 10 Jun 2003 13:39:15 -0400 i was thinking you could do it higher up, like around the hard_start_xmit level. this would create a bit of replicated code, but you could abuse the preprocessor a bit i imagine. 3c59x already does this, so now we'd have 4 different copies. From ralph@istop.com Tue Jun 10 10:52:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:52:25 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHqB2x028331 for ; Tue, 10 Jun 2003 10:52:11 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 8193D36D08; Tue, 10 Jun 2003 13:19:35 -0400 (EDT) Date: Tue, 10 Jun 2003 13:19:41 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "sim@netnation.com" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.085600.71109220.davem@redhat.com> Message-ID: References: <20030609195652.E35696@shell.cyberus.ca> <20030610015311.GB23009@netnation.com> <20030610.085600.71109220.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3097 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > From: Simon Kirby > Date: Mon, 9 Jun 2003 18:53:12 -0700 > > Your CPU use is quite a bit higher than ours. > > Yeah, but his faster cpu is all being burnt to a crisp > doing PIO accesses to the 3c59x card. > > I found that once NAPI was happening, userspace seemed to get a > fairly decent amount of time. > > Unfortunately, NAPI won't help him with the current way the 3c59x > driver works. It needs to provide a way to use MEM I/O before NAPI > would start to be of use to him. Well, I've already decided to retire the 3c905cx cards and drop in a couple of the Pro/1000 cards I recently bought. Considering the Intel GigE cards cost me ~$50 now and the 3Coms are ~$45, I'd say anyone willing to update 3c59x.c has misplaced priorities or too much time on their hands... -Ralph From ralph@istop.com Tue Jun 10 11:10:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 11:10:41 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AIAb2x029674 for ; Tue, 10 Jun 2003 11:10:38 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 0608836E8A; Tue, 10 Jun 2003 14:10:26 -0400 (EDT) Date: Tue, 10 Jun 2003 14:10:32 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Ben Greear Cc: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress In-Reply-To: <3EE54F4D.50909@candelatech.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <3EE54F4D.50909@candelatech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3098 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ben Greear wrote: > One waring about e1000's, make sure you have active airflow across the NICs > if you put two together. Otherwise, buy a dual port NIC...it has a single > chip and you will have less cooling issues. I just took a closer look at my e1000's. They've got a small RC82540EM bga chip on them, manufactured 25th week of '02. If these things do get hot enough to cause problems why wouldn't Intel have them manufactured with heatsinks attached? -Ralph From ralph@istop.com Tue Jun 10 11:13:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 11:13:13 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AICx2x030032 for ; Tue, 10 Jun 2003 11:13:00 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id D291836E28; Tue, 10 Jun 2003 13:32:55 -0400 (EDT) Date: Tue, 10 Jun 2003 13:33:01 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "ralph+d@istop.com" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.084940.74727904.davem@redhat.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030610.084940.74727904.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3099 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > From: Ralph Doncaster > Date: Mon, 9 Jun 2003 20:32:48 -0400 (EDT) > > Lastly from the software side Linux doesn't seem to have anything like > BSD's parameter to control user/system CPU sharing. Once my CPU load > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > 100% and end up with my BGP sessions drop. > > When packet (more specifically, software interrupt) processing > reaches a certain level, we offload the work into process context. That sounds good. Is there a sysctl I can use to define "certain level"? -Ralph From greearb@candelatech.com Tue Jun 10 11:22:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 11:22:08 -0700 (PDT) Received: from grok.yi.org (IDENT:r0ot2V6Or7Oko18QYLLRnw35yHAJTGMr@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AIM22x030639 for ; Tue, 10 Jun 2003 11:22:03 -0700 Received: from candelatech.com (IDENT:NEOotbv9W5p4DDn05xPnBJQSKC48sOBi@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5AILoc04465; Tue, 10 Jun 2003 11:21:50 -0700 Message-ID: <3EE621BE.6070900@candelatech.com> Date: Tue, 10 Jun 2003 11:21:50 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: ralph+d@istop.com CC: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <3EE54F4D.50909@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3100 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Ralph Doncaster wrote: > On Mon, 9 Jun 2003, Ben Greear wrote: > > >>One waring about e1000's, make sure you have active airflow across the NICs >>if you put two together. Otherwise, buy a dual port NIC...it has a single >>chip and you will have less cooling issues. > > > I just took a closer look at my e1000's. They've got a small RC82540EM > bga chip on them, manufactured 25th week of '02. If these things do get > hot enough to cause problems why wouldn't Intel have them manufactured > with heatsinks attached? Dunno... I wish they had. My machine had fairly bad cooling (2U, open case). However, when I put a fan on them, no reboots, whereas before I could crash the machine with nasty memory corruption after about 1 hour of sustained > 100Mbps bi-directional traffic. The temp probe I used showed them to be at about their operating max, though I forget what that was now... Maybe your chipset or cooling is better and you won't hit it..but if you do see crashes, try a fan :) Ben > > -Ralph > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From Robert.Olsson@data.slu.se Tue Jun 10 11:35:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 11:35:38 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AIZT2x031335 for ; Tue, 10 Jun 2003 11:35:30 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id UAA01245; Tue, 10 Jun 2003 20:34:50 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16102.9418.43884.336925@robur.slu.se> Date: Tue, 10 Jun 2003 20:34:50 +0200 To: "David S. Miller" Cc: ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030610.103234.116374169.davem@redhat.com> References: <20030610.084940.74727904.davem@redhat.com> <20030610.103234.116374169.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3101 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Dave! I ripped out the route hash just to test the slow path. Seems like your patch was very good as we see the same performance w/o dst hash ~114 kpps. So my test system drops from 450 kpps to 114 kpps when every incoming interface carries 100% traffic which has 1 dst/pkt which is very unlikely senario I would say. It is not that bad.... Conclusions: * Your patch is good. (I played with some variants) * We need to focus on slow path if we feel improving the 1 dst/pkt scenario. Input rate 2*190 kpps clone_skb=1 Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 3001546 9684614 9684614 6998518 53 0 0 0 BRU eth1 1500 0 13 0 0 0 3001497 0 0 0 BRU eth2 1500 0 3001114 9678333 9678333 6998889 3 0 0 0 BRU eth3 1500 0 2 0 0 0 3001115 0 0 0 BRU rt_cache_stat 00009146 00000000 005b97ed 00000000 00000000 00000000 00000000 00000000 00000004 00000006 00000000 005b107e 005b1071 00000006 00000000 00000000 00000001 --- net/ipv4/route.c.030610.2 2003-06-10 18:55:32.000000000 +0200 +++ net/ipv4/route.c 2003-06-10 19:09:23.000000000 +0200 @@ -722,44 +722,10 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { - struct rtable *rth, **rthp; - unsigned long now = jiffies; int attempts = !in_softirq(); restart: - rthp = &rt_hash_table[hash].chain; - spin_lock_bh(&rt_hash_table[hash].lock); - while ((rth = *rthp) != NULL) { - if (compare_keys(&rth->fl, &rt->fl)) { - /* Put it first */ - *rthp = rth->u.rt_next; - /* - * Since lookup is lockfree, the deletion - * must be visible to another weakly ordered CPU before - * the insertion at the start of the hash chain. - */ - smp_wmb(); - rth->u.rt_next = rt_hash_table[hash].chain; - /* - * Since lookup is lockfree, the update writes - * must be ordered for consistency on SMP. - */ - smp_wmb(); - rt_hash_table[hash].chain = rth; - - rth->u.dst.__use++; - dst_hold(&rth->u.dst); - rth->u.dst.lastuse = now; - spin_unlock_bh(&rt_hash_table[hash].lock); - - rt_drop(rt); - *rp = rth; - return 0; - } - - rthp = &rth->u.rt_next; - } /* Try to bind route to arp only if it is output route or unicast forwarding path. @@ -916,10 +882,7 @@ static inline struct rtable *ip_rt_dst_alloc(unsigned int hash) { - if (atomic_read(&ipv4_dst_ops.entries) > - ipv4_dst_ops.gc_thresh) - __rt_hash_shrink(hash); - + __rt_hash_shrink(hash); return dst_alloc(&ipv4_dst_ops); } @@ -1801,37 +1764,6 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr, u8 tos, struct net_device *dev) { - struct rtable * rth; - unsigned hash; - int iif = dev->ifindex; - - tos &= IPTOS_RT_MASK; - hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos); - - prefetch(&rt_hash_table[hash].chain->fl); - - rcu_read_lock(); - for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { - smp_read_barrier_depends(); - if (rth->fl.fl4_dst == daddr && - rth->fl.fl4_src == saddr && - rth->fl.iif == iif && - rth->fl.oif == 0 && -#ifdef CONFIG_IP_ROUTE_FWMARK - rth->fl.fl4_fwmark == skb->nfmark && -#endif - rth->fl.fl4_tos == tos) { - rth->u.dst.lastuse = jiffies; - dst_hold(&rth->u.dst); - rth->u.dst.__use++; - RT_CACHE_STAT_INC(in_hit); - rcu_read_unlock(); - skb->dst = (struct dst_entry*)rth; - return 0; - } - RT_CACHE_STAT_INC(in_hlist_search); - } - rcu_read_unlock(); /* Multicast recognition logic is moved from route cache to here. Cheers. --ro From fw@deneb.enyo.de Tue Jun 10 11:42:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 11:42:21 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AIgE2x031816 for ; Tue, 10 Jun 2003 11:42:16 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19Po3I-00037m-00; Tue, 10 Jun 2003 20:41:00 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19Po3I-00021N-Co; Tue, 10 Jun 2003 20:41:00 +0200 To: Jamal Hadi Cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> From: Florian Weimer Mail-Followup-To: Jamal Hadi , ralph+d@istop.com, CIT/Paul , 'Simon Kirby' , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Date: Tue, 10 Jun 2003 20:41:00 +0200 In-Reply-To: <20030610061010.Y36963@shell.cyberus.ca> (Jamal Hadi's message of "Tue, 10 Jun 2003 06:53:04 -0400 (EDT)") Message-ID: <87el21wzb7.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3102 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Jamal Hadi writes: > Typically, real world is less intense than the lab. Ex: noone sends > 100Mbps at 64 byte packet size. Unfortunately, compromised hosts do send such traffic, and DoS victims receive it. 8-( You don't want your core routers to break down just because a couple of the 150,000 hosts in your regional network have been compromised (think of Slammer) or you are running an IRC server. > Have you seen how the big boys advertise? Typical GSR linecards for OC-48 are specified to handle 2 Mpps, but the switch fabric is reportedly somewhat inert and the router might choke before that if there are too many linecards involved (I haven't observed this personally, this just chatter from someone who works daily with those beasts). A couple of hundred kpps aren't a problem for those routers, though, as are 300 Mbit (or was it 400?) of Slammer traffic (with random destination addresses). In general, the forwarding performance is nowadays specified in pps and even flows per second if you look carefully at the data sheets. Most vendors have learnt that people want routers with comforting worst-case behavior. However, you have to read carefully, e.g. a Catalyst 6500 with Supervisor Engine 1 (instead of 2) can only create 650,000 flows per second, even if it has a much, much higher peak IP forwarding rate. (The times of routers which died when confronted with a rapid ICMP sweep across a /16 are gone for good, I hope.) From davem@redhat.com Tue Jun 10 12:01:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 12:01:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AJ1Y2x000300 for ; Tue, 10 Jun 2003 12:01:35 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id LAA23912; Tue, 10 Jun 2003 11:58:00 -0700 Date: Tue, 10 Jun 2003 11:57:59 -0700 (PDT) Message-Id: <20030610.115759.26513736.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16102.9418.43884.336925@robur.slu.se> References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3103 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Tue, 10 Jun 2003 20:34:50 +0200 I ripped out the route hash just to test the slow path. Seems like your patch was very good as we see the same performance w/o dst hash ~114 kpps. How did you "rip it out"? Just never look into the routing cache hash and never add entries there? If so, then yes it is excellent simulation for pure slow path. This is not purely an algorithmic problem. The highest cost thing we do in the slow path of input route processing is source validation. This requires real brains to eliminate. Actually, that's a good idea, if someone if brave just rip out fib_validate_source (just don't call it, should work for valid traffic) and see what happens :) From garzik@gtf.org Tue Jun 10 12:20:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 12:20:57 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AJKr2x001093 for ; Tue, 10 Jun 2003 12:20:54 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id C63466641; Tue, 10 Jun 2003 15:20:52 -0400 (EDT) Date: Tue, 10 Jun 2003 15:20:52 -0400 From: Jeff Garzik To: "David S. Miller" Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610192052.GA31962@gtf.org> References: <20030610171617.GC1959@gtf.org> <20030610.101452.35033262.davem@redhat.com> <20030610172556.GD1959@gtf.org> <20030610.103030.15245573.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.103030.15245573.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3104 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 10:30:30AM -0700, David S. Miller wrote: > No end user will see the change then, no distribution vendor worth > their salt will ship with MMIO enabled. > > Right now we get only PIO and everybody suffers. Runtime MMIO > selection is a net win for everyone. For 3c59x and select other drivers, I agree 100% If we are making a general rule, I do not agree... Jeff From davem@redhat.com Tue Jun 10 12:25:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 12:25:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AJPH2x001469 for ; Tue, 10 Jun 2003 12:25:17 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA24035; Tue, 10 Jun 2003 12:21:27 -0700 Date: Tue, 10 Jun 2003 12:21:26 -0700 (PDT) Message-Id: <20030610.122126.52183571.davem@redhat.com> To: jgarzik@pobox.com Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <20030610192052.GA31962@gtf.org> References: <20030610172556.GD1959@gtf.org> <20030610.103030.15245573.davem@redhat.com> <20030610192052.GA31962@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3105 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 10 Jun 2003 15:20:52 -0400 On Tue, Jun 10, 2003 at 10:30:30AM -0700, David S. Miller wrote: > Right now we get only PIO and everybody suffers. Runtime MMIO > selection is a net win for everyone. For 3c59x and select other drivers, I agree 100% If we are making a general rule, I do not agree... I think we agree then. From Robert.Olsson@data.slu.se Tue Jun 10 12:54:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 12:54:21 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AJsC2x002567 for ; Tue, 10 Jun 2003 12:54:13 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id VAA02655; Tue, 10 Jun 2003 21:53:33 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16102.14141.524918.556121@robur.slu.se> Date: Tue, 10 Jun 2003 21:53:33 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030610.115759.26513736.davem@redhat.com> References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3106 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > How did you "rip it out"? Just never look into the routing > cache hash and never add entries there? If so, then yes it is > excellent simulation for pure slow path. Look at the patch... hash lookup is bypassed -> always slow path. no lookup in rt_intern_hash but we keep the entry in the hash and ip_rt_dst_alloc is changed to run do __rt_hash_shrink for each call. > This is not purely an algorithmic problem. The highest cost thing we > do in the slow path of input route processing is source validation. > This requires real brains to eliminate. > > Actually, that's a good idea, if someone if brave just rip out > fib_validate_source (just don't call it, should work for valid > traffic) and see what happens :) It should be easy to do... Cheers. --ro From shemminger@osdl.org Tue Jun 10 13:35:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 13:35:28 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AKZK2x003983 for ; Tue, 10 Jun 2003 13:35:21 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5AKZ8X06742; Tue, 10 Jun 2003 13:35:08 -0700 Date: Tue, 10 Jun 2003 13:35:08 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] net-sysfs parent ref count Message-Id: <20030610133508.3d0bfffc.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3107 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev When creating network sysfs entries, we grab an extra reference to the parent. Not a big deal now, since it just gets blown away on unregister anyway, but when kobject reference counts are used for release, things break. diff -Nru a/net/core/net-sysfs.c b/net/core/net-sysfs.c --- a/net/core/net-sysfs.c Tue Jun 10 11:11:42 2003 +++ b/net/core/net-sysfs.c Tue Jun 10 11:11:42 2003 @@ -299,17 +299,11 @@ goto out_unreg; } - net->stats_kobj.parent = NULL; if (net->get_stats) { struct kobject *k = &net->stats_kobj; - k->parent = kobject_get(&class_dev->kobj); - if (!k->parent) { - ret = -EBUSY; - goto out_unreg; - } - + k->parent = &class_dev->kobj; strlcpy(k->name, "statistics", KOBJ_NAME_LEN); k->ktype = &netstat_ktype; From xerox@foonet.net Tue Jun 10 13:37:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 13:38:06 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AKbv2x004320 for ; Tue, 10 Jun 2003 13:37:58 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h5AKbgeq025158; Tue, 10 Jun 2003 16:37:42 -0400 From: "CIT/Paul" To: "'Jamal Hadi'" , "'Simon Kirby'" Cc: , "'David S. Miller'" , , , Subject: RE: Route cache performance tests Date: Tue, 10 Jun 2003 16:36:43 -0400 Organization: CIT Message-ID: <00e001c32f90$016072c0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030610071638.R37090@shell.cyberus.ca> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 3108 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I'd be happy to set up a repository ftp site or maybe even some cvs servers so all of us can test All these things and share data. We are an ISP so it wouldn't be too hard to just pop up another server To store all this :> Let me know Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Jamal Hadi [mailto:hadi@shell.cyberus.ca] Sent: Tuesday, June 10, 2003 7:23 AM To: Simon Kirby Cc: ralph+d@istop.com; CIT/Paul; 'David S. Miller'; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance tests On Tue, 10 Jun 2003, Simon Kirby wrote: [some good stuff deleted] Simon, I havent looked at your data in details; i will. Someone like Robert would be able to snuff it much faster than i do. I just wanna say thanks for the effort, I will spend time catching up with you folks. It is clear that our next hurudle is gc. Do you have profiles for your data? Profiles would be nice to collect as well. > In any case, setting gc_min_interval to 0 definitely helped, but I > suspect Dave's patches will make a bigger difference. Next up is > 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. > Also since you are doing all that work post the kernels somewhere so people like foo can grab them and test as well. cheers, jamal From shemminger@osdl.org Tue Jun 10 13:39:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 13:39:12 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AKd52x004732 for ; Tue, 10 Jun 2003 13:39:06 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5AKcsX07904; Tue, 10 Jun 2003 13:38:54 -0700 Date: Tue, 10 Jun 2003 13:38:54 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] Cleanup net-sysfs show and change functions Message-Id: <20030610133854.42713231.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3109 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This cleans up the network sysfs code to use helper functions to unify the show/change functions, by using common code in functions rather than template macros. The function always checks for dead devices, so I/O will fail. diff -Nru a/net/core/net-sysfs.c b/net/core/net-sysfs.c --- a/net/core/net-sysfs.c Tue Jun 10 13:32:42 2003 +++ b/net/core/net-sysfs.c Tue Jun 10 13:32:42 2003 @@ -3,10 +3,6 @@ * * Copyright (c) 2003 Stephen Hemminber * - * - * TODO: - * last_tx - * last_rx */ #include @@ -16,33 +12,61 @@ #include #include -#define to_net_dev(class) container_of((class), struct net_device, class_dev) +#define to_class_dev(obj) container_of(obj,struct class_device,kobj) +#define to_net_dev(class) container_of(class, struct net_device, class_dev) -/* generate a show function for simple field */ +/* use same locking rules as GIF* ioctl's */ +static ssize_t netdev_show(const struct class_device *cd, char *buf, + ssize_t (*format)(const struct net_device *, char *)) +{ + struct net_device *net = to_net_dev(cd); + ssize_t ret = -EINVAL; + + read_lock(&dev_base_lock); + if (!net->deadbeaf) + ret = (*format)(net, buf); + read_unlock(&dev_base_lock); + + return ret; +} + +/* generate a show function for simple field */ #define NETDEVICE_SHOW(field, format_string) \ -static ssize_t show_##field(struct class_device *dev, char *buf) \ +static ssize_t format_##field(const struct net_device *net, char *buf) \ { \ - return sprintf(buf, format_string, to_net_dev(dev)->field); \ + return sprintf(buf, format_string, net->field); \ +} \ +static ssize_t show_##field(struct class_device *cd, char *buf) \ +{ \ + return netdev_show(cd, buf, format_##field); \ } -/* generate a store function for a field with locking */ -#define NETDEVICE_STORE(field) \ -static ssize_t \ -store_##field(struct class_device *dev, const char *buf, size_t len) \ -{ \ - char *endp; \ - long new = simple_strtol(buf, &endp, 16); \ - \ - if (endp == buf || new < 0) \ - return -EINVAL; \ - \ - if (!capable(CAP_NET_ADMIN)) \ - return -EPERM; \ - \ - rtnl_lock(); \ - to_net_dev(dev)->field = new; \ - rtnl_unlock(); \ - return len; \ + +/* use same locking and permission rules as SIF* ioctl's */ +static ssize_t netdev_store(struct class_device *dev, + const char *buf, size_t len, + int (*set)(struct net_device *, unsigned long)) +{ + struct net_device *net = to_net_dev(dev); + char *endp; + unsigned long new; + int ret = -EINVAL; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + new = simple_strtoul(buf, &endp, 0); + if (endp == buf) + goto err; + + rtnl_lock(); + if (!net->deadbeaf) { + if ((ret = (*set)(net, new)) == 0) + ret = len; + } + rtnl_unlock(); + err: + return ret; } /* generate a read-only network device class attribute */ @@ -56,6 +80,7 @@ NETDEVICE_ATTR(features, "%#x\n"); NETDEVICE_ATTR(type, "%d\n"); +/* use same locking rules as GIFHWADDR ioctl's */ static ssize_t format_addr(char *buf, const unsigned char *addr, int len) { int i; @@ -72,12 +97,16 @@ static ssize_t show_address(struct class_device *dev, char *buf) { struct net_device *net = to_net_dev(dev); + if (net->deadbeaf) + return -EINVAL; return format_addr(buf, net->dev_addr, net->addr_len); } static ssize_t show_broadcast(struct class_device *dev, char *buf) { struct net_device *net = to_net_dev(dev); + if (net->deadbeaf) + return -EINVAL; return format_addr(buf, net->broadcast, net->addr_len); } @@ -87,54 +116,45 @@ /* read-write attributes */ NETDEVICE_SHOW(mtu, "%d\n"); -static ssize_t store_mtu(struct class_device *dev, const char *buf, size_t len) +static int change_mtu(struct net_device *net, unsigned long new_mtu) { - char *endp; - int new_mtu; - int err; - - new_mtu = simple_strtoul(buf, &endp, 10); - if (endp == buf) - return -EINVAL; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - rtnl_lock(); - err = dev_set_mtu(to_net_dev(dev), new_mtu); - rtnl_unlock(); + return dev_set_mtu(net, (int) new_mtu); +} - return err == 0 ? len : err; +static ssize_t store_mtu(struct class_device *dev, const char *buf, size_t len) +{ + return netdev_store(dev, buf, len, change_mtu); } static CLASS_DEVICE_ATTR(mtu, S_IRUGO | S_IWUSR, show_mtu, store_mtu); NETDEVICE_SHOW(flags, "%#x\n"); +static int change_flags(struct net_device *net, unsigned long new_flags) +{ + return dev_change_flags(net, (unsigned) new_flags); +} + static ssize_t store_flags(struct class_device *dev, const char *buf, size_t len) { - unsigned long new_flags; - char *endp; - int err = 0; + return netdev_store(dev, buf, len, change_flags); +} - new_flags = simple_strtoul(buf, &endp, 16); - if (endp == buf) - return -EINVAL; +static CLASS_DEVICE_ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags); - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - rtnl_lock(); - err = dev_change_flags(to_net_dev(dev), new_flags); - rtnl_unlock(); +NETDEVICE_SHOW(tx_queue_len, "%lu\n"); - return err ? err : len; +static int change_tx_queue_len(struct net_device *net, unsigned long new_len) +{ + net->tx_queue_len = new_len; + return 0; } -static CLASS_DEVICE_ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags); +static ssize_t store_tx_queue_len(struct class_device *dev, const char *buf, size_t len) +{ + return netdev_store(dev, buf,len, change_tx_queue_len); +} -NETDEVICE_SHOW(tx_queue_len, "%lu\n"); -NETDEVICE_STORE(tx_queue_len); static CLASS_DEVICE_ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, store_tx_queue_len); @@ -237,16 +257,17 @@ { struct netstat_fs_entry *entry = container_of(attr, struct netstat_fs_entry, attr); - struct class_device *class_dev - = container_of(kobj->parent, struct class_device, kobj); struct net_device *dev - = to_net_dev(class_dev); - struct net_device_stats *stats - = dev->get_stats ? dev->get_stats(dev) : NULL; - - if (stats && entry->show) - return entry->show(stats, buf); - return -EINVAL; + = to_net_dev(to_class_dev(kobj->parent)); + struct net_device_stats *stats; + ssize_t ret = -EINVAL; + + read_lock(&dev_base_lock); + if (!dev->deadbeaf && entry->show && dev->get_stats && + (stats = (*dev->get_stats)(dev))) + ret = entry->show(stats, buf); + read_unlock(&dev_base_lock); + return ret; } static struct sysfs_ops netstat_sysfs_ops = { From xerox@foonet.net Tue Jun 10 14:39:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 14:39:25 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ALdH2x006829 for ; Tue, 10 Jun 2003 14:39:18 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h5ALbDeq020744; Tue, 10 Jun 2003 17:37:13 -0400 From: "CIT/Paul" To: "'David S. Miller'" , Cc: , , , , , , Subject: RE: Route cache performance under stress Date: Tue, 10 Jun 2003 17:36:14 -0400 Organization: CIT Message-ID: <012301c32f98$52043dd0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030610.115759.26513736.davem@redhat.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 3110 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev Why do you need source validation if we are going to use it for a core router :) Is there anything else in there that may or may not be necessary depending on the circumstances that we are using the router for? Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: David S. Miller [mailto:davem@redhat.com] Sent: Tuesday, June 10, 2003 2:58 PM To: Robert.Olsson@data.slu.se Cc: ralph+d@istop.com; ralph@istop.com; hadi@shell.cyberus.ca; xerox@foonet.net; sim@netnation.com; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: Robert Olsson Date: Tue, 10 Jun 2003 20:34:50 +0200 I ripped out the route hash just to test the slow path. Seems like your patch was very good as we see the same performance w/o dst hash ~114 kpps. How did you "rip it out"? Just never look into the routing cache hash and never add entries there? If so, then yes it is excellent simulation for pure slow path. This is not purely an algorithmic problem. The highest cost thing we do in the slow path of input route processing is source validation. This requires real brains to eliminate. Actually, that's a good idea, if someone if brave just rip out fib_validate_source (just don't call it, should work for valid traffic) and see what happens :) From ralph@istop.com Tue Jun 10 14:39:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 14:39:46 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ALdd2x006868 for ; Tue, 10 Jun 2003 14:39:42 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 8917436AD4; Tue, 10 Jun 2003 17:39:38 -0400 (EDT) Date: Tue, 10 Jun 2003 17:39:44 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "Robert.Olsson@data.slu.se" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.115759.26513736.davem@redhat.com> Message-ID: References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3111 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > Actually, that's a good idea, if someone if brave just rip out > fib_validate_source (just don't call it, should work for valid > traffic) and see what happens :) Looking at Simon's profile numbers, these seem to contribute a lot more than fib_validate_source: 1237 ipt_route_hook 19.3281 3120 do_gettimeofday 21.6667 8299 ip_packet_match 24.6994 8031 fib_lookup 25.0969 1877 fib_rule_put 29.3281 What's the do_gettimeofday for? Is that just a bogus one that shows up for kernel profiling (I can recall using the profiling tool quantify had a similar problem of showing gettimeofday calls that it was doing on its own). I traced back the fib_lookup to ip_forward_finish, which seems to only call ip_forward_options when there's IP options (go figure!), which would make sense for a SYN packet (juno). What I can't think of is why we'd want to have special routing considerations for TCP SYN packets (and other IP packets with options set). -Ralph From davem@redhat.com Tue Jun 10 15:23:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 15:23:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AMNn2x008501 for ; Tue, 10 Jun 2003 15:23:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA24614; Tue, 10 Jun 2003 15:20:21 -0700 Date: Tue, 10 Jun 2003 15:20:20 -0700 (PDT) Message-Id: <20030610.152020.59678979.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3112 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 17:39:44 -0400 (EDT) Looking at Simon's profile numbers, these seem to contribute a lot more than fib_validate_source: Ignore all the fib_rule*() and associated overhead, Simon is going to turn of policy routing support so that stuff drops out of the profiles. The fib_lookup() will decrease significantly from the profiles if fib_validate_source() is deleted, and this is what I want confirmed from such an experiment. 1237 ipt_route_hook 19.3281 3120 do_gettimeofday 21.6667 8299 ip_packet_match 24.6994 8031 fib_lookup 25.0969 1877 fib_rule_put 29.3281 What's the do_gettimeofday for? Every packet records a timestamp. I traced back the fib_lookup to ip_forward_finish, which seems to only call ip_forward_options when there's IP options (go figure!), which would make sense for a SYN packet (juno). You're thinking TCP options, not IP options. What I can't think of is why we'd want to have special routing considerations for TCP SYN packets (and other IP packets with options set). ip_forward_options() has to update things like record-route IP options, which record hop-by-hop information for diagnostic tools like traceroute. From jgarzik@pobox.com Tue Jun 10 15:59:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 15:59:36 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AMxS2x009785 for ; Tue, 10 Jun 2003 15:59:29 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19PQ9H-0005JG-Jw; Mon, 09 Jun 2003 18:09:35 +0100 Message-ID: <3EE4BF39.2020503@pobox.com> Date: Mon, 09 Jun 2003 13:09:13 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup References: <20030607191522.GB3346@gtf.org> <20030607.235825.71096085.davem@redhat.com> <3EE4045D.4040002@pobox.com> <20030608.225309.39172149.davem@redhat.com> In-Reply-To: <20030608.225309.39172149.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3113 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Jeff Garzik > Date: Sun, 08 Jun 2003 23:51:57 -0400 > > David S. Miller wrote: > > Have you extracted out all the init_etherdev() killings Al and > > myself did so you can backport them to 2.4.x too? > > That's the plan, yes. > > That's your plan, but did you do any of this yet? It'll keep > going deeper and deeper into bitkeeper history the longer that > you wait :-) Yes, I have been following my plan. You will see when Marcelo opens 2.4.22-pre1 that I have been committing these to my net-drivers-2.4 queue. Jeff From toml@us.ibm.com Tue Jun 10 16:32:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 16:33:02 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ANWo2x010710 for ; Tue, 10 Jun 2003 16:32:58 -0700 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5ANWBsZ085616; Tue, 10 Jun 2003 19:32:11 -0400 Received: from d01ml072.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5ANW8nJ204774; Tue, 10 Jun 2003 19:32:09 -0400 Subject: IPSec: Policy dst bundles exhausting storage To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Tue, 10 Jun 2003 18:32:00 -0500 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/10/2003 07:32:08 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3114 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev I've discovered a bug in IPv6 policy bundle creation/searching (xfrm6_policy.c: __xfrm6_bundle_create and __xfrm6_find_bundle) during some stress testing using udp (it happens with tcp also) in tunnel mode (it happens in transport also). Every time a udp packet is sent a new dst bundle is created and chained to the policy. Eventually after enough packets are sent, the dst_alloc fails and no more packets can be sent. In IPv4, the first bundle that is created is used repeatedly as it should be. In the __xfrm6_find_bundle function, the xdst->u.rt6.rt6i_src.addr appears to not have been set correctly (it has a value of 0000:0000:0000:0000:0000:0001:0000:0000) and never matches the fl->fl6_src value and so a match is never found causing the creation of a new bundle. It would appear that some values aren't being set, or set correctly, during the __xfrm6_bundle_create function. One other thing I did notice in both the v4 and v6 bundle create functions is the line x->u.rt.fl = *fl. Shouldn't this be a memcpy? Thanks, Tom From davem@redhat.com Tue Jun 10 16:36:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 16:36:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ANa62x011055 for ; Tue, 10 Jun 2003 16:36:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA24888; Tue, 10 Jun 2003 16:32:50 -0700 Date: Tue, 10 Jun 2003 16:32:50 -0700 (PDT) Message-Id: <20030610.163250.70197099.davem@redhat.com> To: toml@us.ibm.com Cc: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3115 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Tom Lendacky" Date: Tue, 10 Jun 2003 18:32:00 -0500 One other thing I did notice in both the v4 and v6 bundle create functions is the line x->u.rt.fl = *fl. Shouldn't this be a memcpy? Gcc emits a memcpy() check the assembly. Structure assignment is a perfectly legal to do this. From davem@redhat.com Tue Jun 10 17:01:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 17:01:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B01N2x011675 for ; Tue, 10 Jun 2003 17:01:23 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA25086; Tue, 10 Jun 2003 16:57:59 -0700 Date: Tue, 10 Jun 2003 16:57:59 -0700 (PDT) Message-Id: <20030610.165759.78731321.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610.152020.59678979.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3116 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 19:58:47 -0400 (EDT) On Tue, 10 Jun 2003, David S. Miller wrote: > Every packet records a timestamp. I'm not aware of anything in IP routing that requires a timestamp for every packet. To me it sounds like we could rip that out too. Guess you never run tcpdump nor use packet schedulers. From ralph@istop.com Tue Jun 10 17:32:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 17:32:25 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B0W92x014517 for ; Tue, 10 Jun 2003 17:32:10 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id E242036F4E; Tue, 10 Jun 2003 19:58:40 -0400 (EDT) Date: Tue, 10 Jun 2003 19:58:47 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "Robert.Olsson@data.slu.se" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.152020.59678979.davem@redhat.com> Message-ID: References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> <20030610.152020.59678979.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3117 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > From: Ralph Doncaster > What's the do_gettimeofday for? > > Every packet records a timestamp. I'm not aware of anything in IP routing that requires a timestamp for every packet. To me it sounds like we could rip that out too. -Ralph From greearb@candelatech.com Tue Jun 10 17:52:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 17:52:47 -0700 (PDT) Received: from grok.yi.org (IDENT:3f8lzfBXBxY932h5VsIWVLN0F7AmAJMz@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B0qb2x015582 for ; Tue, 10 Jun 2003 17:52:38 -0700 Received: from candelatech.com (IDENT:1OZhvqSaxkU2HXzZZfuwDtlha7J3KNaq@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5B0pvc11184; Tue, 10 Jun 2003 17:51:57 -0700 Message-ID: <3EE67D2D.80608@candelatech.com> Date: Tue, 10 Jun 2003 17:51:57 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: ralph+d@istop.com CC: "David S. Miller" , "Robert.Olsson@data.slu.se" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> <20030610.152020.59678979.davem@redhat.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3118 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Ralph Doncaster wrote: > On Tue, 10 Jun 2003, David S. Miller wrote: > > >> From: Ralph Doncaster >> What's the do_gettimeofday for? >> >>Every packet records a timestamp. > > > I'm not aware of anything in IP routing that requires a timestamp for > every packet. To me it sounds like we could rip that out too. > > -Ralph > Maybe as a configurable option, since it would make tcpdump less useful. Seems like we could kludge it up so that we used the TSC (or whatever that really fast hardware clock is) to provide some relative stamp that could be converted to a time_val later? It does seem a bit wasteful to do the gettimeofday when most of the time the result is ignored. (Or, are there things other than tcpdump that need the gettimeofday stamp?) Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Tue Jun 10 18:02:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:02:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B12I2x016313 for ; Tue, 10 Jun 2003 18:02:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA25353; Tue, 10 Jun 2003 17:58:55 -0700 Date: Tue, 10 Jun 2003 17:58:54 -0700 (PDT) Message-Id: <20030610.175854.41658344.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610.165759.78731321.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3120 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 20:41:13 -0400 (EDT) This sounded so unbelievable to me that I took a quick look at the code to see what I'd have to do to get rid of it. It appears that gettimeofday is not called for every packet; just for ICMP timestamp requests and for IP options (ip_options_build and ip_options_compile). Stop lookin in the IP code. Look at where we get the packet from the device, which is one layer up. From davem@redhat.com Tue Jun 10 18:01:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:01:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B11q2x016195 for ; Tue, 10 Jun 2003 18:01:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA25345; Tue, 10 Jun 2003 17:58:13 -0700 Date: Tue, 10 Jun 2003 17:58:13 -0700 (PDT) Message-Id: <20030610.175813.74726237.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610.165759.78731321.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3119 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 20:41:13 -0400 (EDT) On Tue, 10 Jun 2003, David S. Miller wrote: > Guess you never run tcpdump nor use packet schedulers. So because some (in the case of a core router almost none) of the packets will need a timestamp, you do it for every single one of them? In order to be accurate, we must obtain the timestamp exactly when we receive the packet. But until we know that the packet is for us or not (which requires a route lookup), we don't know if we actually need the timestamp or not. This is not some arbitrary thing, this is how you have to implement this. It's not like we said "screw everyone, let's get a timestamp all the time whether we need it or not." :-) From davem@redhat.com Tue Jun 10 18:04:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:04:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B14l2x016932 for ; Tue, 10 Jun 2003 18:04:47 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA25384; Tue, 10 Jun 2003 18:01:20 -0700 Date: Tue, 10 Jun 2003 18:01:20 -0700 (PDT) Message-Id: <20030610.180120.71112140.davem@redhat.com> To: greearb@candelatech.com Cc: ralph+d@istop.com, Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <3EE67D2D.80608@candelatech.com> References: <20030610.152020.59678979.davem@redhat.com> <3EE67D2D.80608@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3121 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Tue, 10 Jun 2003 17:51:57 -0700 Maybe as a configurable option, since it would make tcpdump less useful. Seems like we could kludge it up so that we used the TSC (or whatever that really fast hardware clock is) to provide some relative stamp that could be converted to a time_val later? I have a strange feeling that Ralph's system isn't using TSC and that's why it shows up so high on the profiles :-) TSC do_gettimeofday() is REALLY cheap (TSC read plus a multiply which x86 does in like 5 cycles). Yes, this idea has been tossed around before. But what's funny is that on the bigger boxes, you don't use TSC because amongst the different nodes of the machine they are skewed, so you have to use the ACPI timer or something like that for timestamping. It does seem a bit wasteful to do the gettimeofday when most of the time the result is ignored. (Or, are there things other than tcpdump that need the gettimeofday stamp?) SO_RECVSTAMP, any socket on the machine can ask for this. From greearb@candelatech.com Tue Jun 10 18:15:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:15:47 -0700 (PDT) Received: from grok.yi.org (IDENT:u0jJfD8262ExsIRxrD0jns+wz07PoQDe@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1Fd2x017626 for ; Tue, 10 Jun 2003 18:15:40 -0700 Received: from candelatech.com (IDENT:MgCsb0s7fgJ0gvtiYZpquFQbEB5dMrtz@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5B1Fac14270; Tue, 10 Jun 2003 18:15:37 -0700 Message-ID: <3EE682B8.8060708@candelatech.com> Date: Tue, 10 Jun 2003 18:15:36 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" , "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress References: <20030610.152020.59678979.davem@redhat.com> <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> In-Reply-To: <20030610.180120.71112140.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3122 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Ben Greear > Date: Tue, 10 Jun 2003 17:51:57 -0700 > > Maybe as a configurable option, since it would make tcpdump less useful. > Seems like we could kludge it up so that we used the TSC (or whatever that > really fast hardware clock is) to provide some relative stamp that could be > converted to a time_val later? > > I have a strange feeling that Ralph's system isn't using > TSC and that's why it shows up so high on the profiles :-) > TSC do_gettimeofday() is REALLY cheap (TSC read plus a multiply which > x86 does in like 5 cycles). > > Yes, this idea has been tossed around before. But what's funny > is that on the bigger boxes, you don't use TSC because amongst > the different nodes of the machine they are skewed, so you have > to use the ACPI timer or something like that for timestamping. What determines whether or not we use the "TSC do_gettimeofday". Does it automagically happen when you compile for P-III or something like that? And how big of a "bigger box" are you talking about...regular old SMP, or NUMA? > > It does seem a bit wasteful to do the gettimeofday when most of the time > the result is ignored. > > (Or, are there things other than tcpdump that need the gettimeofday stamp?) > > SO_RECVSTAMP, any socket on the machine can ask for this. Do we know when we are being asked for this value? Ie, could we do the TSC/ACPI timer -> time_val conversion here? If TSC means cheap gettimeofday anyway, then my last question is moot. Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From ralph@istop.com Tue Jun 10 18:22:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:22:23 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1M92x019781 for ; Tue, 10 Jun 2003 18:22:10 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id A382D36A08; Tue, 10 Jun 2003 20:41:07 -0400 (EDT) Date: Tue, 10 Jun 2003 20:41:13 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "Robert.Olsson@data.slu.se" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.165759.78731321.davem@redhat.com> Message-ID: References: <20030610.152020.59678979.davem@redhat.com> <20030610.165759.78731321.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3123 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > From: Ralph Doncaster > Date: Tue, 10 Jun 2003 19:58:47 -0400 (EDT) > > On Tue, 10 Jun 2003, David S. Miller wrote: > > > Every packet records a timestamp. > > I'm not aware of anything in IP routing that requires a timestamp for > every packet. To me it sounds like we could rip that out too. > > Guess you never run tcpdump nor use packet schedulers. So because some (in the case of a core router almost none) of the packets will need a timestamp, you do it for every single one of them? This sounded so unbelievable to me that I took a quick look at the code to see what I'd have to do to get rid of it. It appears that gettimeofday is not called for every packet; just for ICMP timestamp requests and for IP options (ip_options_build and ip_options_compile). -Ralph From davem@redhat.com Tue Jun 10 18:25:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:25:54 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1Po2x020158 for ; Tue, 10 Jun 2003 18:25:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA25508; Tue, 10 Jun 2003 18:22:34 -0700 Date: Tue, 10 Jun 2003 18:22:34 -0700 (PDT) Message-Id: <20030610.182234.74725315.davem@redhat.com> To: greearb@candelatech.com Cc: netdev@oss.sgi.com Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <3EE682B8.8060708@candelatech.com> References: <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> <3EE682B8.8060708@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3124 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Tue, 10 Jun 2003 18:15:36 -0700 What determines whether or not we use the "TSC do_gettimeofday". Does it automagically happen when you compile for P-III or something like that? The 2.5.x kernel has x86 platform drivers that decide this. And how big of a "bigger box" are you talking about...regular old SMP, or NUMA? Many laptops cannot even use TSC reliably because of power management etc. issues. > SO_RECVSTAMP, any socket on the machine can ask for this. Do we know when we are being asked for this value? We have to take the timestamp at netif_receive_skb() for it to be accurate. We don't even know if this packet is for this host until a long time later, let alone whether any local sockets want SO_RECVSTAMP or whether any IP options want timestamp or whether tcpdump is listening etc. From davem@redhat.com Tue Jun 10 18:27:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:27:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1R42x020446 for ; Tue, 10 Jun 2003 18:27:04 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA25515; Tue, 10 Jun 2003 18:23:39 -0700 Date: Tue, 10 Jun 2003 18:23:38 -0700 (PDT) Message-Id: <20030610.182338.41657455.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: greearb@candelatech.com, Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3125 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 21:17:28 -0400 (EDT) Aren't the read_lock_irqsave and restore expensive? If x86 has an inefficient implementation, well... :-) This can be done without locks, nobody has done the x86 implementation of that that's all. I think the x86_64 folks did a lockless version, I know I did for sparc64 :) From greearb@candelatech.com Tue Jun 10 18:51:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:51:32 -0700 (PDT) Received: from grok.yi.org (IDENT:HcdCPHlohuTuV37SdGVSK2851Y4Nv96h@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1pL2x028876 for ; Tue, 10 Jun 2003 18:51:22 -0700 Received: from candelatech.com (IDENT:1YjRRHrrzAagSsIXQA2WlEhdzIYs/Bp4@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5B1pIc18932; Tue, 10 Jun 2003 18:51:18 -0700 Message-ID: <3EE68B15.60802@candelatech.com> Date: Tue, 10 Jun 2003 18:51:17 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: Route cache performance under stress References: <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> <3EE682B8.8060708@candelatech.com> <20030610.182234.74725315.davem@redhat.com> In-Reply-To: <20030610.182234.74725315.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3126 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > Do we know when we are being asked for this value? > > We have to take the timestamp at netif_receive_skb() for it to > be accurate. > > We don't even know if this packet is for this host until a long > time later, let alone whether any local sockets want SO_RECVSTAMP > or whether any IP options want timestamp or whether tcpdump is > listening etc. Yes, I understand why we want a time-stamp very early...but if we can get _some_ sort of time stamp very cheap (TSC, for example) then we can potentially defer the more expensive conversion of this stamp into the equivalent of whatever do_gettimeofday will give us. We could set an 'is-timestamp-converted-already' flag on the skb and have a macro that gets the timestamp. This macro can do the conversion as needed and return the value to calling code. For platforms that do not support TSC or it's equivalent, can just use gettimeofday for the original stamp and set the flag.. -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From ralph@istop.com Tue Jun 10 18:55:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 18:55:48 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B1tX2x029313 for ; Tue, 10 Jun 2003 18:55:34 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 9B67636E4D; Tue, 10 Jun 2003 21:17:22 -0400 (EDT) Date: Tue, 10 Jun 2003 21:17:28 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "David S. Miller" Cc: "greearb@candelatech.com" , "Robert.Olsson@data.slu.se" , "hadi@shell.cyberus.ca" , "xerox@foonet.net" , "sim@netnation.com" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610.180120.71112140.davem@redhat.com> Message-ID: References: <20030610.152020.59678979.davem@redhat.com> <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3127 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > TSC do_gettimeofday() is REALLY cheap (TSC read plus a multiply which > x86 does in like 5 cycles). Aren't the read_lock_irqsave and restore expensive? read_lock_irqsave(&xtime_lock, flags); usec = do_gettimeoffset(); { unsigned long lost = jiffies - wall_jiffies; if (lost) usec += lost * (1000000 / HZ); } sec = xtime.tv_sec; usec += xtime.tv_usec; read_unlock_irqrestore(&xtime_lock, flags); From davem@redhat.com Tue Jun 10 20:36:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 20:36:47 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B3ag2x031214 for ; Tue, 10 Jun 2003 20:36:42 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25877; Tue, 10 Jun 2003 20:33:25 -0700 Date: Tue, 10 Jun 2003 20:33:25 -0700 (PDT) Message-Id: <20030610.203325.41658167.davem@redhat.com> To: greearb@candelatech.com Cc: netdev@oss.sgi.com Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <3EE68B15.60802@candelatech.com> References: <3EE682B8.8060708@candelatech.com> <20030610.182234.74725315.davem@redhat.com> <3EE68B15.60802@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3128 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Yes, I understand why we want a time-stamp very early...but if we can get _some_ sort of time stamp very cheap (TSC, for example) then we can potentially defer the more expensive conversion of this stamp into the equivalent of whatever do_gettimeofday will give us. I fully understand your idea, I've talked about it with Alexey many times. Someone just has to implement it. pkt_sched.h is probably the place to play, maybe make an asm/pkt_sched.h header. From hadi@shell.cyberus.ca Wed Jun 11 00:10:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 00:10:48 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B7AE2x005835 for ; Wed, 11 Jun 2003 00:10:15 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PXjK-0009KH-KW; Mon, 09 Jun 2003 21:15:18 -0400 Date: Mon, 9 Jun 2003 21:15:18 -0400 (EDT) From: Jamal Hadi To: ralph+d@istop.com cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030609204257.L35799@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3129 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ralph Doncaster wrote: > From personal experience, after trying numerous things for over a year one > can get very frustrated. Although your contribution has been useful, you > are also guilty of wildly waving your hands around too. Many moons ago > when I lamented that my 2.2.19 kernel, 750Mhz duron, 3c59x core router > performance sucked you told me NAPI would solve the performance problems. > It didn't. And Rob's latest numbers seem to show that even with the > latest and greatest patches 148kpps is still a dream. It's good to see > that people are finally doing tests to simulate real-world routing > (instead of just pretending the problem doesn't exist because they were > able to get 148kpps in some contrived test). > I am not sure that foos tests are not contrived ;-> The man just hammers away at his routers with DOS tools;-> I feel like a shrink calming him down to stop doing that. hehe. I am actually not against using the DOS tools because they test the worst case. However, to solve a problem you need first to isolate it and methodically squash the coakroches. For example, In 2.2.x you wouldnt even see the problems that we have today because we had bigger problems namely interupt issues. NAPI resolves that. When i told you that i was basing it on facts. We are now exposed to dst cache problems. Daves patches isolate and resolve whats causing all this noise. First it was the cache distribution which is now resolved. Next it is garbage collection which it seems to me is being resolved. When someone working so hard like Dave is putting out these fires we need to help him. If he tells foo to run a specific test then thats what he should run ... I dont think we should just add CISCOs CEF just because someone thinks it works better. We need to systematically isolate and fix. For example just turning on netfilter is poluting the results. Problem is people disappear real quick when asked to run tests that could validate certain concepts. I wish everyone would emulate S Kirby he actually gives good info. > Here's my CPU graphs for the box; it's only doing routing and firewalling > isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) > http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html > > eth1 and eth2 are both sending and receiving ~30mbps of traffic (at > 8-10kpps in and out on each interface). > Is this still the duron 750Mhz? Are you running zebra? Did you check out some of the ideas i talked about earlier? > The other variable that I haven't seen people discuss but have anecdotal > evidence will measurably impact performance is the motherboard used > (chipset and chipset configuration/timing). > Robert has a good collection for what is good hardware. I am so outdated i dont keep track anymore. My fastest machine is still an ASuse dual 450Mhz. > Lastly from the software side Linux doesn't seem to have anything like > BSD's parameter to control user/system CPU sharing. Once my CPU load > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > 100% and end up with my BGP sessions drop. > Well, heres a good example: With NAPI, have your sessions been dropped? Have you tried a different NIC? Not sure how well the 3com is maintained for example. Try a tulip or tg3 or e1000 or the dlink gige. cheers, jamal From ak@suse.de Wed Jun 11 00:25:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 00:25:38 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B7PR2x006575 for ; Wed, 11 Jun 2003 00:25:28 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 7E3281476E; Wed, 11 Jun 2003 09:25:22 +0200 (MEST) Date: Wed, 11 Jun 2003 09:25:19 +0200 From: Andi Kleen To: "David S. Miller" Cc: greearb@candelatech.com, ralph+d@istop.com, Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030611072519.GB27144@wotan.suse.de> References: <20030610.152020.59678979.davem@redhat.com> <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.180120.71112140.davem@redhat.com> X-archive-position: 3130 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > I have a strange feeling that Ralph's system isn't using > TSC and that's why it shows up so high on the profiles :-) > TSC do_gettimeofday() is REALLY cheap (TSC read plus a multiply which > x86 does in like 5 cycles). On a P4 rdtsc takes 90+ cycles (probably because it's flushing the complete pipeline). Of course it's still relatively fast if you run that at 3Ghz, but on slower P4s it may hurt. On Athlons/Hammers it is quite fast, but at least on Hammer it needs a pipeline flush again for accuracy (otherwise the CPU can speculate it around) One bigger cost is normally the rw lock or the two memory barriers for the seqlock (on 2.5). On a UP compiled kernel it should not be a problem though. -Andi From yoshfuji@linux-ipv6.org Wed Jun 11 00:28:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 00:28:13 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B7S02x006938 for ; Wed, 11 Jun 2003 00:28:01 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5B7SnBo023817; Wed, 11 Jun 2003 16:28:49 +0900 Date: Wed, 11 Jun 2003 16:28:49 +0900 (JST) Message-Id: <20030611.162849.52863261.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, krkumar@us.ibm.com Subject: [PATCH/RFC] IPV6: Remember Manage/OtherConfig flags From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3131 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Well, I reread the spec. Managed flag and OtherConfig flag are maintained on a per-interface basis (RFC2462 5.2). So, let's store them in inet6_dev{}. Well, I haven't done kernel/userspace API for this; We should do it via rtnetlink with IFLA_IF6INFO (struct ifla_if6info { u32 ifla_if6flags; }) or something like this. (Note: if_flags is type of u8, but It would be good to reserve 24bits for future extensions.) If this idea seems ok, any voluteers? (I don't have enough time for writing this for now.) Thanks. Index: linux-2.5/include/net/if_inet6.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/if_inet6.h,v retrieving revision 1.8 diff -u -r1.8 if_inet6.h --- linux-2.5/include/net/if_inet6.h 16 May 2003 00:25:11 -0000 1.8 +++ linux-2.5/include/net/if_inet6.h 11 Jun 2003 05:40:20 -0000 @@ -17,6 +17,9 @@ #include +/* inet6_dev.if_flags */ +#define IF_RA_OTHERCONF 0x80 /* Managed flag in RA */ +#define IF_RA_MANAGED 0x40 /* OtherConfig flag in RA */ #define IF_RA_RCVD 0x20 #define IF_RS_SENT 0x10 Index: linux-2.5/net/ipv6/ndisc.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ndisc.c,v retrieving revision 1.38 diff -u -r1.38 ndisc.c --- linux-2.5/net/ipv6/ndisc.c 7 Jun 2003 00:22:34 -0000 1.38 +++ linux-2.5/net/ipv6/ndisc.c 11 Jun 2003 05:40:20 -0000 @@ -1043,6 +1043,14 @@ in6_dev->if_flags |= IF_RA_RCVD; } + /* + * Remember the managed / otherconf flags from the most recently + * received RA message (RFC2462) -- yoshfuji + */ + in6_dev->if_flags = (in6_dev->if_flags & ~(IF_RA_MANAGED|IF_RA_OTHERCONF)) | + (ra_msg->icmph.icmp6_addrconf_managed ? IF_RA_MANAGED : 0) | + (ra_msg->icmph.icmp6_addrconf_other ? IF_RA_OTHERCONF : 0); + lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); rt = rt6_get_dflt_router(&skb->nh.ipv6h->saddr, skb->dev); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From ak@suse.de Wed Jun 11 00:28:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 00:28:50 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B7Sc2x007095 for ; Wed, 11 Jun 2003 00:28:38 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id C59EE14341; Wed, 11 Jun 2003 09:28:32 +0200 (MEST) Date: Wed, 11 Jun 2003 09:28:32 +0200 From: Andi Kleen To: "David S. Miller" Cc: ralph+d@istop.com, ralph@istop.com, greearb@candelatech.com, Robert.Olsson@data.slu.se, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030611072832.GC27144@wotan.suse.de> References: <3EE67D2D.80608@candelatech.com> <20030610.180120.71112140.davem@redhat.com> <20030610.182338.41657455.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.182338.41657455.davem@redhat.com> X-archive-position: 3132 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 06:23:38PM -0700, David S. Miller wrote: > From: Ralph Doncaster > Date: Tue, 10 Jun 2003 21:17:28 -0400 (EDT) > > Aren't the read_lock_irqsave and restore expensive? > > If x86 has an inefficient implementation, well... :-) sti/cli is normally fast on x86, a bit slower on P3 core (a few cycles or so) read_lock_irqsave does a pushfl though, that's rather slow on P4, but still not that bad. read_lock_irq would be faster, but too risky here. > > This can be done without locks, nobody has done the x86 implementation > of that that's all. I think the x86_64 folks did a lockless version, > I know I did for sparc64 :) 2.5 i386 gettimeofday is lockless. But on UP it should not make any difference anyways. -Andi From lpetande@tml.hut.fi Wed Jun 11 02:33:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 02:33:49 -0700 (PDT) Received: from smtp-1.hut.fi (root@smtp-1.hut.fi [130.233.228.91]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B9Xa2x013016 for ; Wed, 11 Jun 2003 02:33:37 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-1.hut.fi (8.12.9/8.12.9) with ESMTP id h5B8eDji019348; Wed, 11 Jun 2003 11:40:14 +0300 Message-ID: <3EE6ECD3.6050103@tml.hut.fi> Date: Wed, 11 Jun 2003 11:48:19 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> <20030610.095135.28806569.davem@redhat.com> In-Reply-To: <20030610.095135.28806569.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-1.hut.fi) X-DCC-HUTCC-Metrics: smtp-1.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 3133 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev David S. Miller wrote: > > If you want to do these things using routes or xfrm rules, you must > integrate the creation of them into either zebra or racoon. You > cannot have a setup where mipv6d and racoon/zebra fight each other > flushing each other's settings. It doesn't work. > In the routing based approach there should not be any conflicts between mipv6 and zebra: We would create cached host routes based on the existing routes. Thus if zebra was running, the mipv6 daemon would not change the routes created by zebra, but only cached host routes. If zebra changed any routes, it would cause the deletion of any invalid cached routes. The mipv6 daemon would listen to netlink messages for route deletion and would then reinsert the mipv6 state into a new cached route. Does this make sense to you? Thanks, Henrik From Robert.Olsson@data.slu.se Wed Jun 11 02:55:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 02:55:23 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5B9tC2x016198 for ; Wed, 11 Jun 2003 02:55:13 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id LAA16196; Wed, 11 Jun 2003 11:54:34 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16102.64602.19145.131439@robur.slu.se> Date: Wed, 11 Jun 2003 11:54:34 +0200 To: Andi Kleen Cc: Bogdan Costescu , "David S. Miller" , sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030610164949.GB13246@wotan.suse.de> References: <20030610.085600.71109220.davem@redhat.com> <20030610164949.GB13246@wotan.suse.de> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3134 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Andi Kleen writes: > You can play some tricks with the driver to make eth_type_trans disappear > from the profiles. This usually helps a lot because it avoids one > full "fetch from cache cold memory" roundtrip per packet, which is slow on > any CPU. Andi! Interesting. Can we get into details? Cheers. --ro From ak@suse.de Wed Jun 11 03:05:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 03:05:42 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BA5V2x017458 for ; Wed, 11 Jun 2003 03:05:31 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id A61A414643; Wed, 11 Jun 2003 12:05:25 +0200 (MEST) Date: Wed, 11 Jun 2003 12:05:20 +0200 From: Andi Kleen To: Robert Olsson Cc: Andi Kleen , Bogdan Costescu , "David S. Miller" , sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030611100520.GB27119@oldwotan.suse.de> References: <20030610.085600.71109220.davem@redhat.com> <20030610164949.GB13246@wotan.suse.de> <16102.64602.19145.131439@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16102.64602.19145.131439@robur.slu.se> X-archive-position: 3135 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Wed, Jun 11, 2003 at 11:54:34AM +0200, Robert Olsson wrote: > > > Andi Kleen writes: > > > You can play some tricks with the driver to make eth_type_trans disappear > > from the profiles. This usually helps a lot because it avoids one > > full "fetch from cache cold memory" roundtrip per packet, which is slow on > > any CPU. > > > Andi! > Interesting. Can we get into details? eth_type_trans checks the ethernet protocol ID and sets the broadcast/multicast/ unicast L2 type. Some NICs have bits in the RX descriptor for most of them. They have a "packet is TCP or UDP or IP" bit and also a bit for unicast or sometimes even multicast/broadcast. So when you have the RX descriptor you can just derive these values from there and put them into the skb without calling eth_type_trans or looking at the cache cold header. Then you do a prefetch on the header. When the packet reaches the network stack later the header has already reached cache and it can be processed without a memory round trip latency. Caveats: On some cards it doesn't work for all packets or can be only done if you don't have any multicast addresses hashed (that's the case for the e1000 if I read the header bits correctly). The lxt1001 (old EOLed card) can do it for all packet types. Often prefetch size is limited so you should not prefetch more than what you can store until the packet reaches the stack. -Andi From Robert.Olsson@data.slu.se Wed Jun 11 03:39:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 03:39:19 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BAd92x018549 for ; Wed, 11 Jun 2003 03:39:10 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA16911; Wed, 11 Jun 2003 12:38:33 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16103.1705.531076.869230@robur.slu.se> Date: Wed, 11 Jun 2003 12:38:33 +0200 To: Andi Kleen Cc: Robert Olsson , Bogdan Costescu , "David S. Miller" , sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030611100520.GB27119@oldwotan.suse.de> References: <20030610.085600.71109220.davem@redhat.com> <20030610164949.GB13246@wotan.suse.de> <16102.64602.19145.131439@robur.slu.se> <20030611100520.GB27119@oldwotan.suse.de> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3136 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Andi Kleen writes: > eth_type_trans checks the ethernet protocol ID and sets the broadcast/multicast/ > unicast L2 type. > > Some NICs have bits in the RX descriptor for most of them. They have a > "packet is TCP or UDP or IP" bit and also a bit for unicast or sometimes > even multicast/broadcast. So when you have the RX descriptor you > can just derive these values from there and put them into the skb > without calling eth_type_trans or looking at the cache cold header. > > Then you do a prefetch on the header. When the packet reaches the > network stack later the header has already reached cache and it can be > processed without a memory round trip latency. > > Caveats: > On some cards it doesn't work for all packets or can be only done > if you don't have any multicast addresses hashed (that's the case > for the e1000 if I read the header bits correctly). The lxt1001 > (old EOLed card) can do it for all packet types. Thanks! Yes. Like to give this a try when I got chance. It should be something for the driver authors. Any patch handy? e1000? Cheers. --ro From hadi@shell.cyberus.ca Wed Jun 11 04:48:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 04:48:20 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BBmA2x000760 for ; Wed, 11 Jun 2003 04:48:11 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Q44u-000ALb-O5; Wed, 11 Jun 2003 07:47:44 -0400 Date: Wed, 11 Jun 2003 07:47:44 -0400 (EDT) From: Jamal Hadi To: Florian Weimer cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Was (Re: Route cache performance under stress In-Reply-To: <87el21wzb7.fsf@deneb.enyo.de> Message-ID: <20030611074007.S39760@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> <87el21wzb7.fsf@deneb.enyo.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3137 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Florian Weimer wrote: > In general, the forwarding performance is nowadays specified in pps > and even flows per second if you look carefully at the data sheets. Ok, this is interesting. I have never seen the flows per second used for simple L3 forwading. I have seen them being used for NAT or firewalling. Looking at the sprint traffic patterns, i think flows/sec is a meaningful metric. > Most vendors have learnt that people want routers with comforting > worst-case behavior. However, you have to read carefully, e.g. a > Catalyst 6500 with Supervisor Engine 1 (instead of 2) can only create > 650,000 flows per second, even if it has a much, much higher peak IP > forwarding rate. > So 2Mpps of 650Kflows/sec ? > (The times of routers which died when confronted with a rapid ICMP > sweep across a /16 are gone for good, I hope.) We should be able to punish specific misbehaving flows. Do you know if any routers are implementing proper DOS tracebacks to allow for inserting drop filters? cheers, jamal From ralf@linux-mips.org Wed Jun 11 04:51:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 04:51:12 -0700 (PDT) Received: from dea.linux-mips.net (p508B75E7.dip.t-dialin.net [80.139.117.231]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BBp42x001165 for ; Wed, 11 Jun 2003 04:51:05 -0700 Received: from dea.linux-mips.net (localhost [127.0.0.1]) by dea.linux-mips.net (8.12.8/8.12.8) with ESMTP id h5BBovbY027624; Wed, 11 Jun 2003 04:50:57 -0700 Received: (from ralf@localhost) by dea.linux-mips.net (8.12.8/8.12.8/Submit) id h5BBotcU027623; Wed, 11 Jun 2003 13:50:55 +0200 Date: Wed, 11 Jun 2003 13:50:55 +0200 From: Ralf Baechle To: Stephen Hemminger Cc: Jeff Garzik , davem@redhat.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-ID: <20030611115055.GB26751@linux-mips.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> <20030609171224.GA14623@gtf.org> <20030609110855.2e264ce1.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609110855.2e264ce1.shemminger@osdl.org> User-Agent: Mutt/1.4.1i X-archive-position: 3138 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 11:08:55AM -0700, Stephen Hemminger wrote: > Okay: > --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 > +++ ioc3-eth.c 2003-06-09 11:08:01.000000000 -0700 > @@ -1613,6 +1613,7 @@ static void __devexit ioc3_remove_one (s > struct ioc3_private *ip = dev->priv; > struct ioc3 *ioc3 = ip->regs; > > + unregister_netdev(dev); > iounmap(ioc3); > pci_release_regions(pdev); > kfree(dev); Thanks, applied. Same is also needed for 2.4. Ralf From hadi@shell.cyberus.ca Wed Jun 11 04:55:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 04:55:30 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BBtL2x003154 for ; Wed, 11 Jun 2003 04:55:22 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Q4Bp-000ALs-7h; Wed, 11 Jun 2003 07:54:53 -0400 Date: Wed, 11 Jun 2003 07:54:53 -0400 (EDT) From: Jamal Hadi To: "David S. Miller" cc: greearb@candelatech.com, netdev@oss.sgi.com Subject: gettime: Was (Re: Route cache performance under stress In-Reply-To: <20030610.203325.41658167.davem@redhat.com> Message-ID: <20030611065255.L39678@shell.cyberus.ca> References: <3EE682B8.8060708@candelatech.com> <20030610.182234.74725315.davem@redhat.com> <3EE68B15.60802@candelatech.com> <20030610.203325.41658167.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3139 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev Ok, time to go into another separate thread ;-> Sounds like a good idea. if (skbneedstimestamp) do_gettimeofday(&skb->stamp); else defertimestamp() For defertimestamp() would it be feasible that you store only the jiffies value in the skb then get timeofday later and somehow compensate for the difference? Seems very doable to me. Question is when do you decide skbneedstimestamp? Is it when the device is in promiscous mode or do it in ip or icmp etc? cheers, jamal On Tue, 10 Jun 2003, David S. Miller wrote: > From: Ben Greear > > Yes, I understand why we want a time-stamp very early...but if > we can get _some_ sort of time stamp very cheap (TSC, for example) > then we can potentially defer the more expensive conversion of > this stamp into the equivalent of whatever do_gettimeofday will > give us. > > I fully understand your idea, I've talked about it with Alexey many > times. Someone just has to implement it. > > pkt_sched.h is probably the place to play, maybe make an > asm/pkt_sched.h header. > > > From ak@suse.de Wed Jun 11 05:08:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 05:08:18 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BC892x003843 for ; Wed, 11 Jun 2003 05:08:10 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 41A7414963; Wed, 11 Jun 2003 14:08:04 +0200 (MEST) Date: Wed, 11 Jun 2003 14:08:03 +0200 From: Andi Kleen To: Jamal Hadi Cc: "David S. Miller" , greearb@candelatech.com, netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress Message-ID: <20030611120803.GB22720@wotan.suse.de> References: <3EE682B8.8060708@candelatech.com> <20030610.182234.74725315.davem@redhat.com> <3EE68B15.60802@candelatech.com> <20030610.203325.41658167.davem@redhat.com> <20030611065255.L39678@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030611065255.L39678@shell.cyberus.ca> X-archive-position: 3140 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Wed, Jun 11, 2003 at 07:54:53AM -0400, Jamal Hadi wrote: > > Ok, time to go into another separate thread ;-> > > Sounds like a good idea. > > if (skbneedstimestamp) > do_gettimeofday(&skb->stamp); > else > defertimestamp() Another way is to just store jiffies (= 10 or 1ms accuracy) This should be nearly zero cost and accurate enough at least for TCP. -Andi From hadi@shell.cyberus.ca Wed Jun 11 05:08:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 05:08:42 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BC8a2x003908 for ; Wed, 11 Jun 2003 05:08:37 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Q4OX-000AML-0z; Wed, 11 Jun 2003 08:08:01 -0400 Date: Wed, 11 Jun 2003 08:08:00 -0400 (EDT) From: Jamal Hadi To: Andi Kleen cc: Robert Olsson , Bogdan Costescu , "David S. Miller" , sim@netnation.com, ralph+d@istop.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030611100520.GB27119@oldwotan.suse.de> Message-ID: <20030611075703.R39786@shell.cyberus.ca> References: <20030610.085600.71109220.davem@redhat.com> <20030610164949.GB13246@wotan.suse.de> <16102.64602.19145.131439@robur.slu.se> <20030611100520.GB27119@oldwotan.suse.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3141 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Wed, 11 Jun 2003, Andi Kleen wrote: > eth_type_trans checks the ethernet protocol ID and sets the broadcast/multicast/ > unicast L2 type. > > Some NICs have bits in the RX descriptor for most of them. They have a > "packet is TCP or UDP or IP" bit and also a bit for unicast or sometimes > even multicast/broadcast. So when you have the RX descriptor you > can just derive these values from there and put them into the skb > without calling eth_type_trans or looking at the cache cold header. > > Then you do a prefetch on the header. When the packet reaches the > network stack later the header has already reached cache and it can be > processed without a memory round trip latency. > I have done prefetching experiments with a NAPIezed sb1250.c driver on MIPS. I never got rid of eth_type_trans() just prefetched skb->data a few lines before calling it. I did see eth_type_trans() almost disappear from the profile (it was way low to be important). Andis idea is even more interesting. I did see i think about 10Kpps more in throughput. Robert, this means our biggest bottleneck right now is cache misses. The MIPS processor i am playing with is SMP and has a large shared L2 cache. What i am observing is that this is quiet useful for SMP. I am limited by how much traffic i can generate right now to test it more. I can do 295Kpps L3 easy. This board is an excuse for you to come down to Ottawa in July ;-> > Caveats: > On some cards it doesn't work for all packets or can be only done > if you don't have any multicast addresses hashed (that's the case > for the e1000 if I read the header bits correctly). The lxt1001 > (old EOLed card) can do it for all packet types. > So can the sb1250. I'll try this out. > Often prefetch size is limited so you should not prefetch more > than what you can store until the packet reaches the stack. > Good point. So is there a systematic way to find out the effects of the prefecth size or you just have to keep trying until you get it right? cheers, jamal From hadi@shell.cyberus.ca Wed Jun 11 05:10:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 05:10:17 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BCAB2x004295 for ; Wed, 11 Jun 2003 05:10:12 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Q4QK-000AMY-Mb for netdev@oss.sgi.com; Wed, 11 Jun 2003 08:09:52 -0400 Date: Wed, 11 Jun 2003 08:09:52 -0400 (EDT) From: Jamal Hadi To: netdev@oss.sgi.com Subject: Real World Traffic WAS(Re: Route cache performance tests (fwd) Message-ID: <20030611080924.D39831@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3142 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev I think is interesting enough for general consumption ---------- Forwarded message ---------- Date: Wed, 11 Jun 2003 07:38:41 -0400 (EDT) From: Jamal Hadi To: Florian Weimer Cc: Pekka Savola , ralph+d@istop.com, Simon Kirby , CIT/Paul , 'David S. Miller' Subject: Real World Traffic WAS(Re: Route cache performance tests Ok time to start changing topics; this thread is already over two hundred emails ;-> On Tue, 10 Jun 2003, Florian Weimer wrote: > Pekka Savola writes: > > > I really hope you're kidding about Sprint core routers. > > Some of the IP carriers have far too little traffic in their backbone, > to a degree that it's an economic disaster. You need some spare > capacity in order to deal with some aspects of IP traffic, but not a > factor of 10 or something similar. > I think this is what might be happening at those sprint routers. Note that though the pps is low the flows per secs is very high. Look at the info on nyc-21: The 95th percentile of traffic is around 63Kpps. Then look at the 95th percentile for number of flows: 72K flows this implies that each flow is getting to send less than 1pps. I think the number of flows is also interesting. May help in formulizing the garbage collection. 376K flows(flowi/dst cache) stored at the worst case. In the worst case 88K flows are active every second. Pekka, do you have similar data you collect as well? > I once asked a networking engineer at another rather large NSP to do > some DoS backtracking for me. The network apparently used GSRs > throughout its core. Unfortunately, GSRs are very shy in the standard > configuration and they won't reveal much data about the traffic they > are routing. So this guy issued "ip route-cache flow" on the > interface on which the traffic was leaving his AS. It took him over > an hour to get that line card up again. Ouch. The funny thing is, > however, that the engineer insisted that this had always worked for > him on all other routers. ("ip route-cache flow" turns on flow-based > process switching, and the line card CPU and its buses are certainly > not powerful enough for that! If it causes no problems, you could > certainly use a 72xx or 75xx, maybe even a PC running Linux 8-) to > route that traffic.) isnt that equivalent to what we do with route caching? cheers, jamal From greearb@candelatech.com Wed Jun 11 08:57:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 08:57:53 -0700 (PDT) Received: from grok.yi.org (IDENT:pEhkkIsi6rpoXHIl1bQltQ+MwN1oRyGJ@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BFvg2x012073 for ; Wed, 11 Jun 2003 08:57:43 -0700 Received: from candelatech.com (IDENT:ZjVOhyXum6udToYaFBHXsY+plgk+p2Is@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5BFvOc32712; Wed, 11 Jun 2003 08:57:25 -0700 Message-ID: <3EE75164.8030500@candelatech.com> Date: Wed, 11 Jun 2003 08:57:24 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jamal Hadi CC: "David S. Miller" , netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress References: <3EE682B8.8060708@candelatech.com> <20030610.182234.74725315.davem@redhat.com> <3EE68B15.60802@candelatech.com> <20030610.203325.41658167.davem@redhat.com> <20030611065255.L39678@shell.cyberus.ca> In-Reply-To: <20030611065255.L39678@shell.cyberus.ca> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3143 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Jamal Hadi wrote: > Ok, time to go into another separate thread ;-> > > Sounds like a good idea. > > if (skbneedstimestamp) > do_gettimeofday(&skb->stamp); > else > defertimestamp() > > For defertimestamp() would it be feasible that you store only the > jiffies value in the skb then get timeofday later and somehow > compensate for the difference? Seems very doable to me. > > Question is when do you decide skbneedstimestamp? > Is it when the device is in promiscous mode or do it in ip or icmp etc? > > cheers, > jamal Jiffies is not nearly precise enough. You need something with usec precision at least. If we make a macro to read the value (converting as needed), and just change all the readers to use that macro, then we don't have to make any interesting decisions in the networking core. Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From pekkas@netcore.fi Wed Jun 11 10:01:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:01:33 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BH1R2x014724 for ; Wed, 11 Jun 2003 10:01:28 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5BH0lG02274; Wed, 11 Jun 2003 20:00:47 +0300 Date: Wed, 11 Jun 2003 20:00:46 +0300 (EEST) From: Pekka Savola To: Jamal Hadi cc: netdev@oss.sgi.com Subject: Re: Real World Traffic WAS(Re: Route cache performance tests (fwd) In-Reply-To: <20030611080924.D39831@shell.cyberus.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3144 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Wed, 11 Jun 2003, Jamal Hadi wrote: > Look at the info on nyc-21: > The 95th percentile of traffic is around 63Kpps. > Then look at the 95th percentile for number of flows: 72K flows > > this implies that each flow is getting to send less than 1pps. > > I think the number of flows is also interesting. May help in formulizing > the garbage collection. > 376K flows(flowi/dst cache) stored at the worst case. In the worst case > 88K flows are active every second. > > Pekka, do you have similar data you collect as well? I don't have data on flows but I can say some data from our perspective; On OC48 backbone interfaces, here: - about 600 Mbit/s of traffic equals about 120 kpps - a DoS attack of 1.2 Gbit/s was 150 kpps - about 1200 Mbit/s of traffic equals about 150 kpps A significant portion of these is large filetransfers, though (as the average packet size is like 600-1000 bytes, calculated from above)... -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From yoshfuji@linux-ipv6.org Wed Jun 11 10:06:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:06:24 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BH6G2x015091 for ; Wed, 11 Jun 2003 10:06:19 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5BH7HBo027742; Thu, 12 Jun 2003 02:07:17 +0900 Date: Thu, 12 Jun 2003 02:07:16 +0900 (JST) Message-Id: <20030612.020716.37975763.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: fix payload length of reassembled packet From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3145 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. I've introduced a bug, which calculates payload length incorrectly when reassembling. Bug was introduced in ChangeSet 1.1229.7.40. (This patch also eliminates redundancy.) Thanks in advance. Index: linux-2.5/net/ipv6/reassembly.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/reassembly.c,v retrieving revision 1.15 diff -u -r1.15 reassembly.c --- linux-2.5/net/ipv6/reassembly.c 30 May 2003 17:46:04 -0000 1.15 +++ linux-2.5/net/ipv6/reassembly.c 11 Jun 2003 15:49:44 -0000 @@ -596,10 +596,8 @@ BUG_TRAP(FRAG6_CB(head)->offset == 0); /* Unfragmented part is taken from the first segment. */ - payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len; - nhoff = head->h.raw - head->nh.raw; - - if (payload_len > 65535 + 8) + payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - 8; + if (payload_len > 65535) goto out_oversize; /* Head of list must not be cloned. */ -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From pekkas@netcore.fi Wed Jun 11 10:16:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:16:11 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHG52x015485 for ; Wed, 11 Jun 2003 10:16:06 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5BHFki02374; Wed, 11 Jun 2003 20:15:46 +0300 Date: Wed, 11 Jun 2003 20:15:45 +0300 (EEST) From: Pekka Savola To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: fix payload length of reassembled packet In-Reply-To: <20030612.020716.37975763.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3146 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Thu, 12 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > I've introduced a bug, which calculates payload length > incorrectly when reassembling. > Bug was introduced in ChangeSet 1.1229.7.40. > (This patch also eliminates redundancy.) > > Thanks in advance. > > Index: linux-2.5/net/ipv6/reassembly.c > =================================================================== > RCS file: /home/cvs/linux-2.5/net/ipv6/reassembly.c,v > retrieving revision 1.15 > diff -u -r1.15 reassembly.c > --- linux-2.5/net/ipv6/reassembly.c 30 May 2003 17:46:04 -0000 1.15 > +++ linux-2.5/net/ipv6/reassembly.c 11 Jun 2003 15:49:44 -0000 > @@ -596,10 +596,8 @@ > BUG_TRAP(FRAG6_CB(head)->offset == 0); > > /* Unfragmented part is taken from the first segment. */ > - payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len; > - nhoff = head->h.raw - head->nh.raw; > - > - if (payload_len > 65535 + 8) > + payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - 8; s/8/sizeof(struct frag_hdr)/ ? > + if (payload_len > 65535) > goto out_oversize; > > /* Head of list must not be cloned. */ > > -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From toml@us.ibm.com Wed Jun 11 10:20:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:20:58 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHKq2x015818 for ; Wed, 11 Jun 2003 10:20:53 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5BHK2td198190; Wed, 11 Jun 2003 13:20:02 -0400 Received: from tomlt2.austin.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5BHK0rE216104; Wed, 11 Jun 2003 13:20:00 -0400 Subject: Re: IPSec: Policy dst bundles exhausting storage From: Tom Lendacky To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, toml@us.ibm.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 11 Jun 2003 12:20:33 -0500 Message-Id: <1055352036.2610.42.camel@tomlt2.tomloffice.austin.ibm.com> Mime-Version: 1.0 X-archive-position: 3147 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev Gcc emits a memcpy() check the assembly. Structure assignment is a perfectly legal to do this. Yes, that makes sense since the compiler doesn't complain. I didn't think to check the assembly. As for the bug though, it appears that the "x->u.rt.fl = *fl" statement shouldn't be performed in the IPv6 __xfrm6_bundle_create function. Since the xfrm_dst structure is a union of the rtable structure and the rt6_info structure (among others), setting the "x->u.rt.fl" is later overwritten when statements such as "x->u.rt6.rt6i_... = ..." are executed. Should the IPv6 code be more like the IPv4 code? A flowi structure can be added to the rt6_info structure, rt6i_fl, which would then allow the "x->u.rt.fl = *fl" to be changed to "x->u.rt6.rt6i_fl = *fl". And then the __xfrm6_find_bundle could make basically the same checks as __xfrm4_find_bundle: if (xdst->u.rt6.rt6i_fl.oif == fl->oif && !ipv6_addr_cmp(&xdst->u.rt6.rt6i_fl.fl6_dst, &fl->fl6_dst) && !ipv6_addr_cmp(&xdst->u.rt6.rt6i_fl.fl6_src, &fl->fl6_src) && ... I've tested this briefly and it appears to work, but I don't know all of the intricacies of how this might effect other parts of the code. I've attached a patch for review. Let me know if this is ok (although I'm leaving this afternoon for a long weekend and won't be back until Monday). Thanks, Tom diff -ur linux-2.5.70-orig/include/net/ip6_fib.h linux-2.5.70-new/include/net/ip6_fib.h --- linux-2.5.70-orig/include/net/ip6_fib.h 2003-06-11 11:56:21.000000000 -0500 +++ linux-2.5.70-new/include/net/ip6_fib.h 2003-06-11 11:58:42.000000000 -0500 @@ -73,6 +73,8 @@ struct rt6key rt6i_src; u8 rt6i_protocol; + + struct flowi rt6i_fl; }; struct fib6_walker_t diff -ur linux-2.5.70-orig/net/ipv6/xfrm6_policy.c linux-2.5.70-new/net/ipv6/xfrm6_policy.c --- linux-2.5.70-orig/net/ipv6/xfrm6_policy.c 2003-06-11 11:56:22.000000000 -0500 +++ linux-2.5.70-new/net/ipv6/xfrm6_policy.c 2003-06-11 11:58:52.000000000 -0500 @@ -60,8 +60,9 @@ read_lock_bh(&policy->lock); for (dst = policy->bundles; dst; dst = dst->next) { struct xfrm_dst *xdst = (struct xfrm_dst*)dst; - if (!ipv6_addr_cmp(&xdst->u.rt6.rt6i_dst.addr, &fl->fl6_dst) && - !ipv6_addr_cmp(&xdst->u.rt6.rt6i_src.addr, &fl->fl6_src) && + if (xdst->u.rt6.rt6i_fl.oif == fl->oif && + !ipv6_addr_cmp(&xdst->u.rt6.rt6i_fl.fl6_dst, &fl->fl6_dst) && + !ipv6_addr_cmp(&xdst->u.rt6.rt6i_fl.fl6_src, &fl->fl6_src) && __xfrm6_bundle_ok(xdst, fl)) { dst_clone(dst); break; @@ -133,7 +134,7 @@ dst_prev->child = &rt->u.dst; for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; - x->u.rt.fl = *fl; + x->u.rt6.rt6i_fl = *fl; dst_prev->dev = rt->u.dst.dev; if (rt->u.dst.dev) From yoshfuji@linux-ipv6.org Wed Jun 11 10:26:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:27:02 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHQv2x016157 for ; Wed, 11 Jun 2003 10:26:58 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5BHRsBo027886; Thu, 12 Jun 2003 02:27:54 +0900 Date: Thu, 12 Jun 2003 02:27:53 +0900 (JST) Message-Id: <20030612.022753.56899094.yoshfuji@linux-ipv6.org> To: pekkas@netcore.fi Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: fix payload length of reassembled packet From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030612.020716.37975763.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3148 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Wed, 11 Jun 2003 20:15:45 +0300 (EEST)), Pekka Savola says: > > + payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - 8; > > s/8/sizeof(struct frag_hdr)/ ? Yes, sizeof(struct frag_hdr). I, however, use 8 for now to focus on the bug itself. (We have more "8"s there which should be substituted.) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Wed Jun 11 10:38:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:38:28 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHcN2x016567 for ; Wed, 11 Jun 2003 10:38:23 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5BHdKBo028285; Thu, 12 Jun 2003 02:39:20 +0900 Date: Thu, 12 Jun 2003 02:39:19 +0900 (JST) Message-Id: <20030612.023919.126807101.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, pekkas@netcore.fi, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: eliminating magic number for sizeof(struct frag_hdr) (Re: [PATCH] IPV6: fix payload length of reassembled packet) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030612.022753.56899094.yoshfuji@linux-ipv6.org> References: <20030612.020716.37975763.yoshfuji@linux-ipv6.org> <20030612.022753.56899094.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3149 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030612.022753.56899094.yoshfuji@linux-ipv6.org> (at Thu, 12 Jun 2003 02:27:53 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > s/8/sizeof(struct frag_hdr)/ ? > > Yes, sizeof(struct frag_hdr). > I, however, use 8 for now to focus on the bug itself. > (We have more "8"s there which should be substituted.) s/8/sizeof(struct frag_hdr)/; please apply this on top of the original patch. Thanks. --- linux-2.5+fix/net/ipv6/reassembly.c Thu Jun 12 02:33:42 2003 +++ linux-2.5+fix+edited/net/ipv6/reassembly.c Thu Jun 12 02:34:27 2003 @@ -596,7 +596,7 @@ BUG_TRAP(FRAG6_CB(head)->offset == 0); /* Unfragmented part is taken from the first segment. */ - payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - 8; + payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - sizeof(struct frag_hdr); if (payload_len > 65535) goto out_oversize; @@ -631,9 +631,10 @@ * header in order to calculate ICV correctly. */ nhoff = fq->nhoffset; head->nh.raw[nhoff] = head->h.raw[0]; - memmove(head->head+8, head->head, (head->data-head->head)-8); - head->mac.raw += 8; - head->nh.raw += 8; + memmove(head->head + sizeof(struct frag_hdr), head->head, + (head->data - head->head) - sizeof(struct frag_hdr)); + head->mac.raw += sizeof(struct frag_hdr); + head->nh.raw += sizeof(struct frag_hdr); skb_shinfo(head)->frag_list = head->next; head->h.raw = head->data; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From Robert.Olsson@data.slu.se Wed Jun 11 10:41:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:41:35 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHfQ2x016926 for ; Wed, 11 Jun 2003 10:41:27 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id TAA23776; Wed, 11 Jun 2003 19:40:47 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16103.27039.204333.952703@robur.slu.se> Date: Wed, 11 Jun 2003 19:40:47 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030610.115759.26513736.davem@redhat.com> References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3150 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > Actually, that's a good idea, if someone if brave just rip out > fib_validate_source (just don't call it, should work for valid > traffic) and see what happens :) Just about 9% better a bit of surprise... Still 1 dst/pkt. Input rate 2*189 kpps. All slow path with fib_source_validate removed. Now 121 kpps. (114 kpps before) Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 3212017 9661983 9661983 6787987 8 0 0 0 BRU eth1 1500 0 9 0 0 0 3212020 0 0 0 BRU eth2 1500 0 3212714 9656726 9656726 6787290 4 0 0 0 BRU eth3 1500 0 1 0 0 0 3212713 0 0 0 BRU rt_cache_stat 00008b63 00000000 0062089f 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00617a8b 00617a7f 00000005 00000000 00000000 00000002 So I added fib_source_validat again and profiled the 1 dst/pkt case. So this just profile of the slow path with some different performance counters. I'll guess the first is most interesting. Cpu type: P4 / Xeon Cpu speed was (MHz estimation) : 1799.55 Counter 0 counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 180000 vma samples %-age symbol name c023c038 107340 33.143 fn_hash_lookup c013154c 17399 5.37223 free_block c0211364 16502 5.09527 __rt_hash_shrink c01316e4 12854 3.96889 kmem_cache_alloc c01b86dc 11719 3.61844 e1000_clean_rx_irq c02033a0 11557 3.56842 alloc_skb c0212330 11378 3.51315 ip_route_input_slow c020cc98 9765 3.01511 eth_type_trans c0208860 7986 2.46581 dst_alloc c0216d98 7733 2.38769 ip_output c021200c 6940 2.14284 rt_set_nexthop c0213a9c 6331 1.9548 dst_free c0126998 6272 1.93659 rcu_do_batch c02035cc 6164 1.90324 skb_release_data c02036c4 6068 1.8736 __kfree_skb c01b8558 5532 1.7081 e1000_clean_tx_irq c01b7678 4970 1.53457 e1000_xmit_frame c020905c 4965 1.53303 neigh_lookup c013179c 4819 1.48795 kmem_cache_free c01317e0 4441 1.37123 kfree c020cb30 4002 1.23568 eth_header c0131728 3522 1.08748 kmalloc c0131384 3434 1.06031 cache_alloc_refill c023a5fc 3392 1.04734 fib_validate_source c023d814 2989 0.922904 fib_lookup c0113368 2190 0.676199 mark_offset_tsc Cpu type: P4 / Xeon Cpu speed was (MHz estimation) : 1799.55 Counter 7 counted MISPRED_BRANCH_RETIRED events (retired mispredicted branches) with a unit mask of 0x01 (retired instruction is non-bogus) count 18000 vma samples %-age symbol name c023c038 5246 85.0933 fn_hash_lookup c020905c 194 3.1468 neigh_lookup c0131384 99 1.60584 cache_alloc_refill c02036c4 66 1.07056 __kfree_skb c020ce70 51 0.827251 qdisc_restart c02033a0 51 0.827251 alloc_skb c0211364 44 0.713706 __rt_hash_shrink c01b86dc 32 0.519059 e1000_clean_rx_irq c023d814 28 0.454177 fib_lookup c0213a9c 25 0.405515 dst_free c0210ce8 25 0.405515 rt_garbage_collect c020ef04 23 0.373074 pfifo_dequeue c01b8558 20 0.324412 e1000_clean_tx_irq c0206dcc 19 0.308191 netif_receive_skb c0206880 18 0.291971 dev_queue_xmit c01b8ab0 18 0.291971 e1000_alloc_rx_buffers c02155e0 17 0.27575 ip_forward c021200c 15 0.243309 rt_set_nexthop c020cc98 13 0.210868 eth_type_trans c01b7678 13 0.210868 e1000_xmit_frame c0212330 12 0.194647 ip_route_input_slow c0131728 12 0.194647 kmalloc c010f3d0 12 0.194647 do_gettimeofday c020a12c 9 0.145985 neigh_resolve_output c010c350 9 0.145985 do_IRQ c0216d98 8 0.129765 ip_output Cpu type: P4 / Xeon Cpu speed was (MHz estimation) : 1799.55 Counter 0 counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x100 (Not set) count 18000 vma samples %-age symbol name c023c038 2361 31.3047 fn_hash_lookup c013154c 686 9.09573 free_block c0211364 507 6.72235 __rt_hash_shrink c0208860 502 6.65606 dst_alloc c01b86dc 433 5.74118 e1000_clean_rx_irq c0213a9c 393 5.21082 dst_free c0126998 378 5.01193 rcu_do_batch c020cc98 262 3.47388 eth_type_trans c02036c4 237 3.1424 __kfree_skb c0126970 234 3.10263 call_rcu c01b8558 212 2.81093 e1000_clean_tx_irq c0216d98 208 2.75789 ip_output c02035cc 202 2.67833 skb_release_data c01b7678 189 2.50597 e1000_xmit_frame c01b8ab0 141 1.86953 e1000_alloc_rx_buffers c02033a0 118 1.56457 alloc_skb c0131384 73 0.967913 cache_alloc_refill c020ce70 46 0.609918 qdisc_restart c0212330 36 0.477327 ip_route_input_slow c01317e0 33 0.43755 kfree c0206880 28 0.371254 dev_queue_xmit c0210ce8 26 0.344736 rt_garbage_collect c020ef04 17 0.225404 pfifo_dequeue c02109d4 16 0.212145 rt_may_expire c01316e4 16 0.212145 kmem_cache_alloc c02155e0 12 0.159109 ip_forward Cpu type: P4 / Xeon Cpu speed was (MHz estimation) : 1799.55 Counter 7 counted MACHINE_CLEAR events (cycles with entire machine pipeline cleared) with a unit mask of 0x01 (count a portion of cycles the machine is cleared for any cause) count 18000 vma samples %-age symbol name c010a738 326 55.4422 irq_entries_start c010afd8 128 21.7687 apic_timer_interrupt c023c038 45 7.65306 fn_hash_lookup c013154c 9 1.53061 free_block c010b208 9 1.53061 page_fault c01b86dc 8 1.36054 e1000_clean_rx_irq c0131384 8 1.36054 cache_alloc_refill c0208860 7 1.19048 dst_alloc c0213a9c 6 1.02041 dst_free c0126970 6 1.02041 call_rcu c0216d98 5 0.85034 ip_output c0126998 5 0.85034 rcu_do_batch c0211364 4 0.680272 __rt_hash_shrink c020cc98 4 0.680272 eth_type_trans c02036c4 4 0.680272 __kfree_skb c02035cc 3 0.510204 skb_release_data c02033a0 3 0.510204 alloc_skb c01b7678 3 0.510204 e1000_xmit_frame c01b8ab0 2 0.340136 e1000_alloc_rx_buffers c01b8558 2 0.340136 e1000_clean_tx_irq c020ce70 1 0.170068 qdisc_restart c02f940c 0 0 ipsec_pfkey_init c02f93cc 0 0 packet_init c02f9354 0 0 af_unix_init c02f9320 0 0 xfrm4_input_init c02f9304 0 0 xfrm4_state_init Cheers. --ro From Robert.Olsson@data.slu.se Wed Jun 11 10:53:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 10:53:16 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BHr72x017491 for ; Wed, 11 Jun 2003 10:53:08 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id TAA23972; Wed, 11 Jun 2003 19:52:25 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16103.27737.627713.123821@robur.slu.se> Date: Wed, 11 Jun 2003 19:52:25 +0200 To: Jamal Hadi Cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030609204257.L35799@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3151 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Jamal Hadi writes: > Robert has a good collection for what is good hardware. I am so outdated > i dont keep track anymore. My fastest machine is still an ASuse dual > 450Mhz. Well giving HW recommendations is very risky... :-) Anyway what we currently use: ftp://robur.slu.se/pub/Linux/bifrost/hardware.txt Cheers. --ro From fw@deneb.enyo.de Wed Jun 11 11:42:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 11:43:10 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BIgt2x020073 for ; Wed, 11 Jun 2003 11:42:57 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19QAXf-0004mt-00; Wed, 11 Jun 2003 20:41:51 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19QAXf-0001oq-9s; Wed, 11 Jun 2003 20:41:51 +0200 To: Jamal Hadi Cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Real World Routers 8-) References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> <87el21wzb7.fsf@deneb.enyo.de> <20030611074007.S39760@shell.cyberus.ca> From: Florian Weimer Mail-Followup-To: Jamal Hadi , ralph+d@istop.com, CIT/Paul , 'Simon Kirby' , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Date: Wed, 11 Jun 2003 20:41:51 +0200 In-Reply-To: <20030611074007.S39760@shell.cyberus.ca> (Jamal Hadi's message of "Wed, 11 Jun 2003 07:47:44 -0400 (EDT)") Message-ID: <877k7scv80.fsf_-_@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5BIgt2x020073 X-archive-position: 3152 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Jamal Hadi writes: > Ok, this is interesting. I have never seen the flows per second > used for simple L3 forwading. I have seen them being used for NAT or > firewalling. Some vendors still sell flow-based routers, and you should be able to get this numbers if the vendor doesn't try to scam you. > Looking at the sprint traffic patterns, i think flows/sec is a > meaningful metric. It's important to look at this number when buying a router, but I still think that stateless IP fowarding is the way to go even if you haven't got specialized hardware (TCAM). >> Most vendors have learnt that people want routers with comforting >> worst-case behavior. However, you have to read carefully, e.g. a >> Catalyst 6500 with Supervisor Engine 1 (instead of 2) can only create >> 650,000 flows per second, even if it has a much, much higher peak IP >> forwarding rate. >> > > So 2Mpps of 650Kflows/sec ? Exactly. (You can use a different Supervisor Engine and get stateless IP switching at 2 Mpps, at least according to the data sheets.) > We should be able to punish specific misbehaving flows. This is quite difficult because misbehaving flows often consist of a single packet. Managing state for such flows is a waste, but you hardly can now this when you have to decide whether you want to create a new flow or not. If you want to punish per-interface flows, forget it. Most routers are not sufficiently multi-homed to make a difference, and attacks often hit routers on multiple interfaces. > Do you know if any routers are implementing proper DOS tracebacks to > allow for inserting drop filters? You mean IP Pushback? I haven't seen it on production routers, and I'm pretty sure that no one uses it yet. Flow-based traffic monitoring is available on most routers nowadays (often sampled, though), even on routers that perform stateless IP forwarding. Anyway, just dropping packets locally doesn't help you *that* much, you need cooperation of your upstream (and automated cooperation ŕ la IP Pushback is still far, far away, I presume). From maxk@qualcomm.com Wed Jun 11 12:21:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 12:21:58 -0700 (PDT) Received: from numenor.qualcomm.com (numenor.qualcomm.com [129.46.51.58]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BJLr2x021717 for ; Wed, 11 Jun 2003 12:21:53 -0700 Received: from sabrina.qualcomm.com (sabrina.qualcomm.com [129.46.61.150]) by numenor.qualcomm.com (8.12.9/8.12.5/1.0) with ESMTP id h5BJLmxO010791 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 11 Jun 2003 12:21:48 -0700 (PDT) Received: from MAXK.qualcomm.com (maxk.qualcomm.com [129.46.176.80]) by sabrina.qualcomm.com (8.12.9/8.12.5/1.0) with ESMTP id h5BJLjBC006806; Wed, 11 Jun 2003 12:21:45 -0700 (PDT) Message-Id: <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> X-Sender: maxk@unixmail.qualcomm.com X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Wed, 11 Jun 2003 12:21:44 -0700 To: Stephen Hemminger , "David S. Miller" , Jeff Garzik From: Max Krasnyansky Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev Cc: netdev@oss.sgi.com In-Reply-To: <20030609115857.38bb31d6.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 3153 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maxk@qualcomm.com Precedence: bulk X-list: netdev At 11:58 AM 6/9/2003, Stephen Hemminger wrote: >- if ((err = dev_alloc_name(&tun->dev, name)) < 0) >- goto failed; >- if ((err = register_netdevice(&tun->dev))) >+ dev = alloc_netdev(sizeof(struct tun_struct), name, >+ tun_setup); >+ if (!dev) >+ return -ENOMEM; >+ >+ tun = dev->priv; >+ tun->flags = flags; >+ >+ if ((err = register_netdevice(tun->dev))) { >+ kfree(dev); > goto failed; >+ } This is wrong. register_netdevice() does not expand name (ie %d stuff). So dev_alloc_name() is still needed. i.e. dev = alloc_netdev(sizeof(struct tun_struct), name, tun_setup); if (!dev) return -ENOMEM; err = dev_alloc_name(dev, name); if (err < 0) { kfree(dev); return err; } Max From garzik@gtf.org Wed Jun 11 12:43:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 12:43:25 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BJhJ2x024654 for ; Wed, 11 Jun 2003 12:43:20 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 66D15665D; Wed, 11 Jun 2003 15:43:18 -0400 (EDT) Date: Wed, 11 Jun 2003 15:43:18 -0400 From: Jeff Garzik To: Max Krasnyansky Cc: Stephen Hemminger , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev Message-ID: <20030611194317.GE31051@gtf.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> User-Agent: Mutt/1.3.28i X-archive-position: 3154 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Wed, Jun 11, 2003 at 12:21:44PM -0700, Max Krasnyansky wrote: > At 11:58 AM 6/9/2003, Stephen Hemminger wrote: > > > >- if ((err = dev_alloc_name(&tun->dev, name)) < 0) > >- goto failed; > >- if ((err = register_netdevice(&tun->dev))) > >+ dev = alloc_netdev(sizeof(struct tun_struct), name, > >+ tun_setup); > >+ if (!dev) > >+ return -ENOMEM; > >+ > >+ tun = dev->priv; > >+ tun->flags = flags; > >+ > >+ if ((err = register_netdevice(tun->dev))) { > >+ kfree(dev); > > goto failed; > >+ } > > > This is wrong. register_netdevice() does not expand name (ie %d stuff). > So dev_alloc_name() is still needed. i.e. Correct. But, register_netdev() is preferred precisely for this reason. Jeff From fw@deneb.enyo.de Wed Jun 11 12:49:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 12:49:13 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BJn62x025605 for ; Wed, 11 Jun 2003 12:49:07 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19QBZs-0007nT-00; Wed, 11 Jun 2003 21:48:12 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19QBZs-00020c-7V; Wed, 11 Jun 2003 21:48:12 +0200 To: ralph+d@istop.com Cc: Jamal Hadi , Pekka Savola , CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: <20030610075702.I37165@shell.cyberus.ca> From: Florian Weimer Mail-Followup-To: ralph+d@istop.com, Jamal Hadi , Pekka Savola , CIT/Paul , 'Simon Kirby' , "'David S. Miller'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Date: Wed, 11 Jun 2003 21:48:12 +0200 In-Reply-To: (Ralph Doncaster's message of "Tue, 10 Jun 2003 11:29:18 -0400 (EDT)") Message-ID: <87he6wbdkz.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3155 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Ralph Doncaster writes: >> Assuming the attacker has a 100mbps link to you, yes ;-> > > A script kiddie 0wning a box with a FE connection is nothing. During what > was probably the worst DOS I got hit with, one of my upstream providers > said they were seeing about 600mbps of traffic related to the attack. Yes, these numbers keep growing. By today's standards, 6000 Mbps shouldn't be too surprising. 8-( One of the servers I keep running was recently flooded with 1500-byte UDP packets, Fast Ethernet line rate. It definitely happens if your pipes are fat enough. From xerox@foonet.net Wed Jun 11 12:53:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 12:53:29 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BJrM2x026035 for ; Wed, 11 Jun 2003 12:53:23 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h5BJr1eq005498; Wed, 11 Jun 2003 15:53:01 -0400 From: "CIT/Paul" To: "'Florian Weimer'" , Cc: "'Jamal Hadi'" , "'Pekka Savola'" , "'Simon Kirby'" , "'David S. Miller'" , , Subject: RE: Route cache performance under stress Date: Wed, 11 Jun 2003 15:40:47 -0400 Organization: CIT Message-ID: <000901c33051$5ae64330$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <87he6wbdkz.fsf@deneb.enyo.de> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 3156 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev Wait until you see a DoS attack at 2 million pps with random source ips and ports and dst ports and tcp flags and the only consistant thing about the entire attack is the destination ip :> can we say.. Null route quick!! Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Florian Weimer [mailto:fw@deneb.enyo.de] Sent: Wednesday, June 11, 2003 3:48 PM To: ralph+d@istop.com Cc: Jamal Hadi; Pekka Savola; CIT/Paul; 'Simon Kirby'; 'David S. Miller'; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Ralph Doncaster writes: >> Assuming the attacker has a 100mbps link to you, yes ;-> > > A script kiddie 0wning a box with a FE connection is nothing. During > what was probably the worst DOS I got hit with, one of my upstream > providers said they were seeing about 600mbps of traffic related to > the attack. Yes, these numbers keep growing. By today's standards, 6000 Mbps shouldn't be too surprising. 8-( One of the servers I keep running was recently flooded with 1500-byte UDP packets, Fast Ethernet line rate. It definitely happens if your pipes are fat enough. From shemminger@osdl.org Wed Jun 11 13:27:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 13:27:48 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BKRf2x027001 for ; Wed, 11 Jun 2003 13:27:42 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5BKRGX29919; Wed, 11 Jun 2003 13:27:18 -0700 Date: Wed, 11 Jun 2003 13:27:15 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: maxk@qualcomm.com, davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev Message-Id: <20030611132715.76a485c7.shemminger@osdl.org> In-Reply-To: <20030611194317.GE31051@gtf.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> <20030611194317.GE31051@gtf.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3157 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Wed, 11 Jun 2003 15:43:18 -0400 Jeff Garzik wrote: > On Wed, Jun 11, 2003 at 12:21:44PM -0700, Max Krasnyansky wrote: > > At 11:58 AM 6/9/2003, Stephen Hemminger wrote: > > > > > > >- if ((err = dev_alloc_name(&tun->dev, name)) < 0) > > >- goto failed; > > >- if ((err = register_netdevice(&tun->dev))) > > >+ dev = alloc_netdev(sizeof(struct tun_struct), name, > > >+ tun_setup); > > >+ if (!dev) > > >+ return -ENOMEM; > > >+ > > >+ tun = dev->priv; > > >+ tun->flags = flags; > > >+ > > >+ if ((err = register_netdevice(tun->dev))) { > > >+ kfree(dev); > > > goto failed; > > >+ } > > > > > > This is wrong. register_netdevice() does not expand name (ie %d stuff). > > So dev_alloc_name() is still needed. i.e. > > Correct. > > But, register_netdev() is preferred precisely for this reason. > Not possible in this case because device is created off socket ioctl so it is called with rtnl_lock From maxk@qualcomm.com Wed Jun 11 14:03:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 14:03:22 -0700 (PDT) Received: from numenor.qualcomm.com (numenor.qualcomm.com [129.46.51.58]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BL3F2x028028 for ; Wed, 11 Jun 2003 14:03:15 -0700 Received: from neophyte.qualcomm.com (neophyte.qualcomm.com [129.46.61.149]) by numenor.qualcomm.com (8.12.9/8.12.5/1.0) with ESMTP id h5BL3BxO016031 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 11 Jun 2003 14:03:11 -0700 (PDT) Received: from MAXK.qualcomm.com (maxk.qualcomm.com [129.46.176.80]) by neophyte.qualcomm.com (8.12.9/8.12.5/1.0) with ESMTP id h5BL38uh026957; Wed, 11 Jun 2003 14:03:09 -0700 (PDT) Message-Id: <5.1.0.14.2.20030611140042.0b8b5950@unixmail.qualcomm.com> X-Sender: maxk@unixmail.qualcomm.com X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Wed, 11 Jun 2003 14:03:08 -0700 To: Stephen Hemminger , Jeff Garzik From: Max Krasnyansky Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev Cc: davem@redhat.com, netdev@oss.sgi.com In-Reply-To: <20030611132715.76a485c7.shemminger@osdl.org> References: <20030611194317.GE31051@gtf.org> <20030609115857.38bb31d6.shemminger@osdl.org> <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> <20030611194317.GE31051@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 3158 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maxk@qualcomm.com Precedence: bulk X-list: netdev At 01:27 PM 6/11/2003, Stephen Hemminger wrote: >On Wed, 11 Jun 2003 15:43:18 -0400 >Jeff Garzik wrote: > >> > >> > This is wrong. register_netdevice() does not expand name (ie %d stuff). >> > So dev_alloc_name() is still needed. i.e. >> >> Correct. >> >> But, register_netdev() is preferred precisely for this reason. >> > >Not possible in this case because device is created off socket ioctl so it is >called with rtnl_lock Yep. But not because it's created from socket ioctl. Because it has to guaranty atomicity. Max From fw@deneb.enyo.de Wed Jun 11 14:09:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 14:09:15 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BL9A2x028397 for ; Wed, 11 Jun 2003 14:09:11 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19QCq8-00033P-00; Wed, 11 Jun 2003 23:09:04 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19QCq8-0002IC-7m; Wed, 11 Jun 2003 23:09:04 +0200 To: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: <20030610075702.I37165@shell.cyberus.ca> <87he6wbdkz.fsf@deneb.enyo.de> From: Florian Weimer Mail-Followup-To: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Date: Wed, 11 Jun 2003 23:09:04 +0200 In-Reply-To: <87he6wbdkz.fsf@deneb.enyo.de> (Florian Weimer's message of "Wed, 11 Jun 2003 21:48:12 +0200") Message-ID: <874r2wb9u7.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3159 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Florian Weimer writes: > Yes, these numbers keep growing. By today's standards, 6000 Mbps Oops, that's one "0" too many. 6 Gbps is definitely still surprising. From shemminger@osdl.org Wed Jun 11 14:43:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 14:43:04 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BLgw2x031095 for ; Wed, 11 Jun 2003 14:42:59 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5BLgnX29203; Wed, 11 Jun 2003 14:42:49 -0700 Date: Wed, 11 Jun 2003 14:42:49 -0700 From: Stephen Hemminger To: Jeff Garzik , Jes Sorenson Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] acenic -- update to use alloc_etherdev Message-Id: <20030611144249.7cd63c1c.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3160 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Updated acenic driver to use alloc_etherdev to hold private data structure. Uses register_netdev() to get the name right this time ;-) Tested against 2.5.70 bk latest on Tigon III board. diff -Nru a/drivers/net/acenic.c b/drivers/net/acenic.c --- a/drivers/net/acenic.c Wed Jun 11 14:36:47 2003 +++ b/drivers/net/acenic.c Wed Jun 11 14:36:47 2003 @@ -642,8 +642,7 @@ (pdev->device == PCI_DEVICE_ID_SGI_ACENIC))) continue; - dev = init_etherdev(NULL, sizeof(struct ace_private)); - + dev = alloc_etherdev(sizeof(struct ace_private)); if (dev == NULL) { printk(KERN_ERR "acenic: Unable to allocate " "net_device structure!\n"); @@ -653,13 +652,6 @@ SET_MODULE_OWNER(dev); SET_NETDEV_DEV(dev, &pdev->dev); - if (!dev->priv) - dev->priv = kmalloc(sizeof(*ap), GFP_KERNEL); - if (!dev->priv) { - printk(KERN_ERR "acenic: Unable to allocate memory\n"); - return -ENOMEM; - } - ap = dev->priv; ap->pdev = pdev; @@ -737,6 +729,12 @@ "AceNIC %i will be disabled.\n", dev->name, boards_found); break; + } + + if (register_netdev(dev)) { + printk(KERN_ERR "acenic: device registration failed\n"); + kfree(dev); + continue; } switch(pdev->vendor) { From jes@wildopensource.com Wed Jun 11 15:21:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 15:21:17 -0700 (PDT) Received: from trained-monkey.org (trained-monkey.org [209.217.122.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5BMLA2x002877 for ; Wed, 11 Jun 2003 15:21:11 -0700 Received: from trained-monkey.org (trained-monkey.org [127.0.0.1]) by trained-monkey.org (8.12.9/8.12.8) with ESMTP id h5BMLATU003413; Wed, 11 Jun 2003 18:21:10 -0400 Received: (from jes@localhost) by trained-monkey.org (8.12.9/8.12.9/Submit) id h5BMLArN003409; Wed, 11 Jun 2003 18:21:10 -0400 X-Authentication-Warning: trained-monkey.org: jes set sender to jes@wildopensource.com using -f To: Stephen Hemminger Cc: Jeff Garzik , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev References: <20030611144249.7cd63c1c.shemminger@osdl.org> From: Jes Sorensen Date: 11 Jun 2003 18:21:09 -0400 In-Reply-To: <20030611144249.7cd63c1c.shemminger@osdl.org> Message-ID: Lines: 15 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3161 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jes@wildopensource.com Precedence: bulk X-list: netdev >>>>> "Stephen" == Stephen Hemminger writes: Stephen> Updated acenic driver to use alloc_etherdev to hold private Stephen> data structure. Uses register_netdev() to get the name right Stephen> this time ;-) Please provide a compat macro for 2.4.18 and younger as well. Stephen> Tested against 2.5.70 bk latest on Tigon III board. Pretty sure it wasn't a Tigon III ;-) Jes PS: The name is Sorensen From davem@redhat.com Wed Jun 11 20:25:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 20:25:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C3PG2x014252 for ; Wed, 11 Jun 2003 20:25:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA28824; Wed, 11 Jun 2003 20:20:03 -0700 Date: Wed, 11 Jun 2003 20:20:03 -0700 (PDT) Message-Id: <20030611.202003.74721468.davem@redhat.com> To: lpetande@tml.hut.fi Cc: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <3EE6ECD3.6050103@tml.hut.fi> References: <3EE5F85E.9080006@tml.hut.fi> <20030610.095135.28806569.davem@redhat.com> <3EE6ECD3.6050103@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3162 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Wed, 11 Jun 2003 11:48:19 +0300 Does this make sense to you? No it doesn't. When you startup zebra, it may flush the entire routing table. You must make zebra aware of any static or dynamic routes you care about. It manages entire routing table and that is the end of the story. From davem@redhat.com Wed Jun 11 20:32:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 20:32:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C3Wi2x014888 for ; Wed, 11 Jun 2003 20:32:44 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA28867; Wed, 11 Jun 2003 20:29:15 -0700 Date: Wed, 11 Jun 2003 20:29:15 -0700 (PDT) Message-Id: <20030611.202915.71117016.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: greearb@candelatech.com, netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030611065255.L39678@shell.cyberus.ca> References: <3EE68B15.60802@candelatech.com> <20030610.203325.41658167.davem@redhat.com> <20030611065255.L39678@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3163 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Wed, 11 Jun 2003 07:54:53 -0400 (EDT) Sounds like a good idea. if (skbneedstimestamp) do_gettimeofday(&skb->stamp); else defertimestamp() Damn, read the thread Jamal :( This is not possible at all. We do not know the value of 'skbneedstimestamp' until much later, but we MUST make the timestamp now in order for it to be accurate. From davem@redhat.com Wed Jun 11 20:33:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 20:33:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C3Xk2x015158 for ; Wed, 11 Jun 2003 20:33:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA28876; Wed, 11 Jun 2003 20:30:15 -0700 Date: Wed, 11 Jun 2003 20:30:15 -0700 (PDT) Message-Id: <20030611.203015.104061804.davem@redhat.com> To: ak@suse.de Cc: hadi@shell.cyberus.ca, greearb@candelatech.com, netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030611120803.GB22720@wotan.suse.de> References: <20030610.203325.41658167.davem@redhat.com> <20030611065255.L39678@shell.cyberus.ca> <20030611120803.GB22720@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3164 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Wed, 11 Jun 2003 14:08:03 +0200 Another way is to just store jiffies (= 10 or 1ms accuracy) This should be nearly zero cost and accurate enough at least for TCP. TCP doesn't use it Andi. SO_RECVSTAMP etc. uses it and that MUST be accurate. People, start approaching this from an actually implementable angle, not one's that have no basis in reality :) From davem@redhat.com Wed Jun 11 21:00:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 21:01:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C40w2x015991 for ; Wed, 11 Jun 2003 21:00:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA28969; Wed, 11 Jun 2003 20:57:22 -0700 Date: Wed, 11 Jun 2003 20:57:21 -0700 (PDT) Message-Id: <20030611.205721.115935520.davem@redhat.com> To: jgarzik@pobox.com Cc: maxk@qualcomm.com, shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev From: "David S. Miller" In-Reply-To: <20030611194317.GE31051@gtf.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> <20030611194317.GE31051@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3165 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Wed, 11 Jun 2003 15:43:18 -0400 On Wed, Jun 11, 2003 at 12:21:44PM -0700, Max Krasnyansky wrote: > This is wrong. register_netdevice() does not expand name (ie %d stuff). > So dev_alloc_name() is still needed. i.e. Correct. But, register_netdev() is preferred precisely for this reason. Right. From davem@redhat.com Wed Jun 11 21:03:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 21:03:20 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C43G2x016353 for ; Wed, 11 Jun 2003 21:03:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA28983; Wed, 11 Jun 2003 20:59:43 -0700 Date: Wed, 11 Jun 2003 20:59:43 -0700 (PDT) Message-Id: <20030611.205943.48506382.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, maxk@qualcomm.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev From: "David S. Miller" In-Reply-To: <20030611132715.76a485c7.shemminger@osdl.org> References: <5.1.0.14.2.20030611121155.0b659e20@unixmail.qualcomm.com> <20030611194317.GE31051@gtf.org> <20030611132715.76a485c7.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3166 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Wed, 11 Jun 2003 13:27:15 -0700 > But, register_netdev() is preferred precisely for this reason. Not possible in this case because device is created off socket ioctl so it is called with rtnl_lock So do it by hand. From davem@redhat.com Wed Jun 11 21:08:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 21:08:20 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C48F2x016947 for ; Wed, 11 Jun 2003 21:08:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29027; Wed, 11 Jun 2003 21:04:45 -0700 Date: Wed, 11 Jun 2003 21:04:45 -0700 (PDT) Message-Id: <20030611.210445.21901735.davem@redhat.com> To: jes@wildopensource.com Cc: shemminger@osdl.org, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev From: "David S. Miller" In-Reply-To: References: <20030611144249.7cd63c1c.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3167 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jes Sorensen Date: 11 Jun 2003 18:21:09 -0400 >>>>> "Stephen" == Stephen Hemminger writes: Stephen> Updated acenic driver to use alloc_etherdev to hold private Stephen> data structure. Uses register_netdev() to get the name right Stephen> this time ;-) Please provide a compat macro for 2.4.18 and younger as well. How actively are you maintaining acenic. Jes? :-) This is a very serious question, I haven't seen a 2.5.x change go back to 2.4.x since it's inception. All this compat nonsense is becoming useless. Other drivers fair just fine 2.4.x/2.5.x without all this ifdef mumbo-jumbo that litters acenic.c and makes it nearly impossible to read. In fact all these localized compat macros make acenic.c HARDER to maintain. From shemminger@osdl.org Wed Jun 11 22:44:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 22:44:51 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C5id2x020155 for ; Wed, 11 Jun 2003 22:44:40 -0700 Received: from osdl.org (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h5C5iOX11399; Wed, 11 Jun 2003 22:44:24 -0700 Message-ID: <3EE81345.6080009@osdl.org> Date: Wed, 11 Jun 2003 22:44:37 -0700 From: Stephen Hemminger User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: jes@wildopensource.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev References: <20030611144249.7cd63c1c.shemminger@osdl.org> <20030611.210445.21901735.davem@redhat.com> In-Reply-To: <20030611.210445.21901735.davem@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3168 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev David S. Miller wrote: > From: Jes Sorensen > Date: 11 Jun 2003 18:21:09 -0400 > > >>>>> "Stephen" == Stephen Hemminger writes: > > Stephen> Updated acenic driver to use alloc_etherdev to hold private > Stephen> data structure. Uses register_netdev() to get the name right > Stephen> this time ;-) > > Please provide a compat macro for 2.4.18 and younger as well. > >How actively are you maintaining acenic. Jes? :-) This is >a very serious question, I haven't seen a 2.5.x change go back >to 2.4.x since it's inception. > >All this compat nonsense is becoming useless. Other drivers >fair just fine 2.4.x/2.5.x without all this ifdef mumbo-jumbo >that litters acenic.c and makes it nearly impossible to read. > >In fact all these localized compat macros make acenic.c HARDER >to maintain. > > The funny thing is this alloc_etherdev patch did not change the compatiablity one bit. Just for grins, took the 2.5 driver back into the 2.4.18 and it doesn't build. The problem is it doesn't know what irqreturn_t is. The enclosed cribbed from atm/he.c fixes that, but it still redefines local_irq_save etc. Maybe it is time to stop the insanity. diff -Nru a/drivers/net/acenic.c b/drivers/net/acenic.c --- a/drivers/net/acenic.c Wed Jun 11 22:36:43 2003 +++ b/drivers/net/acenic.c Wed Jun 11 22:36:43 2003 @@ -188,6 +188,13 @@ #define ACE_MOD_DEC_USE_COUNT do{} while(0) #endif +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,69) +typedef void irqreturn_t; +#define IRQ_NONE +#define IRQ_HANDLED +#define IRQ_RETVAL(x) +#endif + #ifndef SET_NETDEV_DEV #define SET_NETDEV_DEV(net, pdev) do{} while(0) #endif From davem@redhat.com Wed Jun 11 23:10:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:10:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6AI2x021352 for ; Wed, 11 Jun 2003 23:10:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29230; Wed, 11 Jun 2003 22:43:05 -0700 Date: Wed, 11 Jun 2003 22:43:05 -0700 (PDT) Message-Id: <20030611.224305.68068281.davem@redhat.com> To: shemminger@osdl.org Cc: jes@wildopensource.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev From: "David S. Miller" In-Reply-To: <3EE81263.4040205@osdl.org> References: <20030611.210445.21901735.davem@redhat.com> <3EE81263.4040205@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3169 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Wed, 11 Jun 2003 22:40:51 -0700 The problem is it doesn't know what irqreturn_t is. The enclosed cribbed from atm/he.c fixes that, but it still redefines local_irq_save etc. Such compat macros belong in include/linux/interrupt.h and someone needs to merge that to Marcelo. From davem@redhat.com Wed Jun 11 23:26:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:27:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6Qt2x022168 for ; Wed, 11 Jun 2003 23:26:59 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29268; Wed, 11 Jun 2003 23:22:52 -0700 Date: Wed, 11 Jun 2003 23:22:52 -0700 (PDT) Message-Id: <20030611.232252.68134856.davem@redhat.com> To: krkumar@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Panic in ipv6_add_dev From: "David S. Miller" In-Reply-To: <3EE52C92.4060509@us.ibm.com> References: <3EE52C92.4060509@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3170 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Krishna Kumar Date: Mon, 09 Jun 2003 17:55:46 -0700 We need to initialize sysctl_table to NULL in neigh_parms_alloc so that a release can be called safely at any time. Patch applied, thanks. From greearb@candelatech.com Wed Jun 11 23:32:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:33:06 -0700 (PDT) Received: from grok.yi.org (IDENT:CgDU1qPzRQAeHLEhC3pFdLKGo7sxAbUF@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6Wt2x022702 for ; Wed, 11 Jun 2003 23:32:56 -0700 Received: from candelatech.com (IDENT:hhc5eBTtuB6qacKEP2y+uK0ZTlBLR9Q4@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5C6Wfc01649; Wed, 11 Jun 2003 23:32:41 -0700 Message-ID: <3EE81E89.1040004@candelatech.com> Date: Wed, 11 Jun 2003 23:32:41 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: ak@suse.de, hadi@shell.cyberus.ca, netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress References: <20030610.203325.41658167.davem@redhat.com> <20030611065255.L39678@shell.cyberus.ca> <20030611120803.GB22720@wotan.suse.de> <20030611.203015.104061804.davem@redhat.com> In-Reply-To: <20030611.203015.104061804.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3171 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Andi Kleen > Date: Wed, 11 Jun 2003 14:08:03 +0200 > > Another way is to just store jiffies (= 10 or 1ms accuracy) > > This should be nearly zero cost and accurate enough at least for TCP. > > TCP doesn't use it Andi. SO_RECVSTAMP etc. uses it and that > MUST be accurate. > > People, start approaching this from an actually implementable > angle, not one's that have no basis in reality :) I think we need a generic method to get something like the TSC..ie very fast, very precise. Then, we need a way to turn this into the time-of-day. After that, we can calculate time-of-day in a lazy manner. Something like: /* In driver or as early as possible */ skb->rx_stamp = getCurTSC(); skb->flags |= (RX_STAMP_IS_NOT_YET_CONVERTED); .... /* somebody wants to know what time of day rx-stamp was */ if (skb->flags & (RX_STAMP_IS_NOT_YET_CONVERTED)) { skb->rx_stamp = do_gettimeofday() - ((getCurTSC() - skb->rx_stamp) * (magic-conversion-to-timeval-units)); skb->flags &= ~(RX_STAMP_IS_NOT_YET_CONVERTED); } /* rx_stamp is now relative to time-of-day */ But, Dave mentioned TSC is not always good to use, and it won't work at all on older cpus, so the getCurTSC() thing probably needs to be a macro... Seems like this macro would be useful in lots of places...pktgen for instance :) Ben > > > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Wed Jun 11 23:34:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:34:58 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6Yr2x023085 for ; Wed, 11 Jun 2003 23:34:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29279; Wed, 11 Jun 2003 23:31:22 -0700 Date: Wed, 11 Jun 2003 23:31:22 -0700 (PDT) Message-Id: <20030611.233122.27808829.davem@redhat.com> To: slblake@petri-meat.com Cc: fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <1055214346.1199.65.camel@photon> References: <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> <1055214346.1199.65.camel@photon> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3172 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Steven Blake Date: 09 Jun 2003 23:05:47 -0400 http://www.petri-meat.com/slblake/networking/refs/lpm_pkt-class/ Interesting link, thanks for mentioning it. IMHO, the best LPM algorithm (in terms of balancing lookup speed vs. memory consumption vs. update rate) is CRT, described in the first paper [ASIK]. It is patented, but there is hope that it might get released under GPL in the near future. It would be nice if this actually was a "paper", but it's a patent entry, such things are always so cryptic. Is there a real paper on this scheme? From davem@redhat.com Wed Jun 11 23:38:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:39:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6cx2x023528 for ; Wed, 11 Jun 2003 23:38:59 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29293; Wed, 11 Jun 2003 23:35:29 -0700 Date: Wed, 11 Jun 2003 23:35:28 -0700 (PDT) Message-Id: <20030611.233528.26511069.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, krkumar@us.ibm.com Subject: Re: [PATCH] Panic in ipv6_add_dev From: "David S. Miller" In-Reply-To: <20030610.135601.20565349.yoshfuji@linux-ipv6.org> References: <3EE52C92.4060509@us.ibm.com> <20030610.135601.20565349.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3173 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Tue, 10 Jun 2003 13:56:01 +0900 (JST) Well, it is also the problem that the tasks of neigh_parms_alloc() / neigh_sysctl_register() and neigh_parms_release() / neigh_sysctl_unregister() were not symmetric. ... Here's the fix. Patch applied, thanks. From davem@redhat.com Wed Jun 11 23:49:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 11 Jun 2003 23:49:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C6nH2x024127 for ; Wed, 11 Jun 2003 23:49:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29329; Wed, 11 Jun 2003 23:45:35 -0700 Date: Wed, 11 Jun 2003 23:45:34 -0700 (PDT) Message-Id: <20030611.234534.52193216.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16102.9418.43884.336925@robur.slu.se> References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3174 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Tue, 10 Jun 2003 20:34:50 +0200 I ripped out the route hash just to test the slow path. I want to point out an error in such simulations. It doesn't eliminate some of the most expensive part of the routing cache, the 'dst' management. All of that still happens even after your patch. A better simulation of a "pure slowpath" would be to move the DST entry into the fib entries themselves. That is a lot more work, but it would validate the various ideas and claims being made. For example, it would say for sure whether eliminating the routing cache is a win or not for DoS traffic. From davem@redhat.com Thu Jun 12 00:01:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 00:01:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C71r2x025187 for ; Thu, 12 Jun 2003 00:01:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29352; Wed, 11 Jun 2003 23:58:24 -0700 Date: Wed, 11 Jun 2003 23:58:23 -0700 (PDT) Message-Id: <20030611.235823.91336972.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] net-sysfs parent ref count From: "David S. Miller" In-Reply-To: <20030610133508.3d0bfffc.shemminger@osdl.org> References: <20030610133508.3d0bfffc.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3175 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 10 Jun 2003 13:35:08 -0700 When creating network sysfs entries, we grab an extra reference to the parent. Not a big deal now, since it just gets blown away on unregister anyway, but when kobject reference counts are used for release, things break. Applied, thanks. From davem@redhat.com Thu Jun 12 00:04:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 00:04:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C7452x025667 for ; Thu, 12 Jun 2003 00:04:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA29360; Thu, 12 Jun 2003 00:00:33 -0700 Date: Thu, 12 Jun 2003 00:00:33 -0700 (PDT) Message-Id: <20030612.000033.45899350.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] Cleanup net-sysfs show and change functions From: "David S. Miller" In-Reply-To: <20030610133854.42713231.shemminger@osdl.org> References: <20030610133854.42713231.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3176 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 10 Jun 2003 13:38:54 -0700 This cleans up the network sysfs code to use helper functions to unify the show/change functions, by using common code in functions rather than template macros. The function always checks for dead devices, so I/O will fail. Applied, thanks. From davem@redhat.com Thu Jun 12 00:41:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 00:41:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C7fW2x027121 for ; Thu, 12 Jun 2003 00:41:32 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA29439; Thu, 12 Jun 2003 00:38:01 -0700 Date: Thu, 12 Jun 2003 00:38:00 -0700 (PDT) Message-Id: <20030612.003800.71577627.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, krkumar@us.ibm.com Subject: Re: [PATCH/RFC] IPV6: Remember Manage/OtherConfig flags From: "David S. Miller" In-Reply-To: <20030611.162849.52863261.yoshfuji@linux-ipv6.org> References: <20030611.162849.52863261.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3177 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Wed, 11 Jun 2003 16:28:49 +0900 (JST) Managed flag and OtherConfig flag are maintained on a per-interface basis (RFC2462 5.2). So, let's store them in inet6_dev{}. Let's put this in when something actually uses it ok? From davem@redhat.com Thu Jun 12 00:58:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 00:58:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C7wS2x027928 for ; Thu, 12 Jun 2003 00:58:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA29468; Thu, 12 Jun 2003 00:54:27 -0700 Date: Thu, 12 Jun 2003 00:54:26 -0700 (PDT) Message-Id: <20030612.005426.118603842.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, pekkas@netcore.fi Subject: Re: [PATCH] IPV6: eliminating magic number for sizeof(struct frag_hdr) From: "David S. Miller" In-Reply-To: <20030612.023919.126807101.yoshfuji@linux-ipv6.org> References: <20030612.022753.56899094.yoshfuji@linux-ipv6.org> <20030612.023919.126807101.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3178 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Thu, 12 Jun 2003 02:39:19 +0900 (JST) In article <20030612.022753.56899094.yoshfuji@linux-ipv6.org> (at Thu, 12 Jun 2003 02:27:53 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > s/8/sizeof(struct frag_hdr)/ ? > > Yes, sizeof(struct frag_hdr). > I, however, use 8 for now to focus on the bug itself. > (We have more "8"s there which should be substituted.) s/8/sizeof(struct frag_hdr)/; please apply this on top of the original patch. I've applied both patches, thanks. From davem@redhat.com Thu Jun 12 01:15:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 01:15:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C8F32x028814 for ; Thu, 12 Jun 2003 01:15:03 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA29492; Thu, 12 Jun 2003 01:11:28 -0700 Date: Thu, 12 Jun 2003 01:11:27 -0700 (PDT) Message-Id: <20030612.011127.55861847.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev From: "David S. Miller" In-Reply-To: <20030609115857.38bb31d6.shemminger@osdl.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3179 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Stephen, I applied all of your alloc_netdev() changes. Even the TUN one, except I added the necessary dev_alloc_name() call right before register_netdevice(). From davem@redhat.com Thu Jun 12 01:24:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 01:24:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C8OV2x029408 for ; Thu, 12 Jun 2003 01:24:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA29514; Thu, 12 Jun 2003 01:21:00 -0700 Date: Thu, 12 Jun 2003 01:21:00 -0700 (PDT) Message-Id: <20030612.012100.83594413.davem@redhat.com> To: toml@us.ibm.com Cc: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: <1055352036.2610.42.camel@tomlt2.tomloffice.austin.ibm.com> References: <1055352036.2610.42.camel@tomlt2.tomloffice.austin.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3180 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Tom Lendacky Date: 11 Jun 2003 12:20:33 -0500 As for the bug though, it appears that the "x->u.rt.fl = *fl" statement shouldn't be performed in the IPv6 __xfrm6_bundle_create function. I have a better suggestion for fix: 1) Delete the "x->u.rt.fl = *fl;" line completely. 2) Fix the test in __xfrm6_find_bundle() to do a proper prefix-mask based address comparison. rt6->rt6i_{dst,src} are masked addresses, so direct comparison is wrong. Can someone code this up? Thanks. From lpetande@tml.hut.fi Thu Jun 12 01:37:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 01:37:37 -0700 (PDT) Received: from smtp-3.hut.fi (root@smtp-3.hut.fi [130.233.228.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C8bO2x030047 for ; Thu, 12 Jun 2003 01:37:25 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-3.hut.fi (8.12.9/8.12.9) with ESMTP id h5C8ajn6011333; Thu, 12 Jun 2003 11:36:45 +0300 Message-ID: <3EE83D81.5030605@tml.hut.fi> Date: Thu, 12 Jun 2003 11:44:49 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <3EE5F85E.9080006@tml.hut.fi> <20030610.095135.28806569.davem@redhat.com> <3EE6ECD3.6050103@tml.hut.fi> <20030611.202003.74721468.davem@redhat.com> In-Reply-To: <20030611.202003.74721468.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-3.hut.fi) X-DCC-HUTCC-Metrics: smtp-3.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 3181 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev David S. Miller wrote: > From: Henrik Petander > Date: Wed, 11 Jun 2003 11:48:19 +0300 > > Does this make sense to you? > > No it doesn't. When you startup zebra, it may flush the entire > routing table. I don't see a problem in that. It would only result in a short period of missing mipv6 route optimization information, until MIPv6 daemon reinserted the mipv6 information. MIPv6 daemon would do this after getting a notification of the deletion of the old mipv6 related cached routes. This would relate to zebra in the same way as pmtu discovery. Regards, Henrik From davem@redhat.com Thu Jun 12 01:49:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 01:49:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C8nj2x030684 for ; Thu, 12 Jun 2003 01:49:45 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA29741; Thu, 12 Jun 2003 01:46:11 -0700 Date: Thu, 12 Jun 2003 01:46:11 -0700 (PDT) Message-Id: <20030612.014611.32747925.davem@redhat.com> To: greearb@candelatech.com Cc: ak@suse.de, hadi@shell.cyberus.ca, netdev@oss.sgi.com Subject: Re: gettime: Was (Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <3EE81E89.1040004@candelatech.com> References: <20030611120803.GB22720@wotan.suse.de> <20030611.203015.104061804.davem@redhat.com> <3EE81E89.1040004@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3182 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Wed, 11 Jun 2003 23:32:41 -0700 skb->rx_stamp = getCurTSC(); Thanks for mentioned this idea for the 10th time in the past 2 days :-) From davem@redhat.com Thu Jun 12 01:54:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 01:54:54 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5C8sp2x031141 for ; Thu, 12 Jun 2003 01:54:51 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA29767; Thu, 12 Jun 2003 01:49:40 -0700 Date: Thu, 12 Jun 2003 01:49:39 -0700 (PDT) Message-Id: <20030612.014939.56032563.davem@redhat.com> To: lpetande@tml.hut.fi Cc: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <3EE83D81.5030605@tml.hut.fi> References: <3EE6ECD3.6050103@tml.hut.fi> <20030611.202003.74721468.davem@redhat.com> <3EE83D81.5030605@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3183 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Thu, 12 Jun 2003 11:44:49 +0300 MIPv6 daemon would do this after getting a notification of the deletion of the old mipv6 related cached routes. Ok. From Robert.Olsson@data.slu.se Thu Jun 12 06:57:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 06:57:44 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CDvW2x012143 for ; Thu, 12 Jun 2003 06:57:34 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id PAA11003; Thu, 12 Jun 2003 15:56:47 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16104.34463.60472.750699@robur.slu.se> Date: Thu, 12 Jun 2003 15:56:47 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030611.234534.52193216.davem@redhat.com> References: <20030610.103234.116374169.davem@redhat.com> <16102.9418.43884.336925@robur.slu.se> <20030611.234534.52193216.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3184 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > I want to point out an error in such simulations. > > It doesn't eliminate some of the most expensive part of the routing > cache, the 'dst' management. All of that still happens even after > your patch. > > A better simulation of a "pure slowpath" would be to move the DST > entry into the fib entries themselves. > > That is a lot more work, but it would validate the various ideas and > claims being made. For example, it would say for sure whether > eliminating the routing cache is a win or not for DoS traffic. Well it's true. But do we need too? From the profile we actually see the 'dst' management cost. Not much I would say and still we will need some adminstration i.e refcounting even it we should remove the route hash. c023c038 107340 33.143 fn_hash_lookup c013154c 17399 5.37223 free_block c0211364 16502 5.09527 __rt_hash_shrink c01316e4 12854 3.96889 kmem_cache_alloc c01b86dc 11719 3.61844 e1000_clean_rx_irq c02033a0 11557 3.56842 alloc_skb c0212330 11378 3.51315 ip_route_input_slow c020cc98 9765 3.01511 eth_type_trans c0208860 7986 2.46581 dst_alloc c0216d98 7733 2.38769 ip_output c021200c 6940 2.14284 rt_set_nexthop c0213a9c 6331 1.9548 dst_free c0126998 6272 1.93659 rcu_do_batch c02035cc 6164 1.90324 skb_release_data c02036c4 6068 1.8736 __kfree_skb c01b8558 5532 1.7081 e1000_clean_tx_irq c01b7678 4970 1.53457 e1000_xmit_frame From what I understand now removing the route hash is not a good idea. It's seems we can control the hash pretty well and this even under very extreme conditions. I think that people who is suggesting this thinks that we can achieve same performance without it. I don't think we can. So question is should we tune routing to do 120 kpps regardless of input or have a performance span of 112-420 kpps (numbers from my tests). Where we most of the time are close to the higher limit? Anyway fib_lookup seems to be something to look into regardless of this question. Cheers. --ro From jes@trained-monkey.org Thu Jun 12 10:01:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 10:01:31 -0700 (PDT) Received: from trained-monkey.org (trained-monkey.org [209.217.122.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CH1M2x025389 for ; Thu, 12 Jun 2003 10:01:23 -0700 Received: from trained-monkey.org (trained-monkey.org [127.0.0.1]) by trained-monkey.org (8.12.9/8.12.8) with ESMTP id h5CH1PTU006541; Thu, 12 Jun 2003 13:01:25 -0400 Received: (from jes@localhost) by trained-monkey.org (8.12.9/8.12.9/Submit) id h5CH1Ode006537; Thu, 12 Jun 2003 13:01:24 -0400 To: "David S. Miller" Cc: shemminger@osdl.org, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev References: <20030611144249.7cd63c1c.shemminger@osdl.org> <20030611.210445.21901735.davem@redhat.com> From: Jes Sorensen Date: 12 Jun 2003 13:01:23 -0400 In-Reply-To: <20030611.210445.21901735.davem@redhat.com> Message-ID: Lines: 41 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3185 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jes@trained-monkey.org Precedence: bulk X-list: netdev >>>>> "David" == David S Miller writes: David> How actively are you maintaining acenic. Jes? :-) This is a David> very serious question, I haven't seen a 2.5.x change go back to David> 2.4.x since it's inception. Hi David, I clearly haven't been doing the job on acenic recently as I would have liked nor on 2.5 for that sake, so I don't see all the changes that just go into 2.5. David> All this compat nonsense is becoming useless. Other drivers David> fair just fine 2.4.x/2.5.x without all this ifdef mumbo-jumbo David> that litters acenic.c and makes it nearly impossible to read. Depends on how you look at it. First of all the primary goal of the macros is not to make it easier to integrate the driver with the latest state of the art 2.4.x kernel from Marcelo, but rather to make it possible for people to take the driver and drop it into an earlier kernel they are running and only upgrade the driver. There have been quite a lot of acenic users in the past who were not willing to upgrade their kernels for various reasons and who relied on this. Putting the compat macros into include/linux/interrupt.h in Marcelo's tree as you suggested in a later email won't solve this specific problem. Nowadays it's probably reasonable to assume that the majority of users are at 2.4.17+ so I think it's valid to go in and get rid of some of the compat macros that are there to support kernels older than that. David> In fact all these localized compat macros make acenic.c HARDER David> to maintain. I think we will just have to agree to disagree on this. I find it a lot easier to read the code when it uses the 2.5 syntax and provides 2.4 compat macros than it is to have a ton of #ifdef's throughout the code itself. Cheers, Jes From jes@trained-monkey.org Thu Jun 12 10:05:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 10:05:19 -0700 (PDT) Received: from trained-monkey.org (trained-monkey.org [209.217.122.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CH5B2x025835 for ; Thu, 12 Jun 2003 10:05:11 -0700 Received: from trained-monkey.org (trained-monkey.org [127.0.0.1]) by trained-monkey.org (8.12.9/8.12.8) with ESMTP id h5CH5CTU006556; Thu, 12 Jun 2003 13:05:12 -0400 Received: (from jes@localhost) by trained-monkey.org (8.12.9/8.12.9/Submit) id h5CH5B09006552; Thu, 12 Jun 2003 13:05:11 -0400 To: Stephen Hemminger Cc: "David S. Miller" , jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] acenic -- update to use alloc_etherdev References: <20030611144249.7cd63c1c.shemminger@osdl.org> <20030611.210445.21901735.davem@redhat.com> <3EE81263.4040205@osdl.org> From: Jes Sorensen Date: 12 Jun 2003 13:05:11 -0400 In-Reply-To: <3EE81263.4040205@osdl.org> Message-ID: Lines: 24 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3186 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jes@trained-monkey.org Precedence: bulk X-list: netdev >>>>> "Stephen" == Stephen Hemminger writes: Stephen> The funny thing is this alloc_etherdev patch did not change Stephen> the compatiablity one bit. Just for grins, took the 2.5 Stephen> driver back into the 2.4.18 and it doesn't build. The Stephen> problem is it doesn't know what irqreturn_t is. The enclosed Stephen> cribbed from atm/he.c fixes that, but it still redefines Stephen> local_irq_save etc. Hi Stephen, I went back and looked at the comment in 2.4 for when alloc_etherdev was introduced, but clearly I got the mapping of this to the 2.4.x release dates wrong. That was my bad. As for local_irq_save() those patches will still be needed if they are not present in 2.4.17 (I think this is probably a reasonable cut-off release), if it's in 2.4.17+ I'll agree we can pull it. But the irqreturn_t compat stuff still needs to go in since it's clearly not going to be present in 2.4.18 etc. Cheers, Jes From garzik@gtf.org Thu Jun 12 12:49:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 12:49:49 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CJnT2x032549 for ; Thu, 12 Jun 2003 12:49:30 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id D5151664F; Thu, 12 Jun 2003 15:49:26 -0400 (EDT) Date: Thu, 12 Jun 2003 15:49:26 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [PATCHES] 2.4.x net driver updates Message-ID: <20030612194926.GA7653@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 3187 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev BK users may issue a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 Others may download the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-rc8-netdrvr2.patch.bz2 This will update the following files: drivers/net/bonding.c | 3434 ------------------------- Documentation/Configure.help | 9 Documentation/networking/bonding.txt | 537 ++- Documentation/networking/ifenslave.c | 496 ++- drivers/net/8139cp.c | 9 drivers/net/8139too.c | 6 drivers/net/Config.in | 3 drivers/net/Makefile | 8 drivers/net/amd8111e.c | 1075 +++++-- drivers/net/amd8111e.h | 968 +++---- drivers/net/arcnet/arcnet.c | 2 drivers/net/arcnet/rfc1201.c | 6 drivers/net/bonding.c | 266 + drivers/net/bonding/Makefile | 18 drivers/net/bonding/bond_3ad.c | 2667 ++++++++++++++++++- drivers/net/bonding/bond_3ad.h | 342 ++ drivers/net/bonding/bond_alb.c | 1585 +++++++++++ drivers/net/bonding/bond_alb.h | 129 drivers/net/bonding/bond_main.c | 4795 ++++++++++++++++++++++++++++++++--- drivers/net/bonding/bonding.h | 209 + drivers/net/dl2k.h | 1 drivers/net/e100/e100.h | 30 drivers/net/e100/e100_main.c | 389 ++ drivers/net/e100/e100_phy.c | 7 drivers/net/e100/e100_test.c | 155 - drivers/net/e1000/Makefile | 2 drivers/net/e1000/e1000.h | 8 drivers/net/e1000/e1000_ethtool.c | 959 ++++++- drivers/net/e1000/e1000_hw.c | 23 drivers/net/e1000/e1000_hw.h | 8 drivers/net/e1000/e1000_main.c | 312 +- drivers/net/e1000/e1000_osdep.h | 2 drivers/net/eepro.c | 2 drivers/net/ns83820.c | 2 drivers/net/pci-skeleton.c | 4 drivers/net/pcnet32.c | 7 drivers/net/r8169.c | 52 drivers/net/sk98lin/skge.c | 2 drivers/net/sundance.c | 144 - drivers/net/tg3.c | 2 drivers/net/tlan.c | 258 + drivers/net/tlan.h | 7 drivers/net/tokenring/olympic.c | 3 drivers/net/tulip/tulip_core.c | 7 drivers/net/typhoon.c | 4 drivers/net/via-rhine.c | 2 drivers/net/wireless/airo.c | 2 include/linux/ethtool.h | 27 include/linux/if_arcnet.h | 4 include/linux/if_bonding.h | 101 include/linux/if_vlan.h | 1 include/linux/skbuff.h | 4 include/net/if_inet6.h | 5 include/net/irda/irlan_common.h | 2 net/core/dev.c | 4 net/core/skbuff.c | 3 net/ipv6/addrconf.c | 13 net/ipv6/ndisc.c | 3 net/irda/irlan/irlan_eth.c | 6 59 files changed, 13293 insertions(+), 5838 deletions(-) through these ChangeSets: (03/06/08 1.1226) [netdrvr amd8111e] bug fix: move stats update after irq free (03/06/08 1.1225) [e1000] Whitespace cleanup * Whitespace cleanup (03/06/08 1.1224) [e1000] Miscellaneous code cleanup * Added Change Log entries * Miscellaneous code cleanup (03/06/08 1.1223) [e1000] Fixed LED coloring on 82541/82547 controllers * LED colors on 82541 and 82547 controllers were incorrect. The LED mode register didn't have the proper configuration. (03/06/08 1.1222) [e1000] Removed strong branded device ids * Removed strong branded device ids from teh device id table along with the associated branding strings. (03/06/08 1.1221) [e1000] Added support for 82546 Quad-port adapter * Added support for 82546 Quad-port adapter (03/06/08 1.1220) [e1000] Added ethtool test ioctl * Added routines for the Ethtool Test ioctl. * Added more statistics for the Ethtool statistics dump. * Added more registers for the register dump. (03/06/08 1.1219) [e1000] TSO fix * Premature write-back of descriptors during TSO causing resources to be returned too early on ppc64. Fix is to wait until last descriptor of frame is written back, then return resources back to OS. * 82544 hang caused by setting RS bit in context descriptor. Exposes known hang in 82544. Fix is same as above - set RS bit only in last descriptor. (03/06/08 1.1218) [e100] misc * Removed leftovers from removal of /proc support and IDIAG support * Cleaned up reporting of h/w init failure messages * Add 1/2 second delay after PHY reset to allow link partner to see and respond to reset, per IEEE 802.3. (03/06/08 1.1217) [e100] set netdev members before registration * Bug fix: setndev members before netdev registration to avoid races. (03/06/08 1.1216) [e100] use skb_headlen() rather than rolling own. * Cleanup: use skb_headlen() rather than rolling own. Sync w/ 2.5 driver. (03/06/08 1.1215) [e100] VLAN configuration was lost after ethtool diags run * Bug fix: ethtool diags would call e100_up/e100_down, which overwrite current VLAN settings. Move initialization of config regs out of up/down. (03/06/08 1.1214) [e100] fixed stalled stats collection * Bug fix: In the rare event of a failed command to dump stats, stat collection would stop, giving the illusion that traffic had stopped. Fixed by issuing stat dump in watchdog regardless of the status of previous attempt to dump stats. (03/06/08 1.1213) [e100] full stop/start on ethtool set speed/duplex/autoneg * Cleanup ethtool/mii_ioctl sets of speed/duplex/autoneg by stop/set/start driver to ensure sets stick. Must hold xmit_lock around stop/start. (03/06/08 1.1212) [e100] cleanup Tx resources before running ethtool diags * Bug fix: clean up Tx resources before runnig ethtool diags. (03/06/08 1.1211) [e100] Add MDI/MDI-X status to ethtool reg dump * Add MDI/MDI-X (crossover cable) status to ethtool reg dump. (03/06/08 1.1210) [e100] Add ethtool cable diag test * Feature add: ethtool cable diag test. * Some cleanup of the ethtool diags. * Fixed bug in return code for ethtool diag results. (03/06/08 1.1209) [e100] Add ethtool parameter support * Feature add: ethtool parameter support: Tx/Rx ring size, Rx xsum offloading, flow control. (03/06/08 1.1208) [e100] move e100_asf_enable under CONFIG_PM to avoid warning * Bug fix: move e100_asf_enable under CONFIG_PM to avoid compile warning. [Stephen Rothwell (sfr@canb.auug.org.ua)] (03/06/08 1.1207) [e100] Remove "Freeing alive device" warning * Bug fix: don't call any netif_carrier_* until netdev is registered. [Andrew Morton (akpm@dideo.com)] (03/06/06 1.1205) [PATCH] Bonding 2.4 update patch 6 Fix to the ifenslave -c fix, fix to version control (plus change log update). I've got an additional fix for version control that I'll send you on Monday. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1204) [PATCH] Bonding 2.4 update patch 5 Fix to prevent routes on the bonding device from being lost during enslavement processing. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1203) [PATCH] Bonding 2.4 update patch 4 A fix for ifenslave -c. Later patches have fixes for this fix. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1202) [PATCH] Bonding 2.4 update patch 3 A patch with some miscellaneous little stuff (comments, mode names, fix a printk). Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1201) [PATCH] Bonding 2.4 update patch 2 Small patch to fix endless failover problem in the ARP monitor. Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1200) [PATCH] Bonding 2.4 update patch 1 Documentation. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/bonding.txt (03/06/06 1.1199) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1198) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/05 1.1197) [netdrvr amd8111e] link against mii lib (03/06/04 1.1196) [netdrvr] gcc 3.3 cleanups Mostly marking 64-bit constants as ULL. (03/05/29 1.1185.1.52) [netdrvr amd8111e] remove out-of-tree feature that snuck in (03/05/29 1.1185.1.51) [netdrvr amd8111e] interrupt coalescing, libmii, bug fixes * Dynamic interrupt coalescing * mii lib support * dynamic IPG support (disabled by default) * jumbo frame fix * vlan fix * rx irq coalescing fix (03/05/29 1.1185.1.50) [netdrvr tlan] fix 64-bit issues (03/05/29 1.1185.1.49) [netdrvr r8169] sync with 2.5 (backport whitespace cleanups) (03/05/29 1.1185.1.48) [netdrvr r8169] use alloc_etherdev (fix race), pci_disable_device (03/05/29 1.1185.1.47) [netdrvr olympic] fix build with gcc 3.3 (03/05/29 1.1185.6.3) [netdrvr 8139too] add comment, whitespace cleanup (03/05/28 1.1185.6.2) [netdrvr] s/init_etherdev/alloc_etherdev/ in code comments, in 8139too and pci-skeleton drivers. (03/05/28 1.1185.6.1) [netdrvr tlan] backport fixes and cleanups from 2.5 * alloc_etherdev (fixes race) * PCI DMA API * C99 initializers * speling fixes * use pci_{request,release}_regions for PCI devices * propagate error returns back from pci_xxx functions * call pci_set_dma_mask * use keventd for adapter error reset (2.5 uses workqueue) (03/05/27 1.1185.1.45) [netdrvr pcnet32] bug fixes I would like to see a couple of the pcnet32 changes that I think we can agree on be put into the trees so a couple of the potential defects can be avoided. The following patch contains just these pieces. The only controversial one is an arbitrary change in the number of iterations in a while loop spinning on hardware state. No matter how this is done, I am not especially fond of this bit of code as it has no reasonable error recovery path -- however, as a half-way, incremental solution, increasing the polling time should help as the 100 value was certainly found to be insufficient. 1000 may not be sufficient either, but it is certainly no worse. Both of the other changes were hit in testing (and I belive the wmb() at a customer even), so it would help reduce some debug if these go in. Any feedback is appreciated - thanks. (03/05/27 1.1185.1.44) [netdrvr eepro] update MODULE_AUTHOR per old-author request (03/05/27 1.1185.1.43) [netdrvr sundance] fix another flow control bug (03/05/27 1.1185.1.42) [netdrvr sundance] fix flow control bug (03/05/27 1.1185.1.41) [netdrvr bonding] fix ABI version control problem This fix makes bonding not commit to a specific ABI version if the ioctl command is not supported by bonding. (It also removes the '\n' in the continuous printk reporting the link down event in bond_mii_monitor - it got in there by mistake in our previous patch set and caused log messages to appear funny in some situations). (03/05/27 1.1185.1.40) [netdrvr bonding] fix long failover in 802.3ad mode This patch fixes the bug reported by Jay on April 3rd regarding long failover time when releasing the last slave in the active aggregator. The fix, as suggested by Jay, is to follow the spec recommendation and send a LACPDU to the partner saying this port is no longer aggregatable and therefore trigger an immediate re-selection of a new aggregator instead of waiting the entire expiration timeout. (03/05/25 1.1185.1.39) IPv6 over ARCnet (RFC2497) support, IPv6 part. (03/05/25 1.1185.1.38) IPv6 over ARCnet (RFC2497) support, driver part (03/05/25 1.1185.1.37) [irda] module refcounts for irlan (03/05/23 1.1185.3.7) [bonding] small cleanups (03/05/23 1.1185.3.6) [bonding] add rcv load balancing mode This patch adds a new mode that enables receive load balancing for IPv4 traffic on top of the transmit load balancing mode. This capability is achieved by intercepting and manipulating the ARP negotiation to teach clients several MAC addresses for the bond and thus distribute incoming traffic among all slaves with the highest link speed. In order to function properly, slaves are required to be able to have their MAC address set even while the interface is up since once the primary slave looses its link, the new primary slave (and only it) must be able to take over and receive the incoming traffic instead. If a non-primary slave looses its link, ARP packets will be sent to all clients communicating through it in order to teach them a replacement MAC address, and the primary slave will be put in promiscuous mode for 10 seconds for fault tolerance reasons. This patch is against bonding-20030415, but must come only after the locking scheme changing patch since it uses dev_set_promiscuity() that would otherwise cause a system hang. (03/05/23 1.1185.3.5) [bonding] support xmit load balancing mode (03/05/23 1.1185.3.4) [bonding] much improved locking This patch replaces the use of lock_irqsave/unlock_irqrestore in bonding with lock/unlock or lock_bh/unlock_bh as appropriate according to context. This change is based on a previous discussion regarding the fact that holding a lock_irqsave doesn't prevent softirqs from running which can cause deadlocks in certain situations. This new locking scheme has already undergone massive testing cycle by our QA group and we feel it is ready for release (some new modes and enhancements will not work properly without it). (03/05/23 1.1185.3.3) [bonding] better 802.3ad mode control, some cleanup This patch adds the lacp_rate module param to enable better control over the IEEE 802.3ad mode. This param controls the rate at which the partner system is asked to send LACPDUs to bonding. Two options exist: - slow (or 0) - LACPDUs are 30 seconds apart - fast (or 1) - LACPDUs are 1 second apart The default is slow (like most switches around). There are also some code beautifications (mainly converting comments to C style in code segments we added in the past). (03/05/23 1.1185.3.2) [bonding] ABI versioning This patch adds user-land to kernel ABI version control in bonding to restore backward compatibility between different versions of ifenslave and the bonding module. It uses ethtool's GDRVINFO ioctl to pass the ABI version number between ifenslave and the bonding module in both directions so both the driver and the application can tell which partner they're working against and take the appropriate measures when enslaving/releasing an interface. The bonding module remembers the ABI version received from the application, and from that moment on will deny enslave and release commands from an application using a different ABI version, which means that if you want to switch to an ifenslave with a different ABI version (or with non at all), you'll first have to re-load the bonding module. This patch also changes the driver/application versioning scheme to contain 3 fields X.Y.Z with the follows meaning: X - Major version - big behavior changes Y - Minor version - addition of features Z - Extra version - minor changes and bug fixes There are also three minor bug fixes: 1. Prevent enslaving an interface that is already a slave. 2. Prevent enslaving if the bond is down. 3. In bond_release_all, save old value of current_slave before assigning NULL to it to enable using it's original value later on. This patch is against bonding-20030415. (03/04/27 1.1137.1.6) [netdrvr e1000] add TSO support -- disabled * Copy TSO support for 2.5 e1000. Wrapped with NETIF_F_TSO, so not currently enabled in 2.4. Done to keep 2.4 and 2.5 drivers in-sync as much as possible. (03/04/27 1.1137.1.5) [netdrvr e1000] add support for NAPI * Copy NAPI support from 2.5 e1000 driver * Add CONFIG_E1000_NAPI option (03/04/27 1.1137.1.4) [netdrvr tulip] support DM910x chip from ALi (03/04/27 1.1137.1.3) Remove duplicate CONFIG_TULIP_MWI entry in Configure.help Noticed by Geert Uytterhoeven (03/04/27 1.1137.1.2) [netdrvr 8139cp] enable MWI via pci_set_mwi, rather than manually (03/04/26 1.1131.2.6) [netdrvr typhoon] s/#if/#ifdef/ for a CONFIG_ var (03/04/25 1.1131.2.5) [netdrvr sundance] small cleanups from 2.5 - s/long flag/unsigned long flag/ - C99 initializers (03/04/25 1.1131.2.4) [netdrvr sundance] bug fixes, VLAN support - Fix tx bugs in big-endian machines - Remove unused max_interrupt_work module parameter, the new NAPI-like rx scheme doesn't need it. - Remove redundancy get_stats() in intr_handler(), those I/O access could affect performance in ARM-based system - Add Linux software VLAN support - Fix bug of custom mac address (StationAddr register only accept word write) (03/04/25 1.1131.2.3) [netdrvr via-rhine] fix promisc mode I found a via-rhine bug, it can't receive BPDU (mac: 0180c2000000) in promiscuous mode. Fill all "1" in hash table to fix this problem in promiscuous mode. (RCR remain 0x1c, write it as 0x1f don't work) (03/04/25 1.1131.2.2) [wireless airo] fix end-of-array test FYI statsLabels[] is an array of char*, so the fix below is pretty obvious. (03/04/25 1.1131.2.1) [PATCH] fix .text.exit error in drivers/net/r8169.c In drivers/net/r8169.c the function rtl8169_remove_one is __devexit but the pointer to it didn't use __devexit_p resulting in a.text.exit compile error when !CONFIG_HOTPLUG. The fix is simple: (03/04/17 1.1101.8.7) [bonding] add support for IEEE 802.3ad Dynamic link aggregation Contributed by Shmulik Hen @ Intel, merge by Jay Vosburgh @ IBM (03/04/17 1.1101.8.6) [bonding] move private decls into new drv/net/bonding/bonding.h file (03/04/17 1.1101.8.5) [bonding] move driver into new drivers/net/bonding directory (03/04/17 1.1101.8.4) [bonding] Moved setting slave mac addr, and open, from app to the driver This patch enables support of modes that need to use the unique mac address of each slave. It moves setting the slave's mac address and opening it from the application to the driver. This breaks backward compatibility between the new driver and older applications ! It also blocks possibility of enslaving before the master is up (to prevent putting the system in an unstable state), and removes the code that unconditionally restores all base driver's flags (flags are automatically restored once all undo stages are done in proper order). Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.3) [bonding] add support for getting slave's speed and duplex via ethtool Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.2) [bonding] fix comment to prevent future merge difficulties Contributed by Jay Vosburgh @ IBM (03/04/17 1.1101.8.1) [net] store physical device a packet arrives in on (Needed for bonding) Contributed by Jay Vosburgh @ IBM, Shmulik Hen @ Intel, and others. From davem@redhat.com Thu Jun 12 14:39:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 14:39:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CLdQ2x004939 for ; Thu, 12 Jun 2003 14:39:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA31562; Thu, 12 Jun 2003 14:35:40 -0700 Date: Thu, 12 Jun 2003 14:35:40 -0700 (PDT) Message-Id: <20030612.143540.41663883.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16104.34463.60472.750699@robur.slu.se> References: <16102.9418.43884.336925@robur.slu.se> <20030611.234534.52193216.davem@redhat.com> <16104.34463.60472.750699@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3188 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Thu, 12 Jun 2003 15:56:47 +0200 David S. Miller writes: > That is a lot more work, but it would validate the various ideas and > claims being made. For example, it would say for sure whether > eliminating the routing cache is a win or not for DoS traffic. Well it's true. But do we need too? From the profile we actually see the 'dst' management cost. Not much I would say and still we will need some adminstration i.e refcounting even it we should remove the route hash. But Robert, do you know "why" the dst management doesn't show up in your profiles when you rip-out the rtcache? It's because to total number of DST entries is so small that they all fit in the cpu cache. When the rtcache is enabled and we thus have up to "max_size" DST entries in flight at all times, the dst management routines show up very clearly because they have a high probability of missing the cpu cache. In particular, have a good look at Simon's profiles. dst_alloc() is quite near the top there. From shemminger@osdl.org Thu Jun 12 15:05:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 15:05:31 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CM5P2x006023 for ; Thu, 12 Jun 2003 15:05:25 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5CM5BX08042; Thu, 12 Jun 2003 15:05:11 -0700 Date: Thu, 12 Jun 2003 15:05:11 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH] dynamic allocation for dummy net device Message-Id: <20030612150511.7ed28548.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3189 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change dummy driver to dynamically allocate it's net_device using alloc_netdev. This prevents problems later when network device objects persist after unregister. --- linux-2.5/drivers/net/dummy.c 2003-06-12 13:35:41.000000000 -0700 +++ linux-2.5-dyn/drivers/net/dummy.c 2003-06-12 15:00:15.000000000 -0700 @@ -51,15 +51,9 @@ static int dummy_accept_fastpath(struct } #endif -static int __init dummy_init(struct net_device *dev) +static void __init dummy_setup(struct net_device *dev) { /* Initialize the device structure. */ - - dev->priv = kmalloc(sizeof(struct net_device_stats), GFP_KERNEL); - if (dev->priv == NULL) - return -ENOMEM; - memset(dev->priv, 0, sizeof(struct net_device_stats)); - dev->get_stats = dummy_get_stats; dev->hard_start_xmit = dummy_xmit; dev->set_multicast_list = set_multicast_list; @@ -72,8 +66,7 @@ static int __init dummy_init(struct net_ dev->tx_queue_len = 0; dev->flags |= IFF_NOARP; dev->flags &= ~IFF_MULTICAST; - - return 0; + SET_MODULE_OWNER(dev); } static int dummy_xmit(struct sk_buff *skb, struct net_device *dev) @@ -92,32 +85,30 @@ static struct net_device_stats *dummy_ge return dev->priv; } -static struct net_device dev_dummy; +static struct net_device *dev_dummy; static int __init dummy_init_module(void) { int err; - dev_dummy.init = dummy_init; - SET_MODULE_OWNER(&dev_dummy); + dev_dummy = alloc_netdev(sizeof(struct net_device_stats), + "dummy%d", dummy_setup); - /* Find a name for this unit */ - err=dev_alloc_name(&dev_dummy,"dummy%d"); - if(err<0) - return err; - err = register_netdev(&dev_dummy); - if (err<0) - return err; - return 0; + if (!dev_dummy) + return -ENOMEM; + + if ((err = register_netdev(dev_dummy))) { + kfree(dev_dummy); + dev_dummy = NULL; + } + return err; } static void __exit dummy_cleanup_module(void) { - unregister_netdev(&dev_dummy); - kfree(dev_dummy.priv); - - memset(&dev_dummy, 0, sizeof(dev_dummy)); - dev_dummy.init = dummy_init; + unregister_netdev(dev_dummy); + kfree(dev_dummy); + dev_dummy = NULL; } module_init(dummy_init_module); From shemminger@osdl.org Thu Jun 12 15:20:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 15:20:21 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CMKD2x006761 for ; Thu, 12 Jun 2003 15:20:14 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5CMK8X11121 for ; Thu, 12 Jun 2003 15:20:08 -0700 Date: Thu, 12 Jun 2003 15:20:08 -0700 From: Stephen Hemminger To: netdev@oss.sgi.com Subject: [PATCH] IPV6 tunnel (sit) using alloc_netdev Message-Id: <20030612152008.1bf2c9e1.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3190 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change IPV6 tunnel pseudo-device to dynamically allocate using alloc_netdev. This prevents problems with later changes where all net_device's need to be dynamically allocated. Tested by starting; configuring sit0; add/delete sit1, but can't really test unregister because IPV6 always seems to have refcounts. --- linux-2.5/net/ipv6/sit.c 2003-06-05 10:04:53.000000000 -0700 +++ linux-2.5-dyn/net/ipv6/sit.c 2003-06-12 13:43:10.000000000 -0700 @@ -61,16 +61,9 @@ static int ipip6_fb_tunnel_init(struct net_device *dev); static int ipip6_tunnel_init(struct net_device *dev); +static void ipip6_tunnel_setup(struct net_device *dev); -static struct net_device ipip6_fb_tunnel_dev = { - .name = "sit0", - .init = ipip6_fb_tunnel_init -}; - -static struct ip_tunnel ipip6_fb_tunnel = { - .dev = &ipip6_fb_tunnel_dev, - .parms = { .name = "sit0" } -}; +static struct net_device *ipip6_fb_tunnel_dev; static struct ip_tunnel *tunnels_r_l[HASH_SIZE]; static struct ip_tunnel *tunnels_r[HASH_SIZE]; @@ -154,6 +147,7 @@ static struct ip_tunnel * ipip6_tunnel_l struct net_device *dev; unsigned h = 0; int prio = 0; + char name[IFNAMSIZ]; if (remote) { prio |= 2; @@ -168,54 +162,47 @@ static struct ip_tunnel * ipip6_tunnel_l return t; } if (!create) - return NULL; - - dev = kmalloc(sizeof(*dev) + sizeof(*t), GFP_KERNEL); - if (dev == NULL) - return NULL; + goto failed; - memset(dev, 0, sizeof(*dev) + sizeof(*t)); - dev->priv = (void*)(dev+1); - nt = (struct ip_tunnel*)dev->priv; - nt->dev = dev; - dev->init = ipip6_tunnel_init; - memcpy(&nt->parms, parms, sizeof(*parms)); - nt->parms.name[IFNAMSIZ-1] = '\0'; - strcpy(dev->name, nt->parms.name); - if (dev->name[0] == 0) { + if (parms->name[0]) + strlcpy(name, parms->name, IFNAMSIZ); + else { int i; for (i=1; i<100; i++) { - sprintf(dev->name, "sit%d", i); - if (__dev_get_by_name(dev->name) == NULL) + sprintf(name, "sit%d", i); + if (__dev_get_by_name(name) == NULL) break; } if (i==100) goto failed; - memcpy(nt->parms.name, dev->name, IFNAMSIZ); } - SET_MODULE_OWNER(dev); - if (register_netdevice(dev) < 0) + + dev = alloc_netdev(sizeof(*t), name, ipip6_tunnel_setup); + if (dev == NULL) + return NULL; + + nt = dev->priv; + dev->init = ipip6_tunnel_init; + nt->parms = *parms; + + if (register_netdevice(dev) < 0) { + kfree(dev); goto failed; + } dev_hold(dev); + ipip6_tunnel_link(nt); /* Do not decrement MOD_USE_COUNT here. */ return nt; failed: - kfree(dev); return NULL; } -static void ipip6_tunnel_destructor(struct net_device *dev) -{ - if (dev != &ipip6_fb_tunnel_dev) - kfree(dev); -} - static void ipip6_tunnel_uninit(struct net_device *dev) { - if (dev == &ipip6_fb_tunnel_dev) { + if (dev == ipip6_fb_tunnel_dev) { write_lock_bh(&ipip6_lock); tunnels_wc[0] = NULL; write_unlock_bh(&ipip6_lock); @@ -621,7 +608,7 @@ ipip6_tunnel_ioctl (struct net_device *d switch (cmd) { case SIOCGETTUNNEL: t = NULL; - if (dev == &ipip6_fb_tunnel_dev) { + if (dev == ipip6_fb_tunnel_dev) { if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) { err = -EFAULT; break; @@ -654,8 +641,7 @@ ipip6_tunnel_ioctl (struct net_device *d t = ipip6_tunnel_locate(&p, cmd == SIOCADDTUNNEL); - if (dev != &ipip6_fb_tunnel_dev && cmd == SIOCCHGTUNNEL && - t != &ipip6_fb_tunnel) { + if (dev != ipip6_fb_tunnel_dev && cmd == SIOCCHGTUNNEL) { if (t != NULL) { if (t->dev != dev) { err = -EEXIST; @@ -695,7 +681,7 @@ ipip6_tunnel_ioctl (struct net_device *d if (!capable(CAP_NET_ADMIN)) goto done; - if (dev == &ipip6_fb_tunnel_dev) { + if (dev == ipip6_fb_tunnel_dev) { err = -EFAULT; if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) goto done; @@ -703,7 +689,7 @@ ipip6_tunnel_ioctl (struct net_device *d if ((t = ipip6_tunnel_locate(&p, 0)) == NULL) goto done; err = -EPERM; - if (t == &ipip6_fb_tunnel) + if (t == ipip6_fb_tunnel_dev->priv) goto done; dev = t->dev; } @@ -731,12 +717,11 @@ static int ipip6_tunnel_change_mtu(struc return 0; } -static void ipip6_tunnel_init_gen(struct net_device *dev) +static void ipip6_tunnel_setup(struct net_device *dev) { - struct ip_tunnel *t = (struct ip_tunnel*)dev->priv; - - dev->destructor = ipip6_tunnel_destructor; + SET_MODULE_OWNER(dev); dev->uninit = ipip6_tunnel_uninit; + dev->destructor = (void (*)(struct net_device *))kfree; dev->hard_start_xmit = ipip6_tunnel_xmit; dev->get_stats = ipip6_tunnel_get_stats; dev->do_ioctl = ipip6_tunnel_ioctl; @@ -748,8 +733,6 @@ static void ipip6_tunnel_init_gen(struct dev->flags = IFF_NOARP; dev->iflink = 0; dev->addr_len = 4; - memcpy(dev->dev_addr, &t->parms.iph.saddr, 4); - memcpy(dev->broadcast, &t->parms.iph.daddr, 4); } static int ipip6_tunnel_init(struct net_device *dev) @@ -760,8 +743,9 @@ static int ipip6_tunnel_init(struct net_ tunnel = (struct ip_tunnel*)dev->priv; iph = &tunnel->parms.iph; - - ipip6_tunnel_init_gen(dev); + tunnel->dev = dev; + memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4); + memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4); if (iph->daddr) { struct flowi fl = { .nl_u = { .ip4_u = @@ -793,18 +777,16 @@ static int ipip6_tunnel_init(struct net_ int __init ipip6_fb_tunnel_init(struct net_device *dev) { - struct iphdr *iph; + struct ip_tunnel *tunnel = dev->priv; + struct iphdr *iph = &tunnel->parms.iph; - ipip6_tunnel_init_gen(dev); - - iph = &ipip6_fb_tunnel.parms.iph; iph->version = 4; iph->protocol = IPPROTO_IPV6; iph->ihl = 5; iph->ttl = 64; dev_hold(dev); - tunnels_wc[0] = &ipip6_fb_tunnel; + tunnels_wc[0] = tunnel; return 0; } @@ -817,12 +799,14 @@ static struct inet_protocol sit_protocol void sit_cleanup(void) { inet_del_protocol(&sit_protocol, IPPROTO_IPV6); - unregister_netdev(&ipip6_fb_tunnel_dev); + unregister_netdev(ipip6_fb_tunnel_dev); } #endif int __init sit_init(void) { + int err; + printk(KERN_INFO "IPv6 over IPv4 tunneling driver\n"); if (inet_add_protocol(&sit_protocol, IPPROTO_IPV6) < 0) { @@ -830,9 +814,22 @@ int __init sit_init(void) return -EAGAIN; } - ipip6_fb_tunnel_dev.priv = (void*)&ipip6_fb_tunnel; - strcpy(ipip6_fb_tunnel_dev.name, ipip6_fb_tunnel.parms.name); - SET_MODULE_OWNER(&ipip6_fb_tunnel_dev); - register_netdev(&ipip6_fb_tunnel_dev); - return 0; + ipip6_fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", + ipip6_tunnel_setup); + if (!ipip6_fb_tunnel_dev) { + err = -ENOMEM; + goto fail; + } + + ipip6_fb_tunnel_dev->init = ipip6_fb_tunnel_init; + + if ((err = register_netdev(ipip6_fb_tunnel_dev))) + goto fail; + + out: + return err; + fail: + inet_del_protocol(&sit_protocol, IPPROTO_IPV6); + kfree(ipip6_fb_tunnel_dev); + goto out; } From shemminger@osdl.org Thu Jun 12 15:27:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 15:27:49 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CMRh2x007282 for ; Thu, 12 Jun 2003 15:27:43 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5CMRVX12505; Thu, 12 Jun 2003 15:27:31 -0700 Date: Thu, 12 Jun 2003 15:27:31 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] ipip tunnel dynamic net_device's Message-Id: <20030612152731.14d3f23d.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3191 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Convert ipip tunnel pseudo-driver to use dynamically allocated network devices. This depends on the alloc_netdev earlier patch. Tested by creating/deleting and unloading ipip tunnels. --- linux-2.5/net/ipv4/ipip.c 2003-06-05 10:04:53.000000000 -0700 +++ linux-2.5-dyn/net/ipv4/ipip.c 2003-06-12 14:02:34.000000000 -0700 @@ -122,16 +122,9 @@ static int ipip_fb_tunnel_init(struct net_device *dev); static int ipip_tunnel_init(struct net_device *dev); +static void ipip_tunnel_setup(struct net_device *dev); -static struct net_device ipip_fb_tunnel_dev = { - .name = "tunl0", - .init = ipip_fb_tunnel_init, -}; - -static struct ip_tunnel ipip_fb_tunnel = { - .dev = &ipip_fb_tunnel_dev, - .parms ={ .name = "tunl0", } -}; +static struct net_device *ipip_fb_tunnel_dev; static struct ip_tunnel *tunnels_r_l[HASH_SIZE]; static struct ip_tunnel *tunnels_r[HASH_SIZE]; @@ -216,6 +209,7 @@ static struct ip_tunnel * ipip_tunnel_lo struct net_device *dev; unsigned h = 0; int prio = 0; + char name[IFNAMSIZ]; if (remote) { prio |= 2; @@ -232,32 +226,33 @@ static struct ip_tunnel * ipip_tunnel_lo if (!create) return NULL; - dev = kmalloc(sizeof(*dev) + sizeof(*t), GFP_KERNEL); - if (dev == NULL) - return NULL; - - memset(dev, 0, sizeof(*dev) + sizeof(*t)); - dev->priv = (void*)(dev+1); - nt = (struct ip_tunnel*)dev->priv; - nt->dev = dev; - dev->init = ipip_tunnel_init; - memcpy(&nt->parms, parms, sizeof(*parms)); - nt->parms.name[IFNAMSIZ-1] = '\0'; - strcpy(dev->name, nt->parms.name); - if (dev->name[0] == 0) { + if (parms->name[0]) + strlcpy(name, parms->name, IFNAMSIZ); + else { int i; for (i=1; i<100; i++) { - sprintf(dev->name, "tunl%d", i); - if (__dev_get_by_name(dev->name) == NULL) + sprintf(name, "tunl%d", i); + if (__dev_get_by_name(name) == NULL) break; } if (i==100) goto failed; - memcpy(nt->parms.name, dev->name, IFNAMSIZ); } + + dev = alloc_netdev(sizeof(*t), name, ipip_tunnel_setup); + if (dev == NULL) + return NULL; + + nt = dev->priv; SET_MODULE_OWNER(dev); - if (register_netdevice(dev) < 0) + dev->init = ipip_tunnel_init; + dev->destructor = (void (*)(struct net_device *))kfree; + nt->parms = *parms; + + if (register_netdevice(dev) < 0) { + kfree(dev); goto failed; + } dev_hold(dev); ipip_tunnel_link(nt); @@ -265,19 +260,12 @@ static struct ip_tunnel * ipip_tunnel_lo return nt; failed: - kfree(dev); return NULL; } -static void ipip_tunnel_destructor(struct net_device *dev) -{ - if (dev != &ipip_fb_tunnel_dev) - kfree(dev); -} - static void ipip_tunnel_uninit(struct net_device *dev) { - if (dev == &ipip_fb_tunnel_dev) { + if (dev == ipip_fb_tunnel_dev) { write_lock_bh(&ipip_lock); tunnels_wc[0] = NULL; write_unlock_bh(&ipip_lock); @@ -682,7 +670,7 @@ ipip_tunnel_ioctl (struct net_device *de switch (cmd) { case SIOCGETTUNNEL: t = NULL; - if (dev == &ipip_fb_tunnel_dev) { + if (dev == ipip_fb_tunnel_dev) { if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) { err = -EFAULT; break; @@ -715,8 +703,7 @@ ipip_tunnel_ioctl (struct net_device *de t = ipip_tunnel_locate(&p, cmd == SIOCADDTUNNEL); - if (dev != &ipip_fb_tunnel_dev && cmd == SIOCCHGTUNNEL && - t != &ipip_fb_tunnel) { + if (dev != ipip_fb_tunnel_dev && cmd == SIOCCHGTUNNEL) { if (t != NULL) { if (t->dev != dev) { err = -EEXIST; @@ -757,7 +744,7 @@ ipip_tunnel_ioctl (struct net_device *de if (!capable(CAP_NET_ADMIN)) goto done; - if (dev == &ipip_fb_tunnel_dev) { + if (dev == ipip_fb_tunnel_dev) { err = -EFAULT; if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) goto done; @@ -765,7 +752,7 @@ ipip_tunnel_ioctl (struct net_device *de if ((t = ipip_tunnel_locate(&p, 0)) == NULL) goto done; err = -EPERM; - if (t == &ipip_fb_tunnel) + if (t->dev == ipip_fb_tunnel_dev) goto done; dev = t->dev; } @@ -793,12 +780,10 @@ static int ipip_tunnel_change_mtu(struct return 0; } -static void ipip_tunnel_init_gen(struct net_device *dev) +static void ipip_tunnel_setup(struct net_device *dev) { - struct ip_tunnel *t = (struct ip_tunnel*)dev->priv; - + SET_MODULE_OWNER(dev); dev->uninit = ipip_tunnel_uninit; - dev->destructor = ipip_tunnel_destructor; dev->hard_start_xmit = ipip_tunnel_xmit; dev->get_stats = ipip_tunnel_get_stats; dev->do_ioctl = ipip_tunnel_ioctl; @@ -810,8 +795,6 @@ static void ipip_tunnel_init_gen(struct dev->flags = IFF_NOARP; dev->iflink = 0; dev->addr_len = 4; - memcpy(dev->dev_addr, &t->parms.iph.saddr, 4); - memcpy(dev->broadcast, &t->parms.iph.daddr, 4); } static int ipip_tunnel_init(struct net_device *dev) @@ -822,8 +805,9 @@ static int ipip_tunnel_init(struct net_d tunnel = (struct ip_tunnel*)dev->priv; iph = &tunnel->parms.iph; - - ipip_tunnel_init_gen(dev); + tunnel->dev = dev; + memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4); + memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4); if (iph->daddr) { struct flowi fl = { .oif = tunnel->parms.link, @@ -854,17 +838,15 @@ static int ipip_tunnel_init(struct net_d static int __init ipip_fb_tunnel_init(struct net_device *dev) { - struct iphdr *iph; - - ipip_tunnel_init_gen(dev); + struct ip_tunnel *tunnel = dev->priv; + struct iphdr *iph = &tunnel->parms.iph; - iph = &ipip_fb_tunnel.parms.iph; iph->version = 4; iph->protocol = IPPROTO_IPIP; iph->ihl = 5; dev_hold(dev); - tunnels_wc[0] = &ipip_fb_tunnel; + tunnels_wc[0] = tunnel; return 0; } @@ -878,6 +860,8 @@ static char banner[] __initdata = int __init ipip_init(void) { + int err; + printk(banner); if (xfrm4_tunnel_register(&ipip_handler) < 0) { @@ -885,10 +869,24 @@ int __init ipip_init(void) return -EAGAIN; } - ipip_fb_tunnel_dev.priv = (void*)&ipip_fb_tunnel; - SET_MODULE_OWNER(&ipip_fb_tunnel_dev); - register_netdev(&ipip_fb_tunnel_dev); - return 0; + ipip_fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), + "tunl0", + ipip_tunnel_setup); + if (!ipip_fb_tunnel_dev) { + err = -ENOMEM; + goto fail; + } + + ipip_fb_tunnel_dev->init = ipip_fb_tunnel_init; + + if ((err = register_netdev(ipip_fb_tunnel_dev))) + goto fail; + out: + return err; + fail: + xfrm4_tunnel_deregister(&ipip_handler); + kfree(ipip_fb_tunnel_dev); + goto out; } static void __exit ipip_fini(void) @@ -896,7 +894,7 @@ static void __exit ipip_fini(void) if (xfrm4_tunnel_deregister(&ipip_handler) < 0) printk(KERN_INFO "ipip close: can't deregister tunnel\n"); - unregister_netdev(&ipip_fb_tunnel_dev); + unregister_netdev(ipip_fb_tunnel_dev); } #ifdef MODULE From shemminger@osdl.org Thu Jun 12 15:45:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 15:46:06 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CMjt2x008176 for ; Thu, 12 Jun 2003 15:45:55 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5CMjhX16627; Thu, 12 Jun 2003 15:45:43 -0700 Date: Thu, 12 Jun 2003 15:45:43 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] ipgre tunnel dynamic net device's Message-Id: <20030612154543.16314928.shemminger@osdl.org> In-Reply-To: <20030612152731.14d3f23d.shemminger@osdl.org> References: <20030612152731.14d3f23d.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3192 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Similar change as earlier ipip tunnel. Allocate network device's dynamically using alloc_netdev. Likewise tested by create/remove/add/delete with iptunnel. --- linux-2.5/net/ipv4/ip_gre.c 2003-06-05 10:04:52.000000000 -0700 +++ linux-2.5-dyn/net/ipv4/ip_gre.c 2003-06-12 14:02:34.000000000 -0700 @@ -114,20 +114,13 @@ */ static int ipgre_tunnel_init(struct net_device *dev); +static void ipgre_tunnel_setup(struct net_device *dev); /* Fallback tunnel: no source, no destination, no key, no options */ static int ipgre_fb_tunnel_init(struct net_device *dev); -static struct net_device ipgre_fb_tunnel_dev = { - .name = "gre0", - .init = ipgre_fb_tunnel_init -}; - -static struct ip_tunnel ipgre_fb_tunnel = { - .dev = &ipgre_fb_tunnel_dev, - .parms ={ .name = "gre0" } -}; +static struct net_device *ipgre_fb_tunnel_dev; /* Tunnel hash table */ @@ -190,8 +183,9 @@ static struct ip_tunnel * ipgre_tunnel_l if (t->parms.i_key == key && (t->dev->flags&IFF_UP)) return t; } - if (ipgre_fb_tunnel_dev.flags&IFF_UP) - return &ipgre_fb_tunnel; + + if (ipgre_fb_tunnel_dev->flags&IFF_UP) + return ipgre_fb_tunnel_dev->priv; return NULL; } @@ -246,6 +240,7 @@ static struct ip_tunnel * ipgre_tunnel_l struct net_device *dev; unsigned h = HASH(key); int prio = 0; + char name[IFNAMSIZ]; if (local) prio |= 1; @@ -262,32 +257,28 @@ static struct ip_tunnel * ipgre_tunnel_l if (!create) return NULL; - dev = kmalloc(sizeof(*dev) + sizeof(*t), GFP_KERNEL); - if (dev == NULL) - return NULL; - - memset(dev, 0, sizeof(*dev) + sizeof(*t)); - dev->priv = (void*)(dev+1); - nt = (struct ip_tunnel*)dev->priv; - nt->dev = dev; - dev->init = ipgre_tunnel_init; - memcpy(&nt->parms, parms, sizeof(*parms)); - nt->parms.name[IFNAMSIZ-1] = '\0'; - strcpy(dev->name, nt->parms.name); - if (dev->name[0] == 0) { + if (parms->name[0]) + strlcpy(name, parms->name, IFNAMSIZ); + else { int i; for (i=1; i<100; i++) { - sprintf(dev->name, "gre%d", i); - if (__dev_get_by_name(dev->name) == NULL) + sprintf(name, "gre%d", i); + if (__dev_get_by_name(name) == NULL) break; } if (i==100) goto failed; - memcpy(nt->parms.name, dev->name, IFNAMSIZ); } - SET_MODULE_OWNER(dev); - if (register_netdevice(dev) < 0) + + dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup); + if (register_netdevice(dev) < 0) { + kfree(dev); goto failed; + } + + nt = dev->priv; + dev->init = ipgre_tunnel_init; + nt->parms = *parms; dev_hold(dev); ipgre_tunnel_link(nt); @@ -295,16 +286,9 @@ static struct ip_tunnel * ipgre_tunnel_l return nt; failed: - kfree(dev); return NULL; } -static void ipgre_tunnel_destructor(struct net_device *dev) -{ - if (dev != &ipgre_fb_tunnel_dev) - kfree(dev); -} - static void ipgre_tunnel_uninit(struct net_device *dev) { ipgre_tunnel_unlink((struct ip_tunnel*)dev->priv); @@ -916,7 +900,7 @@ ipgre_tunnel_ioctl (struct net_device *d switch (cmd) { case SIOCGETTUNNEL: t = NULL; - if (dev == &ipgre_fb_tunnel_dev) { + if (dev == ipgre_fb_tunnel_dev) { if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) { err = -EFAULT; break; @@ -955,8 +939,7 @@ ipgre_tunnel_ioctl (struct net_device *d t = ipgre_tunnel_locate(&p, cmd == SIOCADDTUNNEL); - if (dev != &ipgre_fb_tunnel_dev && cmd == SIOCCHGTUNNEL && - t != &ipgre_fb_tunnel) { + if (dev != ipgre_fb_tunnel_dev && cmd == SIOCCHGTUNNEL) { if (t != NULL) { if (t->dev != dev) { err = -EEXIST; @@ -1006,7 +989,7 @@ ipgre_tunnel_ioctl (struct net_device *d if (!capable(CAP_NET_ADMIN)) goto done; - if (dev == &ipgre_fb_tunnel_dev) { + if (dev == ipgre_fb_tunnel_dev) { err = -EFAULT; if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) goto done; @@ -1014,7 +997,7 @@ ipgre_tunnel_ioctl (struct net_device *d if ((t = ipgre_tunnel_locate(&p, 0)) == NULL) goto done; err = -EPERM; - if (t == &ipgre_fb_tunnel) + if (t == ipgre_fb_tunnel_dev->priv) goto done; dev = t->dev; } @@ -1140,12 +1123,11 @@ static int ipgre_close(struct net_device #endif -static void ipgre_tunnel_init_gen(struct net_device *dev) +static void ipgre_tunnel_setup(struct net_device *dev) { - struct ip_tunnel *t = (struct ip_tunnel*)dev->priv; - + SET_MODULE_OWNER(dev); dev->uninit = ipgre_tunnel_uninit; - dev->destructor = ipgre_tunnel_destructor; + dev->destructor = (void (*)(struct net_device *))kfree; dev->hard_start_xmit = ipgre_tunnel_xmit; dev->get_stats = ipgre_tunnel_get_stats; dev->do_ioctl = ipgre_tunnel_ioctl; @@ -1157,8 +1139,6 @@ static void ipgre_tunnel_init_gen(struct dev->flags = IFF_NOARP; dev->iflink = 0; dev->addr_len = 4; - memcpy(dev->dev_addr, &t->parms.iph.saddr, 4); - memcpy(dev->broadcast, &t->parms.iph.daddr, 4); } static int ipgre_tunnel_init(struct net_device *dev) @@ -1173,7 +1153,9 @@ static int ipgre_tunnel_init(struct net_ tunnel = (struct ip_tunnel*)dev->priv; iph = &tunnel->parms.iph; - ipgre_tunnel_init_gen(dev); + tunnel->dev = dev; + memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4); + memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4); /* Guess output device to choose reasonable mtu and hard_header_len */ @@ -1231,18 +1213,15 @@ static int ipgre_tunnel_init(struct net_ int __init ipgre_fb_tunnel_init(struct net_device *dev) { struct ip_tunnel *tunnel = (struct ip_tunnel*)dev->priv; - struct iphdr *iph; - - ipgre_tunnel_init_gen(dev); + struct iphdr *iph = &tunnel->parms.iph; - iph = &ipgre_fb_tunnel.parms.iph; iph->version = 4; iph->protocol = IPPROTO_GRE; iph->ihl = 5; tunnel->hlen = sizeof(struct iphdr) + 4; dev_hold(dev); - tunnels_wc[0] = &ipgre_fb_tunnel; + tunnels_wc[0] = tunnel; return 0; } @@ -1259,6 +1238,8 @@ static struct inet_protocol ipgre_protoc int __init ipgre_init(void) { + int err = -EINVAL; + printk(KERN_INFO "GRE over IPv4 tunneling driver\n"); if (inet_add_protocol(&ipgre_protocol, IPPROTO_GRE) < 0) { @@ -1266,10 +1247,23 @@ int __init ipgre_init(void) return -EAGAIN; } - ipgre_fb_tunnel_dev.priv = (void*)&ipgre_fb_tunnel; - SET_MODULE_OWNER(&ipgre_fb_tunnel_dev); - register_netdev(&ipgre_fb_tunnel_dev); - return 0; + ipgre_fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "gre0", + ipgre_tunnel_setup); + if (!ipgre_fb_tunnel_dev) { + err = -ENOMEM; + goto fail; + } + + ipgre_fb_tunnel_dev->init = ipgre_fb_tunnel_init; + + if ((err = register_netdev(ipgre_fb_tunnel_dev))) + goto fail; +out: + return err; +fail: + inet_del_protocol(&ipgre_protocol, IPPROTO_GRE); + kfree(ipgre_fb_tunnel_dev); + goto out; } void ipgre_fini(void) @@ -1277,7 +1271,7 @@ void ipgre_fini(void) if (inet_del_protocol(&ipgre_protocol, IPPROTO_GRE) < 0) printk(KERN_INFO "ipgre close: can't remove protocol\n"); - unregister_netdev(&ipgre_fb_tunnel_dev); + unregister_netdev(ipgre_fb_tunnel_dev); } #ifdef MODULE From shemminger@osdl.org Thu Jun 12 16:13:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 16:13:05 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5CNCx2x009397 for ; Thu, 12 Jun 2003 16:13:00 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5CNCiX24389; Thu, 12 Jun 2003 16:12:44 -0700 Date: Thu, 12 Jun 2003 16:12:43 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: viro@parcelfarce.linux.theplanet.co.uk, jgarzik@pobox.com, netdev@oss.sgi.com Subject: [PATCH] handle slip module unload race better. Message-Id: <20030612161243.01052de0.shemminger@osdl.org> In-Reply-To: <20030612.145803.112592736.davem@redhat.com> References: <20030612132714.6b8e1267.shemminger@osdl.org> <20030612.145803.112592736.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3193 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 12 Jun 2003 14:58:03 -0700 (PDT) "David S. Miller" wrote: > From: Stephen Hemminger > Date: Thu, 12 Jun 2003 13:27:14 -0700 > > The following won't work. > > ... > unregister_netdev(&slc->dev); > if (slc->ctrl.tty) { > printk(KERN_ERR "%s: tty discipline is still running\n", slc->dev.name); > /* Pin module forever */ > MOD_INC_USE_COUNT; > > Because it is in the exit code for the module, and by that time > the module code has decided it is going to remove it and no > longer looks at the ref count. Sorry, don't have a patch to > fix since this is related more to the problematic tty code, > than the network code. > > I think a reasonable thing for it to do is leak the netdevice > and print a message when this happens. Okay, this patch does that. I could start and unload slip, but could not actually trigger the failure. diff -Nru a/drivers/net/slip.c b/drivers/net/slip.c --- a/drivers/net/slip.c Thu Jun 12 16:10:30 2003 +++ b/drivers/net/slip.c Thu Jun 12 16:10:30 2003 @@ -1381,27 +1381,23 @@ local_bh_enable(); } while (busy && time_before(jiffies, timeout)); - busy = 0; for (i = 0; i < slip_maxdev; i++) { struct slip_ctrl *slc = slip_ctrls[i]; if (slc) { unregister_netdev(&slc->dev); if (slc->ctrl.tty) { printk(KERN_ERR "%s: tty discipline is still running\n", slc->dev.name); - /* Pin module forever */ - MOD_INC_USE_COUNT; - busy++; - continue; + /* Intentionally leak the control block. */ + } else { + sl_free_bufs(&slc->ctrl); + kfree(slc); } - sl_free_bufs(&slc->ctrl); - kfree(slc); slip_ctrls[i] = NULL; } } - if (!busy) { - kfree(slip_ctrls); - slip_ctrls = NULL; - } + + kfree(slip_ctrls); + slip_ctrls = NULL; } if ((i = tty_register_ldisc(N_SLIP, NULL))) { From davem@redhat.com Thu Jun 12 17:59:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 17:59:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D0xD2x013033 for ; Thu, 12 Jun 2003 17:59:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA32241; Thu, 12 Jun 2003 17:55:32 -0700 Date: Thu, 12 Jun 2003 17:55:32 -0700 (PDT) Message-Id: <20030612.175532.78736834.davem@redhat.com> To: shemminger@osdl.org Cc: viro@parcelfarce.linux.theplanet.co.uk, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH] handle slip module unload race better. From: "David S. Miller" In-Reply-To: <20030612161243.01052de0.shemminger@osdl.org> References: <20030612132714.6b8e1267.shemminger@osdl.org> <20030612.145803.112592736.davem@redhat.com> <20030612161243.01052de0.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3195 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Thu, 12 Jun 2003 16:12:43 -0700 Okay, this patch does that. I could start and unload slip, but could not actually trigger the failure. Applied, thanks. From davem@redhat.com Thu Jun 12 17:58:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 17:58:56 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D0wp2x012988 for ; Thu, 12 Jun 2003 17:58:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA32234; Thu, 12 Jun 2003 17:55:12 -0700 Date: Thu, 12 Jun 2003 17:55:12 -0700 (PDT) Message-Id: <20030612.175512.104063805.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] ipgre tunnel dynamic net device's From: "David S. Miller" In-Reply-To: <20030612154543.16314928.shemminger@osdl.org> References: <20030612152731.14d3f23d.shemminger@osdl.org> <20030612154543.16314928.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3194 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev I've applied all of your dynamic net device changes, thanks. From krkumar@us.ibm.com Thu Jun 12 18:12:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 18:12:49 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D1CZ2x014082 for ; Thu, 12 Jun 2003 18:12:42 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5D1CTsZ145102; Thu, 12 Jun 2003 21:12:29 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5D1CRgP232776; Thu, 12 Jun 2003 21:12:28 -0400 Message-ID: <3EE924E8.8080003@us.ibm.com> Date: Thu, 12 Jun 2003 18:12:08 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Problem while using VLAN Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3196 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi, I am trying to configure a few thousand (< 5000) VLAN devices on 2.5.70, and use following commands (in a loop) : vconfig add eth0 $i$j ifconfig eth0.$i$j 10.0.$i.$j broadcast <> netmask <> up ifconfig eth0.$i$j add fec0:1:2:$i::$j/64 I get following problems : 1. After some time, I lose the link local and site local addresses on my eth0 and eth1 interfaces which were added through autoconfig. 2. After some time, I lose link local and site addresses on all VLAN devices (after creating a few thousand devices). 3. I find two devices for each created device, eg two eth0.232 or eth0.45 devices. So if I create 100 devices, I get 200. Is there any reason for this to happen, and am I doing something wrong ? Alternatively, any fixes for this ? Thanks, - KK From davem@redhat.com Thu Jun 12 22:42:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 22:42:20 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D5g92x022660 for ; Thu, 12 Jun 2003 22:42:09 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA00928; Thu, 12 Jun 2003 22:38:19 -0700 Date: Thu, 12 Jun 2003 22:38:18 -0700 (PDT) Message-Id: <20030612.223818.74754074.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16103.27039.204333.952703@robur.slu.se> References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> <16103.27039.204333.952703@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3197 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Wed, 11 Jun 2003 19:40:47 +0200 vma samples %-age symbol name c023c038 107340 33.143 fn_hash_lookup Ok, let's optimize our datastructures for how we actually use them :-) Also, fn_zone shrunk by 8 bytes. Try this: --- ./include/net/ip_fib.h.~1~ Thu Jun 12 22:18:33 2003 +++ ./include/net/ip_fib.h Thu Jun 12 22:19:57 2003 @@ -89,13 +89,12 @@ struct fib_info struct fib_rule; #endif -struct fib_result -{ - unsigned char prefixlen; - unsigned char nh_sel; +struct fib_result { + struct fib_info *fi; unsigned char type; unsigned char scope; - struct fib_info *fi; + unsigned char prefixlen; + unsigned char nh_sel; #ifdef CONFIG_IP_MULTIPLE_TABLES struct fib_rule *r; #endif --- ./net/ipv4/fib_hash.c.~1~ Thu Jun 12 21:47:11 2003 +++ ./net/ipv4/fib_hash.c Thu Jun 12 22:08:27 2003 @@ -65,16 +65,15 @@ typedef struct { u32 datum; } fn_hash_idx_t; -struct fib_node -{ - struct fib_node *fn_next; - struct fib_info *fn_info; -#define FIB_INFO(f) ((f)->fn_info) +struct fib_node { fn_key_t fn_key; u8 fn_tos; - u8 fn_type; - u8 fn_scope; u8 fn_state; + u8 fn_scope; + u8 fn_type; + struct fib_node *fn_next; + struct fib_info *fn_info; +#define FIB_INFO(f) ((f)->fn_info) }; #define FN_S_ZOMBIE 1 @@ -82,29 +81,19 @@ struct fib_node static int fib_hash_zombies; -struct fn_zone -{ - struct fn_zone *fz_next; /* Next not empty zone */ - struct fib_node **fz_hash; /* Hash table pointer */ - int fz_nent; /* Number of entries */ - +struct fn_zone { int fz_divisor; /* Hash divisor */ - u32 fz_hashmask; /* (fz_divisor - 1) */ -#define FZ_HASHMASK(fz) ((fz)->fz_hashmask) - +#define FZ_HASHMASK(fz) ((fz)->fz_divisor - 1) int fz_order; /* Zone order */ - u32 fz_mask; -#define FZ_MASK(fz) ((fz)->fz_mask) +#define FZ_MASK(fz) (inet_make_mask((fz)->fz_order)) + struct fib_node **fz_hash; /* Hash table pointer */ + struct fn_zone *fz_next; /* Next not empty zone */ + int fz_nent; /* Number of entries */ }; -/* NOTE. On fast computers evaluation of fz_hashmask and fz_mask - can be cheaper than memory lookup, so that FZ_* macros are used. - */ - -struct fn_hash -{ - struct fn_zone *fn_zones[33]; +struct fn_hash { struct fn_zone *fn_zone_list; + struct fn_zone *fn_zones[33]; }; static __inline__ fn_hash_idx_t fn_hash(fn_key_t key, struct fn_zone *fz) @@ -197,7 +186,6 @@ static void fn_rehash_zone(struct fn_zon { struct fib_node **ht, **old_ht; int old_divisor, new_divisor; - u32 new_hashmask; old_divisor = fz->fz_divisor; @@ -217,8 +205,6 @@ static void fn_rehash_zone(struct fn_zon break; } - new_hashmask = (new_divisor - 1); - #if RT_CACHE_DEBUG >= 2 printk("fn_rehash_zone: hash for zone %d grows from %d\n", fz->fz_order, old_divisor); #endif @@ -231,7 +217,6 @@ static void fn_rehash_zone(struct fn_zon write_lock_bh(&fib_hash_lock); old_ht = fz->fz_hash; fz->fz_hash = ht; - fz->fz_hashmask = new_hashmask; fz->fz_divisor = new_divisor; fn_rebuild_zone(fz, old_ht, old_divisor); write_unlock_bh(&fib_hash_lock); @@ -261,7 +246,6 @@ fn_new_zone(struct fn_hash *table, int z } else { fz->fz_divisor = 1; } - fz->fz_hashmask = (fz->fz_divisor - 1); fz->fz_hash = fz_hash_alloc(fz->fz_divisor); if (!fz->fz_hash) { kfree(fz); @@ -269,7 +253,6 @@ fn_new_zone(struct fn_hash *table, int z } memset(fz->fz_hash, 0, fz->fz_divisor*sizeof(struct fib_node*)); fz->fz_order = z; - fz->fz_mask = inet_make_mask(z); /* Find the first not empty zone with more specific mask */ for (i=z+1; i<=32; i++) @@ -312,10 +295,15 @@ fn_hash_lookup(struct fib_table *tb, con if (f->fn_tos && f->fn_tos != flp->fl4_tos) continue; #endif - f->fn_state |= FN_S_ACCESSED; + { + u8 state = f->fn_state; - if (f->fn_state&FN_S_ZOMBIE) - continue; + if (!(state & FN_S_ACCESSED)) + f->fn_state = state | FN_S_ACCESSED; + + if (state & FN_S_ZOMBIE) + continue; + } if (f->fn_scope < flp->fl4_scope) continue; From davem@redhat.com Thu Jun 12 23:25:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 23:25:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D6P22x025236 for ; Thu, 12 Jun 2003 23:25:03 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA01143; Thu, 12 Jun 2003 23:21:15 -0700 Date: Thu, 12 Jun 2003 23:21:14 -0700 (PDT) Message-Id: <20030612.232114.71088346.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16101.4136.328760.955758@robur.slu.se> References: <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> <16101.4136.328760.955758@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3198 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Tue, 10 Jun 2003 00:54:32 +0200 I'm about to propose some stats even for hash spinning.... Do you mind if I apply this? It looks fine. Next, we should put similar metrics into fib_hash.c From davem@redhat.com Thu Jun 12 23:28:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 12 Jun 2003 23:28:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5D6Sr2x025614 for ; Thu, 12 Jun 2003 23:28:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA01131; Thu, 12 Jun 2003 23:20:03 -0700 Date: Thu, 12 Jun 2003 23:20:02 -0700 (PDT) Message-Id: <20030612.232002.41633789.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030610075732.GD23009@netnation.com> References: <20030610075732.GD23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3199 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Tue, 10 Jun 2003 00:57:32 -0700 In any case, setting gc_min_interval to 0 definitely helped, but I suspect Dave's patches will make a bigger difference. Next up is 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. Did you get stuck in some mud? :-) It's been two days. I even posted new patches for you to test, get on it :))) From Robert.Olsson@data.slu.se Fri Jun 13 03:23:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 03:23:51 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DANi2x004396 for ; Fri, 13 Jun 2003 03:23:45 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA30816; Fri, 13 Jun 2003 12:22:35 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16105.42475.451956.130322@robur.slu.se> Date: Fri, 13 Jun 2003 12:22:35 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030612.223818.74754074.davem@redhat.com> References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> <16103.27039.204333.952703@robur.slu.se> <20030612.223818.74754074.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3200 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > From: Robert Olsson > Date: Wed, 11 Jun 2003 19:40:47 +0200 > > vma samples %-age symbol name > c023c038 107340 33.143 fn_hash_lookup > > Ok, let's optimize our datastructures for how we actually > use them :-) Also, fn_zone shrunk by 8 bytes. > > Try this: I'll pass the university lab on my way home later today and hope give to it a try. Cheers. --ro From Robert.Olsson@data.slu.se Fri Jun 13 03:41:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 03:41:22 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DAfD2x005253 for ; Fri, 13 Jun 2003 03:41:14 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA31111; Fri, 13 Jun 2003 12:40:23 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16105.43543.826589.672148@robur.slu.se> Date: Fri, 13 Jun 2003 12:40:23 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030612.232114.71088346.davem@redhat.com> References: <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> <16101.4136.328760.955758@robur.slu.se> <20030612.232114.71088346.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3201 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > From: Robert Olsson > Date: Tue, 10 Jun 2003 00:54:32 +0200 > > I'm about to propose some stats even for hash spinning.... > > Do you mind if I apply this? It looks fine. No please do. There is an updated rtstat already. > Next, we should put similar metrics into fib_hash.c Yes. Also "candidate" selection in __rt_hash_shrink can be done in rt_intern_hash. We avoid an the extra spinning over the hash chain. Eventually we can save here and have the candidate always ready. Cheers. --ro From Robert.Olsson@data.slu.se Fri Jun 13 03:51:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 03:51:47 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DApg2x005864 for ; Fri, 13 Jun 2003 03:51:43 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA31262; Fri, 13 Jun 2003 12:50:51 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16105.44171.178489.492922@robur.slu.se> Date: Fri, 13 Jun 2003 12:50:51 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030612.143540.41663883.davem@redhat.com> References: <16102.9418.43884.336925@robur.slu.se> <20030611.234534.52193216.davem@redhat.com> <16104.34463.60472.750699@robur.slu.se> <20030612.143540.41663883.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3202 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > But Robert, do you know "why" the dst management doesn't show up in > your profiles when you rip-out the rtcache? > > It's because to total number of DST entries is so small that they all > fit in the cpu cache. When the rtcache is enabled and we thus have up > to "max_size" DST entries in flight at all times, the dst management > routines show up very clearly because they have a high probability of > missing the cpu cache. > > In particular, have a good look at Simon's profiles. dst_alloc() is > quite near the top there. Yes and that was the intention to get pretty close to pure slowpath. As a result I/we now appreciate the hash better... Cheers. --ro From nakam@linux-ipv6.org Fri Jun 13 04:33:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 04:33:45 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DBXT2x007936 for ; Fri, 13 Jun 2003 04:33:32 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19Qmhu-0000ji-00; Fri, 13 Jun 2003 20:26:58 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: "David S. Miller" , lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030613202652.1d64ed6f.nakam@linux-ipv6.org> In-Reply-To: <3EE83D81.5030605@tml.hut.fi> References: <3EE5F85E.9080006@tml.hut.fi> <20030610.095135.28806569.davem@redhat.com> <3EE6ECD3.6050103@tml.hut.fi> <20030611.202003.74721468.davem@redhat.com> <3EE83D81.5030605@tml.hut.fi> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Fri, 13 Jun 2003 20:26:58 +0900 X-archive-position: 3203 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev On Thu, 12 Jun 2003 11:44:49 +0300 Henrik Petander wrote: > > No it doesn't. When you startup zebra, it may flush the entire > > routing table. > > > I don't see a problem in that. It would only result in a short period of > missing mipv6 route optimization information, until MIPv6 daemon > reinserted the mipv6 information. MIPv6 daemon would do this after > getting a notification of the deletion of the old mipv6 related cached > routes. This would relate to zebra in the same way as pmtu discovery. When routing information is deleted by zebra, the packets from user application will be sent incorrectly until MIPv6 daemon re-inserts the special host route. Anyway, in the xfrm based approach, I suggested that MIPv6 should use the same policy as IPsec(racoon), however, it causes problems as David and Henrik said. So, I suggest a little other plan like below: How about making new policy(MIPv6 policy) in the similar way of IPsec? MIPv6 and IPsec policy are managed by separated list. In that way we have to notice about both of policies and state order in looking xfrm up or creating bundle. I think there is not period issue like above. I think, in networking operation, it seems natural way to use stackable destination, and Henrik's patch(mip6-exthdr.patch;maybe sent to netdev at 5 Jun) is almost the same one. The first step is we should have two kinds of policy, IPsec and MIP6. In this step, we make MIP6 stack to use stackable destination and some XFRM with MIP6 policy. Probably the second step will be to modify xfrm_policy is not a IPsec's SPD but generic one. How about you? Regards, -- Masahide NAKAMURA From lpetande@tml.hut.fi Fri Jun 13 07:20:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 07:20:45 -0700 (PDT) Received: from smtp-3.hut.fi (root@smtp-3.hut.fi [130.233.228.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DEKS2x017168 for ; Fri, 13 Jun 2003 07:20:31 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-3.hut.fi (8.12.9/8.12.9) with ESMTP id h5DEJhn6022363; Fri, 13 Jun 2003 17:19:46 +0300 Message-ID: <3EE9DF66.1090409@tml.hut.fi> Date: Fri, 13 Jun 2003 17:27:50 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Masahide NAKAMURA CC: "David S. Miller" , lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <3EE5F85E.9080006@tml.hut.fi> <20030610.095135.28806569.davem@redhat.com> <3EE6ECD3.6050103@tml.hut.fi> <20030611.202003.74721468.davem@redhat.com> <3EE83D81.5030605@tml.hut.fi> <20030613202652.1d64ed6f.nakam@linux-ipv6.org> In-Reply-To: <20030613202652.1d64ed6f.nakam@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-3.hut.fi) X-DCC-HUTCC-Metrics: smtp-3.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 3204 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Masahide NAKAMURA wrote: > On Thu, 12 Jun 2003 11:44:49 +0300 > Henrik Petander wrote: >> >>I don't see a problem in that. It would only result in a short period of >>missing mipv6 route optimization information, until MIPv6 daemon >>reinserted the mipv6 information. MIPv6 daemon would do this after >>getting a notification of the deletion of the old mipv6 related cached >>routes. This would relate to zebra in the same way as pmtu discovery. > > > When routing information is deleted by zebra, the packets from user > application will be sent incorrectly until MIPv6 daemon re-inserts > the special host route. The packets between Mobile Node and Correspondent Node would be just sent through the tunnel via Home Agent. Only parts of the direct traffic between MN and HA would be temporarily lost. Since HA is conceptually a router, the traffic between MN and HA should be very limited and Mobile IP / IPSec signaling would not be affected. This is why IMO the use of soft state is acceptable. > How about making new policy(MIPv6 policy) in the similar way of IPsec? > MIPv6 and IPsec policy are managed by separated list. It would be a good solution from our POV, since it should work well with IPSec. Regards, Henrik From carl@bookmanassociates.com Fri Jun 13 08:00:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 08:00:42 -0700 (PDT) Received: from gulf.vosn.net (gulf.vosn.net [209.197.240.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DF0a2x018991 for ; Fri, 13 Jun 2003 08:00:36 -0700 Received: from [212.18.229.220] (helo=BILL) by gulf.vosn.net with esmtp (Exim 3.36 #1) id 19Qq2b-0000xx-00 for netdev@oss.sgi.com; Fri, 13 Jun 2003 16:00:34 +0100 From: carl@bookmanassociates.com To: netdev@oss.sgi.com Date: Fri, 13 Jun 2003 16:00:32 +0100 MIME-Version: 1.0 Subject: Linux kernel hanging with certain networking configuration - uncertain of relevant maintainer Message-ID: <3EE9F520.13447.9611F8@localhost> Priority: normal X-mailer: Pegasus Mail for Windows (v4.11) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gulf.vosn.net X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [0 0] X-AntiAbuse: Sender Address Domain - bookmanassociates.com X-archive-position: 3205 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carl@bookmanassociates.com Precedence: bulk X-list: netdev Hello, Sorry to bother you if you're the wrong person to contact, I got your name from the MAINTAINERS list in /usr/src/linux on my distribution (kernel 2.4.4). If you are the wrong person could you tell me where I should go please? We have successfully used a linux server in our small office for a year now and it has been almost entirely stable. Recently I configured a VPN to connect to one of our client sites. I installed a linux server at that site also, because they are behind a firewall over which I have no control (being on a medical school's network) I used the method described in the Firewall Piercing mini-howto. Namely I configured pppd to run ssh to connect from that machine to ours here and run pppd on this end, once the tunnel is set up both ends add routes through it to each other's subnets in the ip-up scripts on either end (neither end is using PPP for any other connections, both are connected to the "outside world" via ethernet NICs). This tunnel is largely stable and satisfactory but if intensive load is placed on it (the reliable, reproducible load I use is a remote control software package called PC Duo) then the server machine at our end hangs and has to be rebooted using the reset button on the front or a power off/on cycle. The machine becomes completely unresponsive to pinging from other machines on our LAN, the keyboard becomes unresponsive and the screen freezes. I understand that this is called a "kernel panic" but, unlike other reports I have seen on the internet, there is no "oops" dump thing that comes up on the screen - the screen simply freezes. To add to the mystery I have found this problem with both the OpenSSH package and the non-commercial SSH package (v3.2.3) from ftp.ssh.com PLUS I have also seen something similar where I ran an SSH client on a Windows PC here that connected to the SSH daemon on their firewall with the relevant port for PCDuo control forwarded and then ran an ssh client on that firewall machine through to an SSH daemon on the linux box I support in our client's office, again forwarding the port. This configuration doesn't use ppp at any point yet when I ran PCDuo control over that connection it worked briefly then hung their server, which again had to be rebooted! Can you help at all? Is there some way of getting this "oops" output from linux if that would help? Yours with regards, Carl Peto From carl@bookmanassociates.com Fri Jun 13 08:12:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 08:12:50 -0700 (PDT) Received: from gulf.vosn.net (gulf.vosn.net [209.197.240.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DFCe2x019758 for ; Fri, 13 Jun 2003 08:12:40 -0700 Received: from [212.18.229.220] (helo=BILL) by gulf.vosn.net with esmtp (Exim 3.36 #1) id 19QqEJ-0001Dh-00 for netdev@oss.sgi.com; Fri, 13 Jun 2003 16:12:40 +0100 From: carl@bookmanassociates.com To: netdev@oss.sgi.com Date: Fri, 13 Jun 2003 16:12:38 +0100 MIME-Version: 1.0 Subject: (Fwd) Linux kernel hanging with certain networking configurati Message-ID: <3EE9F7F6.27521.A125F7@localhost> Priority: normal X-mailer: Pegasus Mail for Windows (v4.11) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gulf.vosn.net X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [0 0] X-AntiAbuse: Sender Address Domain - bookmanassociates.com X-archive-position: 3206 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carl@bookmanassociates.com Precedence: bulk X-list: netdev Should have included some debug messages (as recommended in REPORTING- BUGS in /usr/src/linux). carl@flanger:~ > cat /proc/version Linux version 2.4.4-4GB (root@Pentium.suse.de) (gcc version 2.95.3 20010315 (SuSE)) #1 Fri May 18 14:11:12 GMT 2001 carl@flanger:~ > cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 7 cpu MHz : 2405.504 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm bogomips : 4797.23 carl@flanger:~ > cat /proc/modules ipv6 126272 -1 (autoclean) printer 4960 0 (unused) mousedev 4032 0 (unused) usb-uhci 21840 0 (unused) 8139too 11520 1 (autoclean) reiserfs 156432 1 hid 11760 0 (unused) input 3168 0 [mousedev hid] usbcore 47120 1 [printer usb-uhci hid] carl@flanger:~ > cat /proc/ioports /proc/iomem 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0070-007f : rtc 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial(auto) 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial(auto) 0500-051f : PCI device 8086:24c3 0cf8-0cff : PCI conf1 c000-cfff : PCI Bus #01 c000-c0ff : PCI device 10ec:8139 c000-c0ff : 8139too c400-c47f : PCI device 1106:3044 d000-d01f : PCI device 8086:24c4 d000-d01f : usb-uhci d400-d41f : PCI device 8086:24c7 d400-d41f : usb-uhci d800-d81f : PCI device 8086:24c2 d800-d81f : usb-uhci e000-e0ff : PCI device 8086:24c5 e400-e43f : PCI device 8086:24c5 f000-f00f : PCI device 8086:24cb f000-f007 : ide0 f008-f00f : ide1 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000f0000-000fffff : System ROM 00100000-0f7effff : System RAM 00100000-002327d1 : Kernel code 002327d2-0031bdcb : Kernel data 0f7f0000-0f7f2fff : ACPI Non-volatile Storage 0f7f3000-0f7fffff : ACPI Tables 10000000-100003ff : PCI device 8086:24cb e0000000-e7ffffff : PCI device 8086:2562 e8000000-ebffffff : PCI device 8086:2560 ec000000-ec0fffff : PCI Bus #01 ec000000-ec0000ff : PCI device 10ec:8139 ec000000-ec0000ff : 8139too ec001000-ec0017ff : PCI device 1106:3044 ec100000-ec17ffff : PCI device 8086:2562 ec180000-ec1803ff : PCI device 8086:24cd ec181000-ec1811ff : PCI device 8086:24c5 ec182000-ec1820ff : PCI device 8086:24c5 fec00000-ffffffff : reserved carl@flanger:~ > cat /proc/scsi/scsi Attached devices: none Attached devices: none carl@flanger:~ > su Password: USER, n.: The word computer professionals use when they mean "idiot." -- Dave Barry, "Claw Your Way to the Top" DANGER!! THIS IS FLANGER!!:/home/carl # lspci -vvv 00:00.0 Host bridge: Intel Corporation: Unknown device 2560 (rev 03) Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer: Unknown device fb50 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corporation: Unknown device 24c0 (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- [size=8] Region 1: I/O ports at [size=4] Region 2: I/O ports at [size=8] Region 3: I/O ports at [size=4] Region 4: I/O ports at f000 [size=16] Region 5: Memory at 10000000 (32-bit, non-prefetchable) [size=1K] 00:1f.3 SMBus: Intel Corporation: Unknown device 24c3 (rev 02) Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer: Unknown device fb50 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- X-RS-ID: X-RS-Flags: 1,0,1,1,0,0,0 X-RS-Sigset: 0 To: netdev@oss.sgi.com Subject: Linux kernel hanging with certain networking configuration - uncertain of relevant maintainer Comments: Confirmation of reading was requested. MIME-Version: 1.0 Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 8BIT Date: Fri, 13 Jun 2003 15:59:01 +0100 Hello, Sorry to bother you if you're the wrong person to contact, I got your name from the MAINTAINERS list in /usr/src/linux on my distribution (kernel 2.4.4). If you are the wrong person could you tell me where I should go please? We have successfully used a linux server in our small office for a year now and it has been almost entirely stable. Recently I configured a VPN to connect to one of our client sites. I installed a linux server at that site also, because they are behind a firewall over which I have no control (being on a medical school's network) I used the method described in the Firewall Piercing mini-howto. Namely I configured pppd to run ssh to connect from that machine to ours here and run pppd on this end, once the tunnel is set up both ends add routes through it to each other's subnets in the ip-up scripts on either end (neither end is using PPP for any other connections, both are connected to the "outside world" via ethernet NICs). This tunnel is largely stable and satisfactory but if intensive load is placed on it (the reliable, reproducible load I use is a remote control software package called PC Duo) then the server machine at our end hangs and has to be rebooted using the reset button on the front or a power off/on cycle. The machine becomes completely unresponsive to pinging from other machines on our LAN, the keyboard becomes unresponsive and the screen freezes. I understand that this is called a "kernel panic" but, unlike other reports I have seen on the internet, there is no "oops" dump thing that comes up on the screen - the screen simply freezes. To add to the mystery I have found this problem with both the OpenSSH package and the non-commercial SSH package (v3.2.3) from ftp.ssh.com PLUS I have also seen something similar where I ran an SSH client on a Windows PC here that connected to the SSH daemon on their firewall with the relevant port for PCDuo control forwarded and then ran an ssh client on that firewall machine through to an SSH daemon on the linux box I support in our client's office, again forwarding the port. This configuration doesn't use ppp at any point yet when I ran PCDuo control over that connection it worked briefly then hung their server, which again had to be rebooted! Can you help at all? Is there some way of getting this "oops" output from linux if that would help? Yours with regards, Carl Peto ------- End of forwarded message ------- From mk@karaba.org Fri Jun 13 09:04:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 09:04:42 -0700 (PDT) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DG4P2x022056 for ; Fri, 13 Jun 2003 09:04:28 -0700 Received: from [3ffe:501:1057:710::1] (helo=hyakusiki.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 19Qr1q-0005oS-00; Sat, 14 Jun 2003 01:03:50 +0900 Date: Sat, 14 Jun 2003 01:03:50 +0900 Message-ID: <87smqerml5.wl@karaba.org> From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: "David S. Miller" Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: [PATCH] xfrm ip6ip6 (revised) In-Reply-To: <20030601.013040.116362760.davem@redhat.com> References: <87fzmv5ejc.wl@karaba.org> <20030601.013040.116362760.davem@redhat.com> MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 3207 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello, I recreated a xfrm ip6ip6 patch. The changes to the previous patch are: - to allocate unique spi values in xfrm6_tunnel.c by using just simple open addressing hash, - to introduce device-like ip6ip6 handling, and - to fix some bugs. This patch is against CS1.1304. Chould you check this? Regards, -mk At Sun, 01 Jun 2003 01:30:40 -0700 (PDT), "David S. Miller" wrote: (snipped) > I would suggest following implementation: > > 1) Implement something similar to xfrm_alloc_spi(t, 1, ~(u32)0) > > It just needs to allocate unique SPI numbers local to > xfrm6_tunnel.c We mark "SPI" value zero as reserved and > to indicate failed lookup. > > 2) Create hash table, it is keyed by ipv6 address and hash table > entries give SPI values. > > So on input you would say something like: > > u32 spi; > > spi = spihash_lookup(&iph->saddr); > if (!spi) > goto drop; > x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, spi, > IPPROTO_IPV6, AF_INET6); (snipped) Index: linux25-XFRM6_TUNNEL-20030612/include/net/xfrm.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/xfrm.h,v retrieving revision 1.1.1.25 retrieving revision 1.1.1.25.2.1 diff -u -r1.1.1.25 -r1.1.1.25.2.1 --- linux25-XFRM6_TUNNEL-20030612/include/net/xfrm.h 10 Jun 2003 13:21:39 -0000 1.1.1.25 +++ linux25-XFRM6_TUNNEL-20030612/include/net/xfrm.h 12 Jun 2003 15:30:16 -0000 1.1.1.25.2.1 @@ -494,10 +494,6 @@ return 0; } -/* placeholder until xfrm6_tunnel.c is written */ -static inline int xfrm6_tunnel_check_size(struct sk_buff *skb) -{ return 0; } - /* A struct encoding bundle of transformations to apply to some set of flow. * * dst->child points to the next element of bundle. @@ -748,6 +744,12 @@ void (*err_handler)(struct sk_buff *skb, void *info); }; +struct xfrm6_tunnel { + int (*handler)(struct sk_buff **pskb, unsigned int *nhoffp); + void (*err_handler)(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info); +}; + extern void xfrm_init(void); extern void xfrm4_init(void); extern void xfrm4_fini(void); @@ -781,6 +783,11 @@ extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler); extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler); extern int xfrm4_tunnel_check_size(struct sk_buff *skb); +extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler); +extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler); +extern int xfrm6_tunnel_check_size(struct sk_buff *skb); +extern u32 xfrm6_tunnel_alloc_spi(xfrm_address_t *saddr); +extern u32 xfrm6_tunnel_spi_lookup(xfrm_address_t *saddr); extern int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp); extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); Index: linux25-XFRM6_TUNNEL-20030612/net/ipv6/Makefile =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/Makefile,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.2.1 diff -u -r1.1.1.13 -r1.1.1.13.2.1 --- linux25-XFRM6_TUNNEL-20030612/net/ipv6/Makefile 10 Jun 2003 13:21:55 -0000 1.1.1.13 +++ linux25-XFRM6_TUNNEL-20030612/net/ipv6/Makefile 12 Jun 2003 15:29:53 -0000 1.1.1.13.2.1 @@ -9,7 +9,7 @@ protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \ exthdrs.o sysctl_net_ipv6.o datagram.o proc.o \ ip6_flowlabel.o ipv6_syms.o \ - xfrm6_policy.o xfrm6_state.o xfrm6_input.o + xfrm6_policy.o xfrm6_state.o xfrm6_input.o xfrm6_tunnel.o obj-$(CONFIG_INET6_AH) += ah6.o obj-$(CONFIG_INET6_ESP) += esp6.o Index: linux25-XFRM6_TUNNEL-20030612/net/ipv6/ip6_tunnel.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_tunnel.c,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -u -r1.1.1.1 -r1.1.1.1.2.1 --- linux25-XFRM6_TUNNEL-20030612/net/ipv6/ip6_tunnel.c 10 Jun 2003 13:21:55 -0000 1.1.1.1 +++ linux25-XFRM6_TUNNEL-20030612/net/ipv6/ip6_tunnel.c 12 Jun 2003 15:29:53 -0000 1.1.1.1.2.1 @@ -48,6 +48,7 @@ #include #include #include +#include MODULE_AUTHOR("Ville Nuorvala"); MODULE_DESCRIPTION("IPv6-in-IPv6 tunnel"); @@ -1174,10 +1175,9 @@ return 0; } -static struct inet6_protocol ip6ip6_protocol = { +static struct xfrm_tunnel ip6ip6_handler = { .handler = ip6ip6_rcv, .err_handler = ip6ip6_err, - .flags = INET6_PROTO_FINAL }; /** @@ -1192,6 +1192,11 @@ struct sock *sk; struct ipv6_pinfo *np; + if (xfrm6_tunnel_register(&ip6ip6_handler) < 0) { + printk(KERN_INFO "ip6ip6 init: can't register tunnel\n"); + return -EAGAIN; + } + ip6ip6_fb_tnl_dev.priv = (void *) &ip6ip6_fb_tnl; for (i = 0; i < NR_CPUS; i++) { @@ -1216,10 +1221,6 @@ sk->sk_prot->unhash(sk); } - if ((err = inet6_add_protocol(&ip6ip6_protocol, IPPROTO_IPV6)) < 0) { - printk(KERN_ERR "Failed to register IPv6 protocol\n"); - goto fail; - } SET_MODULE_OWNER(&ip6ip6_fb_tnl_dev); register_netdev(&ip6ip6_fb_tnl_dev); @@ -1243,9 +1244,10 @@ { int i; - unregister_netdev(&ip6ip6_fb_tnl_dev); + if (xfrm6_tunnel_deregister(&ip6ip6_handler) < 0) + printk(KERN_INFO "ip6ip6 close: can't deregister tunnel\n"); - inet6_del_protocol(&ip6ip6_protocol, IPPROTO_IPV6); + unregister_netdev(&ip6ip6_fb_tnl_dev); for (i = 0; i < NR_CPUS; i++) { if (!cpu_possible(i)) Index: linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipcomp6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipcomp6.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 ipcomp6.c --- linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipcomp6.c 10 Jun 2003 13:21:54 -0000 1.1.1.3 +++ linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipcomp6.c 13 Jun 2003 09:44:59 -0000 @@ -254,6 +254,66 @@ xfrm_state_put(x); } +static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x) +{ + struct xfrm_state *t = NULL; + + t = xfrm_state_alloc(); + if (!t) + goto out; + + t->id.proto = IPPROTO_IPV6; + t->id.spi = xfrm6_tunnel_alloc_spi((xfrm_address_t *)&x->props.saddr); + memcpy(t->id.daddr.a6, x->id.daddr.a6, sizeof(struct in6_addr)); + memcpy(&t->sel, &x->sel, sizeof(t->sel)); + t->props.family = AF_INET6; + t->props.mode = 1; + memcpy(t->props.saddr.a6, x->props.saddr.a6, sizeof(struct in6_addr)); + + t->type = xfrm_get_type(IPPROTO_IPV6, t->props.family); + if (t->type == NULL) + goto error; + + if (t->type->init_state(t, NULL)) + goto error; + + t->km.state = XFRM_STATE_VALID; + atomic_set(&t->tunnel_users, 1); + +out: + return t; + +error: + xfrm_state_put(t); + goto out; +} + +static int ipcomp6_tunnel_attach(struct xfrm_state *x) +{ + int err = 0; + struct xfrm_state *t = NULL; + u32 spi; + + spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)&x->props.saddr); + if (spi) + t = xfrm_state_lookup((xfrm_address_t *)&x->id.daddr, + spi, IPPROTO_IPV6, AF_INET6); + if (!t) { + t = ipcomp6_tunnel_create(x); + if (!t) { + err = -EINVAL; + goto out; + } + xfrm_state_insert(t); + xfrm_state_hold(t); + } + x->tunnel = t; + atomic_inc(&t->tunnel_users); + +out: + return err; +} + static void ipcomp6_free_data(struct ipcomp_data *ipcd) { if (ipcd->tfm) @@ -292,6 +352,12 @@ ipcd->tfm = crypto_alloc_tfm(x->calg->alg_name, 0); if (!ipcd->tfm) goto error; + + if (x->props.mode) { + err = ipcomp6_tunnel_attach(x); + if (err) + goto error; + } calg_desc = xfrm_calg_get_byname(x->calg->alg_name); BUG_ON(!calg_desc); Index: linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipv6_syms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipv6_syms.c,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.2.1 diff -u -r1.1.1.13 -r1.1.1.13.2.1 --- linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipv6_syms.c 10 Jun 2003 13:21:55 -0000 1.1.1.13 +++ linux25-XFRM6_TUNNEL-20030612/net/ipv6/ipv6_syms.c 12 Jun 2003 15:29:53 -0000 1.1.1.13.2.1 @@ -38,6 +38,11 @@ EXPORT_SYMBOL(ip6_find_1stfragopt); EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_clear_mutable_options); +EXPORT_SYMBOL(xfrm6_tunnel_register); +EXPORT_SYMBOL(xfrm6_tunnel_deregister); +EXPORT_SYMBOL(xfrm6_tunnel_check_size); +EXPORT_SYMBOL(xfrm6_tunnel_alloc_spi); +EXPORT_SYMBOL(xfrm6_tunnel_spi_lookup); EXPORT_SYMBOL(rt6_lookup); EXPORT_SYMBOL(fl6_sock_lookup); EXPORT_SYMBOL(ipv6_ext_hdr); Index: linux25-XFRM6_TUNNEL-20030612/net/ipv6/xfrm6_tunnel.c =================================================================== RCS file: linux25-XFRM6_TUNNEL-20030612/net/ipv6/xfrm6_tunnel.c diff -N linux25-XFRM6_TUNNEL-20030612/net/ipv6/xfrm6_tunnel.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ linux25-XFRM6_TUNNEL-20030612/net/ipv6/xfrm6_tunnel.c 13 Jun 2003 09:44:59 -0000 @@ -0,0 +1,380 @@ +/* + * Copyright (C)2003 USAGI/WIDE Project + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Author Mitsuru KANDA + * + * Based on xfrm4_tunnel + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define XFRM6_TUNNEL_HSIZE 1024 +/* note: we assume index of xfrm_tunnel_table[] == spi */ +static xfrm_address_t *xfrm6_tunnel_table[XFRM6_TUNNEL_HSIZE]; + +static spinlock_t xfrm6_tunnel_lock = SPIN_LOCK_UNLOCKED; + +static unsigned xfrm6_addr_hash(xfrm_address_t *addr) +{ + unsigned h; + h = ntohl(addr->a6[0]^addr->a6[1]^addr->a6[2]^addr->a6[3]); + h = (h ^ (h>>16)) % XFRM6_TUNNEL_HSIZE; + return h; +} + +static void xfrm6_tunnel_htable_init(void) +{ + int i; + for (i=0; idst; + + mtu = dst_pmtu(dst) - sizeof(struct ipv6hdr); + if (mtu < IPV6_MIN_MTU) + mtu = IPV6_MIN_MTU; + + if (skb->len > mtu) { + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev); + ret = -EMSGSIZE; + } + + return ret; +} + +static int ip6ip6_output(struct sk_buff *skb) +{ + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph, *top_iph; + int err; + + if ((err = xfrm6_tunnel_check_size(skb)) != 0) + goto error_nolock; + + iph = skb->nh.ipv6h; + + top_iph = (struct ipv6hdr *)skb_push(skb, x->props.header_len); + top_iph->version = 6; + top_iph->priority = iph->priority; + top_iph->flow_lbl[0] = iph->flow_lbl[0]; + top_iph->flow_lbl[1] = iph->flow_lbl[1]; + top_iph->flow_lbl[2] = iph->flow_lbl[2]; + top_iph->nexthdr = IPPROTO_IPV6; + top_iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + top_iph->hop_limit = iph->hop_limit; + ipv6_addr_copy(&top_iph->saddr, (struct in6_addr *)&x->props.saddr); + ipv6_addr_copy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr); + skb->nh.raw = skb->data; + skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr); + + x->curlft.bytes += skb->len; + x->curlft.packets++; + + spin_unlock_bh(&x->lock); + + if ((skb->dst = dst_pop(dst)) == NULL) { + kfree_skb(skb); + err = -EHOSTUNREACH; + goto error_nolock; + } + + return NET_XMIT_BYPASS; + +error_nolock: + kfree_skb(skb); + return err; +} + +static int ip6ip6_xfrm_rcv(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) +{ + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + return -EINVAL; + + skb->mac.raw = skb->nh.raw; + skb->nh.raw = skb->data; + dst_release(skb->dst); + skb->dst = NULL; + skb->protocol = htons(ETH_P_IPV6); + skb->pkt_type = PACKET_HOST; + netif_rx(skb); + + return 0; +} + +static struct xfrm6_tunnel *ip6ip6_handler; +static DECLARE_MUTEX(xfrm6_tunnel_sem); + +int xfrm6_tunnel_register(struct xfrm6_tunnel *handler) +{ + int ret; + + down(&xfrm6_tunnel_sem); + ret = 0; + if (ip6ip6_handler != NULL) + ret = -EINVAL; + if (!ret) + ip6ip6_handler = handler; + up(&xfrm6_tunnel_sem); + + return ret; +} + +int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler) +{ + int ret; + + down(&xfrm6_tunnel_sem); + ret = 0; + if (ip6ip6_handler != handler) + ret = -EINVAL; + if (!ret) + ip6ip6_handler = NULL; + up(&xfrm6_tunnel_sem); + + synchronize_net(); + + return ret; +} + +static int ip6ip6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) +{ + struct sk_buff *skb = *pskb; + struct xfrm6_tunnel *handler = ip6ip6_handler; + struct xfrm_state *x = NULL; + struct ipv6hdr *iph = skb->nh.ipv6h; + int err = 0; + u32 spi; + + /* device-like_ip6ip6_handler() */ + if (handler) { + err = handler->handler(pskb, nhoffp); + if (!err) + goto out; + } + + spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)&iph->saddr); + x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, + spi, + IPPROTO_IPV6, AF_INET6); + + if (!x) + goto drop; + + spin_lock(&x->lock); + + if (unlikely(x->km.state != XFRM_STATE_VALID)) + goto drop_unlock; + + err = ip6ip6_xfrm_rcv(x, NULL, skb); + if (err) + goto drop_unlock; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock(&x->lock); + xfrm_state_put(x); + +out: + return 0; + +drop_unlock: + spin_unlock(&x->lock); + xfrm_state_put(x); +drop: + kfree_skb(skb); + + return -1; +} + +static void ip6ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct xfrm6_tunnel *handler = ip6ip6_handler; + + /* call here first for device-like ip6ip6 err handling */ + if (handler) { + handler->err_handler(skb, opt, type, code, offset, info); + return; + } + + /* xfrm ip6ip6 native err handling */ + switch (type) { + case ICMPV6_DEST_UNREACH: + switch (code) { + case ICMPV6_NOROUTE: + case ICMPV6_ADM_PROHIBITED: + case ICMPV6_NOT_NEIGHBOUR: + case ICMPV6_ADDR_UNREACH: + case ICMPV6_PORT_UNREACH: + default: + printk(KERN_ERR "xfrm ip6ip6: Destination Unreach.\n"); + break; + } + break; + case ICMPV6_PKT_TOOBIG: + printk(KERN_ERR "xfrm ip6ip6: Packet Too Big.\n"); + break; + case ICMPV6_TIME_EXCEED: + switch (code) { + case ICMPV6_EXC_HOPLIMIT: + printk(KERN_ERR "xfrm ip6ip6: Too small Hoplimit.\n"); + break; + case ICMPV6_EXC_FRAGTIME: + default: + break; + } + break; + case ICMPV6_PARAMPROB: + switch (code) { + case ICMPV6_HDR_FIELD: break; + case ICMPV6_UNK_NEXTHDR: break; + case ICMPV6_UNK_OPTION: break; + } + break; + default: + break; + } + return; +} + +static int ip6ip6_init_state(struct xfrm_state *x, void *args) +{ + if (!x->props.mode) + return -EINVAL; + + x->props.header_len = sizeof(struct ipv6hdr); + + return 0; +} + +static void ip6ip6_destroy(struct xfrm_state *x) +{ + xfrm6_tunnel_free_spi((xfrm_address_t *)&x->props.saddr); +} + +static struct xfrm_type ip6ip6_type = { + .description = "IP6IP6", + .owner = THIS_MODULE, + .proto = IPPROTO_IPV6, + .init_state = ip6ip6_init_state, + .destructor = ip6ip6_destroy, + .input = ip6ip6_xfrm_rcv, + .output = ip6ip6_output, +}; + +static struct inet6_protocol ip6ip6_protocol = { + .handler = ip6ip6_rcv, + .err_handler = ip6ip6_err, + .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL, +}; + +static int __init ip6ip6_init(void) +{ + if (xfrm_register_type(&ip6ip6_type, AF_INET6) < 0) { + printk(KERN_INFO "ip6ip6 init: can't add xfrm type\n"); + return -EAGAIN; + } + if (inet6_add_protocol(&ip6ip6_protocol, IPPROTO_IPV6) < 0) { + printk(KERN_INFO "ip6ip6 init: can't add protocol\n"); + xfrm_unregister_type(&ip6ip6_type, AF_INET6); + return -EAGAIN; + } + xfrm6_tunnel_htable_init(); + return 0; +} + +static void __exit ip6ip6_fini(void) +{ + if (inet6_del_protocol(&ip6ip6_protocol, IPPROTO_IPV6) < 0) + printk(KERN_INFO "ip6ip6 close: can't remove protocol\n"); + if (xfrm_unregister_type(&ip6ip6_type, AF_INET6) < 0) + printk(KERN_INFO "ip6ip6 close: can't remove xfrm type\n"); +} + +module_init(ip6ip6_init); +module_exit(ip6ip6_fini); +MODULE_LICENSE("GPL"); Index: linux25-XFRM6_TUNNEL-20030612/net/xfrm/xfrm_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/xfrm/xfrm_output.c,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.24.1 diff -u -r1.1.1.1 -r1.1.1.1.24.1 --- linux25-XFRM6_TUNNEL-20030612/net/xfrm/xfrm_output.c 6 May 2003 12:43:55 -0000 1.1.1.1 +++ linux25-XFRM6_TUNNEL-20030612/net/xfrm/xfrm_output.c 12 Jun 2003 15:29:06 -0000 1.1.1.1.24.1 @@ -27,11 +27,11 @@ case AF_INET: err = xfrm4_tunnel_check_size(skb); break; - +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) case AF_INET6: err = xfrm6_tunnel_check_size(skb); break; - +#endif default: err = -EINVAL; } From yoshfuji@linux-ipv6.org Fri Jun 13 09:10:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 09:11:06 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DGAs2x022659 for ; Fri, 13 Jun 2003 09:10:57 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5DGBkBo021184; Sat, 14 Jun 2003 01:11:46 +0900 Date: Sat, 14 Jun 2003 01:11:46 +0900 (JST) Message-Id: <20030614.011146.116698702.yoshfuji@linux-ipv6.org> To: mk@linux-ipv6.org Cc: davem@redhat.com, jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 (revised) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <87smqerml5.wl@karaba.org> References: <87fzmv5ejc.wl@karaba.org> <20030601.013040.116362760.davem@redhat.com> <87smqerml5.wl@karaba.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3208 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <87smqerml5.wl@karaba.org> (at Sat, 14 Jun 2003 01:03:50 +0900), Mitsuru KANDA / $B?@ED(B $B=<(B says: > Chould you check this? Kanda-san, would you use the dynamic netdev allication scheme by alloc_netdev(), please? :-) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Fri Jun 13 09:12:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 09:12:19 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DGCE2x023001 for ; Fri, 13 Jun 2003 09:12:14 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5DGD9Bo021213; Sat, 14 Jun 2003 01:13:09 +0900 Date: Sat, 14 Jun 2003 01:13:09 +0900 (JST) Message-Id: <20030614.011309.29519822.yoshfuji@linux-ipv6.org> To: mk@linux-ipv6.org Cc: davem@redhat.com, jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 (revised) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030614.011146.116698702.yoshfuji@linux-ipv6.org> References: <20030601.013040.116362760.davem@redhat.com> <87smqerml5.wl@karaba.org> <20030614.011146.116698702.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3209 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030614.011146.116698702.yoshfuji@linux-ipv6.org> (at Sat, 14 Jun 2003 01:11:46 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > Kanda-san, would you use the dynamic netdev allication scheme by > alloc_netdev(), please? :-) Oops, this comment is absolitely wrong... Very sorry... -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From haveblue@us.ibm.com Fri Jun 13 09:22:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 09:22:59 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DGMf2x023678 for ; Fri, 13 Jun 2003 09:22:49 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5DGMXll240012; Fri, 13 Jun 2003 12:22:33 -0400 Received: from nighthawk.sr71.net (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5DGMVxF058210; Fri, 13 Jun 2003 10:22:31 -0600 Subject: RE: e1000 performance hack for ppc64 (Power4) From: Dave Hansen To: Herman Dierks Cc: "Feldman, Scott" , David Gibson , Linux Kernel Mailing List , Anton Blanchard , Nancy J Milliner , Ricardo C Gonzalez , Brian Twichell , netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1055521263.3531.2055.camel@nighthawk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 13 Jun 2003 09:21:03 -0700 Content-Transfer-Encoding: 7bit X-archive-position: 3210 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: netdev Too long to quote: http://marc.theaimsgroup.com/?t=105538879600001&r=1&w=2 Wouldn't you get most of the benefit from copying that stuff around in the driver if you allocated the skb->data aligned in the first place? There's already code to align them on CPU cache boundaries: #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) So, do something like this: #ifdef ARCH_ALIGN_SKB_BYTES #define SKB_ALIGN_BYTES ARCH_ALIGN_SKB_BYTES #else #define SKB_ALIGN_BYTES SMP_CACHE_BYTES #endif #define SKB_DATA_ALIGN(X) (((X) + (ARCH_ALIGN_SKB - 1)) & \ ~(SKB_ALIGN_BYTES - 1)) You could easily make this adaptive to no align on th arch size when the request is bigger than that, just like in the e1000 patch you posted. -- Dave Hansen haveblue@us.ibm.com From hdierks@us.ibm.com Fri Jun 13 10:03:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 10:03:44 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DH3T2x026752 for ; Fri, 13 Jun 2003 10:03:38 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5DH3Ntd188490; Fri, 13 Jun 2003 13:03:23 -0400 Received: from d01ml065.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by northrelay03.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5DH3KHH114094; Fri, 13 Jun 2003 13:03:21 -0400 Importance: Normal Sensitivity: Subject: RE: e1000 performance hack for ppc64 (Power4) To: haveblue@us.ibm.com Cc: "Feldman, Scott" , David Gibson , Linux Kernel Mailing List , Anton Blanchard , "Nancy J Milliner" , "Ricardo C Gonzalez" , "Brian Twichell" , netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: "Herman Dierks" Date: Fri, 13 Jun 2003 12:03:00 -0500 X-MIMETrack: Serialize by Router on D01ML065/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/13/2003 01:03:21 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3211 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hdierks@us.ibm.com Precedence: bulk X-list: netdev I will let Anton respond to this. I think he may have tried this some time back in his early prototypes to fix this. I think the problem was not where the buffer started but where the packet ended up within the buffer. Due to varying sizes of TCP and IP headers the packet ended up at some non-cache aligned address. What we need for the DMA to work well is to have the final packet (with datalink headers) starting on a cache line as its the final packet that must be DMA'd. In fact it may need to to be aligned to a higher level than that (not sure). haveblue@us.ltcfwd.linux.ibm.com on 06/13/2003 11:21:03 AM To: Herman Dierks/Austin/IBM@IBMUS cc: "Feldman, Scott" , David Gibson , Linux Kernel Mailing List , Anton Blanchard , Nancy J Milliner/Austin/IBM@IBMUS, Ricardo C Gonzalez/Austin/IBM@ibmus, Brian Twichell/Austin/IBM@IBMUS, netdev@oss.sgi.com Subject: RE: e1000 performance hack for ppc64 (Power4) Too long to quote: http://marc.theaimsgroup.com/?t=105538879600001&r=1&w=2 Wouldn't you get most of the benefit from copying that stuff around in the driver if you allocated the skb->data aligned in the first place? There's already code to align them on CPU cache boundaries: #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) So, do something like this: #ifdef ARCH_ALIGN_SKB_BYTES #define SKB_ALIGN_BYTES ARCH_ALIGN_SKB_BYTES #else #define SKB_ALIGN_BYTES SMP_CACHE_BYTES #endif #define SKB_DATA_ALIGN(X) (((X) + (ARCH_ALIGN_SKB - 1)) & \ ~(SKB_ALIGN_BYTES - 1)) You could easily make this adaptive to no align on th arch size when the request is bigger than that, just like in the e1000 patch you posted. -- Dave Hansen haveblue@us.ibm.com From Robert.Olsson@data.slu.se Fri Jun 13 10:16:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 10:17:01 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DHGr2x027537 for ; Fri, 13 Jun 2003 10:16:54 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id TAA05034; Fri, 13 Jun 2003 19:15:48 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16106.1732.795402.473613@robur.slu.se> Date: Fri, 13 Jun 2003 19:15:48 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, ralph@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030612.223818.74754074.davem@redhat.com> References: <16102.9418.43884.336925@robur.slu.se> <20030610.115759.26513736.davem@redhat.com> <16103.27039.204333.952703@robur.slu.se> <20030612.223818.74754074.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3212 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > From: Robert Olsson > Date: Wed, 11 Jun 2003 19:40:47 +0200 > > vma samples %-age symbol name > c023c038 107340 33.143 fn_hash_lookup > > Ok, let's optimize our datastructures for how we actually > use them :-) Also, fn_zone shrunk by 8 bytes. Some percent less on our XEON/UP here. Input rate 2*190 kpps clone_skb=1. dst hash code in. Without Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 2988282 9680382 9680382 7011722 11 0 0 0 BRU eth1 1500 0 15 0 0 0 2988291 0 0 0 BRU eth2 1500 0 2988336 9681671 9681671 7011667 3 0 0 0 BRU eth3 1500 0 2 0 0 0 2988337 0 0 0 BRU 00002100 000005e8 005b2c4a 00000000 00000001 00000000 00000000 00000000 00000000 00000002 00000000 005b1c49 005b1c3c 00000007 00000000 00bd877c 00000002 With patch Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 2877562 9698244 9698244 7122444 11 0 0 0 BRU eth1 1500 0 13 0 0 0 2819350 0 0 0 BRU eth2 1500 0 2877512 9693477 9693477 7122492 4 0 0 0 BRU eth3 1500 0 1 0 0 0 2877511 0 0 0 BRU 00001ec7 0000058e 0057cb3c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0057bb37 0057bb2c 00000008 00000000 00b0c4a6 00000000 Time for a beer now... Still a lot of progress this week. In practice we never see "dst cache overflow" again. Cheers. --ro From vnuorval@tcs.hut.fi Fri Jun 13 11:08:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 11:08:39 -0700 (PDT) Received: from mail.tcs.hut.fi (mail.tcs.hut.fi [130.233.215.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DI8V2x003165 for ; Fri, 13 Jun 2003 11:08:32 -0700 Received: from rhea.tcs.hut.fi (rhea.tcs.hut.fi [130.233.215.147]) by mail.tcs.hut.fi (Postfix) with ESMTP id B738380020F; Fri, 13 Jun 2003 21:08:30 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5DI8U5L004939; Fri, 13 Jun 2003 21:08:30 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5DI8PMg004935; Fri, 13 Jun 2003 21:08:26 +0300 Date: Fri, 13 Jun 2003 21:08:25 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Cc: mk@linux-ipv6.org, , , , , Subject: Re: [PATCH] xfrm ip6ip6 (revised) In-Reply-To: <20030614.011309.29519822.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 3213 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Sat, 14 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <20030614.011146.116698702.yoshfuji@linux-ipv6.org> (at Sat, 14 Jun 2003 01:11:46 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > > Kanda-san, would you use the dynamic netdev allication scheme by > > alloc_netdev(), please? :-) > > Oops, this comment is absolitely wrong... Very sorry... I can do the alloc_netdev() patch :) -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From davem@redhat.com Fri Jun 13 12:11:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 12:11:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DJBj2x008266 for ; Fri, 13 Jun 2003 12:11:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA02458; Fri, 13 Jun 2003 12:07:03 -0700 Date: Fri, 13 Jun 2003 12:07:02 -0700 (PDT) Message-Id: <20030613.120702.71089628.davem@redhat.com> To: mk@linux-ipv6.org Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 (revised) From: "David S. Miller" In-Reply-To: <87smqerml5.wl@karaba.org> References: <87fzmv5ejc.wl@karaba.org> <20030601.013040.116362760.davem@redhat.com> <87smqerml5.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3214 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / $B?@ED(B $B=<(B Date: Sat, 14 Jun 2003 01:03:50 +0900 - to allocate unique spi values in xfrm6_tunnel.c by using just simple open addressing hash, Mitsuru, what happens if two tunnels use same address? Only first one will be found in hash, this is incorrect behavior as it will cause the wrong SPI to be used when packets are really destined for second user of that address. You need to add refcount to hash table entries, so that SPI can be shared by different xfrm6 tunnels with same address. From davem@redhat.com Fri Jun 13 12:24:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 12:24:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DJOM2x009058 for ; Fri, 13 Jun 2003 12:24:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA02538; Fri, 13 Jun 2003 12:19:59 -0700 Date: Fri, 13 Jun 2003 12:19:58 -0700 (PDT) Message-Id: <20030613.121958.78710614.davem@redhat.com> To: mk@linux-ipv6.org Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 (revised) From: "David S. Miller" In-Reply-To: <20030613.120702.71089628.davem@redhat.com> References: <20030601.013040.116362760.davem@redhat.com> <87smqerml5.wl@karaba.org> <20030613.120702.71089628.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3215 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "David S. Miller" Date: Fri, 13 Jun 2003 12:07:02 -0700 (PDT) You need to add refcount to hash table entries, so that SPI can be shared by different xfrm6 tunnels with same address. Mitsuru, here is some example code showing what my idea looks like. I apologize for not making something like this for you earlier. struct v6spi_entry { struct v6spi_entry *next; struct in6_addr addr; u32 spi; atomic_t refcnt; }; u32 spi v6spi_alloc(struct in6_addr *addr) { int h = v6_hashfn(addr); struct v6spi_entry *ent; for (ent = hash_table[h]; ent; ent = ent->next) { if (!ipv6_addr_cmp(addr, &ent->addr)) { atomic_inc(&ent->refcnt); return ent->spi; } } ent = kmalloc(sizeof(*ent), GFP_ATOMIC); ent->spi = alloc_unique_spi(); ipv6_addr_copy(&ent->addr, addr); atomic_set(&ent->refcnt, 1); ent->next = hash_table[h]; hash_table[h] = ent; return ent->spi; } From vnuorval@tcs.hut.fi Fri Jun 13 14:36:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 14:36:38 -0700 (PDT) Received: from mail.tcs.hut.fi (mail.tcs.hut.fi [130.233.215.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DLaM2x016514 for ; Fri, 13 Jun 2003 14:36:23 -0700 Received: from rhea.tcs.hut.fi (rhea.tcs.hut.fi [130.233.215.147]) by mail.tcs.hut.fi (Postfix) with ESMTP id 8B47B800212; Sat, 14 Jun 2003 00:07:26 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5DL7Q5L005312; Sat, 14 Jun 2003 00:07:26 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5DL7Psd005308; Sat, 14 Jun 2003 00:07:26 +0300 Date: Sat, 14 Jun 2003 00:07:25 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@redhat.com, Subject: [patch] IPV6: Refcount leaks in udpv6_connect() In-Reply-To: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3216 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev Hi! A dst refcount leak had unfortunately crept into my original CONFIG_IPV6_SUBTREES patch and your derived udpv6_connect() patch (changeset 1.1215.68.12). While fixing this, I also found and fixed some other apparent leaks. This patch fixes several bugs in udpv6_connect(): - dst refcount leak if ipv6_get_saddr() fails - several flowlabel refcount leaks The diff is done against ChangeSet 1.1307. Thanks, Ville diff -Nur --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet linux-2.5/net/ipv6/udp.c merge-2.5/net/ipv6/udp.c --- linux-2.5/net/ipv6/udp.c Fri Jun 13 16:08:00 2003 +++ merge-2.5/net/ipv6/udp.c Fri Jun 13 16:05:33 2003 @@ -299,9 +299,10 @@ if (addr_type == IPV6_ADDR_MAPPED) { struct sockaddr_in sin; - if (__ipv6_only_sock(sk)) - return -ENETUNREACH; - + if (__ipv6_only_sock(sk)) { + err = -ENETUNREACH; + goto out; + } sin.sin_family = AF_INET; sin.sin_addr.s_addr = daddr->s6_addr32[3]; sin.sin_port = usin->sin6_port; @@ -309,8 +310,8 @@ err = udp_connect(sk, (struct sockaddr*) &sin, sizeof(sin)); ipv4_connected: - if (err < 0) - return err; + if (err) + goto out; ipv6_addr_set(&np->daddr, 0, 0, htonl(0x0000ffff), inet->daddr); @@ -323,7 +324,7 @@ ipv6_addr_set(&np->rcv_saddr, 0, 0, htonl(0x0000ffff), inet->rcv_saddr); } - return 0; + goto out; } if (addr_type&IPV6_ADDR_LINKLOCAL) { @@ -331,8 +332,8 @@ usin->sin6_scope_id) { if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != usin->sin6_scope_id) { - fl6_sock_release(flowlabel); - return -EINVAL; + err = -EINVAL; + goto out; } sk->sk_bound_dev_if = usin->sin6_scope_id; if (!sk->sk_bound_dev_if && @@ -341,8 +342,10 @@ } /* Connect to link-local address requires an interface */ - if (!sk->sk_bound_dev_if) - return -EINVAL; + if (!sk->sk_bound_dev_if) { + err = -EINVAL; + goto out; + } } ipv6_addr_copy(&np->daddr, daddr); @@ -379,31 +382,33 @@ if ((err = dst->error) != 0) { dst_release(dst); - fl6_sock_release(flowlabel); - return err; + goto out; } /* get the source address used in the appropriate device */ err = ipv6_get_saddr(dst, daddr, &fl.fl6_src); - if (err == 0) { - if (ipv6_addr_any(&np->saddr)) - ipv6_addr_copy(&np->saddr, &fl.fl6_src); - - if (ipv6_addr_any(&np->rcv_saddr)) { - ipv6_addr_copy(&np->rcv_saddr, &fl.fl6_src); - inet->rcv_saddr = LOOPBACK4_IPV6; - } + if (err) { + dst_release(dst); + goto out; + } - ip6_dst_store(sk, dst, - !ipv6_addr_cmp(&fl.fl6_dst, &np->daddr) ? - &np->daddr : NULL); + if (ipv6_addr_any(&np->saddr)) + ipv6_addr_copy(&np->saddr, &fl.fl6_src); - sk->sk_state = TCP_ESTABLISHED; + if (ipv6_addr_any(&np->rcv_saddr)) { + ipv6_addr_copy(&np->rcv_saddr, &fl.fl6_src); + inet->rcv_saddr = LOOPBACK4_IPV6; } - fl6_sock_release(flowlabel); + ip6_dst_store(sk, dst, + !ipv6_addr_cmp(&fl.fl6_dst, &np->daddr) ? + &np->daddr : NULL); + + sk->sk_state = TCP_ESTABLISHED; +out: + fl6_sock_release(flowlabel); return err; } -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From anton@samba.org Fri Jun 13 15:39:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 15:39:24 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DMdA2x018754 for ; Fri, 13 Jun 2003 15:39:11 -0700 Received: by lists.samba.org (Postfix, from userid 504) id CD7FE2C10D; Fri, 13 Jun 2003 22:39:09 +0000 (GMT) Date: Sat, 14 Jun 2003 08:38:41 +1000 From: Anton Blanchard To: Dave Hansen Cc: Herman Dierks , "Feldman, Scott" , David Gibson , Linux Kernel Mailing List , Nancy J Milliner , Ricardo C Gonzalez , Brian Twichell , netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030613223841.GB32097@krispykreme> References: <1055521263.3531.2055.camel@nighthawk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1055521263.3531.2055.camel@nighthawk> User-Agent: Mutt/1.5.4i X-archive-position: 3217 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: anton@samba.org Precedence: bulk X-list: netdev > Wouldn't you get most of the benefit from copying that stuff around in > the driver if you allocated the skb->data aligned in the first place? Nice try, but my understanding is that on the transmit path we reserve the maximum sized TCP header, copy the data in then form our TCP header backwards from that point. Since the TCP header size changes with various options, its not an easy task. One thing I thought of doing was to cache the current TCP header size and align the next packet based on it, with an extra cacheline at the start for it to spill into if the TCP header grew. This is only worth it if most packets will have the same sized header. Networking guys: is this a valid assumption? Anton From davem@redhat.com Fri Jun 13 15:50:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 15:50:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DMoS2x019554 for ; Fri, 13 Jun 2003 15:50:28 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA03124; Fri, 13 Jun 2003 15:46:35 -0700 Date: Fri, 13 Jun 2003 15:46:34 -0700 (PDT) Message-Id: <20030613.154634.74748085.davem@redhat.com> To: anton@samba.org Cc: haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <20030613223841.GB32097@krispykreme> References: <1055521263.3531.2055.camel@nighthawk> <20030613223841.GB32097@krispykreme> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3218 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Anton Blanchard Date: Sat, 14 Jun 2003 08:38:41 +1000 This is only worth it if most packets will have the same sized header. Networking guys: is this a valid assumption? Not really... one retransmit and the TCP header size grows due to the SACK options. I find it truly bletcherous what you're trying to do here. Why not instead find out if it's possible to have the e1000 fetch the entire cache line where the first byte of the packet resides? Even ancient designes like SunHME do that. From shemminger@osdl.org Fri Jun 13 16:09:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 16:10:05 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DN9s2x020584 for ; Fri, 13 Jun 2003 16:09:54 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5DN9gX15473; Fri, 13 Jun 2003 16:09:42 -0700 Date: Fri, 13 Jun 2003 16:09:42 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] convert slip driver to alloc_netdev Message-Id: <20030613160942.384ca2c3.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3219 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Slightly more complicated than earlier patches. Convert slip from having an array of control block pointers that containing net_device's to an array of net_device pointers. The slip private data is allocated with alloc_netdev. Also changed the exit loop to use a schedule_timeout instead of yield. That should work better on 2.5 and we can sleep there. This patch is against 2.5.70 with all the earlier net patches applied. Tested with dedicated serial cable between 2.4 and SUT. diff -Nru a/drivers/net/slip.c b/drivers/net/slip.c --- a/drivers/net/slip.c Fri Jun 13 16:03:27 2003 +++ b/drivers/net/slip.c Fri Jun 13 16:03:27 2003 @@ -83,12 +83,7 @@ #define SLIP_VERSION "0.8.4-NET3.019-NEWTTY" - -typedef struct slip_ctrl { - struct slip ctrl; /* SLIP things */ - struct net_device dev; /* the device */ -} slip_ctrl_t; -static slip_ctrl_t **slip_ctrls; +static struct net_device **slip_devs; int slip_maxdev = SL_NRUNIT; /* Can be overridden with insmod! */ MODULE_PARM(slip_maxdev, "i"); @@ -624,32 +619,45 @@ */ dev->mtu = sl->mtu; - dev->hard_start_xmit = sl_xmit; + dev->type = ARPHRD_SLIP + sl->mode; #ifdef SL_CHECK_TRANSMIT dev->tx_timeout = sl_tx_timeout; dev->watchdog_timeo = 20*HZ; #endif + return 0; +} + + +static void sl_uninit(struct net_device *dev) +{ + struct slip *sl = (struct slip*)(dev->priv); + + sl_free_bufs(sl); +} + +static void sl_setup(struct net_device *dev) +{ + dev->init = sl_init; + dev->uninit = sl_uninit; dev->open = sl_open; + dev->destructor = (void (*)(struct net_device *))kfree; dev->stop = sl_close; dev->get_stats = sl_get_stats; dev->change_mtu = sl_change_mtu; + dev->hard_start_xmit = sl_xmit; #ifdef CONFIG_SLIP_SMART dev->do_ioctl = sl_ioctl; #endif dev->hard_header_len = 0; dev->addr_len = 0; - dev->type = ARPHRD_SLIP + sl->mode; dev->tx_queue_len = 10; SET_MODULE_OWNER(dev); /* New-style flags. */ dev->flags = IFF_NOARP|IFF_POINTOPOINT|IFF_MULTICAST; - - return 0; } - /****************************************** Routines looking at TTY side. ******************************************/ @@ -702,52 +710,57 @@ static void sl_sync(void) { int i; + struct net_device *dev; + struct slip *sl; for (i = 0; i < slip_maxdev; i++) { - slip_ctrl_t *slp = slip_ctrls[i]; - if (slp == NULL) + if ((dev = slip_devs[i]) == NULL) break; - if (slp->ctrl.tty || slp->ctrl.leased) + + sl = dev->priv; + if (sl->tty || sl->leased) continue; - if (slp->dev.flags&IFF_UP) - dev_close(&slp->dev); + if (dev->flags&IFF_UP) + dev_close(dev); } } + /* Find a free SLIP channel, and link in this `tty' line. */ static struct slip * sl_alloc(dev_t line) { - struct slip *sl; - slip_ctrl_t *slp = NULL; int i; int sel = -1; int score = -1; + struct net_device *dev = NULL; + struct slip *sl; - if (slip_ctrls == NULL) + if (slip_devs == NULL) return NULL; /* Master array missing ! */ for (i = 0; i < slip_maxdev; i++) { - slp = slip_ctrls[i]; - if (slp == NULL) + dev = slip_devs[i]; + if (dev == NULL) break; - if (slp->ctrl.leased) { - if (slp->ctrl.line != line) + sl = dev->priv; + if (sl->leased) { + if (sl->line != line) continue; - if (slp->ctrl.tty) + if (sl->tty) return NULL; /* Clear ESCAPE & ERROR flags */ - slp->ctrl.flags &= (1 << SLF_INUSE); - return &slp->ctrl; + sl->flags &= (1 << SLF_INUSE); + return sl; } - if (slp->ctrl.tty) + if (sl->tty) continue; - if (current->pid == slp->ctrl.pid) { - if (slp->ctrl.line == line && score < 3) { + if (current->pid == sl->pid) { + if (sl->line == line && score < 3) { sel = i; score = 3; continue; @@ -758,7 +771,7 @@ } continue; } - if (slp->ctrl.line == line && score < 1) { + if (sl->line == line && score < 1) { sel = i; score = 1; continue; @@ -771,10 +784,11 @@ if (sel >= 0) { i = sel; - slp = slip_ctrls[i]; + dev = slip_devs[i]; if (score > 1) { - slp->ctrl.flags &= (1 << SLF_INUSE); - return &slp->ctrl; + sl = dev->priv; + sl->flags &= (1 << SLF_INUSE); + return sl; } } @@ -782,26 +796,32 @@ if (i >= slip_maxdev) return NULL; - if (slp) { - if (test_bit(SLF_INUSE, &slp->ctrl.flags)) { - unregister_netdevice(&slp->dev); - sl_free_bufs(&slp->ctrl); + if (dev) { + sl = dev->priv; + if (test_bit(SLF_INUSE, &sl->flags)) { + unregister_netdevice(dev); + dev = NULL; + slip_devs[i] = NULL; } - } else if ((slp = (slip_ctrl_t *)kmalloc(sizeof(slip_ctrl_t),GFP_KERNEL)) == NULL) - return NULL; + } + + if (!dev) { + char name[IFNAMSIZ]; + sprintf(name, "sl%d", i); - memset(slp, 0, sizeof(slip_ctrl_t)); + dev = alloc_netdev(sizeof(*sl), name, sl_setup); + if (!dev) + return NULL; + dev->base_addr = i; + } + + sl = dev->priv; - sl = &slp->ctrl; /* Initialize channel control data */ sl->magic = SLIP_MAGIC; - sl->dev = &slp->dev; + sl->dev = dev; spin_lock_init(&sl->lock); sl->mode = SL_MODE_DEFAULT; - sprintf(slp->dev.name, "sl%d", i); - slp->dev.base_addr = i; - slp->dev.priv = (void*)sl; - slp->dev.init = sl_init; #ifdef CONFIG_SLIP_SMART init_timer(&sl->keepalive_timer); /* initialize timer_list struct */ sl->keepalive_timer.data=(unsigned long)sl; @@ -810,8 +830,9 @@ sl->outfill_timer.data=(unsigned long)sl; sl->outfill_timer.function=sl_outfill; #endif - slip_ctrls[i] = slp; - return &slp->ctrl; + slip_devs[i] = dev; + + return sl; } /* @@ -865,12 +886,10 @@ if ((err = sl_alloc_bufs(sl, SL_MTU)) != 0) goto err_free_chan; - if (register_netdevice(sl->dev)) { - sl_free_bufs(sl); - goto err_free_chan; - } - set_bit(SLF_INUSE, &sl->flags); + + if ((err = register_netdevice(sl->dev))) + goto err_free_bufs; } #ifdef CONFIG_SLIP_SMART @@ -888,6 +907,9 @@ rtnl_unlock(); return sl->dev->base_addr; +err_free_bufs: + sl_free_bufs(sl); + err_free_chan: sl->tty = NULL; tty->disc_data = NULL; @@ -1335,14 +1357,14 @@ printk(KERN_INFO "SLIP linefill/keepalive option.\n"); #endif - slip_ctrls = kmalloc(sizeof(void*)*slip_maxdev, GFP_KERNEL); - if (!slip_ctrls) { - printk(KERN_ERR "SLIP: Can't allocate slip_ctrls[] array! Uaargh! (-> No SLIP available)\n"); + slip_devs = kmalloc(sizeof(struct net_device *)*slip_maxdev, GFP_KERNEL); + if (!slip_devs) { + printk(KERN_ERR "SLIP: Can't allocate slip devices array! Uaargh! (-> No SLIP available)\n"); return -ENOMEM; } /* Clear the pointer array, we allocate devices when we need them */ - memset(slip_ctrls, 0, sizeof(void*)*slip_maxdev); /* Pointers */ + memset(slip_devs, 0, sizeof(struct net_device *)*slip_maxdev); /* Fill in our line protocol discipline, and register it */ if ((status = tty_register_ldisc(N_SLIP, &sl_ldisc)) != 0) { @@ -1354,51 +1376,59 @@ static void __exit slip_exit(void) { int i; + struct net_device *dev; + struct slip *sl; + unsigned long timeout = jiffies + HZ; + int busy = 0; - if (slip_ctrls != NULL) { - unsigned long timeout = jiffies + HZ; - int busy = 0; + if (slip_devs == NULL) + return; - /* First of all: check for active disciplines and hangup them. - */ - do { - if (busy) - yield(); - - busy = 0; - local_bh_disable(); - for (i = 0; i < slip_maxdev; i++) { - struct slip_ctrl *slc = slip_ctrls[i]; - if (!slc) - continue; - spin_lock(&slc->ctrl.lock); - if (slc->ctrl.tty) { - busy++; - tty_hangup(slc->ctrl.tty); - } - spin_unlock(&slc->ctrl.lock); - } - local_bh_enable(); - } while (busy && time_before(jiffies, timeout)); + /* First of all: check for active disciplines and hangup them. + */ + do { + if (busy) { + current->state = TASK_INTERRUPTIBLE; + schedule_timeout(HZ / 10); + current->state = TASK_RUNNING; + } + busy = 0; for (i = 0; i < slip_maxdev; i++) { - struct slip_ctrl *slc = slip_ctrls[i]; - if (slc) { - unregister_netdev(&slc->dev); - if (slc->ctrl.tty) { - printk(KERN_ERR "%s: tty discipline is still running\n", slc->dev.name); - /* Intentionally leak the control block. */ - } else { - sl_free_bufs(&slc->ctrl); - kfree(slc); - } - slip_ctrls[i] = NULL; + dev = slip_devs[i]; + if (!dev) + continue; + sl = dev->priv; + spin_lock_bh(&sl->lock); + if (sl->tty) { + busy++; + tty_hangup(sl->tty); } + spin_unlock_bh(&sl->lock); } + } while (busy && time_before(jiffies, timeout)); + + + for (i = 0; i < slip_maxdev; i++) { + dev = slip_devs[i]; + if (!dev) + continue; + slip_devs[i] = NULL; + + sl = dev->priv; + if (sl->tty) { + printk(KERN_ERR "%s: tty discipline still running\n", + dev->name); + /* Intentionally leak the control block. */ + dev->destructor = NULL; + } - kfree(slip_ctrls); - slip_ctrls = NULL; + unregister_netdev(dev); } + + kfree(slip_devs); + slip_devs = NULL; + if ((i = tty_register_ldisc(N_SLIP, NULL))) { printk(KERN_ERR "SLIP: can't unregister line discipline (err = %d)\n", i); From shemminger@osdl.org Fri Jun 13 16:41:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 16:41:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DNfX2x022175 for ; Fri, 13 Jun 2003 16:41:33 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5DNfJX22392; Fri, 13 Jun 2003 16:41:19 -0700 Date: Fri, 13 Jun 2003 16:41:19 -0700 From: Stephen Hemminger To: "David S. Miller" , Greg KH Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [PATCH] network hotplug via class_device/kobject Message-Id: <20030613164119.15209934.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3220 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This patch changes network devices to run hotplug out of the kobject/class_device infrastructure rather than calling it from the network core. The code gets simpler and there is only one place for Greg to fix when he changes the API ;-) All hotplug now happens off the chain: rtnl_unlock -> netdev_run_todo -> netdev_sysfs_{un}register The state flag "deadbeaf" was convertied to a state enumeration to handle the necessary book keeping, and adds some defense against drivers that have unexpected semantics. Paranoid about some driver doing something like: rtnl_lock(); register_netdevice(); unregister_netdevice(); rtnl_unlock() BOOM This patch causes an external script API change. Because network device go through the standard path, the action passed to the script is no longer register or unregister but is now "add" or "remove" like other devices. This is a good thing. When testing (at least on RHAT) just change /etc/hotplug/net.agent case statement: case $ACTION in add|register) # Don't do anything if the network is stopped if [ ! -f /var/lock/subsys/network ]; then exit 0 fi Dave, this patch is against your temporary bk tree with earlier net-sysfs cleanups. diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Fri Jun 13 16:29:18 2003 +++ b/include/linux/netdevice.h Fri Jun 13 16:29:18 2003 @@ -355,8 +355,16 @@ spinlock_t queue_lock; /* Number of references to this device */ atomic_t refcnt; - /* The flag marking that device is unregistered, but held by an user */ - int deadbeaf; + /* delayed register/unregister */ + struct list_head todo_list; + + /* register/unregister state machine */ + enum { NETREG_UNINITIALIZED=0, + NETREG_REGISTERING, /* called register_netdevice */ + NETREG_REGISTERED, /* completed register todo */ + NETREG_UNREGISTERING, /* called unregister_netdevice */ + NETREG_UNREGISTERED, /* completed unregister todo */ + } reg_state; /* Net device features */ int features; diff -Nru a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c Fri Jun 13 16:29:18 2003 +++ b/net/core/dev.c Fri Jun 13 16:29:18 2003 @@ -168,14 +168,6 @@ static struct timer_list samp_timer = TIMER_INITIALIZER(sample_queue, 0, 0); #endif -#ifdef CONFIG_HOTPLUG -static void net_run_sbin_hotplug(struct net_device *dev, int is_register); -static void net_run_hotplug_todo(void); -#else -#define net_run_sbin_hotplug(dev, is_register) do { } while (0) -#define net_run_hotplug_todo() do { } while (0) -#endif - /* * Our notifier list */ @@ -2537,6 +2529,17 @@ static int dev_boot_phase = 1; +/* Delayed registration/unregisteration */ +static spinlock_t net_todo_list_lock = SPIN_LOCK_UNLOCKED; +static struct list_head net_todo_list = LIST_HEAD_INIT(net_todo_list); + +static inline void net_set_todo(struct net_device *dev) +{ + spin_lock(&net_todo_list_lock); + list_add_tail(&dev->todo_list, &net_todo_list); + spin_unlock(&net_todo_list_lock); +} + /** * register_netdevice - register a network device * @dev: device to register @@ -2563,6 +2566,9 @@ BUG_ON(dev_boot_phase); ASSERT_RTNL(); + /* When net_device's are persistent, this will be fatal. */ + WARN_ON(dev->reg_state != NETREG_UNINITIALIZED); + spin_lock_init(&dev->queue_lock); spin_lock_init(&dev->xmit_lock); dev->xmit_lock_owner = -1; @@ -2592,9 +2598,6 @@ goto out_err; } - if ((ret = netdev_register_sysfs(dev))) - goto out_err; - /* Fix illegal SG+CSUM combinations. */ if ((dev->features & NETIF_F_SG) && !(dev->features & (NETIF_F_IP_CSUM | @@ -2625,13 +2628,14 @@ write_lock_bh(&dev_base_lock); *dp = dev; dev_hold(dev); - dev->deadbeaf = 0; + dev->reg_state = NETREG_REGISTERING; write_unlock_bh(&dev_base_lock); /* Notify protocols, that a new device appeared. */ notifier_call_chain(&netdev_chain, NETDEV_REGISTER, dev); - net_run_sbin_hotplug(dev, 1); + /* Finish registration after unlock */ + net_set_todo(dev); ret = 0; out: @@ -2654,7 +2658,7 @@ BUG_TRAP(!dev->ip6_ptr); BUG_TRAP(!dev->dn_ptr); - if (!dev->deadbeaf) { + if (dev->reg_state != NETREG_UNREGISTERED) { printk(KERN_ERR "Freeing alive device %p, %s\n", dev, dev->name); return 0; @@ -2731,41 +2735,60 @@ * rtnl_unlock(); * * We are invoked by rtnl_unlock() after it drops the semaphore. - * This allows us to deal with two problems: - * 1) We can invoke hotplug without deadlocking with linkwatch via - * keventd. + * This allows us to deal with problems: + * 1) We can create/delete sysfs objects which invoke hotplug + * without deadlocking with linkwatch via keventd. * 2) Since we run with the RTNL semaphore not held, we can sleep * safely in order to wait for the netdev refcnt to drop to zero. */ -static spinlock_t unregister_todo_lock = SPIN_LOCK_UNLOCKED; -static struct net_device *unregister_todo; - +static DECLARE_MUTEX(net_todo_run_mutex); void netdev_run_todo(void) { - struct net_device *dev; - - net_run_hotplug_todo(); - - spin_lock(&unregister_todo_lock); - dev = unregister_todo; - unregister_todo = NULL; - spin_unlock(&unregister_todo_lock); - - while (dev) { - struct net_device *next = dev->next; - - dev->next = NULL; + struct list_head list = LIST_HEAD_INIT(list); - netdev_unregister_sysfs(dev); + /* Safe outside mutex since we only care about entries that + * this cpu put into queue while under RTNL. + */ + if (list_empty(&net_todo_list)) + return; - netdev_wait_allrefs(dev); + /* Need to guard against multiple cpu's getting out of order. */ + down(&net_todo_run_mutex); - BUG_ON(atomic_read(&dev->refcnt)); + /* Snapshot list, allow later requests */ + spin_lock(&net_todo_list_lock); + list_splice_init(&net_todo_list, &list); + spin_unlock(&net_todo_list_lock); + + while (!list_empty(&list)) { + struct net_device *dev + = list_entry(list.next, struct net_device, todo_list); + list_del(&dev->todo_list); + + switch(dev->reg_state) { + case NETREG_REGISTERING: + netdev_register_sysfs(dev); + dev->reg_state = NETREG_REGISTERED; + break; - netdev_finish_unregister(dev); + case NETREG_UNREGISTERING: + netdev_unregister_sysfs(dev); + dev->reg_state = NETREG_UNREGISTERED; + + netdev_wait_allrefs(dev); + BUG_ON(atomic_read(&dev->refcnt)); + + netdev_finish_unregister(dev); + break; - dev = next; + default: + printk(KERN_ERR "network todo '%s' but state %d\n", + dev->name, dev->reg_state); + break; + } } + + up(&net_todo_run_mutex); } /* Synchronize with packet receive processing. */ @@ -2795,13 +2818,19 @@ BUG_ON(dev_boot_phase); ASSERT_RTNL(); + /* Some devices call without registering for initialization unwind. */ + if (dev->reg_state == NETREG_UNINITIALIZED) { + printk(KERN_DEBUG "unregister_netdevice: device %s/%p never " + "was registered\n", dev->name, dev); + return -ENODEV; + } + + BUG_ON(dev->reg_state != NETREG_REGISTERED); + /* If device is running, close it first. */ if (dev->flags & IFF_UP) dev_close(dev); - BUG_TRAP(!dev->deadbeaf); - dev->deadbeaf = 1; - /* And unlink it from device chain. */ for (dp = &dev_base; (d = *dp) != NULL; dp = &d->next) { if (d == dev) { @@ -2812,11 +2841,13 @@ } } if (!d) { - printk(KERN_DEBUG "unregister_netdevice: device %s/%p never " - "was registered\n", dev->name, dev); + printk(KERN_ERR "unregister net_device: '%s' not found\n", + dev->name); return -ENODEV; } + dev->reg_state = NETREG_UNREGISTERING; + synchronize_net(); #ifdef CONFIG_NET_FASTROUTE @@ -2826,7 +2857,6 @@ /* Shutdown queueing discipline. */ dev_shutdown(dev); - net_run_sbin_hotplug(dev, 0); /* Notify protocols, that we are about to destroy this device. They should clean all the things. @@ -2846,10 +2876,8 @@ free_divert_blk(dev); - spin_lock(&unregister_todo_lock); - dev->next = unregister_todo; - unregister_todo = dev; - spin_unlock(&unregister_todo_lock); + /* Finish processing unregister after unlock */ + net_set_todo(dev); dev_put(dev); return 0; @@ -2955,11 +2983,11 @@ * dev_alloc_name can now advance to next suitable * name that is checked next. */ - dev->deadbeaf = 1; dp = &dev->next; } else { dp = &dev->next; dev->ifindex = dev_new_index(); + dev->reg_state = NETREG_REGISTERED; if (dev->iflink == -1) dev->iflink = dev->ifindex; if (!dev->rebuild_header) @@ -2974,7 +3002,7 @@ */ dp = &dev_base; while ((dev = *dp) != NULL) { - if (dev->deadbeaf) { + if (dev->reg_state != NETREG_REGISTERED) { write_lock_bh(&dev_base_lock); *dp = dev->next; write_unlock_bh(&dev_base_lock); @@ -3001,96 +3029,3 @@ } subsys_initcall(net_dev_init); - -#ifdef CONFIG_HOTPLUG - -struct net_hotplug_todo { - struct list_head list; - char ifname[IFNAMSIZ]; - int is_register; -}; -static spinlock_t net_hotplug_list_lock = SPIN_LOCK_UNLOCKED; -static DECLARE_MUTEX(net_hotplug_run); -static struct list_head net_hotplug_list = LIST_HEAD_INIT(net_hotplug_list); - -static inline void net_run_hotplug_one(struct net_hotplug_todo *ent) -{ - char *argv[3], *envp[5], ifname[12 + IFNAMSIZ], action_str[32]; - int i; - - sprintf(ifname, "INTERFACE=%s", ent->ifname); - sprintf(action_str, "ACTION=%s", - (ent->is_register ? "register" : "unregister")); - - i = 0; - argv[i++] = hotplug_path; - argv[i++] = "net"; - argv[i] = 0; - - i = 0; - /* minimal command environment */ - envp [i++] = "HOME=/"; - envp [i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin"; - envp [i++] = ifname; - envp [i++] = action_str; - envp [i] = 0; - - call_usermodehelper(argv [0], argv, envp, 0); -} - -/* Run all queued hotplug requests. - * Requests are run in FIFO order. - */ -static void net_run_hotplug_todo(void) -{ - struct list_head list = LIST_HEAD_INIT(list); - - /* This is racy but okay since any other requests will get - * processed when the other guy does rtnl_unlock. - */ - if (list_empty(&net_hotplug_list)) - return; - - /* Need to guard against multiple cpu's getting out of order. */ - down(&net_hotplug_run); - - /* Snapshot list, allow later requests */ - spin_lock(&net_hotplug_list_lock); - list_splice_init(&net_hotplug_list, &list); - spin_unlock(&net_hotplug_list_lock); - - while (!list_empty(&list)) { - struct net_hotplug_todo *ent; - - ent = list_entry(list.next, struct net_hotplug_todo, list); - list_del(&ent->list); - net_run_hotplug_one(ent); - kfree(ent); - } - - up(&net_hotplug_run); -} - -/* Notify userspace when a netdevice event occurs, - * by running '/sbin/hotplug net' with certain - * environment variables set. - */ - -static void net_run_sbin_hotplug(struct net_device *dev, int is_register) -{ - struct net_hotplug_todo *ent = kmalloc(sizeof(*ent), GFP_KERNEL); - - ASSERT_RTNL(); - - if (!ent) - return; - - INIT_LIST_HEAD(&ent->list); - memcpy(ent->ifname, dev->name, IFNAMSIZ); - ent->is_register = is_register; - - spin_lock(&net_hotplug_list_lock); - list_add(&ent->list, &net_hotplug_list); - spin_unlock(&net_hotplug_list_lock); -} -#endif diff -Nru a/net/core/net-sysfs.c b/net/core/net-sysfs.c --- a/net/core/net-sysfs.c Fri Jun 13 16:29:18 2003 +++ b/net/core/net-sysfs.c Fri Jun 13 16:29:18 2003 @@ -15,6 +15,11 @@ #define to_class_dev(obj) container_of(obj,struct class_device,kobj) #define to_net_dev(class) container_of(class, struct net_device, class_dev) +static inline int dev_isalive(const struct net_device *dev) +{ + return dev->reg_state == NETREG_REGISTERED; +} + /* use same locking rules as GIF* ioctl's */ static ssize_t netdev_show(const struct class_device *cd, char *buf, ssize_t (*format)(const struct net_device *, char *)) @@ -23,7 +28,7 @@ ssize_t ret = -EINVAL; read_lock(&dev_base_lock); - if (!net->deadbeaf) + if (dev_isalive(net)) ret = (*format)(net, buf); read_unlock(&dev_base_lock); @@ -60,7 +65,7 @@ goto err; rtnl_lock(); - if (!net->deadbeaf) { + if (dev_isalive(net)) { if ((ret = (*set)(net, new)) == 0) ret = len; } @@ -97,17 +102,17 @@ static ssize_t show_address(struct class_device *dev, char *buf) { struct net_device *net = to_net_dev(dev); - if (net->deadbeaf) - return -EINVAL; - return format_addr(buf, net->dev_addr, net->addr_len); + if (dev_isalive(net)) + return format_addr(buf, net->dev_addr, net->addr_len); + return -EINVAL; } static ssize_t show_broadcast(struct class_device *dev, char *buf) { struct net_device *net = to_net_dev(dev); - if (net->deadbeaf) - return -EINVAL; - return format_addr(buf, net->broadcast, net->addr_len); + if (dev_isalive(net)) + return format_addr(buf, net->broadcast, net->addr_len); + return -EINVAL; } static CLASS_DEVICE_ATTR(address, S_IRUGO, show_address, NULL); @@ -152,16 +157,12 @@ static ssize_t store_tx_queue_len(struct class_device *dev, const char *buf, size_t len) { - return netdev_store(dev, buf,len, change_tx_queue_len); + return netdev_store(dev, buf, len, change_tx_queue_len); } static CLASS_DEVICE_ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, store_tx_queue_len); -static struct class net_class = { - .name = "net", -}; - static struct class_device_attribute *net_class_attributes[] = { &class_device_attr_ifindex, @@ -263,7 +264,7 @@ ssize_t ret = -EINVAL; read_lock(&dev_base_lock); - if (!dev->deadbeaf && entry->show && dev->get_stats && + if (dev_isalive(dev) && entry->show && dev->get_stats && (stats = (*dev->get_stats)(dev))) ret = entry->show(stats, buf); read_unlock(&dev_base_lock); @@ -277,6 +278,35 @@ static struct kobj_type netstat_ktype = { .sysfs_ops = &netstat_sysfs_ops, .default_attrs = default_attrs, +}; + +#ifdef CONFIG_HOTPLUG +static int netdev_hotplug(struct class_device *cd, char **envp, + int num_envp, char *buf, int size) +{ + struct net_device *dev = to_net_dev(cd); + int i = 0; + int n; + + /* pass interface in env to hotplug. */ + envp[i++] = buf; + n = snprintf(buf, size, "INTERFACE=%s", dev->name) + 1; + buf += n; + size -= n; + + if ((size <= 0) || (i >= num_envp)) + return -ENOMEM; + + envp[i] = 0; + return 0; +} +#endif + +static struct class net_class = { + .name = "net", +#ifdef CONFIG_HOTPLUG + .hotplug = netdev_hotplug, +#endif }; /* Create sysfs entries for network device. */ From scott.feldman@intel.com Fri Jun 13 16:52:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 16:52:32 -0700 (PDT) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DNqK2x022768 for ; Fri, 13 Jun 2003 16:52:23 -0700 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by caduceus.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5DNiK819690 for ; Fri, 13 Jun 2003 23:44:20 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by petasus.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5DNjZS15947 for ; Fri, 13 Jun 2003 23:45:35 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061316521816470 ; Fri, 13 Jun 2003 16:52:18 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 13 Jun 2003 16:52:19 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e1000 performance hack for ppc64 (Power4) Date: Fri, 13 Jun 2003 16:52:18 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e1000 performance hack for ppc64 (Power4) Thread-Index: AcMyAnKU0XzTrYR3SGmq2WX57ZZcjAABAyWw From: "Feldman, Scott" To: "Anton Blanchard" , "David S. Miller" Cc: , , , , , , , X-OriginalArrivalTime: 13 Jun 2003 23:52:19.0006 (UTC) FILETIME=[D32425E0:01C33206] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5DNqK2x022768 X-archive-position: 3221 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > > Why not instead find out if it's possible to have the e1000 > > fetch the entire cache line where the first byte of the > > packet resides? Even ancient designes like SunHME do that. > > Rusty and I were wondering why the e1000 didnt do that exact thing. > > Scott: is it possible to enable such a thing? I thought the answer was no, so I double checked with a couple of hardware guys, and the answer is still no. -scott From davem@redhat.com Fri Jun 13 16:56:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 16:56:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5DNuh2x023121 for ; Fri, 13 Jun 2003 16:56:43 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA03410; Fri, 13 Jun 2003 16:52:51 -0700 Date: Fri, 13 Jun 2003 16:52:50 -0700 (PDT) Message-Id: <20030613.165250.41635765.davem@redhat.com> To: scott.feldman@intel.com Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3222 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Feldman, Scott" Date: Fri, 13 Jun 2003 16:52:18 -0700 > > Why not instead find out if it's possible to have the e1000 > > fetch the entire cache line where the first byte of the > > packet resides? Even ancient designes like SunHME do that. > > Rusty and I were wondering why the e1000 didnt do that exact thing. > > Scott: is it possible to enable such a thing? I thought the answer was no, so I double checked with a couple of hardware guys, and the answer is still no. Sigh... So Anton, when the PCI controller gets a set of sub-cacheline word reads from the device, it reads the value from memory once for every one of those words? ROFL, if so... I can't believe they wouldn't put caches on the PCI controller for this, at least a one-behind that snoops the bus :( From anton@samba.org Fri Jun 13 17:06:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 17:07:09 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E06m2x023816 for ; Fri, 13 Jun 2003 17:06:48 -0700 Received: by lists.samba.org (Postfix, from userid 504) id B3D552C10D; Fri, 13 Jun 2003 23:20:57 +0000 (GMT) Date: Sat, 14 Jun 2003 09:18:36 +1000 From: Anton Blanchard To: "David S. Miller" Cc: haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030613231836.GD32097@krispykreme> References: <1055521263.3531.2055.camel@nighthawk> <20030613223841.GB32097@krispykreme> <20030613.154634.74748085.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030613.154634.74748085.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3223 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: anton@samba.org Precedence: bulk X-list: netdev > Not really... one retransmit and the TCP header size grows > due to the SACK options. OK scratch that idea. > I find it truly bletcherous what you're trying to do here. I think so too, but its hard to ignore ~100Mbit/sec in performance. > Why not instead find out if it's possible to have the e1000 > fetch the entire cache line where the first byte of the packet > resides? Even ancient designes like SunHME do that. Rusty and I were wondering why the e1000 didnt do that exact thing. Scott: is it possible to enable such a thing? Anton From anton@samba.org Fri Jun 13 17:07:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 17:07:25 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E07J2x023907 for ; Fri, 13 Jun 2003 17:07:20 -0700 Received: by lists.samba.org (Postfix, from userid 504) id 8A3052C10D; Sat, 14 Jun 2003 00:07:19 +0000 (GMT) Date: Sat, 14 Jun 2003 10:03:42 +1000 From: Anton Blanchard To: "Feldman, Scott" Cc: "David S. Miller" , haveblue@us.ibm.com, hdierks@us.ibm.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030614000342.GE32097@krispykreme> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3224 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: anton@samba.org Precedence: bulk X-list: netdev > I thought the answer was no, so I double checked with a couple of > hardware guys, and the answer is still no. Hi Scott, Thats a pity, the e100 docs on sourceforge show it can do what we want, it would be nice if e1000 had this feature too :) 4.2.2 Read Align The Read Align feature is aimed to enhance performance in cache line oriented systems. Starting a PCI transaction in these systems on a non-cache line aligned address may result in low performance. To solve this performance problem, the controller can be configured to terminate Transmit DMA cycles on a cache line boundary, and start the next transaction on a cache line aligned address. This feature is enabled when the Read Align Enable bit is set in device Configure command (Section 6.4.2.3, "Configure (010b)"). If this bit is set, the device operates as follows: * When the device is close to running out of resources on the Transmit * DMA (in other words, the Transmit FIFO is almost full), it attempts to * terminate the read transaction on the nearest cache line boundary when * possible. * When the arbitration counters feature is enabled (maximum Transmit DMA * byte count value is set in configuration space), the device switches * to other pending DMAs on cache line boundary only. From yoshfuji@linux-ipv6.org Fri Jun 13 18:25:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 18:25:54 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E1Pi2x028304 for ; Fri, 13 Jun 2003 18:25:45 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5E1QkBo002022; Sat, 14 Jun 2003 10:26:46 +0900 Date: Sat, 14 Jun 2003 10:26:45 +0900 (JST) Message-Id: <20030614.102645.87315007.yoshfuji@linux-ipv6.org> To: vnuorval@tcs.hut.fi, davem@redhat.com Cc: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: [patch] IPV6: Refcount leaks in udpv6_connect() From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3225 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Sat, 14 Jun 2003 00:07:25 +0300 (EEST)), Ville Nuorvala says: > A dst refcount leak had unfortunately crept into my original > CONFIG_IPV6_SUBTREES patch and your derived udpv6_connect() patch > (changeset 1.1215.68.12). patch looks fine to me. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Fri Jun 13 18:38:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 18:38:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E1cY2x029131 for ; Fri, 13 Jun 2003 18:38:34 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA03622; Fri, 13 Jun 2003 18:34:40 -0700 Date: Fri, 13 Jun 2003 18:34:40 -0700 (PDT) Message-Id: <20030613.183440.41634090.davem@redhat.com> To: anton@samba.org Cc: scott.feldman@intel.com, haveblue@us.ibm.com, hdierks@us.ibm.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <20030614005534.GF32097@krispykreme> References: <20030613.165250.41635765.davem@redhat.com> <20030614005534.GF32097@krispykreme> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3226 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Anton Blanchard Date: Sat, 14 Jun 2003 10:55:34 +1000 What I think is happening is that we arent tripping the prefetch logic. We should take a latency hit for only the first cacheline at which point the host bridge decides to start prefetching for us. If not then we take take the latency hit on each transaction. It sounds like what happens is that the sub-cacheline word reads don't trigger the prefetch, but the first PCI read multiple transaction does. From anton@samba.org Fri Jun 13 18:51:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 18:51:59 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E1pc2x029892 for ; Fri, 13 Jun 2003 18:51:39 -0700 Received: by lists.samba.org (Postfix, from userid 504) id BD3EA2C135; Sat, 14 Jun 2003 00:58:38 +0000 (GMT) Date: Sat, 14 Jun 2003 10:55:34 +1000 From: Anton Blanchard To: "David S. Miller" Cc: scott.feldman@intel.com, haveblue@us.ibm.com, hdierks@us.ibm.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030614005534.GF32097@krispykreme> References: <20030613.165250.41635765.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030613.165250.41635765.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3227 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: anton@samba.org Precedence: bulk X-list: netdev > So Anton, when the PCI controller gets a set of sub-cacheline word > reads from the device, it reads the value from memory once for every > one of those words? ROFL, if so... I can't believe they wouldn't > put caches on the PCI controller for this, at least a one-behind that > snoops the bus :( There is a cache in the host bridge and the PCI-PCI bridge. I dont think we go back to memory for sub cacheline reads. What I think is happening is that we arent tripping the prefetch logic. We should take a latency hit for only the first cacheline at which point the host bridge decides to start prefetching for us. If not then we take take the latency hit on each transaction. Anton From yoshfuji@linux-ipv6.org Fri Jun 13 19:15:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 19:15:26 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E2FG2x031099 for ; Fri, 13 Jun 2003 19:15:17 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5E2GJBo002846; Sat, 14 Jun 2003 11:16:19 +0900 Date: Sat, 14 Jun 2003 11:16:19 +0900 (JST) Message-Id: <20030614.111619.37286104.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org, miyazawa@linux-ipv6.org Subject: [PATCH] [XFRM] xfrm_alloc_spi() always selected minspi From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3228 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. net/xfrm/xfrm_state.c:xfrm_alloc_spi() always selected minspi because of typo. Bug was originally introduced in ChangeSet 1.889.1.182 (net/xfrm/xfrm_state.c@1.11); it is 4 months old... Here's the fix. Thanks in advance. Index: linux-2.5/net/xfrm/xfrm_state.c =================================================================== RCS file: /home/cvs/linux-2.5/net/xfrm/xfrm_state.c,v retrieving revision 1.9 diff -u -p -r1.9 xfrm_state.c --- linux-2.5/net/xfrm/xfrm_state.c 7 Jun 2003 00:22:34 -0000 1.9 +++ linux-2.5/net/xfrm/xfrm_state.c 14 Jun 2003 00:54:18 -0000 @@ -513,7 +513,7 @@ xfrm_alloc_spi(struct xfrm_state *x, u32 maxspi = ntohl(maxspi); for (h=0; hid.daddr, minspi, x->id.proto, x->props.family); + x0 = xfrm_state_lookup(&x->id.daddr, htonl(spi), x->id.proto, x->props.family); if (x0 == NULL) break; xfrm_state_put(x0); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From niv@us.ibm.com Fri Jun 13 22:17:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 22:17:27 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E5H92x004914 for ; Fri, 13 Jun 2003 22:17:15 -0700 Received: from westrelay05.boulder.ibm.com (westrelay05.boulder.ibm.com [9.17.193.33]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5E5H22R250314; Sat, 14 Jun 2003 01:17:02 -0400 Received: from us.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay05.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5E5H0pr144624; Fri, 13 Jun 2003 23:17:00 -0600 Message-ID: <3EEAAFA6.9080609@us.ibm.com> Date: Fri, 13 Jun 2003 22:16:22 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) References: <1055521263.3531.2055.camel@nighthawk> <20030613223841.GB32097@krispykreme> <20030613.154634.74748085.davem@redhat.com> In-Reply-To: <20030613.154634.74748085.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3229 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Anton Blanchard > Date: Sat, 14 Jun 2003 08:38:41 +1000 > > This is only worth it if most packets will have the same sized header. > Networking guys: is this a valid assumption? > > Not really... one retransmit and the TCP header size grows > due to the SACK options. Yep, but it really doesn't have too many options (sic pun ;)).. i.e. The max the options can add are 40 bytes, speaking strictly TCP, not IP. This really should fit into one extra cacheline for most architectures, at most, right? [The TCP options have to end and the data start on a 32 bit boundary. For established connections, we're principally talking SACK options and v. likely timestamp. (Ignoring those egregious benchmark guys who turn everything useful off ;)). SYNs wont have data in any case. So its going to grow by (SACK = 8*n + 2)+ (TS = 10) bytes, with n = number of sack options, with a max of n = 3 if timestamps are enabled. Adding that to the standard length of 20 bytes, the total len of a TCP header is thus very likely one of: 20 + [ 0 | 20 |32 | 36] bytes = 20 | 40 | 52 | 56 bytes. If cachelines were 64 bytes, we wouldnt be wasting a whole lot of space if we aligned data start or some other scheme as was suggested. Even given the larger cachelines, it might be worth it, or is this totally not an option (cough,sic ;))? > I find it truly bletcherous what you're trying to do here. yep thanks, Nivedita From davem@redhat.com Fri Jun 13 22:40:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 22:40:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E5eU2x006015 for ; Fri, 13 Jun 2003 22:40:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA03912; Fri, 13 Jun 2003 22:36:34 -0700 Date: Fri, 13 Jun 2003 22:36:34 -0700 (PDT) Message-Id: <20030613.223634.74746570.davem@redhat.com> To: niv@us.ibm.com Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <3EEAAFA6.9080609@us.ibm.com> References: <20030613223841.GB32097@krispykreme> <20030613.154634.74748085.davem@redhat.com> <3EEAAFA6.9080609@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3230 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Fri, 13 Jun 2003 22:16:22 -0700 Yep, but it really doesn't have too many options (sic pun ;)).. i.e. The max the options can add are 40 bytes, speaking strictly TCP, not IP. This really should fit into one extra cacheline for most architectures, at most, right? It's what the bottom of the header is aligned to, but we build the packet top to bottom not the other way around. From ltd@cisco.com Fri Jun 13 22:41:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 22:41:18 -0700 (PDT) Received: from sj-core-3.cisco.com (sj-core-3.cisco.com [171.68.223.137]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E5fE2x006218 for ; Fri, 13 Jun 2003 22:41:14 -0700 Received: from cisco.com (ringer.cisco.com [64.104.199.11]) by sj-core-3.cisco.com (8.12.6/8.12.6) with ESMTP id h5E5f1I2007006; Fri, 13 Jun 2003 22:41:01 -0700 (PDT) Received: from ltd-t22.cisco.com (syd-vpn-client-254-12.cisco.com [10.66.254.12]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id PAA10267; Sat, 14 Jun 2003 15:42:02 +1000 (EST) Message-Id: <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> X-Sender: ltd@mira-sjcm-3.cisco.com X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Sat, 14 Jun 2003 11:52:53 +1000 To: Anton Blanchard From: Lincoln Dale Subject: Re: e1000 performance hack for ppc64 (Power4) Cc: "David S. Miller" , haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com In-Reply-To: <20030613231836.GD32097@krispykreme> References: <20030613.154634.74748085.davem@redhat.com> <1055521263.3531.2055.camel@nighthawk> <20030613223841.GB32097@krispykreme> <20030613.154634.74748085.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 3231 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ltd@cisco.com Precedence: bulk X-list: netdev At 09:18 AM 14/06/2003 +1000, Anton Blanchard wrote: > > Not really... one retransmit and the TCP header size grows > > due to the SACK options. > >OK scratch that idea. why not have a performance option that is a tradeoff between optimum payload size versus efficiency. unless i misunderstand the problem, you can certainly pad the TCP options with NOPs ... > > I find it truly bletcherous what you're trying to do here. > >I think so too, but its hard to ignore ~100Mbit/sec in performance. another option is for the write() path is for instantant-send TCP sockets to delay the copy_from_user() until the IP+TCP header size is known. i wouldn't expect the net folks to like that, however .. cheers, lincoln. From davem@redhat.com Fri Jun 13 22:45:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 22:45:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E5jM2x006764 for ; Fri, 13 Jun 2003 22:45:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA03951; Fri, 13 Jun 2003 22:41:22 -0700 Date: Fri, 13 Jun 2003 22:41:22 -0700 (PDT) Message-Id: <20030613.224122.104034261.davem@redhat.com> To: ltd@cisco.com Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> References: <20030613.154634.74748085.davem@redhat.com> <20030613231836.GD32097@krispykreme> <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3232 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Lincoln Dale Date: Sat, 14 Jun 2003 11:52:53 +1000 unless i misunderstand the problem, you can certainly pad the TCP options with NOPs ... You may not mangle packet if it is not your's alone. And every TCP packet is shared with TCP retransmit queue and therefore would need to be copied before being mangled. From ltd@cisco.com Fri Jun 13 22:58:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 22:59:07 -0700 (PDT) Received: from sj-core-2.cisco.com (sj-core-2.cisco.com [171.71.177.254]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E5wv2x007456 for ; Fri, 13 Jun 2003 22:58:58 -0700 Received: from cisco.com (ringer.cisco.com [64.104.199.11]) by sj-core-2.cisco.com (8.12.9/8.12.6) with ESMTP id h5E5wi3p005562; Fri, 13 Jun 2003 22:58:45 -0700 (PDT) Received: from ltd-t22.cisco.com (syd-vpn-client-254-12.cisco.com [10.66.254.12]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id PAA10781; Sat, 14 Jun 2003 15:59:46 +1000 (EST) Message-Id: <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> X-Sender: ltd@mira-sjcm-3.cisco.com X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Sat, 14 Jun 2003 15:52:35 +1000 To: "David S. Miller" From: Lincoln Dale Subject: Re: e1000 performance hack for ppc64 (Power4) Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com In-Reply-To: <20030613.224122.104034261.davem@redhat.com> References: <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> <20030613.154634.74748085.davem@redhat.com> <20030613231836.GD32097@krispykreme> <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 3233 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ltd@cisco.com Precedence: bulk X-list: netdev At 10:41 PM 13/06/2003 -0700, David S. Miller wrote: > From: Lincoln Dale > Date: Sat, 14 Jun 2003 11:52:53 +1000 > > unless i misunderstand the problem, you can certainly pad the TCP > options with NOPs ... > >You may not mangle packet if it is not your's alone. > >And every TCP packet is shared with TCP retransmit >queue and therefore would need to be copied before >being mangled. ok, so lets take this a step further. can we have the TCP retransmit side take a performance hit if it needs to realign buffers? once again, for a "high performance app" requiring gigabit-type speeds, its probably fair to say that this is mostly in the realm of applications on a LAN rather than across a WAN or internet. on a switched LAN, i'd expect TCP retransmissions to be far fewer ... cheers, lincoln. From davem@redhat.com Fri Jun 13 23:12:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 23:12:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E6Cn2x008216 for ; Fri, 13 Jun 2003 23:12:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA04025; Fri, 13 Jun 2003 23:08:50 -0700 Date: Fri, 13 Jun 2003 23:08:50 -0700 (PDT) Message-Id: <20030613.230850.85410095.davem@redhat.com> To: ltd@cisco.com Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> References: <5.1.0.14.2.20030614114755.036abbb0@mira-sjcm-3.cisco.com> <20030613.224122.104034261.davem@redhat.com> <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3234 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Lincoln Dale Date: Sat, 14 Jun 2003 15:52:35 +1000 can we have the TCP retransmit side take a performance hit if it needs to realign buffers? You don't understand, the person who mangles the packet must make the copy, not the person not doing the packet modifications. for a "high performance app" requiring gigabit-type speeds, ...we probably won't be using ppc64 and e1000 cards, yes, I agree :-) Anton, go to the local computer store and pick up some tg3 cards or a bunch of Taiwan specials :-) From davem@redhat.com Fri Jun 13 23:18:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 23:18:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E6IG2x008659 for ; Fri, 13 Jun 2003 23:18:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA04046; Fri, 13 Jun 2003 23:14:19 -0700 Date: Fri, 13 Jun 2003 23:14:18 -0700 (PDT) Message-Id: <20030613.231418.39160686.davem@redhat.com> To: ltd@cisco.com Cc: anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <20030613.230850.85410095.davem@redhat.com> References: <20030613.224122.104034261.davem@redhat.com> <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> <20030613.230850.85410095.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3235 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Folks, can we remove whatever member of this CC: list creates bounces that say: Your message to Linux_news awaits moderator approval Ok? I can't guess which one it is because these all look like normal people's email addresses (except possibly the haveblue@us.ibm.com thing, maybe that's the one) From wli@holomorphy.com Fri Jun 13 23:28:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 13 Jun 2003 23:28:28 -0700 (PDT) Received: from holomorphy (mail@holomorphy.com [66.224.33.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E6SF2x009142 for ; Fri, 13 Jun 2003 23:28:15 -0700 Received: from wli by holomorphy with local (Exim 3.36 #1 (Debian)) id 19R4W6-0001gs-00; Fri, 13 Jun 2003 23:27:58 -0700 Date: Fri, 13 Jun 2003 23:27:55 -0700 From: William Lee Irwin III To: "David S. Miller" Cc: ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030614062755.GG26348@holomorphy.com> Mail-Followup-To: William Lee Irwin III , "David S. Miller" , ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com References: <20030613.224122.104034261.davem@redhat.com> <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> <20030613.230850.85410095.davem@redhat.com> <20030613.231418.39160686.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030613.231418.39160686.davem@redhat.com> Organization: The Domain of Holomorphy User-Agent: Mutt/1.5.4i X-archive-position: 3236 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wli@holomorphy.com Precedence: bulk X-list: netdev On Fri, Jun 13, 2003 at 11:14:18PM -0700, David S. Miller wrote: > Folks, can we remove whatever member of this CC: list creates > bounces that say: > Your message to Linux_news awaits moderator approval > Ok? I can't guess which one it is because these all look > like normal people's email addresses (except possibly the > haveblue@us.ibm.com thing, maybe that's the one) That's legitimate, but I'm still trying to convince the BKL brigade that he should have chosen something more eponymous e.g. dhansen. =) -- wli From ak@suse.de Sat Jun 14 02:16:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 02:16:51 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E9Ga2x015377 for ; Sat, 14 Jun 2003 02:16:37 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 51222145E2 for ; Sat, 14 Jun 2003 11:16:31 +0200 (MEST) Date: Sat, 14 Jun 2003 11:16:31 +0200 From: Andi Kleen To: netdev@oss.sgi.com Subject: [PATCH] Make xfrm subsystem optional Message-ID: <20030614091631.GA16993@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-archive-position: 3237 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev This patches only compiles the xfrm subsystem in when any of the options using it are selected. This shrinks the text segment on an amd64 kernel by ~32k, data by ~6k, bss by ~33k, overall ~72K memory saved. -Andi Index: linux/include/net/dst.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/dst.h,v retrieving revision 1.11 diff -u -r1.11 dst.h --- linux/include/net/dst.h 17 Apr 2003 00:35:02 -0000 1.11 +++ linux/include/net/dst.h 14 Jun 2003 07:58:37 -0000 @@ -247,8 +247,16 @@ extern void dst_init(void); struct flowi; +#ifndef CONFIG_XFRM +static inline int xfrm_lookup(struct dst_entry **dst_p, struct flowi *fl, + struct sock *sk, int flags) +{ + return 0; +} +#else extern int xfrm_lookup(struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); +#endif #endif #endif /* _NET_DST_H */ Index: linux/include/net/xfrm.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/xfrm.h,v retrieving revision 1.34 diff -u -r1.34 xfrm.h --- linux/include/net/xfrm.h 9 Jun 2003 17:26:52 -0000 1.34 +++ linux/include/net/xfrm.h 14 Jun 2003 07:58:39 -0000 @@ -584,6 +584,8 @@ return !0; } +#ifdef CONFIG_XFRM + extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, unsigned short family); static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) @@ -649,6 +651,26 @@ } } +#else + +static inline void xfrm_sk_free_policy(struct sock *sk) {} +static inline int xfrm_sk_clone_policy(struct sock *sk) { return 0; } +static inline int xfrm6_route_forward(struct sk_buff *skb) { return 1; } +static inline int xfrm4_route_forward(struct sk_buff *skb) { return 1; } +static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + return 1; +} +static inline int xfrm4_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + return 1; +} +static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) +{ + return 1; +} +#endif + static __inline__ xfrm_address_t *xfrm_flowi_daddr(struct flowi *fl, unsigned short family) { @@ -777,13 +799,31 @@ extern int xfrm_check_selectors(struct xfrm_state **x, int n, struct flowi *fl); extern int xfrm_check_output(struct xfrm_state *x, struct sk_buff *skb, unsigned short family); extern int xfrm4_rcv(struct sk_buff *skb); -extern int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type); extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler); extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler); extern int xfrm4_tunnel_check_size(struct sk_buff *skb); extern int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp); extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); +#ifdef CONFIG_XFRM +extern int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); +extern int xfrm_dst_lookup(struct xfrm_dst **dst, struct flowi *fl, unsigned short family); +#else +static inline int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen) +{ + return -ENOPROTOOPT; +} +static inline int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type) +{ + /* should not happen */ + kfree_skb(skb); + return 0; +} +static inline int xfrm_dst_lookup(struct xfrm_dst **dst, struct flowi *fl, unsigned short family) +{ + return -EINVAL; +} +#endif void xfrm_policy_init(void); void xfrm4_policy_init(void); @@ -805,7 +845,6 @@ extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol); extern struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl); extern int xfrm_flush_bundles(struct xfrm_state *x); -extern int xfrm_dst_lookup(struct xfrm_dst **dst, struct flowi *fl, unsigned short family); extern wait_queue_head_t km_waitq; extern void km_warn_expired(struct xfrm_state *x); Index: linux/net/Kconfig =================================================================== RCS file: /home/cvs/linux-2.5/net/Kconfig,v retrieving revision 1.14 diff -u -r1.14 Kconfig --- linux/net/Kconfig 16 May 2003 04:12:42 -0000 1.14 +++ linux/net/Kconfig 14 Jun 2003 07:58:42 -0000 @@ -143,6 +143,7 @@ config NET_KEY tristate "PF_KEY sockets" + select XFRM ---help--- PF_KEYv2 socket family, compatible to KAME ones. They are required if you are going to use IPsec tools ported Index: linux/net/netsyms.c =================================================================== RCS file: /home/cvs/linux-2.5/net/netsyms.c,v retrieving revision 1.70 diff -u -r1.70 netsyms.c --- linux/net/netsyms.c 9 Jun 2003 17:26:52 -0000 1.70 +++ linux/net/netsyms.c 14 Jun 2003 07:58:43 -0000 @@ -56,7 +56,6 @@ #include #include #include -#include #if defined(CONFIG_INET_AH) || defined(CONFIG_INET_AH_MODULE) || defined(CONFIG_INET6_AH) || defined(CONFIG_INET6_AH_MODULE) #include #endif @@ -292,76 +291,6 @@ /* needed for ip_gre -cw */ EXPORT_SYMBOL(ip_statistics); -EXPORT_SYMBOL(xfrm_user_policy); -EXPORT_SYMBOL(km_waitq); -EXPORT_SYMBOL(km_new_mapping); -EXPORT_SYMBOL(xfrm_cfg_sem); -EXPORT_SYMBOL(xfrm_policy_alloc); -EXPORT_SYMBOL(__xfrm_policy_destroy); -EXPORT_SYMBOL(xfrm_lookup); -EXPORT_SYMBOL(__xfrm_policy_check); -EXPORT_SYMBOL(__xfrm_route_forward); -EXPORT_SYMBOL(xfrm_state_alloc); -EXPORT_SYMBOL(__xfrm_state_destroy); -EXPORT_SYMBOL(xfrm_state_find); -EXPORT_SYMBOL(xfrm_state_insert); -EXPORT_SYMBOL(xfrm_state_check_expire); -EXPORT_SYMBOL(xfrm_state_check_space); -EXPORT_SYMBOL(xfrm_state_lookup); -EXPORT_SYMBOL(xfrm_state_register_afinfo); -EXPORT_SYMBOL(xfrm_state_unregister_afinfo); -EXPORT_SYMBOL(xfrm_state_get_afinfo); -EXPORT_SYMBOL(xfrm_state_put_afinfo); -EXPORT_SYMBOL(xfrm_state_delete_tunnel); -EXPORT_SYMBOL(xfrm_replay_check); -EXPORT_SYMBOL(xfrm_replay_advance); -EXPORT_SYMBOL(xfrm_check_selectors); -EXPORT_SYMBOL(xfrm_check_output); -EXPORT_SYMBOL(__secpath_destroy); -EXPORT_SYMBOL(xfrm_get_acqseq); -EXPORT_SYMBOL(xfrm_parse_spi); -EXPORT_SYMBOL(xfrm4_rcv); -EXPORT_SYMBOL(xfrm4_tunnel_register); -EXPORT_SYMBOL(xfrm4_tunnel_deregister); -EXPORT_SYMBOL(xfrm4_tunnel_check_size); -EXPORT_SYMBOL(xfrm_register_type); -EXPORT_SYMBOL(xfrm_unregister_type); -EXPORT_SYMBOL(xfrm_get_type); -EXPORT_SYMBOL(inet_peer_idlock); -EXPORT_SYMBOL(xfrm_register_km); -EXPORT_SYMBOL(xfrm_unregister_km); -EXPORT_SYMBOL(xfrm_state_delete); -EXPORT_SYMBOL(xfrm_state_walk); -EXPORT_SYMBOL(xfrm_find_acq_byseq); -EXPORT_SYMBOL(xfrm_find_acq); -EXPORT_SYMBOL(xfrm_alloc_spi); -EXPORT_SYMBOL(xfrm_state_flush); -EXPORT_SYMBOL(xfrm_policy_kill); -EXPORT_SYMBOL(xfrm_policy_bysel); -EXPORT_SYMBOL(xfrm_policy_insert); -EXPORT_SYMBOL(xfrm_policy_walk); -EXPORT_SYMBOL(xfrm_policy_flush); -EXPORT_SYMBOL(xfrm_policy_byid); -EXPORT_SYMBOL(xfrm_policy_list); -EXPORT_SYMBOL(xfrm_dst_lookup); -EXPORT_SYMBOL(xfrm_policy_register_afinfo); -EXPORT_SYMBOL(xfrm_policy_unregister_afinfo); -EXPORT_SYMBOL(xfrm_policy_get_afinfo); -EXPORT_SYMBOL(xfrm_policy_put_afinfo); - -EXPORT_SYMBOL_GPL(xfrm_probe_algs); -EXPORT_SYMBOL_GPL(xfrm_count_auth_supported); -EXPORT_SYMBOL_GPL(xfrm_count_enc_supported); -EXPORT_SYMBOL_GPL(xfrm_aalg_get_byidx); -EXPORT_SYMBOL_GPL(xfrm_ealg_get_byidx); -EXPORT_SYMBOL_GPL(xfrm_calg_get_byidx); -EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid); -EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid); -EXPORT_SYMBOL_GPL(xfrm_calg_get_byid); -EXPORT_SYMBOL_GPL(xfrm_aalg_get_byname); -EXPORT_SYMBOL_GPL(xfrm_ealg_get_byname); -EXPORT_SYMBOL_GPL(xfrm_calg_get_byname); -EXPORT_SYMBOL_GPL(skb_icv_walk); #if defined(CONFIG_INET_ESP) || defined(CONFIG_INET_ESP_MODULE) || defined(CONFIG_INET6_ESP) || defined(CONFIG_INET6_ESP_MODULE) EXPORT_SYMBOL_GPL(skb_cow_data); EXPORT_SYMBOL_GPL(pskb_put); Index: linux/net/core/skbuff.c =================================================================== RCS file: /home/cvs/linux-2.5/net/core/skbuff.c,v retrieving revision 1.26 diff -u -r1.26 skbuff.c --- linux/net/core/skbuff.c 26 May 2003 05:13:42 -0000 1.26 +++ linux/net/core/skbuff.c 14 Jun 2003 07:58:45 -0000 @@ -225,7 +225,7 @@ } dst_release(skb->dst); -#ifdef CONFIG_INET +#ifdef CONFIG_XFRM secpath_put(skb->sp); #endif if(skb->destructor) { Index: linux/net/ipv4/Kconfig =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/Kconfig,v retrieving revision 1.8 diff -u -r1.8 Kconfig --- linux/net/ipv4/Kconfig 4 Jun 2003 05:50:13 -0000 1.8 +++ linux/net/ipv4/Kconfig 14 Jun 2003 07:58:46 -0000 @@ -187,6 +187,7 @@ config NET_IPIP tristate "IP: tunneling" depends on INET + select XFRM ---help--- Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the @@ -205,6 +206,7 @@ config NET_IPGRE tristate "IP: GRE tunnels over IP" depends on INET + select XFRM help Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the @@ -343,6 +345,7 @@ config INET_AH tristate "IP: AH transformation" + select XFRM select CRYPTO select CRYPTO_HMAC select CRYPTO_MD5 @@ -354,6 +357,7 @@ config INET_ESP tristate "IP: ESP transformation" + select XFRM select CRYPTO select CRYPTO_HMAC select CRYPTO_MD5 @@ -366,6 +370,7 @@ config INET_IPCOMP tristate "IP: IPComp transformation" + select XFRM select CRYPTO select CRYPTO_DEFLATE ---help--- Index: linux/net/ipv4/Makefile =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/Makefile,v retrieving revision 1.19 diff -u -r1.19 Makefile --- linux/net/ipv4/Makefile 4 Jun 2003 05:18:50 -0000 1.19 +++ linux/net/ipv4/Makefile 14 Jun 2003 07:58:46 -0000 @@ -22,4 +22,4 @@ obj-$(CONFIG_IP_PNP) += ipconfig.o obj-$(CONFIG_NETFILTER) += netfilter/ -obj-y += xfrm4_policy.o xfrm4_state.o xfrm4_input.o xfrm4_tunnel.o +obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o xfrm4_tunnel.o Index: linux/net/ipv4/route.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/route.c,v retrieving revision 1.52 diff -u -r1.52 route.c --- linux/net/ipv4/route.c 6 Jun 2003 15:53:37 -0000 1.52 +++ linux/net/ipv4/route.c 14 Jun 2003 07:58:50 -0000 @@ -2727,8 +2727,10 @@ create_proc_read_entry("net/rt_acct", 0, 0, ip_rt_acct_read, NULL); #endif #endif +#ifdef CONFIG_XFRM xfrm_init(); xfrm4_init(); +#endif out: return rc; out_enomem: Index: linux/net/ipv4/udp.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/udp.c,v retrieving revision 1.34 diff -u -r1.34 udp.c --- linux/net/ipv4/udp.c 7 Jun 2003 00:22:34 -0000 1.34 +++ linux/net/ipv4/udp.c 14 Jun 2003 07:58:52 -0000 @@ -945,6 +945,9 @@ */ static int udp_encap_rcv(struct sock * sk, struct sk_buff *skb) { +#ifndef CONFIG_XFRM + return 1; +#else struct udp_opt *up = udp_sk(sk); struct udphdr *uh = skb->h.uh; struct iphdr *iph; @@ -1004,10 +1007,12 @@ return -1; default: - printk(KERN_INFO "udp_encap_rcv(): Unhandled UDP encap type: %u\n", - encap_type); + if (net_ratelimit()) + printk(KERN_INFO "udp_encap_rcv(): Unhandled UDP encap type: %u\n", + encap_type); return 1; } +#endif } /* returns: Index: linux/net/ipv6/Kconfig =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/Kconfig,v retrieving revision 1.7 diff -u -r1.7 Kconfig --- linux/net/ipv6/Kconfig 9 Jun 2003 17:26:52 -0000 1.7 +++ linux/net/ipv6/Kconfig 14 Jun 2003 07:58:53 -0000 @@ -4,6 +4,7 @@ config IPV6_PRIVACY bool "IPv6: Privacy Extensions (RFC 3041) support" depends on IPV6 + select XFRM select CRYPTO select CRYPTO_MD5 ---help--- @@ -22,6 +23,7 @@ config INET6_AH tristate "IPv6: AH transformation" depends on IPV6 + select XFRM select CRYPTO select CRYPTO_HMAC select CRYPTO_MD5 @@ -34,6 +36,7 @@ config INET6_ESP tristate "IPv6: ESP transformation" depends on IPV6 + select XFRM select CRYPTO select CRYPTO_HMAC select CRYPTO_MD5 @@ -47,6 +50,7 @@ config INET6_IPCOMP tristate "IPv6: IPComp transformation" depends on IPV6 + select XFRM select CRYPTO select CRYPTO_DEFLATE ---help--- @@ -57,6 +61,7 @@ config IPV6_TUNNEL tristate "IPv6: IPv6-in-IPv6 tunnel" + select XFRM depends on IPV6 ---help--- Support for IPv6-in-IPv6 tunnels described in RFC 2473. Index: linux/net/ipv6/Makefile =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/Makefile,v retrieving revision 1.15 diff -u -r1.15 Makefile --- linux/net/ipv6/Makefile 9 Jun 2003 17:26:52 -0000 1.15 +++ linux/net/ipv6/Makefile 14 Jun 2003 07:58:53 -0000 @@ -8,8 +8,9 @@ route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o raw.o \ protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \ exthdrs.o sysctl_net_ipv6.o datagram.o proc.o \ - ip6_flowlabel.o ipv6_syms.o \ - xfrm6_policy.o xfrm6_state.o xfrm6_input.o + ip6_flowlabel.o ipv6_syms.o + +obj-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o obj-$(CONFIG_INET6_AH) += ah6.o obj-$(CONFIG_INET6_ESP) += esp6.o Index: linux/net/ipv6/ipv6_syms.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ipv6_syms.c,v retrieving revision 1.16 diff -u -r1.16 ipv6_syms.c --- linux/net/ipv6/ipv6_syms.c 9 Jun 2003 17:26:52 -0000 1.16 +++ linux/net/ipv6/ipv6_syms.c 14 Jun 2003 07:58:53 -0000 @@ -36,8 +36,10 @@ EXPORT_SYMBOL(in6addr_loopback); EXPORT_SYMBOL(in6_dev_finish_destroy); EXPORT_SYMBOL(ip6_find_1stfragopt); +#ifdef CONFIG_XFRM EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_clear_mutable_options); +#endif EXPORT_SYMBOL(rt6_lookup); EXPORT_SYMBOL(fl6_sock_lookup); EXPORT_SYMBOL(ipv6_ext_hdr); Index: linux/net/ipv6/route.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/route.c,v retrieving revision 1.40 diff -u -r1.40 route.c --- linux/net/ipv6/route.c 9 Jun 2003 17:26:52 -0000 1.40 +++ linux/net/ipv6/route.c 14 Jun 2003 07:58:56 -0000 @@ -1915,7 +1915,9 @@ if (p) p->proc_fops = &rt6_stats_seq_fops; #endif +#ifdef CONFIG_XFRM xfrm6_init(); +#endif } #ifdef MODULE Index: linux/net/xfrm/Kconfig =================================================================== RCS file: /home/cvs/linux-2.5/net/xfrm/Kconfig,v retrieving revision 1.2 diff -u -r1.2 Kconfig --- linux/net/xfrm/Kconfig 25 Mar 2003 18:55:43 -0000 1.2 +++ linux/net/xfrm/Kconfig 14 Jun 2003 07:58:57 -0000 @@ -1,9 +1,13 @@ # # XFRM configuration # +config XFRM + bool + depends on NET + config XFRM_USER tristate "IPsec user configuration interface" - depends on INET + depends on INET && XFRM ---help--- Support for IPsec user configuration interface used by native Linux tools. Index: linux/net/xfrm/Makefile =================================================================== RCS file: /home/cvs/linux-2.5/net/xfrm/Makefile,v retrieving revision 1.3 diff -u -r1.3 Makefile --- linux/net/xfrm/Makefile 4 May 2003 14:10:04 -0000 1.3 +++ linux/net/xfrm/Makefile 14 Jun 2003 07:58:57 -0000 @@ -2,6 +2,7 @@ # Makefile for the XFRM subsystem. # -obj-y := xfrm_policy.o xfrm_state.o xfrm_input.o xfrm_algo.o xfrm_output.o +obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_input.o xfrm_algo.o xfrm_output.o \ + xfrm_export.o obj-$(CONFIG_XFRM_USER) += xfrm_user.o --- /dev/null 2003-03-04 04:26:28.000000000 +0100 +++ linux-2.5/net/xfrm/xfrm_export.c 2003-06-14 10:36:49.000000000 +0200 @@ -0,0 +1,74 @@ + +#include + +EXPORT_SYMBOL(xfrm_user_policy); +EXPORT_SYMBOL(km_waitq); +EXPORT_SYMBOL(km_new_mapping); +EXPORT_SYMBOL(xfrm_cfg_sem); +EXPORT_SYMBOL(xfrm_policy_alloc); +EXPORT_SYMBOL(__xfrm_policy_destroy); +EXPORT_SYMBOL(xfrm_lookup); +EXPORT_SYMBOL(__xfrm_policy_check); +EXPORT_SYMBOL(__xfrm_route_forward); +EXPORT_SYMBOL(xfrm_state_alloc); +EXPORT_SYMBOL(__xfrm_state_destroy); +EXPORT_SYMBOL(xfrm_state_find); +EXPORT_SYMBOL(xfrm_state_insert); +EXPORT_SYMBOL(xfrm_state_check_expire); +EXPORT_SYMBOL(xfrm_state_check_space); +EXPORT_SYMBOL(xfrm_state_lookup); +EXPORT_SYMBOL(xfrm_state_register_afinfo); +EXPORT_SYMBOL(xfrm_state_unregister_afinfo); +EXPORT_SYMBOL(xfrm_state_get_afinfo); +EXPORT_SYMBOL(xfrm_state_put_afinfo); +EXPORT_SYMBOL(xfrm_state_delete_tunnel); +EXPORT_SYMBOL(xfrm_replay_check); +EXPORT_SYMBOL(xfrm_replay_advance); +EXPORT_SYMBOL(xfrm_check_selectors); +EXPORT_SYMBOL(xfrm_check_output); +EXPORT_SYMBOL(__secpath_destroy); +EXPORT_SYMBOL(xfrm_get_acqseq); +EXPORT_SYMBOL(xfrm_parse_spi); +EXPORT_SYMBOL(xfrm4_rcv); +EXPORT_SYMBOL(xfrm4_tunnel_register); +EXPORT_SYMBOL(xfrm4_tunnel_deregister); +EXPORT_SYMBOL(xfrm4_tunnel_check_size); +EXPORT_SYMBOL(xfrm_register_type); +EXPORT_SYMBOL(xfrm_unregister_type); +EXPORT_SYMBOL(xfrm_get_type); +EXPORT_SYMBOL(inet_peer_idlock); +EXPORT_SYMBOL(xfrm_register_km); +EXPORT_SYMBOL(xfrm_unregister_km); +EXPORT_SYMBOL(xfrm_state_delete); +EXPORT_SYMBOL(xfrm_state_walk); +EXPORT_SYMBOL(xfrm_find_acq_byseq); +EXPORT_SYMBOL(xfrm_find_acq); +EXPORT_SYMBOL(xfrm_alloc_spi); +EXPORT_SYMBOL(xfrm_state_flush); +EXPORT_SYMBOL(xfrm_policy_kill); +EXPORT_SYMBOL(xfrm_policy_bysel); +EXPORT_SYMBOL(xfrm_policy_insert); +EXPORT_SYMBOL(xfrm_policy_walk); +EXPORT_SYMBOL(xfrm_policy_flush); +EXPORT_SYMBOL(xfrm_policy_byid); +EXPORT_SYMBOL(xfrm_policy_list); +EXPORT_SYMBOL(xfrm_dst_lookup); +EXPORT_SYMBOL(xfrm_policy_register_afinfo); +EXPORT_SYMBOL(xfrm_policy_unregister_afinfo); +EXPORT_SYMBOL(xfrm_policy_get_afinfo); +EXPORT_SYMBOL(xfrm_policy_put_afinfo); + +EXPORT_SYMBOL_GPL(xfrm_probe_algs); +EXPORT_SYMBOL_GPL(xfrm_count_auth_supported); +EXPORT_SYMBOL_GPL(xfrm_count_enc_supported); +EXPORT_SYMBOL_GPL(xfrm_aalg_get_byidx); +EXPORT_SYMBOL_GPL(xfrm_ealg_get_byidx); +EXPORT_SYMBOL_GPL(xfrm_calg_get_byidx); +EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid); +EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid); +EXPORT_SYMBOL_GPL(xfrm_calg_get_byid); +EXPORT_SYMBOL_GPL(xfrm_aalg_get_byname); +EXPORT_SYMBOL_GPL(xfrm_ealg_get_byname); +EXPORT_SYMBOL_GPL(xfrm_calg_get_byname); + +EXPORT_SYMBOL_GPL(skb_icv_walk); From davem@redhat.com Sat Jun 14 02:30:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 02:31:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E9Ut2x017619 for ; Sat, 14 Jun 2003 02:30:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA04355; Sat, 14 Jun 2003 02:27:03 -0700 Date: Sat, 14 Jun 2003 02:27:02 -0700 (PDT) Message-Id: <20030614.022702.41637600.davem@redhat.com> To: ak@suse.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional From: "David S. Miller" In-Reply-To: <20030614091631.GA16993@wotan.suse.de> References: <20030614091631.GA16993@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3238 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Sat, 14 Jun 2003 11:16:31 +0200 This patches only compiles the xfrm subsystem in when any of the options using it are selected. This shrinks the text segment on an amd64 kernel by ~32k, data by ~6k, bss by ~33k, overall ~72K memory saved. I'm not going to apply this, sorry Andi. I want the freedom to use the XFRM layer for generic things at some point. How about working on making the xfrm layer more lean instead? :) From ak@suse.de Sat Jun 14 02:36:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 02:36:46 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E9aa2x018108 for ; Sat, 14 Jun 2003 02:36:37 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id AD6201470B; Sat, 14 Jun 2003 11:36:30 +0200 (MEST) Date: Sat, 14 Jun 2003 11:36:30 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional Message-ID: <20030614093630.GB16993@wotan.suse.de> References: <20030614091631.GA16993@wotan.suse.de> <20030614.022702.41637600.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030614.022702.41637600.davem@redhat.com> X-archive-position: 3239 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Sat, Jun 14, 2003 at 02:27:02AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Sat, 14 Jun 2003 11:16:31 +0200 > > This patches only compiles the xfrm subsystem in when any of the options > using it are selected. This shrinks the text segment on an amd64 > kernel by ~32k, data by ~6k, bss by ~33k, overall ~72K memory saved. > > I'm not going to apply this, sorry Andi. > > I want the freedom to use the XFRM layer for generic things > at some point. But in 2.7 surely right? When what happens you can easily make CONFIG_XFRM the default. This would give the 2.6 users an useful option. Also when you do use it generically you will hopefully discard some old code (like the rt cache?) which may make up for the additional bloat. But until that happens having both even when not needed doesn't make too much sense. > > How about working on making the xfrm layer more lean instead? :) My last proposal for this (using hlists in the hash tables) was rejected, so I don't see much chance to do this. -Andi From davem@redhat.com Sat Jun 14 02:42:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 02:42:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5E9gZ2x020029 for ; Sat, 14 Jun 2003 02:42:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA04420; Sat, 14 Jun 2003 02:38:43 -0700 Date: Sat, 14 Jun 2003 02:38:43 -0700 (PDT) Message-Id: <20030614.023843.78709528.davem@redhat.com> To: ak@suse.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional From: "David S. Miller" In-Reply-To: <20030614093630.GB16993@wotan.suse.de> References: <20030614091631.GA16993@wotan.suse.de> <20030614.022702.41637600.davem@redhat.com> <20030614093630.GB16993@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3240 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Sat, 14 Jun 2003 11:36:30 +0200 Also when you do use it generically you will hopefully discard some old code (like the rt cache?) which may make up for the additional bloat. But until that happens having both even when not needed doesn't make too much sense. The rtcache will likely be retained as a flow cache lookup miss handler even once we use the flowcache for all lookups. Actually, that entire area is in flux, I still do not know the fate of the rtcache even without the flow cache :) > How about working on making the xfrm layer more lean instead? :) My last proposal for this (using hlists in the hash tables) was rejected, so I don't see much chance to do this. Because hlists cannot retain the behavior we need, specifically because we need the ability to add to the tail. If it's some in-kernel-image table, why not dynamically allocate the table in question? From ak@suse.de Sat Jun 14 03:18:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 03:19:08 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EAIv2x020678 for ; Sat, 14 Jun 2003 03:18:58 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id D17F81431F; Sat, 14 Jun 2003 12:18:51 +0200 (MEST) Date: Sat, 14 Jun 2003 12:18:51 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional Message-ID: <20030614101851.GA24170@wotan.suse.de> References: <20030614091631.GA16993@wotan.suse.de> <20030614.022702.41637600.davem@redhat.com> <20030614093630.GB16993@wotan.suse.de> <20030614.023843.78709528.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030614.023843.78709528.davem@redhat.com> X-archive-position: 3241 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Sat, Jun 14, 2003 at 02:38:43AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Sat, 14 Jun 2003 11:36:30 +0200 > > Also when you do use it generically you will hopefully > discard some old code (like the rt cache?) which may make > up for the additional bloat. But until that happens having > both even when not needed doesn't make too much sense. > > The rtcache will likely be retained as a flow cache lookup > miss handler even once we use the flowcache for all lookups. > > Actually, that entire area is in flux, I still do not know the > fate of the rtcache even without the flow cache :) In that case you could really apply the patch. It doesn't close any future options for you, just makes live a bit better for some users today. > > > How about working on making the xfrm layer more lean instead? :) > > My last proposal for this (using hlists in the hash tables) was > rejected, so I don't see much chance to do this. > > Because hlists cannot retain the behavior we need, specifically > because we need the ability to add to the tail. > > If it's some in-kernel-image table, why not dynamically allocate the > table in question? Allocating it at first lookup would be racy (would need a nasty spinlock at least). It may be possible at first policy setup, but it's not guaranteed you can still get two 32K continuous areas. You could fall back to vmalloc I guess. Allocating it at bootup would be equivalent to the current BSS allocation. Advantage of the dynamic allocation is that it would work for vendor kernels also. -Andi From davem@redhat.com Sat Jun 14 04:30:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 04:31:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EBUT2x022637 for ; Sat, 14 Jun 2003 04:30:34 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id EAA04755; Sat, 14 Jun 2003 04:26:36 -0700 Date: Sat, 14 Jun 2003 04:26:36 -0700 (PDT) Message-Id: <20030614.042636.74749587.davem@redhat.com> To: ak@suse.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional From: "David S. Miller" In-Reply-To: <20030614101851.GA24170@wotan.suse.de> References: <20030614093630.GB16993@wotan.suse.de> <20030614.023843.78709528.davem@redhat.com> <20030614101851.GA24170@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3242 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Sat, 14 Jun 2003 12:18:51 +0200 Allocating it at first lookup would be racy (would need a nasty spinlock at least). It may be possible at first policy setup, but it's not guaranteed you can still get two 32K continuous areas. You could fall back to vmalloc I guess. Andi, you're getting rediculious. Add a xfrm_whatever_init() call and allocate the table there. Oh, I see, we do that already... end of discussion I guess. From greg@kroah.com Sat Jun 14 10:09:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 10:10:07 -0700 (PDT) Received: from granite.he.net (granite.he.net [216.218.226.66]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EH9u2x004621 for ; Sat, 14 Jun 2003 10:09:56 -0700 Received: from [192.168.0.105] (12-203-16-54.client.attbi.com [12.203.16.54]) by granite.he.net (8.8.6p2003-03-31/8.8.2) with ESMTP id KAA00750; Sat, 14 Jun 2003 10:09:03 -0700 Delivered-To: netdev@oss.sgi.com Received: from greg by echidna.kroah.org with local (masqmail 0.2.19) id 19REWH-0tC-00; Sat, 14 Jun 2003 10:08:49 -0700 Date: Sat, 14 Jun 2003 10:08:49 -0700 From: Greg KH To: "David S. Miller" Cc: ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) Message-ID: <20030614170848.GA3324@kroah.com> References: <20030613.224122.104034261.davem@redhat.com> <5.1.0.14.2.20030614154954.026b4768@mira-sjcm-3.cisco.com> <20030613.230850.85410095.davem@redhat.com> <20030613.231418.39160686.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030613.231418.39160686.davem@redhat.com> User-Agent: Mutt/1.4.1i X-archive-position: 3243 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg@kroah.com Precedence: bulk X-list: netdev On Fri, Jun 13, 2003 at 11:14:18PM -0700, David S. Miller wrote: > > Folks, can we remove whatever member of this CC: list creates > bounces that say: > > Your message to Linux_news awaits moderator approval It's someone subscribed to linux-kernel@vger.kernel.org that causes this. I've complained to the admin of that mail-news gateway that is barfing on too many CC: members in the past to not do this, but it doesn't seem like they are listening... greg k-h From ak@suse.de Sat Jun 14 11:32:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 11:32:47 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EIWc2x007416 for ; Sat, 14 Jun 2003 11:32:39 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 02B1A14B25; Sat, 14 Jun 2003 20:32:33 +0200 (MEST) Date: Sat, 14 Jun 2003 20:32:32 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional Message-ID: <20030614183232.GB23546@wotan.suse.de> References: <20030614093630.GB16993@wotan.suse.de> <20030614.023843.78709528.davem@redhat.com> <20030614101851.GA24170@wotan.suse.de> <20030614.042636.74749587.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030614.042636.74749587.davem@redhat.com> X-archive-position: 3244 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Sat, Jun 14, 2003 at 04:26:36AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Sat, 14 Jun 2003 12:18:51 +0200 > > Allocating it at first lookup would be racy (would need a nasty > spinlock at least). It may be possible at first policy setup, but > it's not guaranteed you can still get two 32K continuous areas. You > could fall back to vmalloc I guess. > > Andi, you're getting rediculious. Add a xfrm_whatever_init() call > and allocate the table there. Did you actually read what I wrote? Allocating on init is useless from the bloat perspective because it's 100% equivalent to an BSS allocation. -Andi From jsd@monmouth.com Sat Jun 14 11:49:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 11:50:00 -0700 (PDT) Received: from tadenker.com (tadenker.com [65.103.215.217]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EInt2x007796 for ; Sat, 14 Jun 2003 11:49:56 -0700 Received: (qmail 4538 invoked from network); 14 Jun 2003 18:49:46 -0000 Received: from unknown (HELO av8n.net) (10.200.2.1) by jeeves.office.tad.private with SMTP; 14 Jun 2003 18:49:46 -0000 Received: (qmail 31492 invoked from network); 14 Jun 2003 18:49:45 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 14 Jun 2003 18:49:45 -0000 Message-ID: <3EEB6E49.8040406@monmouth.com> Date: Sat, 14 Jun 2003 14:49:45 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: decorum References: <20030614093630.GB16993@wotan.suse.de> <20030614.023843.78709528.davem@redhat.com> <20030614101851.GA24170@wotan.suse.de> <20030614.042636.74749587.davem@redhat.com> <20030614183232.GB23546@wotan.suse.de> In-Reply-To: <20030614183232.GB23546@wotan.suse.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3245 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev Could we please have less ad-hominem invective and less profanity on this list? If you have a good point to make, making it politely costs you nothing. You know who you are. From ralph@istop.com Sat Jun 14 15:02:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 15:02:47 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5EM2d2x014278 for ; Sat, 14 Jun 2003 15:02:40 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 85E1E369BF; Sat, 14 Jun 2003 18:02:38 -0400 (EDT) Date: Sat, 14 Jun 2003 18:02:53 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: "John S. Denker" Cc: "netdev@oss.sgi.com" Subject: Re: decorum In-Reply-To: <3EEB6E49.8040406@monmouth.com> Message-ID: References: <20030614093630.GB16993@wotan.suse.de> <20030614.023843.78709528.davem@redhat.com> <20030614101851.GA24170@wotan.suse.de> <20030614.042636.74749587.davem@redhat.com> <20030614183232.GB23546@wotan.suse.de> <3EEB6E49.8040406@monmouth.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3246 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev You sound like the NANOG bitch. If your ears can't handle hearing "fuck" now and then, get off the net and back to your bible study group. Ralph Doncaster, president 6042147 Canada Inc. o/a IStop.com On Sat, 14 Jun 2003, John S. Denker wrote: > Could we please have less ad-hominem invective > and less profanity on this list? > > If you have a good point to make, making it > politely costs you nothing. > > You know who you are. > > > From davem@redhat.com Sat Jun 14 20:06:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 20:06:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F36D2x028207 for ; Sat, 14 Jun 2003 20:06:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA05773; Sat, 14 Jun 2003 20:01:52 -0700 Date: Sat, 14 Jun 2003 20:01:51 -0700 (PDT) Message-Id: <20030614.200151.41637995.davem@redhat.com> To: greg@kroah.com Cc: ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, hdierks@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: <20030614170848.GA3324@kroah.com> References: <20030613.230850.85410095.davem@redhat.com> <20030613.231418.39160686.davem@redhat.com> <20030614170848.GA3324@kroah.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3247 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Greg KH Date: Sat, 14 Jun 2003 10:08:49 -0700 It's someone subscribed to linux-kernel@vger.kernel.org that causes this. Thanks, I've nuked linux_news@nextphere.com, if they resubscribe without contacting me, I'll block future subscription attempts from them. From davem@redhat.com Sat Jun 14 20:07:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 20:07:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F3722x028312 for ; Sat, 14 Jun 2003 20:07:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA05782; Sat, 14 Jun 2003 20:03:03 -0700 Date: Sat, 14 Jun 2003 20:03:03 -0700 (PDT) Message-Id: <20030614.200303.71094694.davem@redhat.com> To: ak@suse.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional From: "David S. Miller" In-Reply-To: <20030614183232.GB23546@wotan.suse.de> References: <20030614101851.GA24170@wotan.suse.de> <20030614.042636.74749587.davem@redhat.com> <20030614183232.GB23546@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3248 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Sat, 14 Jun 2003 20:32:32 +0200 On Sat, Jun 14, 2003 at 04:26:36AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Sat, 14 Jun 2003 12:18:51 +0200 > > Allocating it at first lookup would be racy (would need a nasty > spinlock at least). It may be possible at first policy setup, but > it's not guaranteed you can still get two 32K continuous areas. You > could fall back to vmalloc I guess. > > Andi, you're getting rediculious. Add a xfrm_whatever_init() call > and allocate the table there. Did you actually read what I wrote? Allocating on init is useless from the bloat perspective because it's 100% equivalent to an BSS allocation. If dynamic, you could allocate a "tiny" hash table or whatever on bootup and grow it as usage increases, much like we grow the FIB hashes dynamically. From davem@redhat.com Sat Jun 14 20:08:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 20:08:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F38K2x028689 for ; Sat, 14 Jun 2003 20:08:20 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA05794; Sat, 14 Jun 2003 20:04:16 -0700 Date: Sat, 14 Jun 2003 20:04:15 -0700 (PDT) Message-Id: <20030614.200415.104036933.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: jsd@monmouth.com, netdev@oss.sgi.com Subject: Re: decorum From: "David S. Miller" In-Reply-To: References: <20030614183232.GB23546@wotan.suse.de> <3EEB6E49.8040406@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3249 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Sat, 14 Jun 2003 18:02:53 -0400 (EDT) You sound like the NANOG bitch. If your ears can't handle hearing "fuck" now and then, get off the net and back to your bible study group. I totally agree, this isn't a family list. From davem@redhat.com Sat Jun 14 23:40:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 14 Jun 2003 23:40:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F6ek2x004353 for ; Sat, 14 Jun 2003 23:40:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA10606; Sat, 14 Jun 2003 23:36:29 -0700 Date: Sat, 14 Jun 2003 23:36:28 -0700 (PDT) Message-Id: <20030614.233628.28792212.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16105.43543.826589.672148@robur.slu.se> References: <16101.4136.328760.955758@robur.slu.se> <20030612.232114.71088346.davem@redhat.com> <16105.43543.826589.672148@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3250 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Fri, 13 Jun 2003 12:40:23 +0200 David S. Miller writes: > From: Robert Olsson > Date: Tue, 10 Jun 2003 00:54:32 +0200 > > I'm about to propose some stats even for hash spinning.... > > Do you mind if I apply this? It looks fine. No please do. There is an updated rtstat already. I am, but I have to do this by hand. It seems your email client has caught the disease that turns all tabs into spaces :( From davem@redhat.com Sun Jun 15 00:32:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 00:32:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F7W12x005677 for ; Sun, 15 Jun 2003 00:32:01 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA11471; Sun, 15 Jun 2003 00:26:57 -0700 Date: Sun, 15 Jun 2003 00:26:56 -0700 (PDT) Message-Id: <20030615.002656.35017399.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [patch] IPV6: Refcount leaks in udpv6_connect() From: "David S. Miller" In-Reply-To: References: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3251 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Sat, 14 Jun 2003 00:07:25 +0300 (EEST) The diff is done against ChangeSet 1.1307. Your patch does not apply without rejects. Please regenerate. From davem@redhat.com Sun Jun 15 00:33:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 00:33:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F7Xl2x005982 for ; Sun, 15 Jun 2003 00:33:47 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA11489; Sun, 15 Jun 2003 00:29:46 -0700 Date: Sun, 15 Jun 2003 00:29:46 -0700 (PDT) Message-Id: <20030615.002946.15265501.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, miyazawa@linux-ipv6.org Subject: Re: [PATCH] [XFRM] xfrm_alloc_spi() always selected minspi From: "David S. Miller" In-Reply-To: <20030614.111619.37286104.yoshfuji@linux-ipv6.org> References: <20030614.111619.37286104.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3252 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Sat, 14 Jun 2003 11:16:19 +0900 (JST) net/xfrm/xfrm_state.c:xfrm_alloc_spi() always selected minspi because of typo. ... Here's the fix. Patch applied, thanks. From davem@redhat.com Sun Jun 15 00:46:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 00:46:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F7kT2x006334 for ; Sun, 15 Jun 2003 00:46:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA11542; Sun, 15 Jun 2003 00:42:26 -0700 Date: Sun, 15 Jun 2003 00:42:26 -0700 (PDT) Message-Id: <20030615.004226.68040346.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] convert slip driver to alloc_netdev From: "David S. Miller" In-Reply-To: <20030613160942.384ca2c3.shemminger@osdl.org> References: <20030613160942.384ca2c3.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3253 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 13 Jun 2003 16:09:42 -0700 This patch is against 2.5.70 with all the earlier net patches applied. Tested with dedicated serial cable between 2.4 and SUT. Applied, thanks Stephen. From davem@redhat.com Sun Jun 15 00:54:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 00:55:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F7su2x006676 for ; Sun, 15 Jun 2003 00:54:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA11554; Sun, 15 Jun 2003 00:50:55 -0700 Date: Sun, 15 Jun 2003 00:50:55 -0700 (PDT) Message-Id: <20030615.005055.55726223.davem@redhat.com> To: shemminger@osdl.org Cc: greg@kroah.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] network hotplug via class_device/kobject From: "David S. Miller" In-Reply-To: <20030613164119.15209934.shemminger@osdl.org> References: <20030613164119.15209934.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3254 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 13 Jun 2003 16:41:19 -0700 This patch changes network devices to run hotplug out of the kobject/class_device infrastructure rather than calling it from the network core. The code gets simpler and there is only one place for Greg to fix when he changes the API ;-) I'll apply this patch, looks fine. Paranoid about some driver doing something like: rtnl_lock(); register_netdevice(); unregister_netdevice(); rtnl_unlock() BOOM These sorts of turds exist at least in two places: 1) drivers/net/wan/comx.c 2) drivers/net/wan/hdlc_fr.c But it is pretty clear that these two drivers have been tried by nobody in recent years. They both call into {un,}register_netdevice without the RTNL semaphore held. From ak@suse.de Sun Jun 15 01:08:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 01:08:38 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5F88U2x007151 for ; Sun, 15 Jun 2003 01:08:31 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 068961455A; Sun, 15 Jun 2003 10:08:25 +0200 (MEST) Date: Sun, 15 Jun 2003 10:08:24 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, netdev@oss.sgi.com Subject: Re: [PATCH] Make xfrm subsystem optional Message-ID: <20030615080824.GA11398@wotan.suse.de> References: <20030614101851.GA24170@wotan.suse.de> <20030614.042636.74749587.davem@redhat.com> <20030614183232.GB23546@wotan.suse.de> <20030614.200303.71094694.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030614.200303.71094694.davem@redhat.com> X-archive-position: 3255 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Sat, Jun 14, 2003 at 08:03:03PM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Sat, 14 Jun 2003 20:32:32 +0200 > > On Sat, Jun 14, 2003 at 04:26:36AM -0700, David S. Miller wrote: > > From: Andi Kleen > > Date: Sat, 14 Jun 2003 12:18:51 +0200 > > > > Allocating it at first lookup would be racy (would need a nasty > > spinlock at least). It may be possible at first policy setup, but > > it's not guaranteed you can still get two 32K continuous areas. You > > could fall back to vmalloc I guess. > > > > Andi, you're getting rediculious. Add a xfrm_whatever_init() call > > and allocate the table there. > > Did you actually read what I wrote? Allocating on init is useless > from the bloat perspective because it's 100% equivalent to an BSS > allocation. > > If dynamic, you could allocate a "tiny" hash table or whatever > on bootup and grow it as usage increases, much like we grow the > FIB hashes dynamically. I suspect dynamic growing at runtime would result in interesting SMP locking issues (suspect - i haven't studied the xfrm locking in detail yet) Also it has the same problem - 32K direct mapping allocation may not work anymore during runtime. I can take a look later if something else can be improved, but I'm not very optimistic (the design of this thing is just not lightweight neither in code nor data structures) Especially since you already vetoed the two best looking options. It is just another inetpeer cache, a sleeping gigant waiting mostly unused for its great days ;) -Andi From romieu@fr.zoreil.com Sun Jun 15 03:52:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 03:52:25 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FAqF2x012331 for ; Sun, 15 Jun 2003 03:52:17 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.12.8/8.12.1) with ESMTP id h5FAjLvY012508; Sun, 15 Jun 2003 12:45:21 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.12.8/8.12.1) id h5FAjJBk012507; Sun, 15 Jun 2003 12:45:19 +0200 Date: Sun, 15 Jun 2003 12:45:19 +0200 From: Francois Romieu To: "David S. Miller" Cc: khc@pm.waw.pl, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] network hotplug via class_device/kobject Message-ID: <20030615124519.A11939@electric-eye.fr.zoreil.com> References: <20030613164119.15209934.shemminger@osdl.org> <20030615.005055.55726223.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030615.005055.55726223.davem@redhat.com>; from davem@redhat.com on Sun, Jun 15, 2003 at 12:50:55AM -0700 X-archive-position: 3256 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev David S. Miller : [rtnl_lock() and register_netdevice()] > Paranoid about some driver doing something like: > rtnl_lock(); register_netdevice(); unregister_netdevice(); rtnl_unlock() BOOM > > These sorts of turds exist at least in two places: > > 1) drivers/net/wan/comx.c > 2) drivers/net/wan/hdlc_fr.c > > But it is pretty clear that these two drivers have been > tried by nobody in recent years. They both call into It's pretty clear but it's false :o) > {un,}register_netdevice without the RTNL semaphore held. There is a maintainer for 2): GENERIC HDLC DRIVER, N2 AND C101 DRIVERS P: Krzysztof Halasa M: khc@pm.waw.pl W: http://hq.pm.waw.pl/hdlc/ S: Maintained -- Ueimor From kazunori@miyazawa.org Sun Jun 15 06:04:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 06:04:38 -0700 (PDT) Received: from miyazawa.org (usen-221x116x13x66.ap-US01.usen.ad.jp [221.116.13.66]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FD4O2x018789 for ; Sun, 15 Jun 2003 06:04:25 -0700 Received: from monza.miyazawa.org ([::ffff:192.168.0.3]) (IDENT: miyazawa, AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Sun, 15 Jun 2003 22:01:27 +0900 Date: Sun, 15 Jun 2003 22:06:32 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH][IPV6] fix ipv6 header handling of AH input. Message-Id: <20030615220632.0701a5cd.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3257 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, This patch fixes ipv6 header handling of ah input and moves the routine to clear mutable options. It reduces unnecessary header clearing when a packet has only ESP. This patch for linux-2.5.70 + CS1.1307 Best regards, --Kazunori Miyazawa (Yokogawa Electric Corporation) Index: linux25/include/net/xfrm.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/xfrm.h,v retrieving revision 1.1.1.25 retrieving revision 1.1.1.25.4.1 diff -u -r1.1.1.25 -r1.1.1.25.4.1 --- linux25/include/net/xfrm.h 10 Jun 2003 13:21:39 -0000 1.1.1.25 +++ linux25/include/net/xfrm.h 13 Jun 2003 14:43:37 -0000 1.1.1.25.4.1 @@ -782,7 +782,6 @@ extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler); extern int xfrm4_tunnel_check_size(struct sk_buff *skb); extern int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp); -extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); void xfrm_policy_init(void); Index: linux25/net/ipv6/ah6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ah6.c,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.16.2 diff -u -r1.1.1.13 -r1.1.1.13.16.2 --- linux25/net/ipv6/ah6.c 26 May 2003 08:04:11 -0000 1.1.1.13 +++ linux25/net/ipv6/ah6.c 15 Jun 2003 12:18:24 -0000 1.1.1.13.16.2 @@ -36,6 +36,114 @@ #include #include +static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) +{ + u8 *opt = (u8 *)opthdr; + int len = ipv6_optlen(opthdr); + int off = 0; + int optlen = 0; + + off += 2; + len -= 2; + + while (len > 0) { + + switch (opt[off]) { + + case IPV6_TLV_PAD0: + optlen = 1; + break; + default: + if (len < 2) + goto bad; + optlen = opt[off+1]+2; + if (len < optlen) + goto bad; + if (opt[off] & 0x20) + memset(&opt[off+2], 0, opt[off+1]); + break; + } + + off += optlen; + len -= optlen; + } + if (len == 0) + return 1; + +bad: + return 0; +} + +static int ipv6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + unsigned int packet_len = skb->tail - skb->nh.raw; + u8 nexthdr = skb->nh.ipv6h->nexthdr; + u8 nextnexthdr = 0; + + *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + + while (offset + 1 <= packet_len) { + + switch (nexthdr) { + + case NEXTHDR_HOP: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun hopopts\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_ROUTING: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_DEST: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_AUTH: + if (dir == XFRM_POLICY_OUT) { + memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, + (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); + } + if (exthdr->nexthdr == NEXTHDR_DEST) { + offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + nextnexthdr = exthdr->nexthdr; + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + } + return nexthdr; + default : + return nexthdr; + } + } + + return nexthdr; +} + int ah6_output(struct sk_buff *skb) { int err; @@ -80,7 +188,7 @@ memcpy(iph, skb->data, hdr_len); skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); memcpy(skb->nh.ipv6h, iph, hdr_len); - nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); + nexthdr = ipv6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); if (nexthdr == 0) goto error; @@ -138,20 +246,46 @@ int ah6_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) { - int ah_hlen; - struct ipv6hdr *iph; + /* + * Before process AH + * [IPv6][Ext1][Ext2][AH][Dest][Payload] + * |<-------------->| hdr_len + * |<------------------------>| cleared_hlen + * + * To erase AH: + * Keeping copy of cleared headers. After AH processing, + * Moving the pointer of skb->nh.raw by using skb_pull as long as AH + * header length. Then copy back the copy as long as hdr_len + * If destination header following AH exists, copy it into after [Ext2]. + * + * |<>|[IPv6][Ext1][Ext2][Dest][Payload] + * There is offset of AH before IPv6 header after the process. + */ + + struct ipv6hdr *iph = skb->nh.ipv6h; struct ipv6_auth_hdr *ah; struct ah_data *ahp; unsigned char *tmp_hdr = NULL; - int hdr_len = skb->h.raw - skb->nh.raw; + u16 hdr_len = skb->data - skb->nh.raw; + u16 ah_hlen; + u16 cleared_hlen = hdr_len; + u16 nh_offset = 0; u8 nexthdr = 0; + u8 *prevhdr; if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) goto out; ah = (struct ipv6_auth_hdr*)skb->data; ahp = x->data; - ah_hlen = (ah->hdrlen + 2) << 2; + nexthdr = ah->nexthdr; + ah_hlen = (ah->hdrlen + 2) << 2; + cleared_hlen += ah_hlen; + + if (nexthdr == NEXTHDR_DEST) { + struct ipv6_opt_hdr *dsthdr = (struct ipv6_opt_hdr*)(skb->data + ah_hlen); + cleared_hlen += ipv6_optlen(dsthdr); + } if (ah_hlen != XFRM_ALIGN8(sizeof(struct ipv6_auth_hdr) + ahp->icv_full_len) && ah_hlen != XFRM_ALIGN8(sizeof(struct ipv6_auth_hdr) + ahp->icv_trunc_len)) @@ -166,12 +300,16 @@ pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) goto out; - tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + tmp_hdr = kmalloc(cleared_hlen, GFP_ATOMIC); if (!tmp_hdr) goto out; - memcpy(tmp_hdr, skb->nh.raw, hdr_len); - ah = (struct ipv6_auth_hdr*)skb->data; - iph = skb->nh.ipv6h; + memcpy(tmp_hdr, skb->nh.raw, cleared_hlen); + ipv6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); + iph->priority = 0; + iph->flow_lbl[0] = 0; + iph->flow_lbl[1] = 0; + iph->flow_lbl[2] = 0; + iph->hop_limit = 0; { u8 auth_data[ahp->icv_trunc_len]; @@ -187,9 +325,15 @@ } } - nexthdr = ((struct ipv6hdr*)tmp_hdr)->nexthdr = ah->nexthdr; - skb->nh.raw = skb_pull(skb, (ah->hdrlen+2)<<2); + skb->nh.raw = skb_pull(skb, ah_hlen); memcpy(skb->nh.raw, tmp_hdr, hdr_len); + if (nexthdr == NEXTHDR_DEST) { + memcpy(skb->nh.raw + hdr_len, + tmp_hdr + hdr_len + ah_hlen, + cleared_hlen - hdr_len - ah_hlen); + } + prevhdr = (u8*)(skb->nh.raw + nh_offset); + *prevhdr = nexthdr; skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); skb_pull(skb, hdr_len); skb->h.raw = skb->data; Index: linux25/net/ipv6/ipv6_syms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipv6_syms.c,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.4.1 diff -u -r1.1.1.13 -r1.1.1.13.4.1 --- linux25/net/ipv6/ipv6_syms.c 10 Jun 2003 13:21:55 -0000 1.1.1.13 +++ linux25/net/ipv6/ipv6_syms.c 13 Jun 2003 14:43:37 -0000 1.1.1.13.4.1 @@ -37,7 +37,6 @@ EXPORT_SYMBOL(in6_dev_finish_destroy); EXPORT_SYMBOL(ip6_find_1stfragopt); EXPORT_SYMBOL(xfrm6_rcv); -EXPORT_SYMBOL(xfrm6_clear_mutable_options); EXPORT_SYMBOL(rt6_lookup); EXPORT_SYMBOL(fl6_sock_lookup); EXPORT_SYMBOL(ipv6_ext_hdr); Index: linux25/net/ipv6/xfrm6_input.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/xfrm6_input.c,v retrieving revision 1.1.1.7 retrieving revision 1.1.1.7.22.1 diff -u -r1.1.1.7 -r1.1.1.7.22.1 --- linux25/net/ipv6/xfrm6_input.c 6 May 2003 12:43:55 -0000 1.1.1.7 +++ linux25/net/ipv6/xfrm6_input.c 13 Jun 2003 14:43:37 -0000 1.1.1.7.22.1 @@ -15,114 +15,6 @@ static kmem_cache_t *secpath_cachep; -static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) -{ - u8 *opt = (u8 *)opthdr; - int len = ipv6_optlen(opthdr); - int off = 0; - int optlen = 0; - - off += 2; - len -= 2; - - while (len > 0) { - - switch (opt[off]) { - - case IPV6_TLV_PAD0: - optlen = 1; - break; - default: - if (len < 2) - goto bad; - optlen = opt[off+1]+2; - if (len < optlen) - goto bad; - if (opt[off] & 0x20) - memset(&opt[off+2], 0, opt[off+1]); - break; - } - - off += optlen; - len -= optlen; - } - if (len == 0) - return 1; - -bad: - return 0; -} - -int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) -{ - u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - unsigned int packet_len = skb->tail - skb->nh.raw; - u8 nexthdr = skb->nh.ipv6h->nexthdr; - u8 nextnexthdr = 0; - - *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; - - while (offset + 1 <= packet_len) { - - switch (nexthdr) { - - case NEXTHDR_HOP: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun hopopts\n"); - return 0; - } - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_ROUTING: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_DEST: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun destopt\n"); - return 0; - } - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_AUTH: - if (dir == XFRM_POLICY_OUT) { - memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, - (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); - } - if (exthdr->nexthdr == NEXTHDR_DEST) { - offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - nextnexthdr = exthdr->nexthdr; - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun destopt\n"); - return 0; - } - } - return nexthdr; - default : - return nexthdr; - } - } - - return nexthdr; -} - int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) { struct sk_buff *skb = *pskb; @@ -132,26 +24,12 @@ struct xfrm_state *x; int xfrm_nr = 0; int decaps = 0; - struct ipv6hdr *hdr = skb->nh.ipv6h; - unsigned char *tmp_hdr = NULL; - int hdr_len = 0; - u16 nh_offset = 0; int nexthdr = 0; + u8 *prevhdr = NULL; - nh_offset = ((unsigned char*)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; - hdr_len = sizeof(struct ipv6hdr); - - tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); - if (!tmp_hdr) - goto drop; - memcpy(tmp_hdr, skb->nh.raw, hdr_len); - - nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); - hdr->priority = 0; - hdr->flow_lbl[0] = 0; - hdr->flow_lbl[1] = 0; - hdr->flow_lbl[2] = 0; - hdr->hop_limit = 0; + ip6_find_1stfragopt(skb, &prevhdr); + nexthdr = *prevhdr; + *nhoffp = prevhdr - skb->nh.raw; if ((err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0) goto drop; @@ -204,12 +82,6 @@ goto drop; } while (!err); - if (!decaps) { - memcpy(skb->nh.raw, tmp_hdr, hdr_len); - skb->nh.raw[nh_offset] = nexthdr; - skb->nh.ipv6h->payload_len = htons(hdr_len + skb->len - sizeof(struct ipv6hdr)); - } - /* Allocate new secpath or COW existing one. */ if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { kmem_cache_t *pool = skb->sp ? skb->sp->pool : secpath_cachep; @@ -243,7 +115,6 @@ netif_rx(skb); return -1; } else { - *nhoffp = nh_offset; return 1; } @@ -251,7 +122,6 @@ spin_unlock(&x->lock); xfrm_state_put(x); drop: - if (tmp_hdr) kfree(tmp_hdr); while (--xfrm_nr >= 0) xfrm_state_put(xfrm_vec[xfrm_nr].xvec); kfree_skb(skb); From davem@redhat.com Sun Jun 15 06:19:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 06:19:55 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FDJj2x019227 for ; Sun, 15 Jun 2003 06:19:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id GAA29416; Sun, 15 Jun 2003 06:15:00 -0700 Date: Sun, 15 Jun 2003 06:15:00 -0700 (PDT) Message-Id: <20030615.061500.78718230.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] fix ipv6 header handling of AH input. From: "David S. Miller" In-Reply-To: <20030615220632.0701a5cd.kazunori@miyazawa.org> References: <20030615220632.0701a5cd.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3258 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Sun, 15 Jun 2003 22:06:32 +0900 This patch fixes ipv6 header handling of ah input and moves the routine to clear mutable options. It reduces unnecessary header clearing when a packet has only ESP. This patch for linux-2.5.70 + CS1.1307 Thank you very much, I have applied your patch. From hdierks@us.ibm.com Sun Jun 15 07:32:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 07:33:13 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FEWp2x020150 for ; Sun, 15 Jun 2003 07:32:59 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5FEWjpS130992; Sun, 15 Jun 2003 10:32:45 -0400 Received: from d01ml065.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by northrelay03.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5FEWh6e126318; Sun, 15 Jun 2003 10:32:43 -0400 Importance: Normal Sensitivity: Subject: Re: e1000 performance hack for ppc64 (Power4) To: Anton Blanchard Cc: "Feldman, Scott" , "David S. Miller" , haveblue@us.ibm.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, "Nancy J Milliner" , "Ricardo C Gonzalez" , "Brian Twichell" , netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: "Herman Dierks" Date: Sun, 15 Jun 2003 09:32:34 -0500 X-MIMETrack: Serialize by Router on D01ML065/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/15/2003 10:32:44 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3259 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hdierks@us.ibm.com Precedence: bulk X-list: netdev Anton, I think the option described below is intended to cause the adapter to "get off on a cache line boundary" so when it restarts the DMA it will be aligned. This is for cases when the adapter has to get off, for exampe due to FIFO full, etc. Some adapters would get off on any boundary and that then causes perf issues when the DMA is restarted. This is a good option, but I don't think it addresses what we need here as the host needs to ensure a DMA starts on a cache line. Different adapter anyway, but I am just pointing out that even if e1000 had this it would not be the solution. Anton Blanchard on 06/13/2003 07:03:42 PM To: "Feldman, Scott" cc: "David S. Miller" , haveblue@us.ltcfwd.linux.ibm.com, Herman Dierks/Austin/IBM@IBMUS, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, Nancy J Milliner/Austin/IBM@IBMUS, Ricardo C Gonzalez/Austin/IBM@ibmus, Brian Twichell/Austin/IBM@IBMUS, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) > I thought the answer was no, so I double checked with a couple of > hardware guys, and the answer is still no. Hi Scott, Thats a pity, the e100 docs on sourceforge show it can do what we want, it would be nice if e1000 had this feature too :) 4.2.2 Read Align The Read Align feature is aimed to enhance performance in cache line oriented systems. Starting a PCI transaction in these systems on a non-cache line aligned address may result in low performance. To solve this performance problem, the controller can be configured to terminate Transmit DMA cycles on a cache line boundary, and start the next transaction on a cache line aligned address. This feature is enabled when the Read Align Enable bit is set in device Configure command (Section 6.4.2.3, "Configure (010b)"). If this bit is set, the device operates as follows: * When the device is close to running out of resources on the Transmit * DMA (in other words, the Transmit FIFO is almost full), it attempts to * terminate the read transaction on the nearest cache line boundary when * possible. * When the arbitration counters feature is enabled (maximum Transmit DMA * byte count value is set in configuration space), the device switches * to other pending DMAs on cache line boundary only. From hdierks@us.ibm.com Sun Jun 15 07:40:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 07:41:07 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FEew2x020533 for ; Sun, 15 Jun 2003 07:40:58 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e3.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5FEepE2167424; Sun, 15 Jun 2003 10:40:51 -0400 Received: from d01ml065.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by northrelay03.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5FEen6e101972; Sun, 15 Jun 2003 10:40:49 -0400 Importance: Normal Sensitivity: Subject: Re: e1000 performance hack for ppc64 (Power4) To: "David S. Miller" Cc: ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, "Nancy J Milliner" , "Ricardo C Gonzalez" , "Brian Twichell" , netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: "Herman Dierks" Date: Sun, 15 Jun 2003 09:40:41 -0500 X-MIMETrack: Serialize by Router on D01ML065/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/15/2003 10:40:49 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3260 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hdierks@us.ibm.com Precedence: bulk X-list: netdev Look folks, we run 40 to 48 GigE adapters in a p690 32 way on AIX and they basically all run at full speed so let me se you try that on most of these other boxes you are talking about. Same adapter, same hardware logic. I have also seen what many of these other boxes you talk about do when data or structures are not aligned on 64 bit boundaries. The PPC HW does not have those 64bit alignment issues. So, each machine has some warts. Have yet to see a perfect one. If you want a lot of PCI adapters on a box, it takes a number of bridge chips and other IO links to do that. Memory controllers like to deal with cache lines. For larger packets, like jumbo frames or large send (TSO), the few added DMA's is not an issue as the packets are so large the DMA soon get aligned and are not an issue. With TSO being the default, the small packet case becomes less important anyway. Its more an issue on 2.4 where TSO is not provided. We also want this to run well if someone does not want to use TSO. Its only the MTU 1500 case with non-TSO that we are discussing here so copying a few bytes is really not a big deal as the data is already in cache from copying into kernel. If it lets the adapter run at speed, thats what customers want and what we need. Granted, if the HW could deal with this we would not have to, but thats not the case today so I want to spend a few CPU cycles to get best performance. Again, if this is not done on other platforms, I don't understand why you care. If we have to do this for PPC port, fine. I have not seen any of you suggest a better solution that works and will not be a worse hack to TCP or other code. Anton tried various other ideas before we fell back to doing it the same way we did this in AIX. This code is very localized and is only used by platforms that need it. Thus I don't see the big issue here. Herman "David S. Miller" on 06/14/2003 01:08:50 AM To: ltd@cisco.com cc: anton@samba.org, haveblue@us.ltcfwd.linux.ibm.com, Herman Dierks/Austin/IBM@IBMUS, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, Nancy J Milliner/Austin/IBM@IBMUS, Ricardo C Gonzalez/Austin/IBM@ibmus, Brian Twichell/Austin/IBM@IBMUS, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: Lincoln Dale Date: Sat, 14 Jun 2003 15:52:35 +1000 can we have the TCP retransmit side take a performance hit if it needs to realign buffers? You don't understand, the person who mangles the packet must make the copy, not the person not doing the packet modifications. for a "high performance app" requiring gigabit-type speeds, ...we probably won't be using ppc64 and e1000 cards, yes, I agree :-) Anton, go to the local computer store and pick up some tg3 cards or a bunch of Taiwan specials :-) From davem@redhat.com Sun Jun 15 07:48:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 07:48:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FEmI2x020860 for ; Sun, 15 Jun 2003 07:48:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA29590; Sun, 15 Jun 2003 07:44:06 -0700 Date: Sun, 15 Jun 2003 07:44:05 -0700 (PDT) Message-Id: <20030615.074405.39168044.davem@redhat.com> To: hdierks@us.ibm.com Cc: ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, milliner@us.ibm.com, ricardoz@us.ibm.com, twichell@us.ibm.com, netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3261 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Herman Dierks" Date: Sun, 15 Jun 2003 09:40:41 -0500 With TSO being the default, the small packet case becomes less important anyway. This is a very narrow and unrealistic view of the situation. Every third packet your system will process for any connection will be an ACK, a small packet. Most database and web and database transactions happen using small packets for the transaction request. Look, if you're gonna sit here and just rant justifying this bogus behavior of your hardware, it is likely to go in one ear and out the other. Nobody wants to hear excuses. :) The fact is, this system handles sub-cacheline reads inefficiently even if a sequences of transactions are consequetive and to the same cache line and no coherency transactions occur to that cache line. That is dumb, and there is no arguing around this. You would be sensible to realize this, and accept it whilst others try to help you find a solution for your problem. From hch@lst.de Sun Jun 15 11:00:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 11:00:24 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FI0D2x022758 for ; Sun, 15 Jun 2003 11:00:15 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h5FI0BDC004937 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Sun, 15 Jun 2003 20:00:11 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h5FI0B3G004935 for netdev@oss.sgi.com; Sun, 15 Jun 2003 20:00:11 +0200 Date: Sun, 15 Jun 2003 20:00:11 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] switch iph5526_probe to initcalls Message-ID: <20030615180011.GA4922@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 3262 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev It's a pci driver so it has no business in Space.c p.s. it doesn't compile with or without the patch due to scsi issues.. --- 1.21/drivers/net/Space.c Mon Jun 9 18:52:43 2003 +++ edited/drivers/net/Space.c Sat Jun 14 21:32:48 2003 @@ -398,24 +398,6 @@ return -ENODEV; } -#ifdef CONFIG_NET_FC -static int fcif_probe(struct net_device *dev) -{ - if (dev->base_addr == -1) - return 1; - - if (1 -#ifdef CONFIG_IPHASE5526 - && iph5526_probe(dev) -#endif - && 1 ) { - return 1; /* -ENODEV or -EAGAIN would be more accurate. */ - } - return 0; -} -#endif /* CONFIG_NET_FC */ - - #ifdef CONFIG_ETHERTAP static struct net_device tap0_dev = { .name = "tap0", @@ -588,22 +570,6 @@ #define NEXT_DEV (&tr0_dev) #endif - -#ifdef CONFIG_NET_FC -static struct net_device fc1_dev = { - .name = "fc1", - .next = NEXT_DEV, - .init = fcif_probe -}; -static struct net_device fc0_dev = { - .name = "fc0", - .next = &fc1_dev, - .init = fcif_probe -}; -#undef NEXT_DEV -#define NEXT_DEV (&fc0_dev) -#endif - #ifdef CONFIG_SBNI static struct net_device sbni7_dev = { --- 1.25/drivers/net/fc/iph5526.c Tue Jun 10 00:49:03 2003 +++ edited/drivers/net/fc/iph5526.c Sat Jun 14 21:32:48 2003 @@ -239,19 +239,7 @@ static int __init iph5526_probe_pci(struct net_device *dev) { -#ifdef MODULE struct fc_info *fi = (struct fc_info *)dev->priv; -#else - struct fc_info *fi = fc[count]; - static int count; - int err; - - if (!fi) - return -ENODEV; - - fc_setup(dev); - count++; -#endif fi->dev = dev; dev->base_addr = fi->base_addr; dev->irq = fi->irq; @@ -4479,8 +4467,6 @@ return buf; } -#ifdef MODULE - #define NAMELEN 8 /* # of chars for storing dev->name */ static struct net_device *dev_fc[MAX_FC_CARDS]; @@ -4491,7 +4477,7 @@ static int scsi_registered; -int init_module(void) +static int __init iph5526_init(void) { int i = 0; @@ -4530,7 +4516,7 @@ return 0; } -void cleanup_module(void) +static void __exit iph5526_exit(void) { int i = 0; while(fc[i] != NULL) { @@ -4549,7 +4535,9 @@ if (scsi_registered == TRUE) scsi_unregister_host(&driver_template); } -#endif /* MODULE */ + +module_init(iph5526_init); +module_exit(iph5526_exit); void clean_up_memory(struct fc_info *fi) { From davem@redhat.com Sun Jun 15 11:02:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 11:02:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FI2I2x022924 for ; Sun, 15 Jun 2003 11:02:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA30208; Sun, 15 Jun 2003 10:58:11 -0700 Date: Sun, 15 Jun 2003 10:58:11 -0700 (PDT) Message-Id: <20030615.105811.55734399.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch iph5526_probe to initcalls From: "David S. Miller" In-Reply-To: <20030615180011.GA4922@lst.de> References: <20030615180011.GA4922@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3263 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Sun, 15 Jun 2003 20:00:11 +0200 It's a pci driver so it has no business in Space.c p.s. it doesn't compile with or without the patch due to scsi issues.. Can we put this patch on hold until that's resolved? Thanks. From hch@lst.de Sun Jun 15 11:05:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 11:05:24 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FI5H2x023427 for ; Sun, 15 Jun 2003 11:05:18 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h5FI5EDC005021 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 15 Jun 2003 20:05:14 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h5FI5EFn005019; Sun, 15 Jun 2003 20:05:14 +0200 Date: Sun, 15 Jun 2003 20:05:14 +0200 From: Christoph Hellwig To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch iph5526_probe to initcalls Message-ID: <20030615180514.GA5006@lst.de> References: <20030615180011.GA4922@lst.de> <20030615.105811.55734399.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030615.105811.55734399.davem@redhat.com> User-Agent: Mutt/1.3.28i X-Spam-Score: -5 () EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 3264 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev On Sun, Jun 15, 2003 at 10:58:11AM -0700, David S. Miller wrote: > From: Christoph Hellwig > Date: Sun, 15 Jun 2003 20:00:11 +0200 > > It's a pci driver so it has no business in Space.c > > p.s. it doesn't compile with or without the patch due to scsi issues.. > > Can we put this patch on hold until that's resolved? Well, we could. It's just that I doubt anyone will fix it anytime soon, the hardware doesn't seem to be widely used and it's hard to touch the driver without a barfbag nearby. (and you also applied the alloc_fcdev changes for it :)) From davem@redhat.com Sun Jun 15 11:08:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 11:08:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FI842x023735 for ; Sun, 15 Jun 2003 11:08:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id LAA30250; Sun, 15 Jun 2003 11:03:58 -0700 Date: Sun, 15 Jun 2003 11:03:58 -0700 (PDT) Message-Id: <20030615.110358.10308169.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch iph5526_probe to initcalls From: "David S. Miller" In-Reply-To: <20030615180514.GA5006@lst.de> References: <20030615180011.GA4922@lst.de> <20030615.105811.55734399.davem@redhat.com> <20030615180514.GA5006@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3265 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Sun, 15 Jun 2003 20:05:14 +0200 On Sun, Jun 15, 2003 at 10:58:11AM -0700, David S. Miller wrote: > p.s. it doesn't compile with or without the patch due to scsi issues.. > > Can we put this patch on hold until that's resolved? Well, we could. It's just that I doubt anyone will fix it anytime soon, the hardware doesn't seem to be widely used and it's hard to touch the driver without a barfbag nearby. Ugh, ok. At least it's impossible for you to break the tree with this change. :) From hch@lst.de Sun Jun 15 11:10:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 11:10:36 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FIAW2x024098 for ; Sun, 15 Jun 2003 11:10:32 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h5FIAUDC005142 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 15 Jun 2003 20:10:30 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h5FIAUam005140; Sun, 15 Jun 2003 20:10:30 +0200 Date: Sun, 15 Jun 2003 20:10:30 +0200 From: Christoph Hellwig To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch iph5526_probe to initcalls Message-ID: <20030615181030.GA5127@lst.de> References: <20030615180011.GA4922@lst.de> <20030615.105811.55734399.davem@redhat.com> <20030615180514.GA5006@lst.de> <20030615.110358.10308169.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030615.110358.10308169.davem@redhat.com> User-Agent: Mutt/1.3.28i X-Spam-Score: -4.5 () EMAIL_ATTRIBUTION,IN_REP_TO,REFERENCES,REPLY_WITH_QUOTES,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 3266 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev On Sun, Jun 15, 2003 at 11:03:58AM -0700, David S. Miller wrote: > At least it's impossible for you to break the tree > with this change. :) Take care, I could have smuggled something evil into Space.c.. From khc@pm.waw.pl Sun Jun 15 13:33:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 Jun 2003 13:33:24 -0700 (PDT) Received: from hq.pm.waw.pl (hq.pm.waw.pl [195.116.170.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5FKX72x026175 for ; Sun, 15 Jun 2003 13:33:07 -0700 Received: by hq.pm.waw.pl (Postfix, from userid 10) id 1187C3192; Sun, 15 Jun 2003 21:54:56 +0200 (CEST) Received: by defiant.pm.waw.pl (Postfix, from userid 500) id 9D15D3C7A0; Sun, 15 Jun 2003 21:32:38 +0200 (CEST) To: "David S. Miller" Cc: shemminger@osdl.org, greg@kroah.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] network hotplug via class_device/kobject References: <20030613164119.15209934.shemminger@osdl.org> <20030615.005055.55726223.davem@redhat.com> From: Krzysztof Halasa Date: 15 Jun 2003 21:32:38 +0200 In-Reply-To: <20030615.005055.55726223.davem@redhat.com> Message-ID: Lines: 20 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3267 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: khc@pm.waw.pl Precedence: bulk X-list: netdev "David S. Miller" writes: > Paranoid about some driver doing something like: > rtnl_lock(); register_netdevice(); unregister_netdevice(); > rtnl_unlock() BOOM > > These sorts of turds exist at least in two places: > > 1) drivers/net/wan/comx.c > 2) drivers/net/wan/hdlc_fr.c > > But it is pretty clear that these two drivers have been > tried by nobody in recent years. They both call into > {un,}register_netdevice without the RTNL semaphore held. Not sure about 1), but 2) calls (un)register_netdevice() with rtnl_lock, from ioctl. -- Krzysztof Halasa Network Administrator From vnuorval@tcs.hut.fi Mon Jun 16 02:23:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 02:23:11 -0700 (PDT) Received: from mail.tcs.hut.fi (mail.tcs.hut.fi [130.233.215.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5G9Mx2x022748 for ; Mon, 16 Jun 2003 02:23:00 -0700 Received: from rhea.tcs.hut.fi (rhea.tcs.hut.fi [130.233.215.147]) by mail.tcs.hut.fi (Postfix) with ESMTP id 247448000A3; Mon, 16 Jun 2003 12:22:59 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5G9Mx5L015593; Mon, 16 Jun 2003 12:22:59 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h5G9MvN3015589; Mon, 16 Jun 2003 12:22:57 +0300 Date: Mon, 16 Jun 2003 12:22:57 +0300 (EEST) From: Ville Nuorvala To: "David S. Miller" Cc: yoshfuji@linux-ipv6.org, Subject: Re: [patch] IPV6: Refcount leaks in udpv6_connect() In-Reply-To: <20030615.002656.35017399.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-377318441-1702587990-1055755024=:15528" Content-ID: X-archive-position: 3268 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---377318441-1702587990-1055755024=:15528 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: On Sun, 15 Jun 2003, David S. Miller wrote: > Your patch does not apply without rejects. > Please regenerate. Hmm, odd. Perhaps the inlining of the patch mangled it somehow. Anyhow, here's a bk treediff against CS 1.1318 as an attachment. Hope this applies :) Thanks, Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 ---377318441-1702587990-1055755024=:15528 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="udpv6_connect.patch" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: ATTACHMENT; FILENAME="udpv6_connect.patch" ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUuT0xEL25ldC9pcHY2L3Vk cC5jIGxpbnV4LTIuNS9uZXQvaXB2Ni91ZHAuYw0KLS0tIGxpbnV4LTIuNS5P TEQvbmV0L2lwdjYvdWRwLmMJTW9uIEp1biAxNiAxMjowMjowOCAyMDAzDQor KysgbGludXgtMi41L25ldC9pcHY2L3VkcC5jCVN1biBKdW4gMTUgMjM6MTA6 MDMgMjAwMw0KQEAgLTI5OSw5ICsyOTksMTAgQEANCiAJaWYgKGFkZHJfdHlw ZSA9PSBJUFY2X0FERFJfTUFQUEVEKSB7DQogCQlzdHJ1Y3Qgc29ja2FkZHJf aW4gc2luOw0KIA0KLQkJaWYgKF9faXB2Nl9vbmx5X3NvY2soc2spKQ0KLQkJ CXJldHVybiAtRU5FVFVOUkVBQ0g7DQotDQorCQlpZiAoX19pcHY2X29ubHlf c29jayhzaykpIHsNCisJCQllcnIgPSAtRU5FVFVOUkVBQ0g7DQorCQkJZ290 byBvdXQ7DQorCQl9DQogCQlzaW4uc2luX2ZhbWlseSA9IEFGX0lORVQ7DQog CQlzaW4uc2luX2FkZHIuc19hZGRyID0gZGFkZHItPnM2X2FkZHIzMlszXTsN CiAJCXNpbi5zaW5fcG9ydCA9IHVzaW4tPnNpbjZfcG9ydDsNCkBAIC0zMDks OCArMzEwLDggQEANCiAJCWVyciA9IHVkcF9jb25uZWN0KHNrLCAoc3RydWN0 IHNvY2thZGRyKikgJnNpbiwgc2l6ZW9mKHNpbikpOw0KIA0KIGlwdjRfY29u bmVjdGVkOg0KLQkJaWYgKGVyciA8IDApDQotCQkJcmV0dXJuIGVycjsNCisJ CWlmIChlcnIpDQorCQkJZ290byBvdXQ7DQogCQkNCiAJCWlwdjZfYWRkcl9z ZXQoJm5wLT5kYWRkciwgMCwgMCwgaHRvbmwoMHgwMDAwZmZmZiksIGluZXQt PmRhZGRyKTsNCiANCkBAIC0zMjMsNyArMzI0LDcgQEANCiAJCQlpcHY2X2Fk ZHJfc2V0KCZucC0+cmN2X3NhZGRyLCAwLCAwLCBodG9ubCgweDAwMDBmZmZm KSwNCiAJCQkJICAgICAgaW5ldC0+cmN2X3NhZGRyKTsNCiAJCX0NCi0JCXJl dHVybiAwOw0KKwkJZ290byBvdXQ7DQogCX0NCiANCiAJaWYgKGFkZHJfdHlw ZSZJUFY2X0FERFJfTElOS0xPQ0FMKSB7DQpAQCAtMzMxLDggKzMzMiw4IEBA DQogCQkgICAgdXNpbi0+c2luNl9zY29wZV9pZCkgew0KIAkJCWlmIChzay0+ c2tfYm91bmRfZGV2X2lmICYmDQogCQkJICAgIHNrLT5za19ib3VuZF9kZXZf aWYgIT0gdXNpbi0+c2luNl9zY29wZV9pZCkgew0KLQkJCQlmbDZfc29ja19y ZWxlYXNlKGZsb3dsYWJlbCk7DQotCQkJCXJldHVybiAtRUlOVkFMOw0KKwkJ CQllcnIgPSAtRUlOVkFMOw0KKwkJCQlnb3RvIG91dDsNCiAJCQl9DQogCQkJ c2stPnNrX2JvdW5kX2Rldl9pZiA9IHVzaW4tPnNpbjZfc2NvcGVfaWQ7DQog CQkJaWYgKCFzay0+c2tfYm91bmRfZGV2X2lmICYmDQpAQCAtMzQxLDggKzM0 MiwxMCBAQA0KIAkJfQ0KIA0KIAkJLyogQ29ubmVjdCB0byBsaW5rLWxvY2Fs IGFkZHJlc3MgcmVxdWlyZXMgYW4gaW50ZXJmYWNlICovDQotCQlpZiAoIXNr LT5za19ib3VuZF9kZXZfaWYpDQotCQkJcmV0dXJuIC1FSU5WQUw7DQorCQlp ZiAoIXNrLT5za19ib3VuZF9kZXZfaWYpIHsNCisJCQllcnIgPSAtRUlOVkFM Ow0KKwkJCWdvdG8gb3V0Ow0KKwkJfQ0KIAl9DQogDQogCWlwdjZfYWRkcl9j b3B5KCZucC0+ZGFkZHIsIGRhZGRyKTsNCkBAIC0zNzksMzEgKzM4MiwzMyBA QA0KIA0KIAlpZiAoKGVyciA9IGRzdC0+ZXJyb3IpICE9IDApIHsNCiAJCWRz dF9yZWxlYXNlKGRzdCk7DQotCQlmbDZfc29ja19yZWxlYXNlKGZsb3dsYWJl bCk7DQotCQlyZXR1cm4gZXJyOw0KKwkJZ290byBvdXQ7DQogCX0NCiANCiAJ LyogZ2V0IHRoZSBzb3VyY2UgYWRkcmVzcyB1c2VkIGluIHRoZSBhcHByb3By aWF0ZSBkZXZpY2UgKi8NCiANCiAJZXJyID0gaXB2Nl9nZXRfc2FkZHIoZHN0 LCBkYWRkciwgJmZsLmZsNl9zcmMpOw0KIA0KLQlpZiAoZXJyID09IDApIHsN Ci0JCWlmIChpcHY2X2FkZHJfYW55KCZucC0+c2FkZHIpKQ0KLQkJCWlwdjZf YWRkcl9jb3B5KCZucC0+c2FkZHIsICZmbC5mbDZfc3JjKTsNCi0NCi0JCWlm IChpcHY2X2FkZHJfYW55KCZucC0+cmN2X3NhZGRyKSkgew0KLQkJCWlwdjZf YWRkcl9jb3B5KCZucC0+cmN2X3NhZGRyLCAmZmwuZmw2X3NyYyk7DQotCQkJ aW5ldC0+cmN2X3NhZGRyID0gTE9PUEJBQ0s0X0lQVjY7DQotCQl9DQorCWlm IChlcnIpIHsNCisJCWRzdF9yZWxlYXNlKGRzdCk7DQorCQlnb3RvIG91dDsN CisJfQ0KIA0KLQkJaXA2X2RzdF9zdG9yZShzaywgZHN0LA0KLQkJCSAgICAg ICFpcHY2X2FkZHJfY21wKCZmbC5mbDZfZHN0LCAmbnAtPmRhZGRyKSA/DQot CQkJICAgICAgJm5wLT5kYWRkciA6IE5VTEwpOw0KKwlpZiAoaXB2Nl9hZGRy X2FueSgmbnAtPnNhZGRyKSkNCisJCWlwdjZfYWRkcl9jb3B5KCZucC0+c2Fk ZHIsICZmbC5mbDZfc3JjKTsNCiANCi0JCXNrLT5za19zdGF0ZSA9IFRDUF9F U1RBQkxJU0hFRDsNCisJaWYgKGlwdjZfYWRkcl9hbnkoJm5wLT5yY3Zfc2Fk ZHIpKSB7DQorCQlpcHY2X2FkZHJfY29weSgmbnAtPnJjdl9zYWRkciwgJmZs LmZsNl9zcmMpOw0KKwkJaW5ldC0+cmN2X3NhZGRyID0gTE9PUEJBQ0s0X0lQ VjY7DQogCX0NCi0JZmw2X3NvY2tfcmVsZWFzZShmbG93bGFiZWwpOw0KIA0K KwlpcDZfZHN0X3N0b3JlKHNrLCBkc3QsDQorCQkgICAgICAhaXB2Nl9hZGRy X2NtcCgmZmwuZmw2X2RzdCwgJm5wLT5kYWRkcikgPw0KKwkJICAgICAgJm5w LT5kYWRkciA6IE5VTEwpOw0KKw0KKwlzay0+c2tfc3RhdGUgPSBUQ1BfRVNU QUJMSVNIRUQ7DQorb3V0Og0KKwlmbDZfc29ja19yZWxlYXNlKGZsb3dsYWJl bCk7DQogCXJldHVybiBlcnI7DQogfQ0KIA0K ---377318441-1702587990-1055755024=:15528-- From davem@redhat.com Mon Jun 16 05:07:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 05:07:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GC7P2x029383 for ; Mon, 16 Jun 2003 05:07:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA31798; Mon, 16 Jun 2003 05:02:08 -0700 Date: Mon, 16 Jun 2003 05:02:07 -0700 (PDT) Message-Id: <20030616.050207.104061458.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [patch] IPV6: Refcount leaks in udpv6_connect() From: "David S. Miller" In-Reply-To: References: <20030615.002656.35017399.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3269 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Mon, 16 Jun 2003 12:22:57 +0300 (EEST) Hmm, odd. Perhaps the inlining of the patch mangled it somehow. Anyhow, here's a bk treediff against CS 1.1318 as an attachment. Thanks, I've applied your fix. From niv@us.ibm.com Mon Jun 16 09:19:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 09:19:22 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GGJC2x016771 for ; Mon, 16 Jun 2003 09:19:12 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GGIWll226908; Mon, 16 Jun 2003 12:18:33 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GGIKFD019974; Mon, 16 Jun 2003 10:18:20 -0600 Message-ID: <3EEDEDA1.7010401@us.ibm.com> Date: Mon, 16 Jun 2003 09:17:37 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Herman Dierks CC: "David S. Miller" , ltd@cisco.com, anton@samba.org, haveblue@us.ibm.com, scott.feldman@intel.com, dwg@au1.ibm.com, linux-kernel@vger.kernel.org, Nancy J Milliner , Ricardo C Gonzalez , Brian Twichell , netdev@oss.sgi.com Subject: Re: e1000 performance hack for ppc64 (Power4) References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3270 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Herman Dierks wrote: > Look folks, we run 40 to 48 GigE adapters in a p690 32 way on AIX and > they basically all run at full speed so let me se you try that on most of > these other boxes you are talking about. Same adapter, same hardware > logic. FWIW, I think that's pretty cool. Not easy to do. :) > For larger packets, like jumbo frames or large send (TSO), the few added > DMA's is not an issue as the packets are so large the DMA soon get aligned > and are not an issue. With TSO being the default, the small packet case > becomes less important anyway. Its more an issue on 2.4 where TSO is not > provided. We also want this to run well if someone does not want to use > TSO. Slightly off-topic, but TSO being enabled and TSO being used are two different things, right? Ditto jumbo frames..How often is this the actual env in real world situations? I'm concerned that this is more typical of development testing/performance testing environments. > Its only the MTU 1500 case with non-TSO that we are discussing here so Which still is the pretty important case, I think.. > copying a few bytes is really not a big deal as the data is already in > cache from copying into kernel. If it lets the adapter run at speed, thats > what customers want and what we need. Yep. > Granted, if the HW could deal with this we would not have to, but thats not > the case today so I want to spend a few CPU cycles to get best performance. > Again, if this is not done on other platforms, I don't understand why you > care. Still would be nice to put in the best solution possible, i.e. address it for the broadest set of affected pieces and minimizing the impact.. > If we have to do this for PPC port, fine. I have not seen any of you Hope it doesn't have to come to that..It would be nice to see it in the mainline kernel. Regardless of platform, distro, etc.. these users are still people who are taking the time, the effort to adopt Linux and sometimes in environments and situations that are pretty critical. Change and innovation are difficult activities to engage in some places, and anything we can do to make this a no-brainer solution for them, and their decisions shine, thats gotta be worth something to go the extra mile for :) thanks, Nivedita From scott.feldman@intel.com Mon Jun 16 11:21:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 11:21:39 -0700 (PDT) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GILS2x006123 for ; Mon, 16 Jun 2003 11:21:28 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by caduceus.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5GIDHm27632 for ; Mon, 16 Jun 2003 18:13:20 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by talaria.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5GILoe21604 for ; Mon, 16 Jun 2003 18:21:51 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061611211625035 ; Mon, 16 Jun 2003 11:21:16 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 16 Jun 2003 11:21:16 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e1000 performance hack for ppc64 (Power4) Date: Mon, 16 Jun 2003 11:21:16 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e1000 performance hack for ppc64 (Power4) Thread-Index: AcMzSuDMkewgo8NbQ/eWd2USmzVnawA6Bt3g From: "Feldman, Scott" To: "Herman Dierks" , "David S. Miller" Cc: , , , , , "Nancy J Milliner" , "Ricardo C Gonzalez" , "Brian Twichell" , X-OriginalArrivalTime: 16 Jun 2003 18:21:16.0611 (UTC) FILETIME=[13787130:01C33434] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5GILS2x006123 X-archive-position: 3271 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev Herman wrote: > Its only the MTU 1500 case with non-TSO that we are > discussing here so copying a few bytes is really not a big > deal as the data is already in cache from copying into > kernel. If it lets the adapter run at speed, thats what > customers want and what we need. Granted, if the HW could > deal with this we would not have to, but thats not the case > today so I want to spend a few CPU cycles to get best > performance. Again, if this is not done on other platforms, I > don't understand why you care. I care because adding the arch-specific hack creates a maintenance issue for me. -scott From haveblue@us.ibm.com Mon Jun 16 11:32:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 11:32:27 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GIWC2x006527 for ; Mon, 16 Jun 2003 11:32:22 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GIW6pS226098; Mon, 16 Jun 2003 14:32:06 -0400 Received: from DYN318212.beaverton.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GIW1HO025788; Mon, 16 Jun 2003 14:32:02 -0400 Subject: RE: e1000 performance hack for ppc64 (Power4) From: Dave Hansen To: "Feldman, Scott" Cc: Herman Dierks , "David S. Miller" , ltd@cisco.com, Anton Blanchard , dwg@au1.ibm.com, Linux Kernel Mailing List , Nancy J Milliner , Ricardo C Gonzalez , Brian Twichell , netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1055788229.1609.4.camel@nighthawk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 16 Jun 2003 11:30:29 -0700 Content-Transfer-Encoding: 7bit X-archive-position: 3272 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: netdev On Mon, 2003-06-16 at 11:21, Feldman, Scott wrote: > Herman wrote: > > Its only the MTU 1500 case with non-TSO that we are > > discussing here so copying a few bytes is really not a big > > deal as the data is already in cache from copying into > > kernel. If it lets the adapter run at speed, thats what > > customers want and what we need. Granted, if the HW could > > deal with this we would not have to, but thats not the case > > today so I want to spend a few CPU cycles to get best > > performance. Again, if this is not done on other platforms, I > > don't understand why you care. > > I care because adding the arch-specific hack creates a maintenance issue > for Scott, would you be pleased if something was implemented out of the driver, in generic net code? Something that all the drivers could use, even if nothing but e1000 used it for now. -- Dave Hansen haveblue@us.ibm.com From scott.feldman@intel.com Mon Jun 16 11:56:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 11:56:11 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GIu62x006999 for ; Mon, 16 Jun 2003 11:56:07 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5GIpRR04857 for ; Mon, 16 Jun 2003 18:51:27 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by talaria.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5GIucM23130 for ; Mon, 16 Jun 2003 18:56:39 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061611560304854 ; Mon, 16 Jun 2003 11:56:03 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 16 Jun 2003 11:56:03 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e1000 performance hack for ppc64 (Power4) Date: Mon, 16 Jun 2003 11:56:03 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e1000 performance hack for ppc64 (Power4) Thread-Index: AcM0NZhCXHVlgXspR9CtCHUR7FqVhgAAfjpQ From: "Feldman, Scott" To: "Dave Hansen" Cc: "Herman Dierks" , "David S. Miller" , , "Anton Blanchard" , , "Linux Kernel Mailing List" , "Nancy J Milliner" , "Ricardo C Gonzalez" , "Brian Twichell" , X-OriginalArrivalTime: 16 Jun 2003 18:56:03.0535 (UTC) FILETIME=[EF5FC5F0:01C33438] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5GIu62x006999 X-archive-position: 3273 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > Scott, would you be pleased if something was implemented out > of the driver, in generic net code? Something that all the > drivers could use, even if nothing but e1000 used it for now. I suppose the driver could unconditionally call something like skb_realign_for_broken_hw, which is a nop on non-broken archs, but would it make more sense to not have the driver mess with the skb at all? -scott From greearb@candelatech.com Mon Jun 16 12:10:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 12:10:12 -0700 (PDT) Received: from grok.yi.org (IDENT:JohM8h0NmuhXzQ3hi5LlgvvNlovcIQqb@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GJA52x007549 for ; Mon, 16 Jun 2003 12:10:06 -0700 Received: from candelatech.com (IDENT:XSfiKNcOmnOjm/+A6mo9skvEnUMdPrK+@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5GJA1c29851 for ; Mon, 16 Jun 2003 12:10:02 -0700 Message-ID: <3EEE1609.1020809@candelatech.com> Date: Mon, 16 Jun 2003 12:10:01 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "'netdev@oss.sgi.com'" Subject: Routing issue in a strange configuration. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3274 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev I have a machine with eth1 IP 10.3.1.4 and eth2 with 10.3.2.4 I am using source-based routing, and have the eth1 & 2 ports connected to another machine which is acting as a route (the other machine has 10.3.1.1 and 10.3.2.1 IP addresses). I run ping with the -I option to bind it to eth1, but instead of sending the arp and/or ICMP request to the gateway, it instead arps for the IP on eth2. The machines are running RedHat 9, and the problem exists in the default 2.4.20-8 kernel. I have not tried other kernels yet, so if you think this is a RedHat only issue, I can try the stock kernel. Here is the output from the machine that is attempting to send the traffic: [root@localhost root]# ifconfig eth0 Link encap:Ethernet HWaddr 00:03:47:2B:39:CA inet addr:192.168.1.22 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1362 errors:0 dropped:0 overruns:0 frame:0 TX packets:1345 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:80208 (78.3 Kb) TX bytes:1705206 (1.6 Mb) Interrupt:11 Base address:0xdf00 Memory:feafe000-feafe038 eth1 Link encap:Ethernet HWaddr 00:03:47:2B:39:CB inet addr:10.3.1.4 Bcast:10.3.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:377 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:400 RX bytes:596 (596.0 b) TX bytes:16338 (15.9 Kb) Interrupt:11 Base address:0xde80 Memory:feafd000-feafd038 eth2 Link encap:Ethernet HWaddr 00:50:C2:11:32:64 inet addr:10.3.2.4 Bcast:10.3.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:3 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:11 Base address:0xbc00 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1418 errors:0 dropped:0 overruns:0 frame:0 TX packets:1418 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1938073 (1.8 Mb) TX bytes:1938073 (1.8 Mb) [root@localhost root]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.3.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 10.3.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.1.5 0.0.0.0 UG 0 0 0 eth0 [root@localhost root]# ip ru 0: from all lookup local 32758: from 10.3.2.4 lookup 2 32759: from 10.3.1.4 lookup 1 32766: from all lookup main 32767: from all lookup 253 [root@localhost root]# ip route show table 1 10.3.1.0/24 via 10.3.1.4 dev eth1 default via 10.3.1.1 dev eth1 [root@localhost root]# ip route show table 2 10.3.2.0/24 via 10.3.2.4 dev eth2 default via 10.3.2.1 dev eth2 [root@localhost root]# ping -I eth1 10.3.1.1 PING 10.3.1.1 (10.3.1.1) from 10.3.1.4 eth1: 56(84) bytes of data. 64 bytes from 10.3.1.1: icmp_seq=1 ttl=64 time=0.167 ms 64 bytes from 10.3.1.1: icmp_seq=2 ttl=64 time=0.087 ms # The other interface on the router machine (same machine as I just pinged above) [root@localhost root]# ping -I eth1 10.3.2.1 PING 10.3.2.1 (10.3.2.1) from 10.3.1.4 eth1: 56(84) bytes of data. From 10.3.1.4 icmp_seq=1 Destination Host Unreachable From 10.3.1.4 icmp_seq=3 Destination Host Unreachable # It is NOT using the default gateway for this traffic, but is instead # just trying to ARP. [root@localhost root]# tcpdump -n -i eth1 tcpdump: listening on eth1 11:56:19.788336 arp who-has 10.3.2.1 tell 10.3.1.4 11:56:20.788134 arp who-has 10.3.2.1 tell 10.3.1.4 11:56:21.788149 arp who-has 10.3.2.1 tell 10.3.1.4 11:56:22.788379 arp who-has 10.3.2.1 tell 10.3.1.4 -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From andre@skjellin.no Mon Jun 16 12:51:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 12:51:17 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GJp12x008262 for ; Mon, 16 Jun 2003 12:51:02 -0700 Received: (qmail 27371 invoked by uid 1006); 16 Jun 2003 19:53:32 -0000 Received: from andre@skjellin.no by ns1 by uid 1003 with qmail-scanner-1.16 (sophie: 2.14/3.69. spamassassin: 2.55. Clear:. Processed in 0.024541 secs); 16 Jun 2003 19:53:32 -0000 Received: from slask.tomt.net (HELO slurv.ws.pasop.tomt.net) (andre@skjellin.no@217.8.136.222) by mail.skjellin.no with SMTP; 16 Jun 2003 19:53:32 -0000 Subject: IPv6 bugs introduced in 2.4.21 From: Andre Tomt To: netdev@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 Organization: Message-Id: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4- Date: 16 Jun 2003 21:50:48 +0200 Content-Transfer-Encoding: 8bit X-archive-position: 3275 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@skjellin.no Precedence: bulk X-list: netdev [PS, I'm not subscribed to netdev, CC would be nice - is there any way to get subscribed?] Hi I mailed you guys a little while ago on the "unable to use SOMENETWORK::0000 as a nexthop gateway" bug in 2.4.21-pre/rc a while ago. It is still present in 2.4.21, rendering the "first" /128 of a arbitrary prefixlen unusable - :0000. This is especially bad with /127 tunnels, rendering :0000 and :0001 unusable). But! There is one more oddity in 2.4.21, not present in 2.4.20.. The lower part of a /127 network is somehow, strangely routed to lo - observe.. * ed-gw1 configuration running 2.4.21-s2 (internal vendor tree with megaraid, md and some netdrv updates, but pristine behaves the same) * relevant routing info 2001:730:3::1:2a/127 via :: dev aorta proto kernel metric 256 mtu 1480 advmss 1420 2a is Aorta's end of the /127, 2b is our end. from a host behind some routers: root@kvass ~ # traceroute6 2001:730:3::1:2a traceroute to 2001:730:3::1:2a (2001:730:3::1:2a) from 2001:730:f:3::2, 30 hops max, 16 byte packets 1 slask-fe-1.ws.pasop.tomt.net (2001:730:f:3::1) 2.46 ms 0.543 ms 0.455 ms 2 casablanca-fe-1.co.pasop.tomt.net (2001:730:f:7::1) 0.941 ms 0.622 ms 0.567 ms 3 ed-gw1-si0-0.sand-osl.skjellin.net (2001:730:f:ffff::) 6.56 ms 5.177 ms 5.087 ms ^-- hmmm. this was supposed to end at Aorta, not here. root@kvass ~ # traceroute6 2001:730:3::1:2b traceroute to 2001:730:3::1:2b (2001:730:3::1:2b) from 2001:730:f:3::2, 30 hops max, 16 byte packets 1 slask-fe-1.ws.pasop.tomt.net (2001:730:f:3::1) 0.797 ms 0.533 ms 0.455 ms 2 casablanca-fe-1.co.pasop.tomt.net (2001:730:f:7::1) 1.077 ms 0.623 ms 0.558 ms 3 skjellin-gw1.no.ipv6.aorta.net (2001:730:3::1:2b) 5.068 ms 18.271 ms 5.182 ms ^-- alright! from ed-gw1: root@ed-gw1 ~ # traceroute6 2001:730:3::1:2a traceroute to 2001:730:3::1:2a (2001:730:3::1:2a) from ::1, 30 hops max, 16 byte packets 1 ip6-localhost (::1) 0.108 ms 0.099 ms 0.033 ms ^-- ehem.. this isn't Aorta, is it? root@ed-gw1 ~ # traceroute6 2001:730:3::1:2b traceroute to 2001:730:3::1:2b (2001:730:3::1:2b) from ::1, 30 hops max, 16 byte packets 1 skjellin-gw1.no.ipv6.aorta.net (2001:730:3::1:2b) 0.101 ms 0.089 ms 0.103 ms ^-- alright! -- Mvh, André Tomt andre@skjellin.no From janiceg@us.ibm.com Mon Jun 16 13:33:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 13:33:25 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GKWx2x009316 for ; Mon, 16 Jun 2003 13:33:07 -0700 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GKWf2R178270; Mon, 16 Jun 2003 16:32:42 -0400 Received: from austin.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay01.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GKWe4G137898; Mon, 16 Jun 2003 14:32:40 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5GKWcGg034718; Mon, 16 Jun 2003 15:32:38 -0500 Received: from us.ibm.com (dyn95394201.austin.ibm.com [9.53.94.201]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id PAA20674; Mon, 16 Jun 2003 15:32:37 -0500 Message-ID: <3EEE28DE.6040808@us.ibm.com> Date: Mon, 16 Jun 2003 15:30:22 -0500 From: Janice M Girouard Organization: IBM Linux Technology Center - Network Device Drivers User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com CC: stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, Jeff Garzik , davem@redhat.com Subject: patch for common networking error messages Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3276 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: janiceg@us.ibm.com Precedence: bulk X-list: netdev Below is a patch that demonstrates standard messages for ethernet device drivers. I would like your feedback on the concept of standard network messages, and any suggestions for messages to include. The intent of the standard message change is to: 1) Ensure key events are communicated to user space in a predictable way, enabling automated diagnostic systems or error log analysis, 2) Reduce the number of puzzling messages that are logged -- in this case, by replacing them with standard messages, and/or 3) Identify the device (or driver name) that is responsible for the error. The patch includes changes for two drivers, the e1000 and tg3, to provide a concrete example of the concept. Below is a snapshot of an error log, with the new messages: Jun 4 14:54:06 dyn95394175 kernel: e1000: Intel(R) PRO/1000 Network Driver - version 5.0.43-k3 Jun 4 14:54:06 dyn95394175 kernel: Copyright (c) 1999-2003 Intel Corporation. Jun 4 14:54:06 dyn95394175 kernel: eth2: Intel(R) PRO/1000 Network Connection Jun 4 14:54:06 dyn95394175 kernel: eth2: scatter/gather I/O enabled Jun 4 14:54:06 dyn95394175 kernel: eth2: all IP checksums on transmit enabled Jun 4 14:54:06 dyn95394175 kernel: eth3: Intel(R) PRO/1000 Network Connection Jun 4 14:54:06 dyn95394175 kernel: eth3: scatter/gather I/O enabled Jun 4 14:54:06 dyn95394175 kernel: eth3: all IP checksums on transmit enabled ... Jun 4 14:54:06 dyn95394175 kernel: tg3: Broadcom Tigon3 ethernet driver - version 1.5 ... Below is the text for the most basic standard messages: EMSG_NET_LINK_FAIL "%s: transient problem: link error detected - MII status %x\n" EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n" EMSG_NET_HUNG "%s: software failure: ethernet controller hung\n" EMSG_NET_RX_ERR "%s: transient problem: packet receive error, rx_errors = %ld\n" EMSG_NET_TX_ERR "%s: transient problem: packet transmit error, tx_errors = %ld\n" EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n" EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n" EMSG_NET_SGATHER "%s: scatter/gather I/O enabled\n" EMSG_NET_NO_SGATHER "%s: performance event: scatter/gather I/O disabled\n" EMSG_NET_HW_CSUMS "%s: all IP checksums on transmit enabled\n" EMSG_NET_CSUMS "%s: TCP/UDP over IPv6 checksums on transmit enabled\n" EMSG_NET_NO_CSUMS "%s: performance event: IP checksums on transmit disabled\n" Janice Girouard janiceg@us.ibm.com =================================================== diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_hw.c linux-2.5.69.newMsgs/drivers/net/e1000/e1000_hw.c --- linux-2.5.69.orig/drivers/net/e1000/e1000_hw.c 2003-06-04 13:24:46.000000000 -0500 +++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_hw.c 2003-06-04 13:14:58.000000000 -0500 @@ -31,6 +31,7 @@ */ #include "e1000_hw.h" +#include static int32_t e1000_set_phy_type(struct e1000_hw *hw); static void e1000_phy_init_script(struct e1000_hw *hw); @@ -468,7 +469,7 @@ * be initialized based on a value in the EEPROM. */ if(e1000_read_eeprom(hw, EEPROM_INIT_CONTROL2_REG, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } @@ -666,7 +667,11 @@ hw->autoneg_failed = 1; ret_val = e1000_check_for_link(hw); if(ret_val < 0) { + uint16_t mii_status_reg; DEBUGOUT("Error while checking for link\n"); + e1000_read_phy_reg( hw, PHY_STATUS, &mii_status_reg); + DEBUGOUT1(EMSG_NET_LINK_FAIL, hw->back->adapter->netdev->name, + mii_status_reg); return ret_val; } hw->autoneg_failed = 0; @@ -730,7 +735,7 @@ msec_delay(15); if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PAGE_SELECT, 0x0000) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -746,29 +751,29 @@ /* Disable SmartSpeed */ if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CONFIG, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data &= ~IGP01E1000_PSCFR_SMART_SPEED; if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CONFIG, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } /* Set auto Master/Slave resolution process */ if(e1000_read_phy_reg(hw, PHY_1000T_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data &= ~CR_1000T_MS_ENABLE; if(e1000_write_phy_reg(hw, PHY_1000T_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -779,14 +784,14 @@ hw->mdix = 1; if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } else { /* Enable CRS on TX. This must be set for half-duplex operation. */ if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= M88E1000_PSCR_ASSERT_CRS_ON_TX; @@ -826,7 +831,7 @@ if(hw->disable_polarity_correction == 1) phy_data |= M88E1000_PSCR_POLARITY_REVERSAL; if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -834,7 +839,7 @@ * to 25MHz clock. */ if(e1000_read_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= M88E1000_EPSCR_TX_CLK_25; @@ -847,7 +852,7 @@ M88E1000_EPSCR_SLAVE_DOWNSHIFT_1X); if(e1000_write_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } @@ -855,7 +860,7 @@ /* SW Reset the PHY so all changes take effect */ ret_val = e1000_phy_reset(hw); if(ret_val < 0) { - DEBUGOUT("Error Resetting the PHY\n"); + DEBUGOUT1(EMSG_DEV_SW_RESET, hw->back->adapter->netdev->name); return ret_val; } } @@ -899,12 +904,12 @@ * the Auto Neg Restart bit in the PHY control register. */ if(e1000_read_phy_reg(hw, PHY_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG); if(e1000_write_phy_reg(hw, PHY_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -933,11 +938,11 @@ */ for(i = 0; i < 10; i++) { if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(phy_data & MII_SR_LINK_STATUS) { @@ -988,13 +993,13 @@ /* Read the MII Auto-Neg Advertisement Register (Address 4). */ if(e1000_read_phy_reg(hw, PHY_AUTONEG_ADV, &mii_autoneg_adv_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } /* Read the MII 1000Base-T Control Register (Address 9). */ if(e1000_read_phy_reg(hw, PHY_1000T_CTRL, &mii_1000t_ctrl_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1103,14 +1108,14 @@ } if(e1000_write_phy_reg(hw, PHY_AUTONEG_ADV, mii_autoneg_adv_reg) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } DEBUGOUT1("Auto-Neg Advertising %x\n", mii_autoneg_adv_reg); if(e1000_write_phy_reg(hw, PHY_1000T_CTRL, mii_1000t_ctrl_reg) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } return 0; @@ -1150,7 +1155,7 @@ /* Read the MII Control Register. */ if(e1000_read_phy_reg(hw, PHY_CTRL, &mii_ctrl_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1199,7 +1204,7 @@ if (hw->phy_type == e1000_phy_m88) { if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1208,7 +1213,7 @@ */ phy_data &= ~M88E1000_PSCR_AUTO_X_MODE; if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } DEBUGOUT1("M88E1000 PSCR: %x \n", phy_data); @@ -1220,7 +1225,7 @@ * forced whenever speed or duplex are forced. */ if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1228,14 +1233,14 @@ phy_data &= ~IGP01E1000_PSCR_FORCE_MDI_MDIX; if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } /* Write back the modified PHY MII control register. */ if(e1000_write_phy_reg(hw, PHY_CTRL, mii_ctrl_reg) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } udelay(1); @@ -1258,11 +1263,11 @@ * to be set. */ if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(mii_status_reg & MII_SR_LINK_STATUS) break; @@ -1285,11 +1290,11 @@ * to be set. */ if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } @@ -1301,12 +1306,12 @@ * defaults back to a 2.5MHz clock when the PHY is reset. */ if(e1000_read_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= M88E1000_EPSCR_TX_CLK_25; if(e1000_write_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1314,12 +1319,12 @@ * TX. This must be set for both full and half duplex operation. */ if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= M88E1000_PSCR_ASSERT_CRS_ON_TX; if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } } @@ -1379,7 +1384,7 @@ */ if (hw->phy_type == e1000_phy_igp) { if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(phy_data & IGP01E1000_PSSR_FULL_DUPLEX) ctrl |= E1000_CTRL_FD; @@ -1398,7 +1403,7 @@ ctrl |= E1000_CTRL_SPD_100; } else { if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(phy_data & M88E1000_PSSR_DPLX) ctrl |= E1000_CTRL_FD; @@ -1533,11 +1538,11 @@ * some "sticky" (latched) bits. */ if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error \n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) { - DEBUGOUT("PHY Read Error \n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1549,11 +1554,11 @@ * negotiated. */ if(e1000_read_phy_reg(hw, PHY_AUTONEG_ADV, &mii_nway_adv_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_LP_ABILITY, &mii_nway_lp_ability_reg) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1735,11 +1740,11 @@ * Read the register twice since the link bit is sticky. */ if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } @@ -1798,7 +1803,7 @@ */ if(hw->tbi_compatibility_en) { if(e1000_read_phy_reg(hw, PHY_LP_ABILITY, &lp_capability) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(lp_capability & (NWAY_LPAR_10T_HD_CAPS | @@ -1941,11 +1946,11 @@ * Complete bit to be set. */ if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(phy_data & MII_SR_AUTONEG_COMPLETE) { @@ -2286,7 +2291,7 @@ if((hw->mac_type == e1000_82541) || (hw->mac_type == e1000_82547)) { if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PAGE_SELECT, 0x0000) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return; } @@ -2315,12 +2320,12 @@ DEBUGFUNC("e1000_phy_reset"); if(e1000_read_phy_reg(hw, PHY_CTRL, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } phy_data |= MII_CR_RESET; if(e1000_write_phy_reg(hw, PHY_CTRL, phy_data) < 0) { - DEBUGOUT("PHY Write Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } udelay(1); @@ -2346,13 +2351,13 @@ /* Read the PHY ID Registers to identify which PHY is onboard. */ if(e1000_read_phy_reg(hw, PHY_ID1, &phy_id_high) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } hw->phy_id = (uint32_t) (phy_id_high << 16); udelay(20); if(e1000_read_phy_reg(hw, PHY_ID2, &phy_id_low) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } hw->phy_id |= (uint32_t) (phy_id_low & PHY_REVISION_MASK); @@ -2406,7 +2411,7 @@ ret_val = 0; } while(0); - if(ret_val < 0) DEBUGOUT("PHY Write Error\n"); + if(ret_val < 0) DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name); return ret_val; } @@ -2566,11 +2571,11 @@ } if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } if((phy_data & MII_SR_LINK_STATUS) != MII_SR_LINK_STATUS) { @@ -3121,7 +3126,7 @@ for(i = 0; i < (EEPROM_CHECKSUM_REG + 1); i++) { if(e1000_read_eeprom(hw, i, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } checksum += eeprom_data; @@ -3153,14 +3158,14 @@ for(i = 0; i < EEPROM_CHECKSUM_REG; i++) { if(e1000_read_eeprom(hw, i, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } checksum += eeprom_data; } checksum = (uint16_t) EEPROM_SUM - checksum; if(e1000_write_eeprom(hw, EEPROM_CHECKSUM_REG, 1, &checksum) < 0) { - DEBUGOUT("EEPROM Write Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_WRITE, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } return 0; @@ -3381,7 +3386,7 @@ /* Get word 0 from EEPROM */ if(e1000_read_eeprom(hw, offset, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } /* Save word 0 in upper half of part_num */ @@ -3389,7 +3394,7 @@ /* Get word 1 from EEPROM */ if(e1000_read_eeprom(hw, ++offset, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } /* Save word 1 in lower half of part_num */ @@ -3415,7 +3420,7 @@ for(i = 0; i < NODE_ADDRESS_SIZE; i += 2) { offset = i >> 1; if(e1000_read_eeprom(hw, offset, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } hw->perm_mac_addr[i] = (uint8_t) (eeprom_data & 0x00FF); @@ -3715,7 +3720,7 @@ hw->ledctl_mode2 = hw->ledctl_default; if(e1000_read_eeprom(hw, EEPROM_ID_LED_SETTINGS, 1, &eeprom_data) < 0) { - DEBUGOUT("EEPROM Read Error\n"); + DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name); return -E1000_ERR_EEPROM; } if((eeprom_data== ID_LED_RESERVED_0000) || @@ -3807,7 +3812,7 @@ E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode1); break; default: - DEBUGOUT("Invalid device ID\n"); + DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name); return -E1000_ERR_CONFIG; } return 0; @@ -3849,7 +3854,7 @@ E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_default); break; default: - DEBUGOUT("Invalid device ID\n"); + DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name); return -E1000_ERR_CONFIG; } return 0; @@ -3902,7 +3907,7 @@ E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode2); break; default: - DEBUGOUT("Invalid device ID\n"); + DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name); return -E1000_ERR_CONFIG; } return 0; @@ -3955,7 +3960,7 @@ E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode1); break; default: - DEBUGOUT("Invalid device ID\n"); + DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name); return -E1000_ERR_CONFIG; } return 0; @@ -4468,14 +4473,14 @@ if(hw->phy_type == e1000_phy_igp) { if(e1000_read_phy_reg(hw, IGP01E1000_PHY_LINK_HEALTH, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } hw->speed_downgraded = (phy_data & IGP01E1000_PLHR_SS_DOWNGRADE) ? 1 : 0; } else if(hw->phy_type == e1000_phy_m88) { if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_STATUS, &phy_data) < 0) { - DEBUGOUT("PHY Read Error\n"); + DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name); return -E1000_ERR_PHY; } hw->speed_downgraded = (phy_data & M88E1000_PSSR_DOWNSHIFT) >> diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_main.c linux-2.5.69.newMsgs/drivers/net/e1000/e1000_main.c --- linux-2.5.69.orig/drivers/net/e1000/e1000_main.c 2003-06-04 13:24:46.000000000 -0500 +++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_main.c 2003-06-04 13:14:58.000000000 -0500 @@ -27,6 +27,7 @@ *******************************************************************************/ #include "e1000.h" +#include /* Change Log * @@ -228,10 +229,11 @@ e1000_init_module(void) { int ret; - printk(KERN_INFO "%s - version %s\n", - e1000_driver_string, e1000_driver_version); - printk(KERN_INFO "%s\n", e1000_copyright); + printk(KERN_INFO EMSG_BASICS, e1000_driver_name , + e1000_driver_string , e1000_driver_version); + + printk(KERN_INFO " %s\n", e1000_copyright); ret = pci_module_init(&e1000_driver); if(ret >= 0) @@ -508,6 +510,22 @@ netif_stop_queue(netdev); printk(KERN_INFO "%s: %s\n", netdev->name, adapter->id_string); + + if (netdev->features & NETIF_F_SG) + printk(KERN_INFO EMSG_NET_SGATHER, netdev->name); + else + printk(KERN_INFO EMSG_NET_NO_SGATHER, netdev->name); + + if (netdev->features & NETIF_F_HW_CSUM) { + printk(KERN_INFO EMSG_NET_HW_CSUMS, netdev->name); + } else { + if (netdev->features & NETIF_F_IP_CSUM) + printk(KERN_INFO EMSG_NET_CSUMS, netdev->name); + else + printk(KERN_INFO EMSG_NET_NO_CSUMS, netdev->name); + } + + e1000_check_options(adapter); /* Initial Wake on LAN setting @@ -597,9 +615,11 @@ hw->subsystem_vendor_id = pdev->subsystem_vendor; hw->subsystem_id = pdev->subsystem_device; - pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id); + if (pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id)) + printk(KERN_ERR EMSG_PCI_READ, netdev->name); - pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); + if (pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word)) + printk(KERN_ERR EMSG_PCI_READ, netdev->name); adapter->rx_buffer_len = E1000_RXBUFFER_2048; hw->max_frame_size = netdev->mtu + @@ -1334,6 +1354,7 @@ adapter->tx_fifo_head = 0; atomic_set(&adapter->tx_fifo_stall, 0); + printk(KERN_INFO EMSG_NET_START_QUEUE, netdev->name ); netif_wake_queue(netdev); } else { mod_timer(&adapter->tx_fifo_stall_timer, jiffies + 1); @@ -1361,13 +1382,10 @@ e1000_get_speed_and_duplex(&adapter->hw, &adapter->link_speed, &adapter->link_duplex); - - printk(KERN_INFO - "e1000: %s NIC Link is Up %d Mbps %s\n", + printk(KERN_INFO EMSG_NET_LINK_UP, netdev->name, adapter->link_speed, adapter->link_duplex == FULL_DUPLEX ? - "Full Duplex" : "Half Duplex"); - + "full" : "half"); netif_carrier_on(netdev); netif_wake_queue(netdev); mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); @@ -1375,13 +1393,14 @@ } } else { if(netif_carrier_ok(netdev)) { + uint16_t mii_status_reg; adapter->link_speed = 0; adapter->link_duplex = 0; - printk(KERN_INFO - "e1000: %s NIC Link is Down\n", - netdev->name); + e1000_read_phy_reg(&adapter->hw, PHY_STATUS, + &mii_status_reg); + printk(KERN_INFO EMSG_NET_LINK_FAIL, + netdev->name, mii_status_reg); netif_carrier_off(netdev); - netif_stop_queue(netdev); mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); } @@ -1420,9 +1439,21 @@ i = txdr->next_to_clean; if(txdr->buffer_info[i].dma && time_after(jiffies, txdr->buffer_info[i].time_stamp + HZ) && - !(E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF)) + !(E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF)) { + printk(KERN_INFO EMSG_NET_HUNG, netdev->name ); netif_stop_queue(netdev); + } + /* + * Need to add some code here to see if an individual tx has timed out. + * Right now we only look for hangs when the entire tx buffer fills up + * and there is nowhere to put an in-bound transmit packet. Then we + * could log the following error: + * + * netdev_err(netdev, EMSG_NET_TX_ERR); + * + */ + /* Reset the timer */ mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ); } @@ -1697,12 +1728,14 @@ } if(E1000_DESC_UNUSED(&adapter->tx_ring) < DESC_NEEDED) { + printk(KERN_INFO EMSG_NET_STOP_QUEUE, netdev->name ); netif_stop_queue(netdev); return 1; } if(adapter->hw.mac_type == e1000_82547) { if(e1000_82547_fifo_workaround(adapter, skb)) { + printk(KERN_INFO EMSG_NET_STOP_QUEUE, netdev->name ); netif_stop_queue(netdev); mod_timer(&adapter->tx_fifo_stall_timer, jiffies); return 1; @@ -1828,6 +1861,8 @@ struct e1000_hw *hw = &adapter->hw; unsigned long flags; uint16_t phy_tmp; + unsigned long rx_errors; + unsigned long tx_errors; #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF @@ -1920,10 +1955,14 @@ /* Rx Errors */ + rx_errors = adapter->net_stats.rx_errors; adapter->net_stats.rx_errors = adapter->stats.rxerrc + adapter->stats.crcerrs + adapter->stats.algnerrc + adapter->stats.rlec + adapter->stats.rnbc + adapter->stats.mpc + adapter->stats.cexterr; + if (rx_errors != adapter->net_stats.rx_errors) + printk(KERN_INFO EMSG_NET_RX_ERR, adapter->ifname, + adapter->net_stats.rx_errors); adapter->net_stats.rx_dropped = adapter->stats.rnbc; adapter->net_stats.rx_length_errors = adapter->stats.rlec; adapter->net_stats.rx_crc_errors = adapter->stats.crcerrs; @@ -1933,8 +1972,12 @@ /* Tx Errors */ + tx_errors = adapter->net_stats.tx_errors; adapter->net_stats.tx_errors = adapter->stats.ecol + adapter->stats.latecol; + if ( tx_errors != adapter->net_stats.tx_errors) + printk(KERN_INFO EMSG_NET_TX_ERR, adapter->ifname, + adapter->net_stats.tx_errors); adapter->net_stats.tx_aborted_errors = adapter->stats.ecol; adapter->net_stats.tx_window_errors = adapter->stats.latecol; adapter->net_stats.tx_carrier_errors = adapter->stats.tncrs; @@ -2105,8 +2148,10 @@ tx_ring->next_to_clean = i; - if(cleaned && netif_queue_stopped(netdev) && netif_carrier_ok(netdev)) + if(cleaned && netif_queue_stopped(netdev) && netif_carrier_ok(netdev)) { + printk(KERN_INFO EMSG_NET_START_QUEUE, netdev->name ); netif_wake_queue(netdev); + } return cleaned; } @@ -2516,7 +2561,8 @@ { struct e1000_adapter *adapter = hw->back; - pci_read_config_word(adapter->pdev, reg, value); + if (pci_read_config_word(adapter->pdev, reg, value)) + printk(KERN_ERR EMSG_PCI_READ, adapter->netdev->name); } void @@ -2524,7 +2570,8 @@ { struct e1000_adapter *adapter = hw->back; - pci_write_config_word(adapter->pdev, reg, *value); + if (pci_write_config_word(adapter->pdev, reg, *value)) + printk(KERN_ERR EMSG_PCI_WRITE, adapter->netdev->name); } uint32_t diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_param.c linux-2.5.69.newMsgs/drivers/net/e1000/e1000_param.c --- linux-2.5.69.orig/drivers/net/e1000/e1000_param.c 2003-06-04 13:24:46.000000000 -0500 +++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_param.c 2003-06-04 13:14:58.000000000 -0500 @@ -27,6 +27,7 @@ *******************************************************************************/ #include "e1000.h" +#include /* This is the only thing that needs to be changed to adjust the * maximum number of ports that the driver can manage. @@ -244,7 +245,8 @@ }; static int __devinit -e1000_validate_option(int *value, struct e1000_option *opt) +e1000_validate_option(struct e1000_adapter *adapter, int *value, + struct e1000_option *opt) { if(*value == OPTION_UNSET) { *value = opt->def; @@ -255,16 +257,19 @@ case enable_option: switch (*value) { case OPTION_ENABLED: - printk(KERN_INFO "%s Enabled\n", opt->name); + printk(KERN_INFO EMSG_DEV_CFG_ENABLED, + adapter->netdev->name, opt->name); return 0; case OPTION_DISABLED: - printk(KERN_INFO "%s Disabled\n", opt->name); + printk(KERN_INFO EMSG_DEV_CFG_DISABLED, + adapter->netdev->name, opt->name); return 0; } break; case range_option: if(*value >= opt->arg.r.min && *value <= opt->arg.r.max) { - printk(KERN_INFO "%s set to %i\n", opt->name, *value); + printk(KERN_INFO EMSG_DEV_CFG_ISET, + adapter->netdev->name, opt->name, *value); return 0; } break; @@ -330,7 +335,7 @@ MAX_TXD : MAX_82544_TXD; tx_ring->count = TxDescriptors[bd]; - e1000_validate_option(&tx_ring->count, &opt); + e1000_validate_option(adapter, &tx_ring->count, &opt); E1000_ROUNDUP(tx_ring->count, REQ_TX_DESCRIPTOR_MULTIPLE); } { /* Receive Descriptor Count */ @@ -346,7 +351,7 @@ opt.arg.r.max = mac_type < e1000_82544 ? MAX_RXD : MAX_82544_RXD; rx_ring->count = RxDescriptors[bd]; - e1000_validate_option(&rx_ring->count, &opt); + e1000_validate_option(adapter, &rx_ring->count, &opt); E1000_ROUNDUP(rx_ring->count, REQ_RX_DESCRIPTOR_MULTIPLE); } { /* Checksum Offload Enable/Disable */ @@ -358,7 +363,7 @@ }; int rx_csum = XsumRX[bd]; - e1000_validate_option(&rx_csum, &opt); + e1000_validate_option(adapter, &rx_csum, &opt); adapter->rx_csum = rx_csum; } { /* Flow Control */ @@ -380,7 +385,7 @@ }; int fc = FlowControl[bd]; - e1000_validate_option(&fc, &opt); + e1000_validate_option(adapter, &fc, &opt); adapter->hw.fc = adapter->hw.original_fc = fc; } { /* Transmit Interrupt Delay */ @@ -394,7 +399,7 @@ }; adapter->tx_int_delay = TxIntDelay[bd]; - e1000_validate_option(&adapter->tx_int_delay, &opt); + e1000_validate_option(adapter, &adapter->tx_int_delay, &opt); } { /* Transmit Absolute Interrupt Delay */ struct e1000_option opt = { @@ -407,7 +412,7 @@ }; adapter->tx_abs_int_delay = TxAbsIntDelay[bd]; - e1000_validate_option(&adapter->tx_abs_int_delay, &opt); + e1000_validate_option(adapter, &adapter->tx_abs_int_delay, &opt); } { /* Receive Interrupt Delay */ struct e1000_option opt = { @@ -420,7 +425,7 @@ }; adapter->rx_int_delay = RxIntDelay[bd]; - e1000_validate_option(&adapter->rx_int_delay, &opt); + e1000_validate_option(adapter, &adapter->rx_int_delay, &opt); } { /* Receive Absolute Interrupt Delay */ struct e1000_option opt = { @@ -433,7 +438,7 @@ }; adapter->rx_abs_int_delay = RxAbsIntDelay[bd]; - e1000_validate_option(&adapter->rx_abs_int_delay, &opt); + e1000_validate_option(adapter, &adapter->rx_abs_int_delay, &opt); } { /* Interrupt Throttling Rate */ struct e1000_option opt = { @@ -452,7 +457,7 @@ /* Dynamic mode */ adapter->itr = 1; } else { - e1000_validate_option(&adapter->itr, &opt); + e1000_validate_option(adapter, &adapter->itr, &opt); } } @@ -525,7 +530,7 @@ }; speed = Speed[bd]; - e1000_validate_option(&speed, &opt); + e1000_validate_option(adapter, &speed, &opt); } { /* Duplex */ struct e1000_opt_list dplx_list[] = {{ 0, "" }, @@ -542,7 +547,7 @@ }; dplx = Duplex[bd]; - e1000_validate_option(&dplx, &opt); + e1000_validate_option(adapter, &dplx, &opt); } if(AutoNeg[bd] != OPTION_UNSET && (speed != 0 || dplx != 0)) { @@ -595,7 +600,7 @@ }; int an = AutoNeg[bd]; - e1000_validate_option(&an, &opt); + e1000_validate_option(adapter, &an, &opt); adapter->hw.autoneg_advertised = an; } diff -Naur linux-2.5.69.orig/drivers/net/tg3.c linux-2.5.69.newMsgs/drivers/net/tg3.c --- linux-2.5.69.orig/drivers/net/tg3.c 2003-06-04 13:24:35.000000000 -0500 +++ linux-2.5.69.newMsgs/drivers/net/tg3.c 2003-06-04 13:15:45.000000000 -0500 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -347,6 +348,7 @@ udelay(40); } + if (ret) printk(KERN_ERR EMSG_DEV_PHY_READ, tp->dev->name); return ret; } @@ -393,6 +395,7 @@ udelay(40); } + if (ret) printk(KERN_ERR EMSG_DEV_PHY_WRITE, tp->dev->name); return ret; } @@ -634,9 +637,11 @@ static void tg3_link_report(struct tg3 *tp) { if (!netif_carrier_ok(tp->dev)) { - printk(KERN_INFO PFX "%s: Link is down.\n", tp->dev->name); + u32 mii_regval; + tg3_readphy(tp, MII_TG3_PHY_STAT, &mii_regval); + printk(KERN_INFO EMSG_NET_LINK_FAIL, tp->dev->name, mii_regval); } else { - printk(KERN_INFO PFX "%s: Link is up at %d Mbps, %s duplex.\n", + printk(KERN_INFO EMSG_NET_LINK_UP, tp->dev->name, (tp->link_config.active_speed == SPEED_1000 ? 1000 : @@ -1780,8 +1785,10 @@ tp->tx_cons = sw_idx; if (netif_queue_stopped(tp->dev) && - (TX_BUFFS_AVAIL(tp) > TG3_TX_WAKEUP_THRESH)) + (TX_BUFFS_AVAIL(tp) > TG3_TX_WAKEUP_THRESH)) { + printk(KERN_INFO EMSG_NET_START_QUEUE, tp->dev->name); netif_wake_queue(tp->dev); + } } /* Returns size of skb allocated or < 0 on error. @@ -2580,8 +2587,10 @@ } tp->tx_prod = entry; - if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) + if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) { + printk(KERN_INFO EMSG_NET_STOP_QUEUE, dev->name); netif_stop_queue(dev); + } out_unlock: spin_unlock_irqrestore(&tp->tx_lock, flags); @@ -2727,8 +2736,10 @@ } tp->tx_prod = entry; - if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) + if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) { + printk(KERN_INFO EMSG_NET_STOP_QUEUE, dev->name); netif_stop_queue(dev); + } spin_unlock_irqrestore(&tp->tx_lock, flags); @@ -3436,10 +3447,7 @@ } if (i >= 10000) { - printk(KERN_ERR PFX "tg3_reset_cpu timed out for %s, " - "and %s CPU\n", - tp->dev->name, - (offset == RX_CPU_BASE ? "RX" : "TX")); + printk(KERN_ERR EMSG_DEV_SW_RESET, tp->dev->name); return -ENODEV; } return 0; @@ -4462,7 +4470,7 @@ tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0); err = tg3_reset_hw(tp); - + if (err) printk( KERN_ERR EMSG_DEV_SW_RESET, tp->dev->name); out: return err; } @@ -4963,11 +4971,15 @@ stats->rx_errors = old_stats->rx_errors + get_stat64(&hw_stats->rx_errors); + if (stats->rx_errors != old_stats->rx_errors) + printk(KERN_INFO EMSG_NET_RX_ERR, dev->name, stats->rx_errors); stats->tx_errors = old_stats->tx_errors + get_stat64(&hw_stats->tx_errors) + get_stat64(&hw_stats->tx_mac_errors) + get_stat64(&hw_stats->tx_carrier_sense_errors) + get_stat64(&hw_stats->tx_discards); + if (stats->tx_errors != old_stats->tx_errors) + printk(KERN_INFO EMSG_NET_TX_ERR, dev->name, stats->tx_errors); stats->multicast = old_stats->multicast + get_stat64(&hw_stats->rx_mcast_packets); @@ -5661,8 +5673,10 @@ int i; if (offset > EEPROM_ADDR_ADDR_MASK || - (offset % 4) != 0) + (offset % 4) != 0) { + printk(KERN_ERR EMSG_DEV_EEPROM_READ, tp->dev->name); return -EINVAL; + } tmp = tr32(GRC_EEPROM_ADDR) & ~(EEPROM_ADDR_ADDR_MASK | EEPROM_ADDR_DEVID_MASK | @@ -5681,8 +5695,10 @@ break; udelay(100); } - if (!(tmp & EEPROM_ADDR_COMPLETE)) + if (!(tmp & EEPROM_ADDR_COMPLETE)) { + printk(KERN_ERR EMSG_NET_HUNG, tp->dev->name); return -EBUSY; + } *val = tr32(GRC_EEPROM_DATA); return 0; @@ -5875,8 +5891,10 @@ */ if (tp->phy_id == PHY_ID_INVALID) { if (!eeprom_signature_found || - !KNOWN_PHY_ID(eeprom_phy_id & PHY_ID_MASK)) + !KNOWN_PHY_ID(eeprom_phy_id & PHY_ID_MASK)) { + printk(KERN_ERR EMSG_PCI_BAD_ID, tp->dev->name); return -ENODEV; + } tp->phy_id = eeprom_phy_id; } } @@ -5953,6 +5971,7 @@ ~(ADVERTISED_1000baseT_Half | ADVERTISED_1000baseT_Full); + if (err) printk(KERN_ERR EMSG_DEV_PHY_READ, tp->dev->name); return err; } @@ -6046,9 +6065,11 @@ * workaround but turns MWI off all the times so never uses * it. This seems to suggest that the workaround is insufficient. */ - pci_read_config_word(tp->pdev, PCI_COMMAND, &pci_cmd); + if (pci_read_config_word(tp->pdev, PCI_COMMAND, &pci_cmd)) + printk(KERN_ERR EMSG_PCI_READ, tp->dev->name); pci_cmd &= ~PCI_COMMAND_INVALIDATE; - pci_write_config_word(tp->pdev, PCI_COMMAND, pci_cmd); + if (pci_write_config_word(tp->pdev, PCI_COMMAND, pci_cmd)) + printk(KERN_ERR EMSG_PCI_WRITE, tp->dev->name); /* It is absolutely critical that TG3PCI_MISC_HOST_CTRL * has the register indirect write enable bit set before @@ -6056,8 +6077,9 @@ * critical that the PCI-X hw workaround situation is decided * before that as well. */ - pci_read_config_dword(tp->pdev, TG3PCI_MISC_HOST_CTRL, - &misc_ctrl_reg); + if (pci_read_config_dword(tp->pdev, TG3PCI_MISC_HOST_CTRL, + &misc_ctrl_reg)) + printk(KERN_ERR EMSG_PCI_READ, tp->dev->name); tp->pci_chip_rev_id = (misc_ctrl_reg >> MISC_HOST_CTRL_CHIPREV_SHIFT); @@ -6870,6 +6892,20 @@ } else tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS; + if (dev->features & NETIF_F_SG) + printk(KERN_INFO EMSG_NET_SGATHER, dev->name); + else + printk(KERN_INFO EMSG_NET_NO_SGATHER, dev->name); + + if (dev->features & NETIF_F_HW_CSUM) { + printk(KERN_INFO EMSG_NET_HW_CSUMS, dev->name); + } else { + if (dev->features & NETIF_F_IP_CSUM) + printk(KERN_INFO EMSG_NET_CSUMS, dev->name); + else + printk(KERN_INFO EMSG_NET_NO_CSUMS, dev->name); + } + #if TG3_DO_TSO != 0 if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5700 || (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701 && @@ -7027,6 +7063,8 @@ static int __init tg3_init(void) { + printk(KERN_INFO EMSG_BASICS, DRV_MODULE_NAME, + "Broadcom Tigon3 ethernet driver" , DRV_MODULE_VERSION); return pci_module_init(&tg3_driver); } diff -Naur linux-2.5.69.orig/drivers/net/tg3.h linux-2.5.69.newMsgs/drivers/net/tg3.h --- linux-2.5.69.orig/drivers/net/tg3.h 2003-06-04 13:24:35.000000000 -0500 +++ linux-2.5.69.newMsgs/drivers/net/tg3.h 2003-06-04 13:15:45.000000000 -0500 @@ -1319,6 +1319,8 @@ /* Tigon3 specific PHY MII registers. */ #define TG3_BMCR_SPEED1000 0x0040 +#define MII_TG3_PHY_STAT 0x01 /* Status Register*/ + #define MII_TG3_CTRL 0x09 /* 1000-baseT control register */ #define MII_TG3_CTRL_ADV_1000_HALF 0x0100 #define MII_TG3_CTRL_ADV_1000_FULL 0x0200 diff -Naur linux-2.5.69.orig/include/linux/stdmsgs.h linux-2.5.69.newMsgs/include/linux/stdmsgs.h --- linux-2.5.69.orig/include/linux/stdmsgs.h 1969-12-31 18:00:00.000000000 -0600 +++ linux-2.5.69.newMsgs/include/linux/stdmsgs.h 2003-06-10 10:44:12.000000000 -0500 @@ -0,0 +1,57 @@ +#ifndef _STDMSGS_ +#define _STDMSGS_ + +/* + * Some common error messages for logging. + * + * Note: the "%s:" text preceeding each message + * is used to describe the device name for + * messages unique to a specific piece of h/w, + * or the device driver name otherwise. + * + +/********************************************************* + * common system errors/msgs + */ +#define EMSG_BASICS "%s: %s - version %s\n" +#define EMGS_NOMEM + + +/********************************************************* + * device errors/msgs + */ +#define EMSG_DEV_EEPROM_READ "%s: hardware failure: EEPROM read error\n" +#define EMSG_DEV_EEPROM_WRITE "%s: hardware failure: EEPROM write error\n" +#define EMSG_DEV_PHY_READ "%s: hardware failure: read error on physical interface\n" +#define EMSG_DEV_PHY_WRITE "%s: hardware failure: write error on physical interface\n" +#define EMSG_DEV_SW_RESET "%s: software failure: unable to reset device \n" +#define EMSG_DEV_CFG_ENABLED "%s: configuration note: %s enabled\n" +#define EMSG_DEV_CFG_DISABLED "%s: configuration note: %s disabled\n" +#define EMSG_DEV_CFG_ISET "%s: configuration note: %s set to %i\n" + + + +/********************************************************* + * network errors/msgs + */ +#define EMSG_NET_LINK_FAIL "%s: transient problem: link error detected - MII status %x\n" +#define EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n" +#define EMSG_NET_HUNG "%s: software failure: ethernet controller hung\n" +#define EMSG_NET_RX_ERR "%s: transient problem: packet receive error, rx_errors = %ld\n" +#define EMSG_NET_TX_ERR "%s: transient problem: packet transmit error, tx_errors = %ld\n" +#define EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n" +#define EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n" +#define EMSG_NET_SGATHER "%s: scatter/gather I/O enabled\n" +#define EMSG_NET_NO_SGATHER "%s: performance event: scatter/gather I/O disabled\n" +#define EMSG_NET_HW_CSUMS "%s: all IP checksums on transmit enabled\n" +#define EMSG_NET_CSUMS "%s: TCP/UDP over IPv6 checksums on transmit enabled\n" +#define EMSG_NET_NO_CSUMS "%s: performance event: IP checksums on transmit disabled\n" + +/********************************************************* + * PCI device errors/msgs + */ +#define EMSG_PCI_BAD_ID "%s: invalid device ID in PCI config header\n" +#define EMSG_PCI_READ "%s: hardware failure: PCI read error \n" +#define EMSG_PCI_WRITE "%s: hardware failure: PCI write error \n" + +#endif /* _STDMSGS_ */ From davem@redhat.com Mon Jun 16 13:43:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 13:43:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GKh12x010713 for ; Mon, 16 Jun 2003 13:43:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA00968; Mon, 16 Jun 2003 13:38:42 -0700 Date: Mon, 16 Jun 2003 13:38:41 -0700 (PDT) Message-Id: <20030616.133841.35533284.davem@redhat.com> To: janiceg@us.ibm.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEE28DE.6040808@us.ibm.com> References: <3EEE28DE.6040808@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3277 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice M Girouard Date: Mon, 16 Jun 2003 15:30:22 -0500 EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n" Should indicate flow control state too. EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n" EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n" Oh _ABSOLUTELY NOT_, you're not printing a message for normal events like this. Especially those that are going to occur on highly loaded systems. From ak@suse.de Mon Jun 16 13:53:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 13:54:01 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GKro2x011079 for ; Mon, 16 Jun 2003 13:53:51 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id A035014D75; Mon, 16 Jun 2003 22:53:44 +0200 (MEST) Date: Mon, 16 Jun 2003 22:53:42 +0200 From: Andi Kleen To: "David S. Miller" Cc: janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages Message-ID: <20030616205342.GH30400@wotan.suse.de> References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030616.133841.35533284.davem@redhat.com> X-archive-position: 3278 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Mon, Jun 16, 2003 at 01:38:41PM -0700, David S. Miller wrote: > From: Janice M Girouard > Date: Mon, 16 Jun 2003 15:30:22 -0500 > > EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n" > > Should indicate flow control state too. It would be actually useful to wrap these in real functions. Why? It will make supporting netconsole easier which has to be careful to never recurse in the network driver. -Andi From davem@redhat.com Mon Jun 16 13:55:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 13:55:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GKtk2x011391 for ; Mon, 16 Jun 2003 13:55:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA01016; Mon, 16 Jun 2003 13:51:25 -0700 Date: Mon, 16 Jun 2003 13:51:24 -0700 (PDT) Message-Id: <20030616.135124.71580008.davem@redhat.com> To: ak@suse.de Cc: janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <20030616205342.GH30400@wotan.suse.de> References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> <20030616205342.GH30400@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3279 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Mon, 16 Jun 2003 22:53:42 +0200 Why? It will make supporting netconsole easier which has to be careful to never recurse in the network driver. printk can check this From cfriesen@nortelnetworks.com Mon Jun 16 14:00:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 14:00:23 -0700 (PDT) Received: from zcars04e.nortelnetworks.com (zcars04e.nortelnetworks.com [47.129.242.56]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GL0D2x011723 for ; Mon, 16 Jun 2003 14:00:14 -0700 Received: from zcard307.ca.nortel.com (americasm03.nt.com [47.129.242.67]) by zcars04e.nortelnetworks.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id h5GL05q29066; Mon, 16 Jun 2003 17:00:06 -0400 (EDT) Received: from zcard0k6.ca.nortel.com ([47.129.242.158]) by zcard307.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id M7BYGKB5; Mon, 16 Jun 2003 17:00:06 -0400 Received: from pcard0ks.ca.nortel.com ([47.129.117.131]) by zcard0k6.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id LV8RAWDV; Mon, 16 Jun 2003 17:00:05 -0400 Received: from nortelnetworks.com (localhost.localdomain [127.0.0.1]) by pcard0ks.ca.nortel.com (Postfix) with ESMTP id 250F12E12F; Mon, 16 Jun 2003 17:00:05 -0400 (EDT) Message-ID: <3EEE2FD4.6060207@nortelnetworks.com> Date: Mon, 16 Jun 2003 17:00:04 -0400 X-Sybari-Space: 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: is it expected behaviour to receive one's own broadcast messages? Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3280 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortelnetworks.com Precedence: bulk X-list: netdev I have an app that is sending out broadcast messages to the local network using 255.255.255.255. It is registered for INADDR_ANY. The thing that seems strange is that it receives a copy of every packet that it sends out. Is this expected? The kernel is 2.4.18. Thanks, Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.comnetdev@oss.sgi.comnetdev@oss.sgi.com From janiceg@us.ibm.com Mon Jun 16 14:02:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 14:02:10 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GL252x012177 for ; Mon, 16 Jun 2003 14:02:06 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e32.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GL1vr9270480; Mon, 16 Jun 2003 17:01:57 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GL1TcH110962; Mon, 16 Jun 2003 15:01:29 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5GL1RGg027722; Mon, 16 Jun 2003 16:01:27 -0500 Received: from us.ibm.com (dyn95394201.austin.ibm.com [9.53.94.201]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id QAA25094; Mon, 16 Jun 2003 16:01:26 -0500 Message-ID: <3EEE2F9F.60706@us.ibm.com> Date: Mon, 16 Jun 2003 15:59:11 -0500 From: Janice M Girouard Organization: IBM Linux Technology Center - Network Device Drivers User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, Daniel Stekloff , Janice Girouard , Larry Kessler , kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3281 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: janiceg@us.ibm.com Precedence: bulk X-list: netdev I agree that it's not desirable to introduce a bunch of messages that we aren't already logging. I didn't show the netif_msg prefix because I was trying to focus the patch on the common messages, but you would normally proceed a message with: if netif_msg_link() printk("some text to indicate the link is up/down") The netif_msg_link test would normally filter out what messages should be logged. Or, just leave out the message call. I added one or two messages to the tg3 and e1000 drivers to demonstrate where you might use these common messages... just to show that various drivers could use the text. Actually using the specific message would be completely up to the developer. Jaince David S. Miller wrote: > From: Janice M Girouard > Date: Mon, 16 Jun 2003 15:30:22 -0500 > > EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n" > >Should indicate flow control state too. > > EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n" > EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n" > >Oh _ABSOLUTELY NOT_, you're not printing a message >for normal events like this. Especially those that are >going to occur on highly loaded systems. > > > From shemminger@osdl.org Mon Jun 16 14:12:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 14:12:10 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GLC02x012976 for ; Mon, 16 Jun 2003 14:12:01 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5GLBnX11113; Mon, 16 Jun 2003 14:11:49 -0700 Date: Mon, 16 Jun 2003 14:11:48 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] alloc_netdev for shaper Message-Id: <20030616141148.72df9830.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3282 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This converts shaper to allocating an array of pointers to net_device's rather than an array of net_devices. This is necessary because in future, network_device's may live past unregister. Tested with shapecfg to run e1000 at 64kbps. This is against 2.5.71+ bk latest. --- linux-2.5/drivers/net/shaper.c 2003-06-12 13:32:15.000000000 -0700 +++ linux-2.5-sysfs/drivers/net/shaper.c 2003-06-16 13:58:57.000000000 -0700 @@ -630,7 +630,7 @@ static void shaper_init_priv(struct net_ * Add a shaper device to the system */ -static int __init shaper_probe(struct net_device *dev) +static void __init shaper_setup(struct net_device *dev) { /* * Set up the shaper. @@ -642,6 +642,7 @@ static int __init shaper_probe(struct ne dev->open = shaper_open; dev->stop = shaper_close; + dev->destructor = (void (*)(struct net_device *))kfree; dev->hard_start_xmit = shaper_start_xmit; dev->get_stats = shaper_get_stats; dev->set_multicast_list = NULL; @@ -669,12 +670,6 @@ static int __init shaper_probe(struct ne dev->addr_len = 0; dev->tx_queue_len = 10; dev->flags = 0; - - /* - * Shaper is ok - */ - - return 0; } static int shapers = 1; @@ -695,35 +690,38 @@ __setup("shapers=", set_num_shapers); #endif /* MODULE */ -static struct net_device *devs; +static struct net_device **devs; static unsigned int shapers_registered = 0; static int __init shaper_init(void) { - int i, err; + int i; size_t alloc_size; - struct shaper *sp; + struct net_device *dev; + char name[IFNAMSIZ]; if (shapers < 1) return -ENODEV; - alloc_size = (sizeof(*devs) * shapers) + - (sizeof(struct shaper) * shapers); + alloc_size = sizeof(*dev) * shapers; devs = kmalloc(alloc_size, GFP_KERNEL); if (!devs) return -ENOMEM; memset(devs, 0, alloc_size); - sp = (struct shaper *) &devs[shapers]; for (i = 0; i < shapers; i++) { - err = dev_alloc_name(&devs[i], "shaper%d"); - if (err < 0) + + snprintf(name, IFNAMSIZ, "shaper%d", i); + dev = alloc_netdev(sizeof(struct shaper), name, + shaper_setup); + if (!dev) break; - devs[i].init = shaper_probe; - devs[i].priv = &sp[i]; - if (register_netdev(&devs[i])) + + if (register_netdev(dev)) break; + + devs[i] = dev; shapers_registered++; } @@ -740,7 +738,8 @@ static void __exit shaper_exit (void) int i; for (i = 0; i < shapers_registered; i++) - unregister_netdev(&devs[i]); + if (devs[i]) + unregister_netdev(devs[i]); kfree(devs); devs = NULL; From davem@redhat.com Mon Jun 16 14:28:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 14:28:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GLSW2x013388 for ; Mon, 16 Jun 2003 14:28:34 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA01153; Mon, 16 Jun 2003 14:24:08 -0700 Date: Mon, 16 Jun 2003 14:24:07 -0700 (PDT) Message-Id: <20030616.142407.123998330.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] alloc_netdev for shaper From: "David S. Miller" In-Reply-To: <20030616141148.72df9830.shemminger@osdl.org> References: <20030616141148.72df9830.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3283 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 16 Jun 2003 14:11:48 -0700 This converts shaper to allocating an array of pointers to net_device's rather than an array of net_devices. This is necessary because in future, network_device's may live past unregister Tested with shapecfg to run e1000 at 64kbps. Applied, thanks. From niv@us.ibm.com Mon Jun 16 15:14:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:14:51 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMER2x014123 for ; Mon, 16 Jun 2003 15:14:34 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GME82R220618; Mon, 16 Jun 2003 18:14:09 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GMDuFD136394; Mon, 16 Jun 2003 16:13:57 -0600 Message-ID: <3EEE40F1.4030107@us.ibm.com> Date: Mon, 16 Jun 2003 15:13:05 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Janice M Girouard CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, Jeff Garzik , davem@redhat.com Subject: Re: patch for common networking error messages References: <3EEE28DE.6040808@us.ibm.com> In-Reply-To: <3EEE28DE.6040808@us.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3284 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Janice M Girouard wrote: > Below is a patch that demonstrates standard messages for ethernet device > drivers. I would like your feedback on the concept of standard network > messages, and any suggestions for messages to include. Very useful! I'd like to see a short note on this in Documentation/ networking..Or perhaps if there is already a RAS best practices kind of doc or something, add to that? (sorry, havent checked) But it would be handy for people who wanted to contribute patches for other drivers. Essentially, things like some guidelines on classifying some of those messages, when creating new messages. eg when is something a state change and when is it a performance event? I notice some slight ambiguity in your defs..(sorry, very minor nitpick :)). I'd certainly like to see messages from the driver when the card enters/leaves promiscuous mode, as an example of things we'd like to add... thanks, Nivedita > The intent of the standard message change is to: > 1) Ensure key events are communicated to user space in a predictable > way, enabling automated diagnostic systems or error log analysis, > 2) Reduce the number of puzzling messages that are logged -- in this > case, by replacing them with standard messages, and/or > 3) Identify the device (or driver name) that is responsible for the error. From davem@redhat.com Mon Jun 16 15:17:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:17:34 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMHS2x014469 for ; Mon, 16 Jun 2003 15:17:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01283; Mon, 16 Jun 2003 15:13:09 -0700 Date: Mon, 16 Jun 2003 15:13:08 -0700 (PDT) Message-Id: <20030616.151308.55864910.davem@redhat.com> To: niv@us.ibm.com Cc: janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEE40F1.4030107@us.ibm.com> References: <3EEE28DE.6040808@us.ibm.com> <3EEE40F1.4030107@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3285 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Mon, 16 Jun 2003 15:13:05 -0700 I'd certainly like to see messages from the driver when the card enters/leaves promiscuous mode, egrep "promiscuous mode" net/core/dev.c | grep printk From girouard@us.ibm.com Mon Jun 16 15:29:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:29:38 -0700 (PDT) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMTS2x014907 for ; Mon, 16 Jun 2003 15:29:35 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e2.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GMTL9X058734; Mon, 16 Jun 2003 18:29:21 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GMTJxt028858; Mon, 16 Jun 2003 18:29:20 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: Daniel Stekloff , janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com, X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Mon, 16 Jun 2003 17:29:15 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/16/2003 18:29:19 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3286 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev From: "David S. Miller" Date : 06/16/2003 05:13 PM egrep "promiscuous mode" net/core/dev.c | grep printk I noticed when I performed the grep, the printk shows: printk(KERN_INFO "device %s %s promiscuous mode\n" For the sake of consistency and automatic error log analysis, it might be nice to standardize on a message closer to: printk(KERN_INFO "%s: %s promiscuous mode\n" It's somewhat common, but not universal to start the error message with the device name followed by a colon. Janice From davem@redhat.com Mon Jun 16 15:32:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:32:15 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMWA2x015244 for ; Mon, 16 Jun 2003 15:32:10 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01370; Mon, 16 Jun 2003 15:27:45 -0700 Date: Mon, 16 Jun 2003 15:27:45 -0700 (PDT) Message-Id: <20030616.152745.124055059.davem@redhat.com> To: girouard@us.ibm.com Cc: stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3287 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Mon, 16 Jun 2003 17:29:15 -0500 For the sake of consistency and automatic error log analysis, it might be And all the scripts checking for the existing messages in log files? Screw them, right? This whole idea is starting to leave a very bad taste in my mouth... From davem@redhat.com Mon Jun 16 15:34:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:34:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMYt2x015556 for ; Mon, 16 Jun 2003 15:34:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01407; Mon, 16 Jun 2003 15:30:37 -0700 Date: Mon, 16 Jun 2003 15:30:37 -0700 (PDT) Message-Id: <20030616.153037.90795159.davem@redhat.com> To: akpm@digeo.com Cc: netdev@oss.sgi.com Subject: Re: 2.4.21 oops From: "David S. Miller" In-Reply-To: <20030616153029.0c8f2a20.akpm@digeo.com> References: <20030616153029.0c8f2a20.akpm@digeo.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3288 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andrew Morton Date: Mon, 16 Jun 2003 15:30:29 -0700 This kills 2.5.71++ as well. I forwarded this to David Stevens, he last played in this area so he might be able to fix it fast. He obviously didn't add the bug, since it exists in 2.4.x which doesn't have his IGMPv3 mods... From sim@netnation.com Mon Jun 16 15:37:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:37:31 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMbF2x015879 for ; Mon, 16 Jun 2003 15:37:16 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19S2bC-0001mR-IO; Mon, 16 Jun 2003 15:37:14 -0700 Date: Mon, 16 Jun 2003 15:37:14 -0700 From: Simon Kirby To: "David S. Miller" Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030616223714.GB18484@netnation.com> References: <20030610075732.GD23009@netnation.com> <20030612.232002.41633789.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030612.232002.41633789.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3289 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Thu, Jun 12, 2003 at 11:20:02PM -0700, David S. Miller wrote: > In any case, setting gc_min_interval to 0 definitely helped, but I > suspect Dave's patches will make a bigger difference. Next up is > 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. > > Did you get stuck in some mud? :-) It's been two days. > > I even posted new patches for you to test, get on it :))) Ok, I dug myself out. :) I have oprofile working, and I wrote a simple application to measure received pps on the receiving box with gettimeofday() accuracy. To reduce noise I am profiling for one minute periods. The sender is capable of sending about 315,000 pps via an e1000. So, which kernels shall I try? When I set the thing up I was using 2.5.70-bk14, but I am compiling 2.4.71, and I will try with your patch above and with Alexey's. Stock 2.4.21-rc7 (CONFIG_IP_MULTIPLE_TABLES=y): 60.0047 seconds passed, avg forwarding rate: 122169.980 pps 60.0052 seconds passed, avg forwarding rate: 123650.166 pps 60.0045 seconds passed, avg forwarding rate: 122352.499 pps 60.0059 seconds passed, avg forwarding rate: 121830.346 pps 60.0046 seconds passed, avg forwarding rate: 121714.614 pps 60.0057 seconds passed, avg forwarding rate: 121927.324 pps 60.0061 seconds passed, avg forwarding rate: 121995.740 pps 60.0064 seconds passed, avg forwarding rate: 122168.417 pps 60.0030 seconds passed, avg forwarding rate: 123245.149 pps 60.0062 seconds passed, avg forwarding rate: 122613.361 pps (CPU type is still Opteron, oprofile is wrong. :)) Cpu type: Hammer Cpu speed was (MHz estimation) : 1393.98 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c0278b00 4537 12.6063 fn_hash_lookup c027a1d0 3293 9.14976 fib_lookup c024aa70 2989 8.30508 rt_intern_hash c024c7d0 2195 6.09892 ip_route_input c024c000 2020 5.61267 ip_route_input_slow c024ed20 1244 3.45652 ip_rcv c0247650 1234 3.42873 eth_header c0276490 1226 3.4065 fib_validate_source c024a710 1173 3.25924 rt_garbage_collect c0132ee0 930 2.58405 kmalloc c0242f60 924 2.56738 neigh_lookup c02502a0 915 2.54237 ip_forward c0132da0 875 2.43123 kmem_cache_alloc c02444e0 735 2.04223 neigh_resolve_output c0249f70 734 2.03946 rt_hash_code c0133070 717 1.99222 kmem_cache_free c024bcd0 666 1.85051 rt_set_nexthop c023bfe0 632 1.75604 __kfree_skb c02778e0 620 1.7227 fib_semantic_match c023bce0 611 1.69769 alloc_skb c0247f30 529 1.46985 pfifo_fast_dequeue c0247830 524 1.45596 eth_type_trans c02404b0 521 1.44762 netif_receive_skb c0132cb0 520 1.44485 free_block c0247a80 487 1.35315 qdisc_restart c01330f0 466 1.2948 kfree c02426e0 456 1.26702 dst_alloc c023fd60 454 1.26146 dev_queue_xmit c0132bd0 401 1.1142 kmem_cache_alloc_batch c0272740 370 1.02806 inet_select_addr c0252e80 353 0.980828 ip_finish_output c0244350 325 0.903029 neigh_hh_init c0242840 321 0.891914 dst_destroy c010d5c0 315 0.875243 do_gettimeofday c0247ec0 276 0.76688 pfifo_fast_enqueue c024baa0 245 0.680745 ipv4_dst_destroy c023bf70 216 0.600167 kfree_skbmem c0120160 210 0.583495 cpu_raise_softirq c026f1e0 156 0.433454 arp_bind_neighbour c0277990 136 0.377883 __fib_res_prefsrc c027a020 126 0.350097 fib_rules_policy c0279d60 96 0.266741 fib_rule_put c0240370 68 0.188941 net_tx_action c026ec10 37 0.102806 arp_hash c023bf00 34 0.0944707 skb_release_data c0240770 14 0.0388997 net_rx_action size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 118130 56 123028 0 0 0 0 0 0 0 0 123028 123024 0 0 109456 58 122426 0 0 0 0 0 0 0 0 122426 122422 0 0 113318 52 124832 0 0 0 0 0 0 0 0 124832 124828 0 0 104356 53 122131 0 0 0 0 0 0 0 0 122131 122127 0 0 98333 56 125064 0 0 0 0 0 0 0 0 125064 125060 0 0 125925 57 125899 0 0 0 0 0 0 0 0 125899 125896 0 0 117516 44 122676 0 0 0 0 0 0 0 0 122676 122672 0 0 121088 48 124472 0 0 0 0 0 0 0 0 124472 124468 0 0 113049 43 123041 0 0 0 0 0 0 0 0 123041 123037 0 0 104339 43 122377 0 0 0 0 0 0 0 0 122377 122373 0 0 98324 46 125074 0 0 0 0 0 0 0 0 125074 125070 0 0 126818 40 126816 0 0 0 0 0 0 0 0 126816 126813 0 0 131018 54 124958 0 0 0 0 0 0 0 0 124958 124954 0 0 122303 51 122369 0 0 0 0 0 0 0 0 122369 122365 0 0 113661 46 122438 0 0 0 0 0 0 0 0 122438 122434 0 0 104350 49 121771 0 0 0 0 0 0 0 0 121771 121767 0 0 98330 58 125062 0 0 0 0 0 0 0 0 125062 125058 0 0 102841 36 124675 0 0 0 0 0 0 0 0 124675 124671 0 0 131031 49 126507 0 0 0 0 0 0 0 0 126507 126504 0 0 Stock 2.5.70-bk14 (CONFIG_IP_MULTIPLE_TABLES=y): (There is some noise in forward pps rate because the route cache keeps ballooning and collapsing all over the place.) 60.0042 seconds passed, avg forwarding rate: 102595.362 pps 60.0039 seconds passed, avg forwarding rate: 102690.418 pps 60.0043 seconds passed, avg forwarding rate: 102254.257 pps 60.0036 seconds passed, avg forwarding rate: 102708.344 pps 60.0052 seconds passed, avg forwarding rate: 102647.544 pps 60.0036 seconds passed, avg forwarding rate: 102697.595 pps 60.0042 seconds passed, avg forwarding rate: 102326.652 pps 60.0043 seconds passed, avg forwarding rate: 102615.183 pps 60.0081 seconds passed, avg forwarding rate: 101990.399 pps 60.0036 seconds passed, avg forwarding rate: 102220.386 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c0290480 8213 15.6283 rt_garbage_collect c011ee80 4979 9.47443 local_bh_enable c02bf140 3616 6.8808 fn_hash_lookup c02906d0 1977 3.76199 rt_intern_hash c0291b10 1949 3.70871 ip_route_input_slow c028d1d0 1647 3.13404 nf_iterate c028d4c0 1575 2.99703 nf_hook_slow c0220c10 1513 2.87905 tg3_start_xmit c02c0a00 1470 2.79723 fib_lookup c02923e0 1169 2.22446 ip_route_input c0220010 895 1.70308 tg3_rx c0134e20 877 1.66882 kmem_cache_free c0134d60 863 1.64218 kmem_cache_alloc c028e270 855 1.62696 pfifo_fast_dequeue c0134bf0 855 1.62696 free_block c02bcf30 837 1.59271 fib_validate_source c0294450 833 1.5851 ip_rcv_finish c0286b20 782 1.48805 netif_receive_skb c028db70 765 1.4557 eth_header c0293ee0 750 1.42716 ip_rcv c0290190 727 1.38339 rt_may_expire c028fda0 689 1.31108 rt_hash_code c0295210 679 1.29205 ip_forward c0134a30 648 1.23306 cache_alloc_refill c0134da0 636 1.21023 kmalloc c0298870 591 1.1246 ip_finish_output2 c0134e60 520 0.989496 kfree c01ad3d0 504 0.95905 memcpy c028a860 498 0.947633 neigh_resolve_output c0282c90 477 0.907672 alloc_skb c02c08e0 474 0.901964 fib_rules_policy c02be300 474 0.901964 fib_semantic_match c0289870 471 0.896255 neigh_lookup c028dce0 466 0.886741 eth_type_trans c0289160 445 0.84678 dst_alloc c0295450 442 0.841072 ip_forward_finish c02865e0 407 0.774471 dev_queue_xmit c02b7950 383 0.728802 inet_select_addr c0289290 344 0.65459 dst_destroy c0296790 325 0.618435 ip_finish_output c02b4140 320 0.608921 arp_hash c021fc50 317 0.603212 tg3_tx c02d0090 313 0.595601 ipv4_sabotage_out c028e1f0 306 0.58228 pfifo_fast_enqueue c021fec0 296 0.563252 tg3_recycle_rx c028a6e0 294 0.559446 neigh_hh_init size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 5402 29 82379 0 0 0 0 0 0 0 0 76979 76401 575 575 131075 29 125867 0 0 0 0 0 0 0 0 123076 122879 194 194 117660 26 118554 0 0 0 0 0 0 0 0 110363 109467 896 896 86134 22 100138 0 0 0 0 0 0 0 0 91947 91353 591 591 48682 28 94224 0 0 0 0 0 0 0 0 86033 85427 603 603 3419 28 86216 0 0 0 0 0 0 0 0 82916 82390 523 523 131075 25 127871 0 0 0 0 0 0 0 0 122980 122879 98 98 116937 33 117383 0 0 0 0 0 0 0 0 109192 108744 448 448 83306 17 98079 0 0 0 0 0 0 0 0 89888 89248 637 637 43585 17 91951 0 0 0 0 0 0 0 0 83760 83158 599 599 715 23 88681 0 0 0 0 0 0 0 0 88081 87487 591 591 104407 37 134703 0 0 0 0 0 0 0 0 127112 127111 0 0 67650 14 94902 0 0 0 0 0 0 0 0 86711 86122 587 587 23861 25 87875 0 0 0 0 0 0 0 0 79684 79090 591 591 670 26 108386 0 0 0 0 0 0 0 0 107786 107211 572 572 131075 38 130550 0 0 0 0 0 0 0 0 122959 122879 77 77 99588 14 108546 0 0 0 0 0 0 0 0 100355 99586 768 768 61804 27 93912 0 0 0 0 0 0 0 0 85721 85095 624 624 14826 27 84721 0 0 0 0 0 0 0 0 76530 75901 626 626 2.4.71 (CONFIG_IP_MULTIPLE_TABLES=y) w/correction to make flow_cache_init compile w/CONFIG_SMP=n (register_cpu_notifier): 60.0060 seconds passed, avg forwarding rate: 103857.780 pps 60.0036 seconds passed, avg forwarding rate: 104893.408 pps 60.0061 seconds passed, avg forwarding rate: 104623.946 pps 60.0040 seconds passed, avg forwarding rate: 104457.440 pps 60.0057 seconds passed, avg forwarding rate: 104505.375 pps 60.0042 seconds passed, avg forwarding rate: 103663.532 pps 60.0043 seconds passed, avg forwarding rate: 104240.425 pps 60.0042 seconds passed, avg forwarding rate: 104422.699 pps 60.0034 seconds passed, avg forwarding rate: 104252.729 pps 60.0058 seconds passed, avg forwarding rate: 104138.597 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c0292270 8941 16.2664 rt_garbage_collect c011f080 5295 9.63323 local_bh_enable c02c0fb0 3943 7.17353 fn_hash_lookup c02924c0 2209 4.01885 rt_intern_hash c0293900 2115 3.84783 ip_route_input_slow c028ef90 1725 3.1383 nf_iterate c028f280 1673 3.0437 nf_hook_slow c02c2880 1536 2.79445 fib_lookup c02941d0 1381 2.51246 ip_route_input c0222330 1307 2.37783 tg3_start_xmit c0296240 1000 1.81931 ip_rcv_finish c0134ff0 961 1.74835 free_block c0221710 918 1.67012 tg3_rx c0290030 909 1.65375 pfifo_fast_dequeue c0135230 861 1.56642 kmem_cache_free c0295cd0 844 1.53549 ip_rcv c02bed90 835 1.51912 fib_validate_source c0135170 822 1.49547 kmem_cache_alloc c028f930 818 1.48819 eth_header c0291f80 741 1.34811 rt_may_expire c02886a0 726 1.32082 netif_receive_skb c0291b80 708 1.28807 rt_hash_code c0134e20 684 1.24441 cache_alloc_refill c01351b0 644 1.17163 __kmalloc c028b610 595 1.08249 neigh_lookup c028c600 542 0.986064 neigh_resolve_output c0284620 538 0.978787 alloc_skb c029a680 534 0.97151 ip_finish_output2 c0135270 510 0.927846 kfree c01adc80 484 0.880544 memcpy c028af00 468 0.851435 dst_alloc c02c0160 438 0.796856 fib_semantic_match c028faa0 434 0.789579 eth_type_trans c0297020 419 0.762289 ip_forward c0288160 418 0.76047 dev_queue_xmit c02c2760 398 0.724084 fib_rules_policy c0297260 394 0.716807 ip_forward_finish c02b9790 391 0.711349 inet_select_addr c0221e90 374 0.680421 tg3_set_txd c028ffb0 348 0.633119 pfifo_fast_enqueue c028fcc0 339 0.616745 qdisc_restart c028b030 326 0.593094 dst_destroy c02d1fb0 301 0.547611 ipv4_sabotage_out c02985a0 299 0.543973 ip_finish_output c0128a00 293 0.533057 call_rcu c0221350 284 0.516683 tg3_tx size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 26127 24 92112 0 0 0 0 0 0 0 0 83920 83318 599 599 722 32 106131 0 0 0 0 0 0 0 0 105531 104945 583 583 131075 23 130793 0 0 0 0 0 0 0 0 123201 122879 319 319 131074 27 131449 0 0 0 0 0 0 0 0 123257 122878 376 376 120198 23 120965 0 0 0 0 0 0 0 0 112773 112005 768 768 93359 26 104746 0 0 0 0 0 0 0 0 96554 96040 511 511 59620 17 97943 0 0 0 0 0 0 0 0 89751 89140 608 608 18593 20 90640 0 0 0 0 0 0 0 0 82448 81851 593 593 70 23 113117 0 0 0 0 0 0 0 0 113117 112479 636 636 131075 22 131306 0 0 0 0 0 0 0 0 123114 122879 232 232 110767 18 111534 0 0 0 0 0 0 0 0 103342 102574 768 768 79859 20 100776 0 0 0 0 0 0 0 0 92584 91971 610 610 43481 24 95328 0 0 0 0 0 0 0 0 87136 86500 632 632 704 34 88794 0 0 0 0 0 0 0 0 88194 87591 601 601 131075 36 130692 0 0 0 0 0 0 0 0 123100 122879 218 218 110844 25 111611 0 0 0 0 0 0 0 0 103419 102651 768 768 80008 18 100862 0 0 0 0 0 0 0 0 92670 92043 624 624 43390 24 95064 0 0 0 0 0 0 0 0 86872 86260 608 608 720 31 88869 0 0 0 0 0 0 0 0 88269 87682 585 585 2.4.71 (CONFIG_IP_MULTIPLE_TABLES=n) 60.0039 seconds passed, avg forwarding rate: 108482.881 pps 60.0036 seconds passed, avg forwarding rate: 107850.012 pps 60.0043 seconds passed, avg forwarding rate: 108330.941 pps 60.0063 seconds passed, avg forwarding rate: 108424.657 pps 60.0071 seconds passed, avg forwarding rate: 108575.916 pps 60.0040 seconds passed, avg forwarding rate: 107774.861 pps 60.0053 seconds passed, avg forwarding rate: 107765.720 pps 60.0065 seconds passed, avg forwarding rate: 108021.888 pps 60.0039 seconds passed, avg forwarding rate: 107364.055 pps 60.0061 seconds passed, avg forwarding rate: 107593.173 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c0292260 7149 13.2856 rt_garbage_collect c011f080 6875 12.7764 local_bh_enable c02c0df0 3440 6.39286 fn_hash_lookup c02924b0 2180 4.05129 rt_intern_hash c02938c0 2158 4.01041 ip_route_input_slow c028ef90 1769 3.28749 nf_iterate c0222330 1644 3.05519 tg3_start_xmit c028f280 1601 2.97528 nf_hook_slow c02940f0 1209 2.24679 ip_route_input c02bec10 1187 2.20591 fib_validate_source c0135230 987 1.83423 kmem_cache_free c0134ff0 950 1.76547 free_block c0135170 924 1.71715 kmem_cache_alloc c0295c00 918 1.706 ip_rcv c0221710 908 1.68742 tg3_rx c0290020 907 1.68556 pfifo_fast_dequeue c0296170 897 1.66698 ip_rcv_finish c028f920 808 1.50158 eth_header c0291b70 804 1.49415 rt_hash_code c02886a0 740 1.37521 netif_receive_skb c0134e20 739 1.37335 cache_alloc_refill c0291f70 736 1.36778 rt_may_expire c0296f50 703 1.30645 ip_forward c028b610 676 1.25627 neigh_lookup c01351b0 669 1.24326 __kmalloc c02bffa0 637 1.18379 fib_semantic_match c029a5b0 635 1.18008 ip_finish_output2 c0135270 614 1.14105 kfree c0284620 544 1.01096 alloc_skb c028c600 515 0.957071 neigh_resolve_output c01adc80 498 0.925479 memcpy c028fa90 464 0.862293 eth_type_trans c0288160 457 0.849285 dev_queue_xmit c0297190 425 0.789816 ip_forward_finish c028af00 420 0.780524 dst_alloc c02b9680 411 0.763799 inet_select_addr c028ffa0 350 0.650437 pfifo_fast_enqueue c028b030 350 0.650437 dst_destroy c02984d0 330 0.613269 ip_finish_output c0221350 326 0.605835 tg3_tx c0128a00 324 0.602119 call_rcu c028fcb0 323 0.60026 qdisc_restart c028c480 323 0.60026 neigh_hh_init c02215c0 291 0.540792 tg3_recycle_rx c02d0e80 258 0.479465 ipv4_sabotage_in c02d0ec0 257 0.477606 ipv4_sabotage_out size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 103545 28 107316 0 0 0 0 0 0 0 0 99124 98495 626 626 76151 24 104300 0 0 0 0 0 0 0 0 96108 95485 620 620 10200 26 83194 0 0 0 0 0 0 0 0 75002 74396 603 603 702 38 122078 0 0 0 0 0 0 0 0 121478 120872 603 603 131075 39 130952 0 0 0 0 0 0 0 0 123360 122879 478 478 126869 28 126996 0 0 0 0 0 0 0 0 118804 118676 128 128 85330 22 107046 0 0 0 0 0 0 0 0 98854 98259 591 591 50739 22 97102 0 0 0 0 0 0 0 0 88910 88288 620 620 10501 17 91459 0 0 0 0 0 0 0 0 83267 82641 623 623 689 34 121790 0 0 0 0 0 0 0 0 121190 120571 616 616 131075 33 130963 0 0 0 0 0 0 0 0 123371 122879 489 489 110335 44 126916 0 0 0 0 0 0 0 0 118724 118659 64 64 81862 16 103228 0 0 0 0 0 0 0 0 95036 94406 628 628 50717 24 100536 0 0 0 0 0 0 0 0 92344 91734 607 607 12301 33 93251 0 0 0 0 0 0 0 0 85059 84463 593 593 684 33 119995 0 0 0 0 0 0 0 0 119395 118771 621 621 131074 26 146318 0 0 0 0 0 0 0 0 138726 138610 113 113 114248 25 115015 0 0 0 0 0 0 0 0 106823 106055 768 768 88917 18 106318 0 0 0 0 0 0 0 0 98126 97548 575 575 57970 24 100752 0 0 0 0 0 0 0 0 92560 91932 625 625 2.5.71-davem-rtcache-jun9: 60.006 seconds passed, avg forwarding rate: 160182.941 pps 60.0077 seconds passed, avg forwarding rate: 159805.476 pps 60.007 seconds passed, avg forwarding rate: 160274.907 pps 60.0084 seconds passed, avg forwarding rate: 160212.101 pps 60.0045 seconds passed, avg forwarding rate: 159345.161 pps 60.0076 seconds passed, avg forwarding rate: 159552.768 pps 60.0046 seconds passed, avg forwarding rate: 159416.702 pps 60.0035 seconds passed, avg forwarding rate: 160435.829 pps 60.0043 seconds passed, avg forwarding rate: 160015.150 pps 60.0072 seconds passed, avg forwarding rate: 159309.661 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c02c0fa0 4875 8.75586 fn_hash_lookup c02939f0 3839 6.89513 ip_route_input_slow c028f040 2507 4.50276 nf_iterate c028f330 2450 4.40038 nf_hook_slow c0222330 2377 4.26927 tg3_start_xmit c02bedc0 1722 3.09284 fib_validate_source c0134ff0 1478 2.6546 free_block c0292560 1444 2.59353 rt_intern_hash c0221710 1409 2.53067 tg3_rx c0135230 1406 2.52528 kmem_cache_free c0296320 1399 2.51271 ip_rcv_finish c0294260 1281 2.30077 ip_route_input c02886a0 1235 2.21815 netif_receive_skb c028f9d0 1194 2.14451 eth_header c028af00 1153 2.07087 dst_alloc c0291c20 1120 2.0116 rt_hash_code c0295db0 1111 1.99544 ip_rcv c0135170 1109 1.99185 kmem_cache_alloc c0134e20 1109 1.99185 cache_alloc_refill c02900d0 1016 1.82481 pfifo_fast_dequeue c028b6c0 1014 1.82122 neigh_lookup c01351b0 1003 1.80146 __kmalloc c02c0150 894 1.60569 fib_semantic_match c01adc80 838 1.50511 memcpy c029a760 837 1.50331 ip_finish_output2 c028c6b0 801 1.43866 neigh_resolve_output c0288160 792 1.42249 dev_queue_xmit c0135270 789 1.4171 kfree c0284620 740 1.32909 alloc_skb c011f080 661 1.1872 local_bh_enable c028fb40 656 1.17822 eth_type_trans c02b9830 648 1.16386 inet_select_addr c0297340 601 1.07944 ip_forward_finish c02929d0 591 1.06148 __rt_hash_shrink c0290050 540 0.96988 pfifo_fast_enqueue c028b0e0 532 0.955511 dst_destroy c0298680 511 0.917794 ip_finish_output c0128a00 501 0.899833 call_rcu c0297100 479 0.860319 ip_forward c02d1070 478 0.858523 ipv4_sabotage_out c0221350 451 0.810029 tg3_tx c028c530 431 0.774108 neigh_hh_init c02215c0 426 0.765127 tg3_recycle_rx c028fd60 394 0.707653 qdisc_restart c02d1030 375 0.673528 ipv4_sabotage_in c0292310 359 0.64479 rt_garbage_collect size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 21502 9 160507 0 0 0 0 0 0 0 0 160507 160507 0 0 23701 15 160485 0 0 0 0 0 0 0 0 160485 160484 1 0 21866 6 160498 0 0 0 0 0 0 0 0 160498 160498 0 0 23551 12 160464 0 0 0 0 0 0 0 0 160464 160464 0 0 23266 13 160203 0 0 0 0 0 0 0 0 160203 160203 0 0 22095 9 160591 0 0 0 0 0 0 0 0 160591 160591 0 0 23962 15 160461 0 0 0 0 0 0 0 0 160461 160460 0 0 22066 13 158691 0 0 0 0 0 0 0 0 158691 158691 0 0 22951 10 160166 0 0 0 0 0 0 0 0 160166 160166 0 0 21861 18 159134 0 0 0 0 0 0 0 0 159134 159134 0 0 21097 6 159126 0 0 0 0 0 0 0 0 159126 159125 1 0 22943 6 161350 0 0 0 0 0 0 0 0 161350 161350 0 0 21692 4 160124 0 0 0 0 0 0 0 0 160124 160124 0 0 23524 16 161184 0 0 0 0 0 0 0 0 161184 161184 0 0 20471 15 160833 0 0 0 0 0 0 0 0 160833 160833 0 0 23160 5 161643 0 0 0 0 0 0 0 0 161643 161642 0 0 21981 10 160518 0 0 0 0 0 0 0 0 160518 160518 0 0 20640 11 160145 0 0 0 0 0 0 0 0 160145 160145 0 0 21536 14 160194 0 0 0 0 0 0 0 0 160194 160194 0 0 21110 10 161550 0 0 0 0 0 0 0 0 161550 161550 0 0 ...What next? :) Simon- From shemminger@osdl.org Mon Jun 16 15:43:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:43:31 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMhL2x016219 for ; Mon, 16 Jun 2003 15:43:21 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5GMgqX31743; Mon, 16 Jun 2003 15:42:52 -0700 Date: Mon, 16 Jun 2003 15:42:51 -0700 From: Stephen Hemminger To: Chad Tindel , Jay Vosburgh , "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.71] Fix module owner for bonding driver Message-Id: <20030616154251.28c3e3ee.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3290 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev The bonding driver does explicit MOD_INC/DEC even though it does a SET_MODULE_OWNER. So it doesn't changes module use counts on open/close on 2.5 when it shouldn't. It also has a /proc entry which does not affect the module use counts when it should. Suggestion: the /proc interface could/should be converted to seq_file. Volunteers? Janitor project? diff -Nru a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c Mon Jun 16 15:24:37 2003 +++ b/drivers/net/bonding/bond_main.c Mon Jun 16 15:24:37 2003 @@ -952,8 +952,6 @@ add_timer(alb_timer); } - MOD_INC_USE_COUNT; - if (miimon > 0) { /* link check interval, in milliseconds. */ init_timer(timer); timer->expires = jiffies + (miimon * HZ / 1000); @@ -1027,7 +1025,6 @@ bond_alb_deinitialize(bond); } - MOD_DEC_USE_COUNT; return 0; } @@ -3694,6 +3691,8 @@ kfree(bond); return -ENOMEM; } + bond->bond_proc_dir->owner = THIS_MODULE; + bond->bond_proc_info_file = create_proc_info_entry("info", 0, bond->bond_proc_dir, bond_get_info); @@ -3705,6 +3704,7 @@ kfree(bond); return -ENOMEM; } + bond->bond_proc_info_file->owner = THIS_MODULE; #endif /* CONFIG_PROC_FS */ if (first_pass == 1) { From niv@us.ibm.com Mon Jun 16 15:46:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:46:19 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMkE2x016532 for ; Mon, 16 Jun 2003 15:46:15 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GMk62R244562; Mon, 16 Jun 2003 18:46:06 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GMk2cH105424; Mon, 16 Jun 2003 16:46:04 -0600 Message-ID: <3EEE4880.3080505@us.ibm.com> Date: Mon, 16 Jun 2003 15:45:20 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: patch for common networking error messages References: <20030616.152745.124055059.davem@redhat.com> In-Reply-To: <20030616.152745.124055059.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3291 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Janice Girouard > Date: Mon, 16 Jun 2003 17:29:15 -0500 > > For the sake of consistency and automatic error log analysis, it might be > > And all the scripts checking for the existing messages > in log files? Screw them, right? Are you saying we never get to change any current log messages ever again on accnt of the scripts that are monitoring for those precise words? Hope not :) I'd agree a lot of thought (and agreement :))has to go into this before changing minor nits and stuff, and not causing too much disruption..Evolution, as opposed to revolution ;). I would hope that most wouldnt need changing.. thanks, Nivedita From davem@redhat.com Mon Jun 16 15:48:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:48:34 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMmS2x016839 for ; Mon, 16 Jun 2003 15:48:28 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01482; Mon, 16 Jun 2003 15:44:01 -0700 Date: Mon, 16 Jun 2003 15:44:01 -0700 (PDT) Message-Id: <20030616.154401.132900800.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030616223714.GB18484@netnation.com> References: <20030610075732.GD23009@netnation.com> <20030612.232002.41633789.davem@redhat.com> <20030616223714.GB18484@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3292 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 16 Jun 2003 15:37:14 -0700 So, which kernels shall I try? When I set the thing up I was using 2.5.70-bk14, but I am compiling 2.5.71, and I will try with your patch above and with Alexey's. Thanks for your profiles. I pushed all of our current work to Linus's tree. But for your convenience here are the routing diffs against plain 2.5.71 # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1318.1.15 -> 1.1318.1.16 # net/ipv4/route.c 1.63 -> 1.64 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/16 kuznet@ms2.inr.ac.ru 1.1318.1.16 # [IPV4]: More sane rtcache behavior. # 1) More reasonable ip_rt_gc_min_interval default # 2) Trim less valuable entries in hash chain during # rt_intern_hash when such chains grow too long. # -------------------------------------------- # diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c --- a/net/ipv4/route.c Mon Jun 16 15:45:20 2003 +++ b/net/ipv4/route.c Mon Jun 16 15:45:20 2003 @@ -111,7 +111,7 @@ int ip_rt_max_size; int ip_rt_gc_timeout = RT_GC_TIMEOUT; int ip_rt_gc_interval = 60 * HZ; -int ip_rt_gc_min_interval = 5 * HZ; +int ip_rt_gc_min_interval = HZ / 2; int ip_rt_redirect_number = 9; int ip_rt_redirect_load = HZ / 50; int ip_rt_redirect_silence = ((HZ / 50) << (9 + 1)); @@ -456,6 +456,25 @@ out: return ret; } +/* Bits of score are: + * 31: very valuable + * 30: not quite useless + * 29..0: usage counter + */ +static inline u32 rt_score(struct rtable *rt) +{ + u32 score = rt->u.dst.__use; + + if (rt_valuable(rt)) + score |= (1<<31); + + if (!rt->fl.iif || + !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL))) + score |= (1<<30); + + return score; +} + /* This runs via a timer and thus is always in BH context. */ static void rt_check_expire(unsigned long dummy) { @@ -721,6 +740,9 @@ { struct rtable *rth, **rthp; unsigned long now = jiffies; + struct rtable *cand = NULL, **candp = NULL; + u32 min_score = ~(u32)0; + int chain_length = 0; int attempts = !in_softirq(); restart: @@ -755,7 +777,33 @@ return 0; } + if (!atomic_read(&rth->u.dst.__refcnt)) { + u32 score = rt_score(rth); + + if (score <= min_score) { + cand = rth; + candp = rthp; + min_score = score; + } + } + + chain_length++; + rthp = &rth->u.rt_next; + } + + if (cand) { + /* ip_rt_gc_elasticity used to be average length of chain + * length, when exceeded gc becomes really aggressive. + * + * The second limit is less certain. At the moment it allows + * only 2 entries per bucket. We will see. + */ + if (chain_length > ip_rt_gc_elasticity || + (chain_length > 1 && !(min_score & (1<<31)))) { + *candp = cand->u.rt_next; + rt_free(cand); + } } /* Try to bind route to arp only if it is output # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1320.1.1 -> 1.1320.1.2 # net/ipv4/route.c 1.64 -> 1.65 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/16 robert.olsson@data.slu.se 1.1320.1.2 # [IPV4]: In rt_intern_hash, reinit all state vars on branch to "restart". # -------------------------------------------- # diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c --- a/net/ipv4/route.c Mon Jun 16 15:46:05 2003 +++ b/net/ipv4/route.c Mon Jun 16 15:46:05 2003 @@ -739,13 +739,19 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; - unsigned long now = jiffies; - struct rtable *cand = NULL, **candp = NULL; - u32 min_score = ~(u32)0; - int chain_length = 0; + unsigned long now; + struct rtable *cand, **candp; + u32 min_score; + int chain_length; int attempts = !in_softirq(); restart: + chain_length = 0; + min_score = ~(u32)0; + cand = NULL; + candp = NULL; + now = jiffies; + rthp = &rt_hash_table[hash].chain; spin_lock_bh(&rt_hash_table[hash].lock); From jgarzik@pobox.com Mon Jun 16 15:48:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:48:36 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMmS2x016838 for ; Mon, 16 Jun 2003 15:48:29 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19S2m2-0007pR-79; Mon, 16 Jun 2003 23:48:26 +0100 Message-ID: <3EEE492E.9080308@pobox.com> Date: Mon, 16 Jun 2003 18:48:14 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: Janice M Girouard CC: "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, Daniel Stekloff , Janice Girouard , Larry Kessler , kenistonj@us.ibm.com Subject: Re: patch for common networking error messages References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> <3EEE2F9F.60706@us.ibm.com> In-Reply-To: <3EEE2F9F.60706@us.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3293 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Janice M Girouard wrote: > I agree that it's not desirable to introduce a bunch of messages that we > aren't already logging. I didn't show the netif_msg prefix because I > was trying to focus the patch on the common messages, but you would > normally proceed a message with: > > if netif_msg_link() > printk("some text to indicate the link is up/down") > > The netif_msg_link test would normally filter out what messages should > be logged. There are several issues at play here. 1) In general, I think you're approaching the logging from the wrong angle. Start with netif_msg_xxx/NETIF_MSG_xxx first, and figure out the logging API for those cases. These cover the majority of common cases, and most are not specific to hardware at all. Starting at the driver level and trying to move driver-specific messages into the upper layers is the wrong direction, I feel. 2) If we are going to do major surgery on messages, make them more computer-parseable at the same time. Human readable, since it must above-all-else be kernel hacker readable, ... but computer parseable. Here is an example. DISCLAIMER: No doubt there is a better format, it is merely for illustration. "%s: performance event: scatter/gather I/O disabled\n" becomes "dev=%s evt=perf sgio=disabled\n" Basically a key-value format. Resist the urge to use numeric response codes. For stuff like this, I think both Linus and the typical human brain prefer English words to numeric response codes. This suggested output is not unlike some arch's show-processor-state sysrq output. 3) _Somebody_ needs to do some "ground pounding", and figure out what info sysadmins and users want to see. Event logging in general, so far, seems to me more like a management checklist item than a real user need... but I am quite willing to be proved wrong. Until we get feedback along these lines, I tend to resist changes like this in general. My initial read of your attached patch was that it was a long of source churn, and I couldn't fathom what any user would gain from it all. Jeff There are a whole bunch of netif_msg_xxx and corresponding NETIF_MSG_xxx bits. I don't see much need to change that I think getting the logging API right for those would be an important first step. Jeff P.S. It is important to note the bits are laid out in increasing verbosity. From girouard@us.ibm.com Mon Jun 16 15:50:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:50:54 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMoo2x017463 for ; Mon, 16 Jun 2003 15:50:51 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GMohpS228942; Mon, 16 Jun 2003 18:50:44 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GMofxt029882; Mon, 16 Jun 2003 18:50:42 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: Daniel Stekloff , janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Mon, 16 Jun 2003 17:50:08 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/16/2003 18:50:41 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3294 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev From: David S. Miller Date:06/16/2003 05:27 PM And all the scripts checking for the existing messages in log files? Screw them, right? That's a good point. One possible suggestion would be to submit more than one stdmsgs.h files. One a legacy file, and one that is more consistent from message to message.. shooting for a gradual migration. Ultimately, I think standard messages would greatly support/simplify scripts, especially between the myriad of ethernet drivers. Each one reports the data slightly differently, so you're error log analysis needs to recognize 100 or so ways of being told that the link just went down. Janice "David S. Miller" cc: Daniel Stekloff/Beaverton/IBM@IBMUS, janiceg@us.ltcfwd.linux.ibm.com, jgarzik@pobox.com, 06/16/2003 05:27 kenistonj@us.ibm.com, Larry Kessler/Beaverton/IBM@IBMUS, PM linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ltcfwd.linux.ibm.com Subject: Re: patch for common networking error messages From: Janice Girouard Date: Mon, 16 Jun 2003 17:29:15 -0500 For the sake of consistency and automatic error log analysis, it might be And all the scripts checking for the existing messages in log files? Screw them, right? This whole idea is starting to leave a very bad taste in my mouth... From niv@us.ibm.com Mon Jun 16 15:51:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:51:35 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMpQ2x017668 for ; Mon, 16 Jun 2003 15:51:27 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5GMpIll233230; Mon, 16 Jun 2003 18:51:18 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5GMpGcH109964; Mon, 16 Jun 2003 16:51:17 -0600 Message-ID: <3EEE49BA.6070401@us.ibm.com> Date: Mon, 16 Jun 2003 15:50:34 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages References: <3EEE28DE.6040808@us.ibm.com> <3EEE40F1.4030107@us.ibm.com> <20030616.151308.55864910.davem@redhat.com> In-Reply-To: <20030616.151308.55864910.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3295 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Nivedita Singhvi > Date: Mon, 16 Jun 2003 15:13:05 -0700 > > I'd certainly like to see messages from the driver when the > card enters/leaves promiscuous mode, > > egrep "promiscuous mode" net/core/dev.c | grep printk Yeah, but dev_mc_upload() doesnt return any status ;). (For those of us who distrust hw (Sorry Scott! :))). But it was just my example, I assure you. I'm not holding up a flag in the wind on this particular nit. I do see positives in the feature as a whole though. It would be a shame to get grounded for minor things.. thanks, Nivedita From jgarzik@pobox.com Mon Jun 16 15:52:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:52:31 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMqN2x017959 for ; Mon, 16 Jun 2003 15:52:24 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19S2pq-0007rf-65; Mon, 16 Jun 2003 23:52:22 +0100 Message-ID: <3EEE4A1A.6010904@pobox.com> Date: Mon, 16 Jun 2003 18:52:10 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org CC: Janice M Girouard , "David S. Miller" , netdev@oss.sgi.com, Daniel Stekloff , Janice Girouard , Larry Kessler , kenistonj@us.ibm.com Subject: Re: patch for common networking error messages References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> <3EEE2F9F.60706@us.ibm.com> <3EEE492E.9080308@pobox.com> In-Reply-To: <3EEE492E.9080308@pobox.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3296 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Jeff Garzik wrote: > 3) _Somebody_ needs to do some "ground pounding", and figure out what > info sysadmins and users want to see. Event logging in general, so far, > seems to me more like a management checklist item than a real user > need... but I am quite willing to be proved wrong. Until we get > feedback along these lines, I tend to resist changes like this in > general. My initial read of your attached patch was that it was a long > of source churn, and I couldn't fathom what any user would gain from it make that "a lot of" > There are a whole bunch of netif_msg_xxx and corresponding NETIF_MSG_xxx > bits. I don't see much need to change that I think getting the logging > API right for those would be an important first step. > > Jeff arg :) I should fire my editor. Jeff From davem@redhat.com Mon Jun 16 15:57:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 15:57:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMvE2x018381 for ; Mon, 16 Jun 2003 15:57:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01541; Mon, 16 Jun 2003 15:52:52 -0700 Date: Mon, 16 Jun 2003 15:52:51 -0700 (PDT) Message-Id: <20030616.155251.25131382.davem@redhat.com> To: niv@us.ibm.com Cc: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEE4880.3080505@us.ibm.com> References: <20030616.152745.124055059.davem@redhat.com> <3EEE4880.3080505@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3297 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Mon, 16 Jun 2003 15:45:20 -0700 [ I removed this kenistonj@us.ibm.com from the CC:, it bounces... ] I'd agree a lot of thought (and agreement :))has to go into this before changing minor nits and stuff, and not causing too much disruption..Evolution, as opposed to revolution ;). I would hope that most wouldnt need changing.. There would be absolutely ZERO disruption if you guys would use you brains and implement what you're actually trying to achieve, a system event logging mechanism. We have a message queueing mechanism using sockets, called netlink, and you can make whatever actions in the kernel you think should be monitored go and stuff messages into this system event netlink socket. Then, you don't have to standardize a bunch of absolutely silly strings (I mean, the concept is so incredibly stupid), you get events that are in a precisely defined format going over this netlink socket. Then whoever in userspace reads out the messages can interpret them however the fuck it wants to. It is then trivial to parse the messages and filter them. Furthermore, you could even transmit such messages over a network connection to a remote logging server as-is. And hey, look, for network links going up and down we have the hooks already. Funny that... From davem@redhat.com Mon Jun 16 15:59:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:00:03 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GMxw2x018700 for ; Mon, 16 Jun 2003 15:59:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01554; Mon, 16 Jun 2003 15:55:34 -0700 Date: Mon, 16 Jun 2003 15:55:33 -0700 (PDT) Message-Id: <20030616.155533.63022973.davem@redhat.com> To: girouard@us.ibm.com Cc: stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3298 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Mon, 16 Jun 2003 17:50:08 -0500 One possible suggestion would be to submit more than one stdmsgs.h files. One a legacy file, and one that is more consistent from message to message.. shooting for a gradual migration. Let me know when you're back on planet earth ok? Standardizing strings is an absolutely FRUITLESS exercise. If you want events, standardize events and push them over a queueing based communications channel to userspace, namely using netlink sockets. From davem@redhat.com Mon Jun 16 16:01:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:01:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GN1e2x019069 for ; Mon, 16 Jun 2003 16:01:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA01573; Mon, 16 Jun 2003 15:57:18 -0700 Date: Mon, 16 Jun 2003 15:57:17 -0700 (PDT) Message-Id: <20030616.155717.58468888.davem@redhat.com> To: niv@us.ibm.com Cc: janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEE49BA.6070401@us.ibm.com> References: <3EEE40F1.4030107@us.ibm.com> <20030616.151308.55864910.davem@redhat.com> <3EEE49BA.6070401@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3299 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Mon, 16 Jun 2003 15:50:34 -0700 I do see positives in the feature as a whole though. Would you design a network protocol this way? By passing strings like "open connection please", "sure no problem" back and forth between server and client? Of course not. So why are we even remotely considering the standardization of _STRINGS_ for event reporting? From sim@netnation.com Mon Jun 16 16:09:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:09:32 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GN9O2x019476 for ; Mon, 16 Jun 2003 16:09:24 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19S36I-0002E5-SD; Mon, 16 Jun 2003 16:09:22 -0700 Date: Mon, 16 Jun 2003 16:09:22 -0700 From: Simon Kirby To: "David S. Miller" Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030616230922.GC18484@netnation.com> References: <20030610075732.GD23009@netnation.com> <20030612.232002.41633789.davem@redhat.com> <20030616223714.GB18484@netnation.com> <20030616.154401.132900800.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030616.154401.132900800.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3300 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote: > I pushed all of our current work to Linus's tree. > But for your convenience here are the routing diffs > against plain 2.5.71 Trying to apply against 2.5.71: patching file net/ipv4/route.c Hunk #2 succeeded at 454 (offset -2 lines). Hunk #3 succeeded at 738 (offset -2 lines). Hunk #4 succeeded at 775 (offset -2 lines). patching file net/ipv4/route.c Hunk #1 FAILED at 739. 1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej Trying to apply against 2.5.71-bk2: patching file net/ipv4/route.c patching file net/ipv4/route.c Hunk #1 FAILED at 739. 1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej Missing something between? Code from bk2: static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; unsigned long now = jiffies; int attempts = !in_softirq(); Patch: static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; - unsigned long now = jiffies; - struct rtable *cand = NULL, **candp = NULL; ... Simon- From davem@redhat.com Mon Jun 16 16:13:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:13:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GNDU2x019807 for ; Mon, 16 Jun 2003 16:13:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA01628; Mon, 16 Jun 2003 16:08:57 -0700 Date: Mon, 16 Jun 2003 16:08:56 -0700 (PDT) Message-Id: <20030616.160856.35828947.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030616230922.GC18484@netnation.com> References: <20030616223714.GB18484@netnation.com> <20030616.154401.132900800.davem@redhat.com> <20030616230922.GC18484@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3301 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 16 Jun 2003 16:09:22 -0700 On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote: > I pushed all of our current work to Linus's tree. > But for your convenience here are the routing diffs > against plain 2.5.71 Trying to apply against 2.5.71: patching file net/ipv4/route.c Hunk #2 succeeded at 454 (offset -2 lines). Hunk #3 succeeded at 738 (offset -2 lines). Hunk #4 succeeded at 775 (offset -2 lines). patching file net/ipv4/route.c Hunk #1 FAILED at 739. 1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej Trying to apply against 2.5.71-bk2: patching file net/ipv4/route.c patching file net/ipv4/route.c Hunk #1 FAILED at 739. 1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej Missing something between? Code from bk2: static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; unsigned long now = jiffies; int attempts = !in_softirq(); Patch: It depends upon the first patch that I enclosed. What I gave you was a 2-part patch, the first one did: @@ -721,6 +740,9 @@ { struct rtable *rth, **rthp; unsigned long now = jiffies; + struct rtable *cand = NULL, **candp = NULL; + u32 min_score = ~(u32)0; + int chain_length = 0; int attempts = !in_softirq(); restart: The second one did: @@ -739,13 +739,19 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; - unsigned long now = jiffies; - struct rtable *cand = NULL, **candp = NULL; - u32 min_score = ~(u32)0; - int chain_length = 0; + unsigned long now; + struct rtable *cand, **candp; + u32 min_score; + int chain_length; int attempts = !in_softirq(); ... I have no idea why it doesn't apply. Nothing else has happened in these bits of code for a while. From davem@redhat.com Mon Jun 16 16:15:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:15:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GNFR2x020114 for ; Mon, 16 Jun 2003 16:15:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA01649; Mon, 16 Jun 2003 16:10:51 -0700 Date: Mon, 16 Jun 2003 16:10:50 -0700 (PDT) Message-Id: <20030616.161050.85423902.davem@redhat.com> To: shemminger@osdl.org Cc: ctindel@users.sourceforge.net, fubar@us.ibm.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.71] Fix module owner for bonding driver From: "David S. Miller" In-Reply-To: <20030616154251.28c3e3ee.shemminger@osdl.org> References: <20030616154251.28c3e3ee.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3302 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 16 Jun 2003 15:42:51 -0700 The bonding driver does explicit MOD_INC/DEC even though it does a SET_MODULE_OWNER. So it doesn't changes module use counts on open/close on 2.5 when it shouldn't. It also has a /proc entry which does not affect the module use counts when it should. Applied, thanks Stephen. Suggestion: the /proc interface could/should be converted to seq_file. Volunteers? Janitor project? There is a bonding maintainer, maybe that'd be the best bet :-) From sim@netnation.com Mon Jun 16 16:27:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:27:54 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GNRo2x020625 for ; Mon, 16 Jun 2003 16:27:51 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19S3OA-0002QA-8S; Mon, 16 Jun 2003 16:27:50 -0700 Date: Mon, 16 Jun 2003 16:27:50 -0700 From: Simon Kirby To: "David S. Miller" Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030616232750.GD18484@netnation.com> References: <20030616223714.GB18484@netnation.com> <20030616.154401.132900800.davem@redhat.com> <20030616230922.GC18484@netnation.com> <20030616.160856.35828947.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030616.160856.35828947.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3303 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote: > It depends upon the first patch that I enclosed. Never mind. :) Such patches don't work very well with patch --dry. Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From sim@netnation.com Mon Jun 16 16:49:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 16:49:50 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5GNnc2x020998 for ; Mon, 16 Jun 2003 16:49:38 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19S3jF-0002op-Aw; Mon, 16 Jun 2003 16:49:37 -0700 Date: Mon, 16 Jun 2003 16:49:37 -0700 From: Simon Kirby To: "David S. Miller" Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030616234937.GE18484@netnation.com> References: <20030616223714.GB18484@netnation.com> <20030616.154401.132900800.davem@redhat.com> <20030616230922.GC18484@netnation.com> <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030616232750.GD18484@netnation.com> User-Agent: Mutt/1.5.4i X-archive-position: 3304 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 16, 2003 at 04:27:50PM -0700, Simon Kirby wrote: > On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote: > > > It depends upon the first patch that I enclosed. > > Never mind. :) Such patches don't work very well with patch --dry. Okay, here goes 2.5.71 + this patch: 60.0049 seconds passed, avg forwarding rate: 160190.859 pps 60.0085 seconds passed, avg forwarding rate: 157118.708 pps 60.0046 seconds passed, avg forwarding rate: 157211.097 pps 60.0073 seconds passed, avg forwarding rate: 157557.710 pps ...Looks like a tad worse than with your patch, but not by much. Forwarding rate is still pretty crappy for an Opteron. Will fiddle a bit more tonight to see what I can do. Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c02c0ea0 5113 9.07075 fn_hash_lookup c0293970 3264 5.79052 ip_route_input_slow c028ef90 2734 4.85027 nf_iterate c028f280 2525 4.47949 nf_hook_slow c02924b0 2127 3.77342 rt_intern_hash c0222330 2125 3.76987 tg3_start_xmit c02becc0 1755 3.11347 fib_validate_source c0290020 1684 2.98751 pfifo_fast_dequeue c0296220 1531 2.71608 ip_rcv_finish c0135230 1449 2.57061 kmem_cache_free c0134ff0 1431 2.53867 free_block c0221710 1369 2.42868 tg3_rx c0295cb0 1350 2.39498 ip_rcv c0135170 1304 2.31337 kmem_cache_alloc c02941a0 1258 2.23176 ip_route_input c028f920 1255 2.22644 eth_header c0134e20 1148 2.03662 cache_alloc_refill c0291b70 1104 1.95856 rt_hash_code c02886a0 1082 1.91953 netif_receive_skb c01351b0 983 1.7439 __kmalloc c028b610 923 1.63745 neigh_lookup c02c0050 914 1.62149 fib_semantic_match c029a660 857 1.52037 ip_finish_output2 c028c600 829 1.47069 neigh_resolve_output c01adc80 766 1.35893 memcpy c0135270 743 1.31812 kfree c0297000 741 1.31458 ip_forward c0284620 686 1.217 alloc_skb c02b9730 666 1.18152 inet_select_addr c028fa90 663 1.1762 eth_type_trans c0128a00 649 1.15136 call_rcu c0297240 623 1.10524 ip_forward_finish c028af00 620 1.09991 dst_alloc c0288160 597 1.05911 dev_queue_xmit c028ffa0 570 1.01121 pfifo_fast_enqueue c028b030 486 0.862191 dst_destroy c0292260 485 0.860417 rt_garbage_collect c028fcb0 472 0.837355 qdisc_restart c0221350 467 0.828484 tg3_tx c028c480 463 0.821388 neigh_hh_init c02215c0 455 0.807196 tg3_recycle_rx c02d0f70 447 0.793003 ipv4_sabotage_out c0298580 443 0.785907 ip_finish_output c011f080 430 0.762844 local_bh_enable c010fc40 358 0.635112 do_gettimeofday c0284860 345 0.612049 __kfree_skb size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 22910 10 158190 0 0 0 0 0 0 0 0 158190 158188 1 0 20590 10 158330 0 0 0 0 0 0 0 0 158330 158328 1 0 20515 14 158306 0 0 0 0 0 0 0 0 158306 158304 1 0 21000 4 158964 0 0 0 0 0 0 0 0 158964 158962 1 0 21631 8 159300 0 0 0 0 0 0 0 0 159300 159298 0 0 20329 13 160059 0 0 0 0 0 0 0 0 160059 160057 1 0 22995 7 157441 0 0 0 0 0 0 0 0 157441 157439 1 0 22418 9 156831 0 0 0 0 0 0 0 0 156831 156829 1 0 22417 11 157321 0 0 0 0 0 0 0 0 157321 157319 1 0 21339 6 157898 0 0 0 0 0 0 0 0 157898 157896 0 0 22562 10 157734 0 0 0 0 0 0 0 0 157734 157732 1 0 20488 12 159496 0 0 0 0 0 0 0 0 159496 159493 1 0 22527 10 157674 0 0 0 0 0 0 0 0 157674 157672 1 0 21992 7 156729 0 0 0 0 0 0 0 0 156729 156727 0 0 21372 10 157106 0 0 0 0 0 0 0 0 157106 157104 1 0 22950 10 156402 0 0 0 0 0 0 0 0 156402 156400 2 0 20471 11 157057 0 0 0 0 0 0 0 0 157057 157055 1 0 20864 13 159082 0 0 0 0 0 0 0 0 159082 159080 0 0 22416 10 157658 0 0 0 0 0 0 0 0 157658 157656 1 0 22659 8 157348 0 0 0 0 0 0 0 0 157348 157346 1 0 Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From niv@us.ibm.com Mon Jun 16 17:08:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 17:08:10 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H07w2x021519 for ; Mon, 16 Jun 2003 17:08:00 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5H07nJk204924; Mon, 16 Jun 2003 20:07:49 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5H07lFD180254; Mon, 16 Jun 2003 18:07:48 -0600 Message-ID: <3EEE5BA8.8000601@us.ibm.com> Date: Mon, 16 Jun 2003 17:07:04 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: patch for common networking error messages References: <20030616.152745.124055059.davem@redhat.com> <3EEE4880.3080505@us.ibm.com> <20030616.155251.25131382.davem@redhat.com> In-Reply-To: <20030616.155251.25131382.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3305 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David S. Miller wrote: > There would be absolutely ZERO disruption if you guys would use you > brains and implement what you're actually trying to achieve, a system > event logging mechanism. > We have a message queueing mechanism using sockets, called netlink, > and you can make whatever actions in the kernel you think should be > monitored go and stuff messages into this system event netlink socket. I should clarify here that I was speaking strictly for my lonesome sorry self :), and have no knowledge of what the state of the various RAS projects currently are, and the approaches they are trying.. For all I know, they may be currently trying precisely that.. Janice's patch is the first I've seen in this area (Luckily, most of the time they keep me in a cave :) :)), and I do appreciate *something* being done in this area, it seemed a good start and really, I dont care how its implemented, I'll leave that to the folks who have spent longer than the 8 mins I currently have on it.. > Then, you don't have to standardize a bunch of absolutely silly > strings (I mean, the concept is so incredibly stupid), you get events > that are in a precisely defined format going over this netlink socket. Well, right now, thats all we have, right? Silly strings? But thats not really my position, which is more like: Whatever! Whatever! Somebody! Make it so! :) :). > Then whoever in userspace reads out the messages can interpret them > however the fuck it wants to. It is then trivial to parse the > messages and filter them. Furthermore, you could even transmit such > messages over a network connection to a remote logging server as-is. > > And hey, look, for network links going up and down we have the hooks > already. Funny that... OK, that is a good idea.. :) thanks, Nivedita From girouard@us.ibm.com Mon Jun 16 17:44:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 17:44:52 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H0iZ2x022758 for ; Mon, 16 Jun 2003 17:44:36 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e3.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5H0iSE2090688; Mon, 16 Jun 2003 20:44:28 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5H0iRHO026450; Mon, 16 Jun 2003 20:44:27 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: Daniel Stekloff , janiceg@us.ibm.com, jgarzik@pobox.com, Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Mon, 16 Jun 2003 19:44:22 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/16/2003 20:44:27 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3306 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev From: David S. Miller" Date: Mon, 16 Jun 2003 17:50:08 -0500 If you want events, standardize events and push them over a queueing based communications channel to userspace, namely using netlink sockets. It sounds like you are proposing a new family for the netlink subsystem. I've included a list of the families I see in 2.5.70 the end of this note. I'm trying to understand which family handles events such as link state changes or device initialization failure? It sounds like you're proposing somethink equivalent to NETLINK_TCP_DIAG. Is that right? There seems to be some overlap between netlink and netdev notifier events. Netdev notifier reports many key events. One event I don't see reported is any indication that device initialization failed. #define NETDEV_UP 0x0001 #define NETDEV_DOWN 0x0002 #define NETDEV_REBOOT 0x0003 #define NETDEV_CHANGE 0x0004 #define NETDEV_REGISTER 0x0005 #define NETDEV_UNREGISTER 0x0006 #define NETDEV_CHANGEMTU 0x0007 #define NETDEV_CHANGEADDR 0x0008 #define NETDEV_GOING_DOWN 0x0009 #define NETDEV_CHANGENAME 0x000A One issue with netdev, is that it doesn't seem to allow for the flexibility of the information passed that netlink has. For example, when you issue a NETDEV_CHANGE, it's not clear from the event what the specific change was. ============================== NETLINK 2.5.70 netlink.h #define NETLINK_ROUTE 0 /* Routing/device hook */ #define NETLINK_SKIP 1 /* Reserved for ENskip */ #define NETLINK_USERSOCK 2 /* Reserved for user mode socket protocols */ #define NETLINK_FIREWALL 3 /* Firewalling hook */ #define NETLINK_TCPDIAG 4 /* TCP socket monitoring */ #define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */ #define NETLINK_XFRM 6 /* ipsec */ #define NETLINK_ARPD 8 #define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */ #define NETLINK_IP6_FW 13 #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ #define NETLINK_TAPBASE 16 /* 16 to 31 are ethertap */ Descriptions for many of these events can be found at http://www.europe.redhat.com/documentation/man-pages/man7/netlink.7.php3 NETLINK_ROUTE Receives routing updates and may be used to modify the IPv4 routing table (see NETLINK_FIREWALL Receives packets sent by the IPv4 firewall code. NETLINK_ARPD For managing the arp table in user space. NETLINK_ROUTE6 Receives and sends IPv6 routing table updates. NETLINK_IP6_FW to receive packets that failed the IPv6 firewall checks (currently not implemented). NETLINK_TAPBASE...NETLINK_TAPBASE+15 are the instances of the ethertap device. Ethertap is a pseudo network tunnel device that allows an ethernet driver to be simulated from user space. NETLINK_SKIP Reserved for ENskip. NETLINK_USERSOCK is reserved for future user space protocols. From becker@scyld.com Mon Jun 16 18:03:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 18:03:16 -0700 (PDT) Received: from NewBlue.Scyld.com ([64.237.107.19]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H1342x023215 for ; Mon, 16 Jun 2003 18:03:05 -0700 Received: from beohost.scyld.com (h-66-134-106-243.MCLNVA23.covad.net [66.134.106.243]) by NewBlue.Scyld.com (8.10.2/8.10.2) with ESMTP id h5GMrXU00366; Mon, 16 Jun 2003 18:53:33 -0400 Received: from localhost (becker@localhost) by beohost.scyld.com (8.11.6/8.11.6) with ESMTP id h5GN2hj09416; Mon, 16 Jun 2003 19:02:43 -0400 Date: Mon, 16 Jun 2003 19:02:43 -0400 (EDT) From: Donald Becker To: Nivedita Singhvi cc: Janice M Girouard , Linux Kernel Mailing List , , Jeff Garzik , Subject: Re: patch for common networking error messages In-Reply-To: <3EEE40F1.4030107@us.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3307 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: becker@scyld.com Precedence: bulk X-list: netdev On Mon, 16 Jun 2003, Nivedita Singhvi wrote: > Janice M Girouard wrote: > > > Below is a patch that demonstrates standard messages for ethernet device > > drivers. I would like your feedback on the concept of standard network > > messages, and any suggestions for messages to include. There are many things that can go wrong, and most of them are very dependent on the NIC's architecture. A few minutes of thought would lead to the understanding that a enforced-common list of error/problem messages is equivalent to a set of error counters. We have had that since the initial Linux network hardware layer design. (Note: BSD of that era had only a single "error" counter. ) > Essentially, things like some guidelines on classifying some > of those messages, when creating new messages. eg when is > something a state change and when is it a performance event? The NETIF_MSG_* enable bits classify the status message types. > I'd certainly like to see messages from the driver when the > card enters/leaves promiscuous mode, as an example of things > we'd like to add... Most Ethernet drivers should already do this, right around the message /* Unconditionally log net taps. */ > > 2) Reduce the number of puzzling messages that are logged -- in this > > case, by replacing them with standard messages, and/or > > 3) Identify the device (or driver name) that is responsible for the error. The (physical) interface name, dev->name, should prefix every error message. This is a recommendation, not enforced, so not every driver does it. [[[ Comment: Status/error messages are not an area to demonstrate creativity, but all too frequently it becomes one. ]]] -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 From davem@redhat.com Mon Jun 16 18:24:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 18:24:20 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H1OA2x024456 for ; Mon, 16 Jun 2003 18:24:10 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA01901; Mon, 16 Jun 2003 18:19:47 -0700 Date: Mon, 16 Jun 2003 18:19:46 -0700 (PDT) Message-Id: <20030616.181946.22044667.davem@redhat.com> To: girouard@us.ibm.com Cc: stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3308 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Mon, 16 Jun 2003 19:44:22 -0500 It sounds like you are proposing a new family for the netlink subsystem. Exactly, you have to create this. From girouard@us.ibm.com Mon Jun 16 19:13:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 19:13:08 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H2D32x026512 for ; Mon, 16 Jun 2003 19:13:04 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5H2CupS180740; Mon, 16 Jun 2003 22:12:56 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5H2Ctxt018972; Mon, 16 Jun 2003 22:12:55 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: Daniel Stekloff , janiceg@us.ibm.com, jgarzik@pobox.com, Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Mon, 16 Jun 2003 21:12:50 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/16/2003 22:12:56 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3309 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Mon, 16 Jun 2003 19:44:22 -0500 It sounds like you are proposing a new family for the netlink subsystem. From: "David S. Miller" Date:06/16/2003 08:19 PM Exactly, you have to create this. Okay. That solves the issue of events generated in a plethora of formats for the same event. Any suggestions on what should be included in this new family? I can present a patch to suggest a starting point. However, it would be great to hear from everyone that has any initial thoughts. One question that comes to mind, since there is some overlap with netdev notifier events, should we include those events in the new family? I can envision a couple of approaches: 1) keep the two interfaces (netdev notifier and netlink), with separate end users in mind and duplicate the events to each interface. Possibly thinking about migrating to just one interface over time. Applications would then just receive one set of events. 2) keep the two interfaces, with no duplication of messages, clarifying the uses for the two interfaces. An application would then register, and obtain events from the two separate mechanisms. p.s. thanks for all the input so far. From Valdis.Kletnieks@vt.edu Mon Jun 16 21:34:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 16 Jun 2003 21:34:53 -0700 (PDT) Received: from turing-police.cc.vt.edu (h80ad2689.async.vt.edu [128.173.38.137]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H4Yc2x028469 for ; Mon, 16 Jun 2003 21:34:39 -0700 Received: from turing-police.cc.vt.edu (localhost [127.0.0.1]) by turing-police.cc.vt.edu (8.12.10.Beta0/8.12.10.Beta0) with ESMTP id h5H4YZPZ003025; Tue, 17 Jun 2003 00:34:35 -0400 Message-Id: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4+dev To: Janice Girouard Cc: "David S. Miller" , Daniel Stekloff , janiceg@us.ibm.com, jgarzik@pobox.com, Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages In-Reply-To: Your message of "Mon, 16 Jun 2003 21:12:50 CDT." From: Valdis.Kletnieks@vt.edu References: Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_218344882P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Tue, 17 Jun 2003 00:34:34 -0400 X-archive-position: 3310 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Valdis.Kletnieks@vt.edu Precedence: bulk X-list: netdev --==_Exmh_218344882P Content-Type: text/plain; charset=us-ascii On Mon, 16 Jun 2003 21:12:50 CDT, Janice Girouard said: > Okay. That solves the issue of events generated in a plethora of formats > for the same event. Any suggestions on what should be included in this new > family? I can present a patch to suggest a starting point. However, it > would be great to hear from everyone that has any initial thoughts. Well, at the risk of torquing off any SCO supporters, I'd suggest a quick peek over at the general design of the AIX errpt/trace systems - in both cases, data comes out of the kernel in a formatted binary stream, and then a 'template' file is used to drive the parsing of the formatted data. Quite slick overall, and nicely extensible - you add a new kernel subsystem that has more trace points, you just tack its templates onto the end of the format file and you're good to go.... (And SCO can't even claim that's their code - it's pretty obvious the parentage of the AIX errpt/trace logging is the OS/VS1 and MVS SMF logging features :) --==_Exmh_218344882P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQE+7ppacC3lWbTT17ARAnRPAJ4tmv1gOrC3xPdA/9RRB6aIc+ox4wCg3Y3M ALPo18K54qzckAkE1Xa7B4g= =00MR -----END PGP SIGNATURE----- --==_Exmh_218344882P-- From ak@suse.de Tue Jun 17 00:10:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 00:10:39 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H7AK2x001495 for ; Tue, 17 Jun 2003 00:10:22 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 7E4E014E57; Tue, 17 Jun 2003 09:09:58 +0200 (MEST) Date: Tue, 17 Jun 2003 09:09:57 +0200 From: Andi Kleen To: Andrew Morton Cc: "David S. Miller" , ak@suse.de, janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages Message-ID: <20030617070957.GB2752@wotan.suse.de> References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> <20030616205342.GH30400@wotan.suse.de> <20030616.135124.71580008.davem@redhat.com> <20030616152707.58da808c.akpm@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030616152707.58da808c.akpm@digeo.com> X-archive-position: 3311 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > Actually it already does, to cover the case where an interrupt handler calls > printk while process-context code is performing a printk. I don't think it'll work. Both printk and release_console_sem take the logbuf_lock, which will deadlock if the same CPU already holds it. It would need to use the usual linux recursive lock hack. -Andi From aamir.shafi@niit.edu.pk Tue Jun 17 01:08:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 01:08:52 -0700 (PDT) Received: from intranet.niit.edu.pk ([202.125.153.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H88a2x005078 for ; Tue, 17 Jun 2003 01:08:38 -0700 Received: from intranet.niit.edu.pk (localhost [127.0.0.1]) by intranet.niit.edu.pk (8.12.5/8.12.5) with ESMTP id h5H75g4E009423 for ; Tue, 17 Jun 2003 13:05:42 +0600 Received: (from apache@localhost) by intranet.niit.edu.pk (8.12.5/8.12.5/Submit) id h5H75f5m009421; Tue, 17 Jun 2003 13:05:41 +0600 X-Authentication-Warning: intranet.niit.edu.pk: apache set sender to aamir.shafi@niit.edu.pk using -f Received: from 202.125.153.149 (SquirrelMail authenticated user aamir) by intranet with HTTP; Tue, 17 Jun 2003 13:05:41 +0600 (PKST) Message-ID: <1645.202.125.153.149.1055833541.squirrel@intranet> Date: Tue, 17 Jun 2003 13:05:41 +0600 (PKST) Subject: Compiling linux kernel with TCP/IP debug option ON From: "Aamir Shafi" To: netdev@oss.sgi.com Reply-To: aamir.shafi@niit.edu.pk User-Agent: SquirrelMail/1.4.0-1 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 X-Priority: 3 Importance: Normal X-archive-position: 3312 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aamir.shafi@niit.edu.pk Precedence: bulk X-list: netdev Hi All ! I am just a new user of this mailing list. I would like to know about some useful resources on getting a quick start to kernel compilation. Plus i wanne have a linux kernel which has the debugging option ON on the TCP/IP Stack. Any help would be highly appreciated. Secondly, why would any one want to do that ? as i am just being told to do with out telling the use of it ? So plz let me know, why would anyone wanne do it ? Thanks anyway, Best Regards -- Aamir Shafi Computer Software Engr. National University of Sciences and Technology Pakistan From andre@tomt.net Tue Jun 17 02:21:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 02:21:39 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5H9LO2x009227 for ; Tue, 17 Jun 2003 02:21:25 -0700 Received: (qmail 26396 invoked by uid 1006); 17 Jun 2003 09:24:01 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.16 (sophie: 2.14/3.69. spamassassin: 2.55. Clear:. Processed in 0.05076 secs); 17 Jun 2003 09:24:01 -0000 Received: from slask.tomt.net (HELO slurv.ws.pasop.tomt.net) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 17 Jun 2003 09:24:01 -0000 Subject: Re: 2.4.21 oops second time From: Andre Tomt To: Robert Grzelak Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <00d801c3343a$9452aea0$083a0ed4@deltav> References: <00d801c3343a$9452aea0$083a0ed4@deltav> Content-Type: multipart/mixed; boundary="=-n2p0AZfGfpGJ532fVGiv" Organization: Message-Id: <1055841673.7481.13.camel@slurv.ws.pasop.tomt.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4- Date: 17 Jun 2003 11:21:13 +0200 X-archive-position: 3313 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev --=-n2p0AZfGfpGJ532fVGiv Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit (Added CC to netdev) On man, 2003-06-16 at 21:07, Robert Grzelak wrote: > Welcome! > My colege Andrzej Sosnowski in post "2.4.21 oops" > has write about error in kernel 2.4.21 in time of using this script > "#!/bin/sh > for IP in `/usr/bin/seq 3 500`; do > ip addr add 3ffe:80ee:c1d::$IP/48 dev eth0 > ip addr add 3ffe:80ee:c1d::a:$IP/48 dev eth0 > done" I adapted this script a bit, and got it reproduced. It _seems_ (unverified) to not trigger as fast over ssh, as on local console, for some reason. You can wonder about the legitimacy of adding almost 1000 addresses to one interface, however, it should not crash, rather return something more useful when it happens and the kernel can not handle it. Captured the OOPS over serial console, and decoded it, attached. -- André Tomt andre@tomt.net --=-n2p0AZfGfpGJ532fVGiv Content-Disposition: attachment; filename=oops.decode Content-Type: text/plain; name=oops.decode; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit ksymoops 2.4.8 on i686 2.4.21-s2. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21-s2/ (default) -m /boot/System.map-2.4.21-s2 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. kernel BUG at sched.c:564! invalid operand: 0000 CPU: 0 EIP: 0010:[] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 00000018 ebx: c15e5760 ecx: dd6bc000 edx: df57bf7c esi: dd6bc000 edi: dd6bdb98 ebp: dd6bdb44 esp: dd6bdb1c ds: 0018 es: 0018 ss: 0018 Process ip (pid: 6645, stackpage=dd6bd000) Stack: c0261e8a 00000202 dd6bc000 00000000 dd6bc000 00000000 0000d424 c15e5760 7fffffff dd6bdb98 dd6bdb80 c0113535 0000d400 c15e1d60 c15e1c00 c15a0fe0 04000001 dd6bdbac 0000000a c0108e05 0000000a c15e1c00 dd6bdbac c15e5760 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 0f 0b 34 02 82 1e 26 c0 e9 0b fd ff ff 0f 0b 2d 02 82 1e 26 >>EIP; c011387f <===== >>ebx; c15e5760 <_end+12cd418/20562d18> >>ecx; dd6bc000 <_end+1d3a3cb8/20562d18> >>edx; df57bf7c <_end+1f263c34/20562d18> >>esi; dd6bc000 <_end+1d3a3cb8/20562d18> >>edi; dd6bdb98 <_end+1d3a5850/20562d18> >>ebp; dd6bdb44 <_end+1d3a57fc/20562d18> >>esp; dd6bdb1c <_end+1d3a57d4/20562d18> Trace; c0113535 Trace; c0108e05 Trace; c01eeff2 Trace; c01ef0a0 Trace; c01fa933 Trace; c01ef1ff Trace; c0244163 Trace; c02444e6 Trace; c0243938 Trace; c02446f8 Trace; c02323ec Trace; c010610b <__down_failed_trylock+7/c> Trace; c023347e Trace; c010610b <__down_failed_trylock+7/c> Trace; c02339d7 Trace; c01fa211 Trace; c01fa040 Trace; c01f9e4b Trace; c01fc33a Trace; c01fbc47 Trace; c01fc0aa Trace; c01ec725 Trace; c01edc7b Trace; c0124f89 Trace; c0125147 Trace; c0112848 Trace; c0125936 <__vma_link+56/c0> Trace; c01269f5 Trace; c01ee152 Trace; c012582f Trace; c010738f Code; c011387f 00000000 <_EIP>: Code; c011387f <===== 0: 0f 0b ud2a <===== Code; c0113881 2: 34 02 xor $0x2,%al Code; c0113883 4: 82 (bad) Code; c0113884 5: 1e push %ds Code; c0113885 6: 26 es Code; c0113886 7: c0 e9 0b shr $0xb,%cl Code; c0113889 a: fd std Code; c011388a b: ff (bad) Code; c011388b c: ff 0f decl (%edi) Code; c011388d e: 0b 2d 02 82 1e 26 or 0x261e8202,%ebp <0>Kernel panic: Aiee, killing interrupt handler! 1 warning issued. Results may not be reliable. --=-n2p0AZfGfpGJ532fVGiv-- From niki.waibel@newlogic.com Tue Jun 17 03:25:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 03:26:03 -0700 (PDT) Received: from mail.newlogic.at (mail.newlogic.at [194.208.88.201]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HAPs2x012222 for ; Tue, 17 Jun 2003 03:25:55 -0700 Received: from enterprise2.newlogic.at (e2-1.newlogic.at [172.27.1.1]) by mail.newlogic.at (8.12.9/8.12.9) with ESMTP id h5HAPcYu026024; Tue, 17 Jun 2003 12:25:38 +0200 (MEST) Received: from blade100-2 (blade100-2.newlogic.at [172.27.120.2]) by enterprise2.newlogic.at (8.12.9/8.12.9) with ESMTP id h5HAPYxr016179; Tue, 17 Jun 2003 12:25:34 +0200 (MEST) Message-Id: <200306171025.h5HAPYxr016179@enterprise2.newlogic.at> X-Mailer: XFMail 1.5.4 on Solaris X-Priority: 3 (Normal) Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Tue, 17 Jun 2003 12:25:34 +0200 (MEST) Reply-To: Niki Waibel Organization: NewLogic Technologies AG From: Niki Waibel To: axp-list@redhat.com Subject: linux-2.5.71 on pc164/alpha Cc: kraxel@bytesex.org, vojtech@suse.cz, jbglaw@lug-owl.de, netdev@oss.sgi.com X-archive-position: 3314 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niki.waibel@newlogic.com Precedence: bulk X-list: netdev * matrox frambuffer device works now. * sb16 alsa audio compiles but i have not yet verified if it is working. * ERROR (warning) in arch/alpha/kernel/srmcons.c: === CC arch/alpha/kernel/srmcons.o arch/alpha/kernel/srmcons.c:269: warning: `srmcons_ops' defined but not used make[1]: *** [arch/alpha/kernel/srmcons.o] Error 1 make: *** [arch/alpha/kernel] Error 2 === after removing srmcons_ops and all references of the functions inside that structure compiling was okay... * still troubles with bttv: === LD vmlinux drivers/built-in.o(.text+0x81294): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x812a0): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x812c0): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x812cc): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x812e4): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x812f8): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x8130c): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x8131c): more undefined references to `writel' follow drivers/built-in.o(.text+0x81354): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x81368): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x81388): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x81394): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x813ac): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x813b8): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x813cc): In function `bttv_apply_geo': : undefined reference to `writel' drivers/built-in.o(.text+0x813d8): more undefined references to `writel' follow drivers/built-in.o(.text+0x81494): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x814a0): In function `bttv_apply_geo': : undefined reference to `readl' drivers/built-in.o(.text+0x81608): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x81618): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x81634): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x81640): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x81688): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x81690): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x816a0): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x816a8): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x816bc): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x816cc): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x816fc): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x81704): In function `bttv_set_dma': : undefined reference to `readl' drivers/built-in.o(.text+0x8171c): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x8172c): In function `bttv_set_dma': : undefined reference to `writel' drivers/built-in.o(.text+0x81b74): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81b7c): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81b9c): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81ba8): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81bc8): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81bcc): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81bec): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81bf8): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81c9c): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81ca4): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81cc4): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81cd0): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81d7c): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81d8c): In function `bttv_buffer_activate': : undefined reference to `readl' drivers/built-in.o(.text+0x81dac): In function `bttv_buffer_activate': : undefined reference to `writel' drivers/built-in.o(.text+0x81db8): In function `bttv_buffer_activate': : undefined reference to `writel' === i disabled the bttv driver... * another strange thing in net/core/flow.c: === LD vmlinux net/built-in.o(.init.text+0x4a4): In function `flow_cache_init': : undefined reference to `register_cpu_notifier' net/built-in.o(.init.text+0x4ac): In function `flow_cache_init': : undefined reference to `register_cpu_notifier' === i simply removed the line which called register_cpu_notifier... then the kernel built. * there is still the caps lock problem! if caps lock is pressed the console freezes ... or the keyboard imput... i dont know. machine is still okay (network access is possible). this is not the case if X is running. i am not 100% but afair num lock and scroll lock cause the same problem. niki (using gcc-3.3/binutils-2.14) -- niki w. waibel - system administrator @ newlogic technologies ag From jmorris@intercode.com.au Tue Jun 17 04:29:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 04:29:29 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:njImSqimdi4CyrEz4Bt01MWgCV5x24Y5@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HBTG2x017645 for ; Tue, 17 Jun 2003 04:29:18 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5HBSkr32133; Tue, 17 Jun 2003 21:28:50 +1000 Date: Tue, 17 Jun 2003 21:28:45 +1000 (EST) From: James Morris To: Julian Blake Kongslie cc: linux-kernel@vger.kernel.org, , "David S. Miller" , Subject: Re: IPSEC problems with GRE. In-Reply-To: <1055746871.2305.7.camel@festa.omgwallhack.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3315 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On 16 Jun 2003, Julian Blake Kongslie wrote: > Hi there. > > I've been playing around with IPSec, and I came across a problem with > encrypting data sent directly by the kernel. > > Specifically, attempts to encrypt a GRE or IPIP tunnel with ipsec in > transport mode result in one of: > 1) No data sent. > 2) Data sent, ignored by peer. > 3) Kernel panic, with no SysRq. > > Numbers 1 and 2 might be configuration problems on my part, but I have > other ipsec setups running fine, and can't see anything different for > these. Number 3 is a big problem. I've not been able to reproduce the panic, but there is a potential issue with path mtu which could explain (1) and (2): the transport mode SAs between the gateways are not aware of the gre tunnel. You need to lower the mtu on the gre tunnel at each end to take the ipsec overhead into account. This will cause the gateways to generate appropriate icmp pmtu messages. This is handled automatically for tunnel mode ipsec configurations. - James -- James Morris From chas@locutus.cmf.nrl.navy.mil Tue Jun 17 05:42:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 05:42:14 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HCg12x020746 for ; Tue, 17 Jun 2003 05:42:02 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5HCfqsG026542; Tue, 17 Jun 2003 08:41:52 -0400 (EDT) Received: (from chas@localhost) by locutus.cmf.nrl.navy.mil (8.12.7/8.12.7/Submit) id h5HCdpsF021438; Tue, 17 Jun 2003 08:39:52 -0400 Date: Tue, 17 Jun 2003 08:39:52 -0400 From: chas williams Message-Id: <200306171239.h5HCdpsF021438@locutus.cmf.nrl.navy.mil> To: davem@redhat.com Subject: [PATCH][ATM][1/3] assorted changes for atm Cc: netdev@oss.sgi.com X-Spam-Score: () hits=0.4 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3316 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev [atm]: split atm_ioctl into vcc_ioctl and atm_dev_ioctl # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1398 -> 1.1399 # net/atm/pvc.c 1.8 -> 1.9 # net/atm/svc.c 1.10 -> 1.11 # net/atm/common.h 1.4 -> 1.5 # net/atm/resources.h 1.3 -> 1.4 # net/atm/resources.c 1.9 -> 1.10 # net/atm/common.c 1.27 -> 1.28 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/16 chas@relax.cmf.nrl.navy.mil 1.1399 # split atm_ioctl into vcc_ioctl and atm_dev_ioctl # -------------------------------------------- # diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:12:22 2003 +++ b/net/atm/common.c Tue Jun 17 08:12:22 2003 @@ -563,129 +563,51 @@ } -static void copy_aal_stats(struct k_atm_aal_stats *from, - struct atm_aal_stats *to) +int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { -#define __HANDLE_ITEM(i) to->i = atomic_read(&from->i) - __AAL_STAT_ITEMS -#undef __HANDLE_ITEM -} - - -static void subtract_aal_stats(struct k_atm_aal_stats *from, - struct atm_aal_stats *to) -{ -#define __HANDLE_ITEM(i) atomic_sub(to->i,&from->i) - __AAL_STAT_ITEMS -#undef __HANDLE_ITEM -} - - -static int fetch_stats(struct atm_dev *dev,struct atm_dev_stats *arg,int zero) -{ - struct atm_dev_stats tmp; - int error = 0; - - copy_aal_stats(&dev->stats.aal0,&tmp.aal0); - copy_aal_stats(&dev->stats.aal34,&tmp.aal34); - copy_aal_stats(&dev->stats.aal5,&tmp.aal5); - if (arg) error = copy_to_user(arg,&tmp,sizeof(tmp)); - if (zero && !error) { - subtract_aal_stats(&dev->stats.aal0,&tmp.aal0); - subtract_aal_stats(&dev->stats.aal34,&tmp.aal34); - subtract_aal_stats(&dev->stats.aal5,&tmp.aal5); - } - return error ? -EFAULT : 0; -} - - -int atm_ioctl(struct socket *sock,unsigned int cmd,unsigned long arg) -{ - struct atm_dev *dev; - struct list_head *p; struct atm_vcc *vcc; - int *tmp_buf, *tmp_p; - void *buf; - int error,len,size,number, ret_val; + int error; - ret_val = 0; vcc = ATM_SD(sock); switch (cmd) { case SIOCOUTQ: if (sock->state != SS_CONNECTED || - !test_bit(ATM_VF_READY,&vcc->flags)) { - ret_val = -EINVAL; + !test_bit(ATM_VF_READY, &vcc->flags)) { + error = -EINVAL; goto done; } - ret_val = put_user(vcc->sk->sk_sndbuf - - atomic_read(&vcc->sk->sk_wmem_alloc), - (int *) arg) ? -EFAULT : 0; + error = put_user(vcc->sk->sk_sndbuf - + atomic_read(&vcc->sk->sk_wmem_alloc), + (int *) arg) ? -EFAULT : 0; goto done; case SIOCINQ: { struct sk_buff *skb; if (sock->state != SS_CONNECTED) { - ret_val = -EINVAL; + error = -EINVAL; goto done; } skb = skb_peek(&vcc->sk->sk_receive_queue); - ret_val = put_user(skb ? skb->len : 0,(int *) arg) - ? -EFAULT : 0; - goto done; - } - case ATM_GETNAMES: - if (get_user(buf, - &((struct atm_iobuf *) arg)->buffer)) { - ret_val = -EFAULT; - goto done; - } - if (get_user(len, - &((struct atm_iobuf *) arg)->length)) { - ret_val = -EFAULT; + error = put_user(skb ? skb->len : 0, + (int *) arg) ? -EFAULT : 0; goto done; } - size = 0; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) - size += sizeof(int); - if (size > len) { - spin_unlock(&atm_dev_lock); - ret_val = -E2BIG; - goto done; - } - tmp_buf = kmalloc(size, GFP_ATOMIC); - if (!tmp_buf) { - spin_unlock(&atm_dev_lock); - ret_val = -ENOMEM; - goto done; - } - tmp_p = tmp_buf; - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - *tmp_p++ = dev->number; - } - spin_unlock(&atm_dev_lock); - ret_val = ((copy_to_user(buf, tmp_buf, size)) || - put_user(size, &((struct atm_iobuf *) arg)->length) - ) ? -EFAULT : 0; - kfree(tmp_buf); - goto done; case SIOCGSTAMP: /* borrowed from IP */ if (!vcc->sk->sk_stamp.tv_sec) { - ret_val = -ENOENT; + error = -ENOENT; goto done; } - ret_val = copy_to_user((void *)arg, &vcc->sk->sk_stamp, - sizeof(struct timeval)) ? -EFAULT : 0; + error = copy_to_user((void *)arg, &vcc->sk->sk_stamp, + sizeof(struct timeval)) ? -EFAULT : 0; goto done; case ATM_SETSC: printk(KERN_WARNING "ATM_SETSC is obsolete\n"); - ret_val = 0; + error = 0; goto done; case ATMSIGD_CTRL: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } /* @@ -696,28 +618,28 @@ * have the same privledges that /proc/kcore needs */ if (!capable(CAP_SYS_RAWIO)) { - ret_val = -EPERM; + error = -EPERM; goto done; } error = sigd_attach(vcc); - if (!error) sock->state = SS_CONNECTED; - ret_val = error; + if (!error) + sock->state = SS_CONNECTED; goto done; #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) case SIOCMKCLIP: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_clip_ops()) { - ret_val = atm_clip_ops->clip_create(arg); + error = atm_clip_ops->clip_create(arg); module_put(atm_clip_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMARPD_CTRL: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } #if defined(CONFIG_ATM_CLIP_MODULE) @@ -728,48 +650,47 @@ error = atm_clip_ops->atm_init_atmarp(vcc); if (!error) sock->state = SS_CONNECTED; - ret_val = error; } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMARP_MKIP: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_clip_ops()) { - ret_val = atm_clip_ops->clip_mkip(vcc, arg); + error = atm_clip_ops->clip_mkip(vcc, arg); module_put(atm_clip_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMARP_SETENTRY: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_clip_ops()) { - ret_val = atm_clip_ops->clip_setentry(vcc, arg); + error = atm_clip_ops->clip_setentry(vcc, arg); module_put(atm_clip_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMARP_ENCAP: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_clip_ops()) { - ret_val = atm_clip_ops->clip_encap(vcc, arg); + error = atm_clip_ops->clip_encap(vcc, arg); module_put(atm_clip_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; #endif #if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE) case ATMLEC_CTRL: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } #if defined(CONFIG_ATM_LANE_MODULE) @@ -781,37 +702,36 @@ module_put(atm_lane_ops->owner); if (error >= 0) sock->state = SS_CONNECTED; - ret_val = error; } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMLEC_MCAST: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_lane_ops()) { - ret_val = atm_lane_ops->mcast_attach(vcc, (int) arg); + error = atm_lane_ops->mcast_attach(vcc, (int) arg); module_put(atm_lane_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMLEC_DATA: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_lane_ops()) { - ret_val = atm_lane_ops->vcc_attach(vcc, (void *) arg); + error = atm_lane_ops->vcc_attach(vcc, (void *) arg); module_put(atm_lane_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; #endif #if defined(CONFIG_ATM_MPOA) || defined(CONFIG_ATM_MPOA_MODULE) case ATMMPC_CTRL: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } #if defined(CONFIG_ATM_MPOA_MODULE) @@ -823,63 +743,62 @@ module_put(atm_mpoa_ops->owner); if (error >= 0) sock->state = SS_CONNECTED; - ret_val = error; } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; case ATMMPC_DATA: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (try_atm_mpoa_ops()) { - ret_val = atm_mpoa_ops->vcc_attach(vcc, arg); + error = atm_mpoa_ops->vcc_attach(vcc, arg); module_put(atm_mpoa_ops->owner); } else - ret_val = -ENOSYS; + error = -ENOSYS; goto done; #endif #if defined(CONFIG_ATM_TCP) || defined(CONFIG_ATM_TCP_MODULE) case SIOCSIFATMTCP: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (!atm_tcp_ops.attach) { - ret_val = -ENOPKG; + error = -ENOPKG; goto done; } - fops_get (&atm_tcp_ops); - error = atm_tcp_ops.attach(vcc,(int) arg); - if (error >= 0) sock->state = SS_CONNECTED; - else fops_put (&atm_tcp_ops); - ret_val = error; + fops_get(&atm_tcp_ops); + error = atm_tcp_ops.attach(vcc, (int) arg); + if (error >= 0) + sock->state = SS_CONNECTED; + else + fops_put (&atm_tcp_ops); goto done; case ATMTCP_CREATE: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (!atm_tcp_ops.create_persistent) { - ret_val = -ENOPKG; + error = -ENOPKG; goto done; } error = atm_tcp_ops.create_persistent((int) arg); - if (error < 0) fops_put (&atm_tcp_ops); - ret_val = error; + if (error < 0) + fops_put (&atm_tcp_ops); goto done; case ATMTCP_REMOVE: if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; + error = -EPERM; goto done; } if (!atm_tcp_ops.remove_persistent) { - ret_val = -ENOPKG; + error = -ENOPKG; goto done; } error = atm_tcp_ops.remove_persistent((int) arg); - fops_put (&atm_tcp_ops); - ret_val = error; + fops_put(&atm_tcp_ops); goto done; #endif default: @@ -887,182 +806,23 @@ } #if defined(CONFIG_PPPOATM) || defined(CONFIG_PPPOATM_MODULE) if (pppoatm_ioctl_hook) { - ret_val = pppoatm_ioctl_hook(vcc, cmd, arg); - if (ret_val != -ENOIOCTLCMD) + error = pppoatm_ioctl_hook(vcc, cmd, arg); + if (error != -ENOIOCTLCMD) goto done; } #endif #if defined(CONFIG_ATM_BR2684) || defined(CONFIG_ATM_BR2684_MODULE) if (br2684_ioctl_hook) { - ret_val = br2684_ioctl_hook(vcc, cmd, arg); - if (ret_val != -ENOIOCTLCMD) + error = br2684_ioctl_hook(vcc, cmd, arg); + if (error != -ENOIOCTLCMD) goto done; } #endif - if (get_user(buf,&((struct atmif_sioc *) arg)->arg)) { - ret_val = -EFAULT; - goto done; - } - if (get_user(len,&((struct atmif_sioc *) arg)->length)) { - ret_val = -EFAULT; - goto done; - } - if (get_user(number,&((struct atmif_sioc *) arg)->number)) { - ret_val = -EFAULT; - goto done; - } - if (!(dev = atm_dev_lookup(number))) { - ret_val = -ENODEV; - goto done; - } - - size = 0; - switch (cmd) { - case ATM_GETTYPE: - size = strlen(dev->type)+1; - if (copy_to_user(buf,dev->type,size)) { - ret_val = -EFAULT; - goto done_release; - } - break; - case ATM_GETESI: - size = ESI_LEN; - if (copy_to_user(buf,dev->esi,size)) { - ret_val = -EFAULT; - goto done_release; - } - break; - case ATM_SETESI: - { - int i; - - for (i = 0; i < ESI_LEN; i++) - if (dev->esi[i]) { - ret_val = -EEXIST; - goto done_release; - } - } - /* fall through */ - case ATM_SETESIF: - { - unsigned char esi[ESI_LEN]; - - if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; - goto done_release; - } - if (copy_from_user(esi,buf,ESI_LEN)) { - ret_val = -EFAULT; - goto done_release; - } - memcpy(dev->esi,esi,ESI_LEN); - ret_val = ESI_LEN; - goto done_release; - } - case ATM_GETSTATZ: - if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; - goto done_release; - } - /* fall through */ - case ATM_GETSTAT: - size = sizeof(struct atm_dev_stats); - error = fetch_stats(dev,buf,cmd == ATM_GETSTATZ); - if (error) { - ret_val = error; - goto done_release; - } - break; - case ATM_GETCIRANGE: - size = sizeof(struct atm_cirange); - if (copy_to_user(buf,&dev->ci_range,size)) { - ret_val = -EFAULT; - goto done_release; - } - break; - case ATM_GETLINKRATE: - size = sizeof(int); - if (copy_to_user(buf,&dev->link_rate,size)) { - ret_val = -EFAULT; - goto done_release; - } - break; - case ATM_RSTADDR: - if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; - goto done_release; - } - atm_reset_addr(dev); - break; - case ATM_ADDADDR: - case ATM_DELADDR: - if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; - goto done_release; - } - { - struct sockaddr_atmsvc addr; - if (copy_from_user(&addr,buf,sizeof(addr))) { - ret_val = -EFAULT; - goto done_release; - } - if (cmd == ATM_ADDADDR) - ret_val = atm_add_addr(dev,&addr); - else - ret_val = atm_del_addr(dev,&addr); - goto done_release; - } - case ATM_GETADDR: - size = atm_get_addr(dev,buf,len); - if (size < 0) - ret_val = size; - else - /* may return 0, but later on size == 0 means "don't - write the length" */ - ret_val = put_user(size, - &((struct atmif_sioc *) arg)->length) ? -EFAULT : 0; - goto done_release; - case ATM_SETLOOP: - if (__ATM_LM_XTRMT((int) (long) buf) && - __ATM_LM_XTLOC((int) (long) buf) > - __ATM_LM_XTRMT((int) (long) buf)) { - ret_val = -EINVAL; - goto done_release; - } - /* fall through */ - case ATM_SETCIRANGE: - case SONET_GETSTATZ: - case SONET_SETDIAG: - case SONET_CLRDIAG: - case SONET_SETFRAMING: - if (!capable(CAP_NET_ADMIN)) { - ret_val = -EPERM; - goto done_release; - } - /* fall through */ - default: - if (!dev->ops->ioctl) { - ret_val = -EINVAL; - goto done_release; - } - size = dev->ops->ioctl(dev,cmd,buf); - if (size < 0) { - ret_val = (size == -ENOIOCTLCMD ? -EINVAL : size); - goto done_release; - } - } - - if (size) - ret_val = put_user(size,&((struct atmif_sioc *) arg)->length) ? - -EFAULT : 0; - else - ret_val = 0; -done_release: - atm_dev_release(dev); + error = atm_dev_ioctl(cmd, arg); done: - return ret_val; + return error; } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:12:22 2003 +++ b/net/atm/common.h Tue Jun 17 08:12:22 2003 @@ -18,7 +18,7 @@ int atm_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len); unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); -int atm_ioctl(struct socket *sock,unsigned int cmd,unsigned long arg); +int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); int atm_setsockopt(struct socket *sock,int level,int optname,char *optval, int optlen); int atm_getsockopt(struct socket *sock,int level,int optname,char *optval, diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:12:22 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:12:22 2003 @@ -82,7 +82,7 @@ .accept = sock_no_accept, .getname = pvc_getname, .poll = atm_poll, - .ioctl = atm_ioctl, + .ioctl = vcc_ioctl, .listen = sock_no_listen, .shutdown = pvc_shutdown, .setsockopt = atm_setsockopt, diff -Nru a/net/atm/resources.c b/net/atm/resources.c --- a/net/atm/resources.c Tue Jun 17 08:12:22 2003 +++ b/net/atm/resources.c Tue Jun 17 08:12:22 2003 @@ -12,6 +12,7 @@ #include #include #include +#include #include /* for barrier */ #include #include @@ -19,6 +20,7 @@ #include "common.h" #include "resources.h" +#include "addr.h" #ifndef NULL @@ -170,6 +172,240 @@ dev->ops->dev_close(dev); atm_dev_deregister(dev); } + + +static void copy_aal_stats(struct k_atm_aal_stats *from, + struct atm_aal_stats *to) +{ +#define __HANDLE_ITEM(i) to->i = atomic_read(&from->i) + __AAL_STAT_ITEMS +#undef __HANDLE_ITEM +} + + +static void subtract_aal_stats(struct k_atm_aal_stats *from, + struct atm_aal_stats *to) +{ +#define __HANDLE_ITEM(i) atomic_sub(to->i, &from->i) + __AAL_STAT_ITEMS +#undef __HANDLE_ITEM +} + + +static int fetch_stats(struct atm_dev *dev, struct atm_dev_stats *arg, int zero) +{ + struct atm_dev_stats tmp; + int error = 0; + + copy_aal_stats(&dev->stats.aal0, &tmp.aal0); + copy_aal_stats(&dev->stats.aal34, &tmp.aal34); + copy_aal_stats(&dev->stats.aal5, &tmp.aal5); + if (arg) + error = copy_to_user(arg, &tmp, sizeof(tmp)); + if (zero && !error) { + subtract_aal_stats(&dev->stats.aal0, &tmp.aal0); + subtract_aal_stats(&dev->stats.aal34, &tmp.aal34); + subtract_aal_stats(&dev->stats.aal5, &tmp.aal5); + } + return error ? -EFAULT : 0; +} + + +int atm_dev_ioctl(unsigned int cmd, unsigned long arg) +{ + void *buf; + int error, len, number, size = 0; + struct atm_dev *dev; + struct list_head *p; + int *tmp_buf, *tmp_p; + + switch (cmd) { + case ATM_GETNAMES: + if (get_user(buf, &((struct atm_iobuf *) arg)->buffer)) + return -EFAULT; + if (get_user(len, &((struct atm_iobuf *) arg)->length)) + return -EFAULT; + spin_lock(&atm_dev_lock); + list_for_each(p, &atm_devs) + size += sizeof(int); + if (size > len) { + spin_unlock(&atm_dev_lock); + return -E2BIG; + } + tmp_buf = kmalloc(size, GFP_ATOMIC); + if (!tmp_buf) { + spin_unlock(&atm_dev_lock); + return -ENOMEM; + } + tmp_p = tmp_buf; + list_for_each(p, &atm_devs) { + dev = list_entry(p, struct atm_dev, dev_list); + *tmp_p++ = dev->number; + } + spin_unlock(&atm_dev_lock); + error = ((copy_to_user(buf, tmp_buf, size)) || + put_user(size, &((struct atm_iobuf *) arg)->length)) + ? -EFAULT : 0; + kfree(tmp_buf); + return error; + default: + break; + } + + if (get_user(buf, &((struct atmif_sioc *) arg)->arg)) + return -EFAULT; + if (get_user(len, &((struct atmif_sioc *) arg)->length)) + return -EFAULT; + if (get_user(number, &((struct atmif_sioc *) arg)->number)) + return -EFAULT; + + if (!(dev = atm_dev_lookup(number))) + return -ENODEV; + + switch (cmd) { + case ATM_GETTYPE: + size = strlen(dev->type) + 1; + if (copy_to_user(buf, dev->type, size)) { + error = -EFAULT; + goto done; + } + break; + case ATM_GETESI: + size = ESI_LEN; + if (copy_to_user(buf, dev->esi, size)) { + error = -EFAULT; + goto done; + } + break; + case ATM_SETESI: + { + int i; + + for (i = 0; i < ESI_LEN; i++) + if (dev->esi[i]) { + error = -EEXIST; + goto done; + } + } + /* fall through */ + case ATM_SETESIF: + { + unsigned char esi[ESI_LEN]; + + if (!capable(CAP_NET_ADMIN)) { + error = -EPERM; + goto done; + } + if (copy_from_user(esi, buf, ESI_LEN)) { + error = -EFAULT; + goto done; + } + memcpy(dev->esi, esi, ESI_LEN); + error = ESI_LEN; + goto done; + } + case ATM_GETSTATZ: + if (!capable(CAP_NET_ADMIN)) { + error = -EPERM; + goto done; + } + /* fall through */ + case ATM_GETSTAT: + size = sizeof(struct atm_dev_stats); + error = fetch_stats(dev, buf, cmd == ATM_GETSTATZ); + if (error) + goto done; + break; + case ATM_GETCIRANGE: + size = sizeof(struct atm_cirange); + if (copy_to_user(buf, &dev->ci_range, size)) { + error = -EFAULT; + goto done; + } + break; + case ATM_GETLINKRATE: + size = sizeof(int); + if (copy_to_user(buf, &dev->link_rate, size)) { + error = -EFAULT; + goto done; + } + break; + case ATM_RSTADDR: + if (!capable(CAP_NET_ADMIN)) { + error = -EPERM; + goto done; + } + atm_reset_addr(dev); + break; + case ATM_ADDADDR: + case ATM_DELADDR: + if (!capable(CAP_NET_ADMIN)) { + error = -EPERM; + goto done; + } + { + struct sockaddr_atmsvc addr; + + if (copy_from_user(&addr, buf, sizeof(addr))) { + error = -EFAULT; + goto done; + } + if (cmd == ATM_ADDADDR) + error = atm_add_addr(dev, &addr); + else + error = atm_del_addr(dev, &addr); + goto done; + } + case ATM_GETADDR: + error = atm_get_addr(dev, buf, len); + if (error < 0) + goto done; + size = error; + /* may return 0, but later on size == 0 means "don't + write the length" */ + error = put_user(size, &((struct atmif_sioc *) arg)->length) + ? -EFAULT : 0; + goto done; + case ATM_SETLOOP: + if (__ATM_LM_XTRMT((int) (long) buf) && + __ATM_LM_XTLOC((int) (long) buf) > + __ATM_LM_XTRMT((int) (long) buf)) { + error = -EINVAL; + goto done; + } + /* fall through */ + case ATM_SETCIRANGE: + case SONET_GETSTATZ: + case SONET_SETDIAG: + case SONET_CLRDIAG: + case SONET_SETFRAMING: + if (!capable(CAP_NET_ADMIN)) { + error = -EPERM; + goto done; + } + /* fall through */ + default: + if (!dev->ops->ioctl) { + error = -EINVAL; + goto done; + } + size = dev->ops->ioctl(dev, cmd, buf); + if (size < 0) { + error = (size == -ENOIOCTLCMD ? -EINVAL : size); + goto done; + } + } + + if (size) + error = put_user(size, &((struct atmif_sioc *) arg)->length) + ? -EFAULT : 0; + else + error = 0; +done: + atm_dev_release(dev); + return error; +} + struct sock *alloc_atm_vcc_sk(int family) { diff -Nru a/net/atm/resources.h b/net/atm/resources.h --- a/net/atm/resources.h Tue Jun 17 08:12:22 2003 +++ b/net/atm/resources.h Tue Jun 17 08:12:22 2003 @@ -16,6 +16,7 @@ struct sock *alloc_atm_vcc_sk(int family); void free_atm_vcc_sk(struct sock *sk); +int atm_dev_ioctl(unsigned int cmd, unsigned long arg); #ifdef CONFIG_PROC_FS diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:12:22 2003 +++ b/net/atm/svc.c Tue Jun 17 08:12:22 2003 @@ -402,7 +402,7 @@ .accept = svc_accept, .getname = svc_getname, .poll = atm_poll, - .ioctl = atm_ioctl, + .ioctl = vcc_ioctl, .listen = svc_listen, .shutdown = svc_shutdown, .setsockopt = svc_setsockopt, fg From chas@locutus.cmf.nrl.navy.mil Tue Jun 17 05:42:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 05:42:46 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HCgY2x020785 for ; Tue, 17 Jun 2003 05:42:34 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5HCgMsG026557; Tue, 17 Jun 2003 08:42:22 -0400 (EDT) Received: (from chas@localhost) by locutus.cmf.nrl.navy.mil (8.12.7/8.12.7/Submit) id h5HCeMbB021442; Tue, 17 Jun 2003 08:40:22 -0400 Date: Tue, 17 Jun 2003 08:40:22 -0400 From: chas williams Message-Id: <200306171240.h5HCeMbB021442@locutus.cmf.nrl.navy.mil> To: davem@redhat.com Subject: [PATCH][ATM][3/3] assorted changes for atm Cc: netdev@oss.sgi.com X-Spam-Score: () hits=0.4 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3317 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev [atm]: keep vcc's on global list instead of per device # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1403 -> 1.1404 # drivers/atm/he.c 1.12 -> 1.13 # net/atm/atm_misc.c 1.4 -> 1.5 # drivers/atm/eni.c 1.14 -> 1.15 # net/atm/proc.c 1.17 -> 1.18 # net/atm/pvc.c 1.13 -> 1.14 # drivers/atm/idt77252.c 1.14 -> 1.15 # net/atm/lec.c 1.26 -> 1.27 # net/atm/svc.c 1.15 -> 1.16 # drivers/atm/atmtcp.c 1.7 -> 1.8 # net/atm/common.h 1.9 -> 1.10 # net/atm/signaling.c 1.11 -> 1.12 # net/atm/resources.h 1.4 -> 1.5 # net/atm/mpc.c 1.17 -> 1.18 # include/linux/atmdev.h 1.15 -> 1.16 # net/atm/resources.c 1.10 -> 1.11 # net/atm/clip.c 1.14 -> 1.15 # drivers/atm/fore200e.c 1.15 -> 1.16 # net/atm/common.c 1.32 -> 1.33 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/17 chas@relax.cmf.nrl.navy.mil 1.1404 # vcc_sklist conversion # -------------------------------------------- # diff -Nru a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c --- a/drivers/atm/atmtcp.c Tue Jun 17 08:26:40 2003 +++ b/drivers/atm/atmtcp.c Tue Jun 17 08:26:40 2003 @@ -153,9 +153,9 @@ static int atmtcp_v_ioctl(struct atm_dev *dev,unsigned int cmd,void *arg) { - unsigned long flags; struct atm_cirange ci; struct atm_vcc *vcc; + struct sock *s; if (cmd != ATM_SETCIRANGE) return -ENOIOCTLCMD; if (copy_from_user(&ci,(void *) arg,sizeof(ci))) return -EFAULT; @@ -163,14 +163,18 @@ if (ci.vci_bits == ATM_CI_MAX) ci.vci_bits = MAX_VCI_BITS; if (ci.vpi_bits > MAX_VPI_BITS || ci.vpi_bits < 0 || ci.vci_bits > MAX_VCI_BITS || ci.vci_bits < 0) return -EINVAL; - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) + read_lock(&vcc_sklist_lock); + for (s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; if ((vcc->vpi >> ci.vpi_bits) || (vcc->vci >> ci.vci_bits)) { - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EBUSY; } - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); dev->ci_range = ci; return 0; } @@ -233,9 +237,9 @@ static void atmtcp_c_close(struct atm_vcc *vcc) { - unsigned long flags; struct atm_dev *atmtcp_dev; struct atmtcp_dev_data *dev_data; + struct sock *s; struct atm_vcc *walk; atmtcp_dev = (struct atm_dev *) vcc->dev_data; @@ -246,19 +250,23 @@ kfree(dev_data); shutdown_atm_dev(atmtcp_dev); vcc->dev_data = NULL; - spin_lock_irqsave(&atmtcp_dev->lock, flags); - for (walk = atmtcp_dev->vccs; walk; walk = walk->next) + read_lock(&vcc_sklist_lock); + for (s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != atmtcp_dev) + continue; wake_up(&walk->sleep); - spin_unlock_irqrestore(&atmtcp_dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); } static int atmtcp_c_send(struct atm_vcc *vcc,struct sk_buff *skb) { - unsigned long flags; struct atm_dev *dev; struct atmtcp_hdr *hdr; - struct atm_vcc *out_vcc; + struct sock *s; + struct atm_vcc *out_vcc = NULL; struct sk_buff *new_skb; int result = 0; @@ -270,13 +278,17 @@ (struct atmtcp_control *) skb->data); goto done; } - spin_lock_irqsave(&dev->lock, flags); - for (out_vcc = dev->vccs; out_vcc; out_vcc = out_vcc->next) + read_lock(&vcc_sklist_lock); + for (s = vcc_sklist; s; s = s->sk_next) { + out_vcc = atm_sk(s); + if (out_vcc->dev != dev) + continue; if (out_vcc->vpi == ntohs(hdr->vpi) && out_vcc->vci == ntohs(hdr->vci) && out_vcc->qos.rxtp.traffic_class != ATM_NONE) break; - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); if (!out_vcc) { atomic_inc(&vcc->stats->tx_err); goto done; @@ -366,7 +378,7 @@ if (itf != -1) dev = atm_dev_lookup(itf); if (dev) { if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } if (PRIV(dev)->vcc) return -EBUSY; @@ -378,7 +390,8 @@ if (error) return error; } PRIV(dev)->vcc = vcc; - bind_vcc(vcc,&atmtcp_control_dev); + vcc->dev = &atmtcp_control_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); vcc->dev_data = dev; @@ -402,7 +415,7 @@ dev = atm_dev_lookup(itf); if (!dev) return -ENODEV; if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } dev_data = PRIV(dev); @@ -410,7 +423,7 @@ dev_data->persist = 0; if (PRIV(dev)->vcc) return 0; kfree(dev_data); - atm_dev_release(dev); + atm_dev_put(dev); shutdown_atm_dev(dev); return 0; } diff -Nru a/drivers/atm/eni.c b/drivers/atm/eni.c --- a/drivers/atm/eni.c Tue Jun 17 08:26:40 2003 +++ b/drivers/atm/eni.c Tue Jun 17 08:26:40 2003 @@ -1887,10 +1887,10 @@ static int get_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; + struct sock *s; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) *vpi = 0; if (*vci == ATM_VCI_ANY) { for (*vci = ATM_NOT_RSV_VCI; *vci < NR_VCI; (*vci)++) { @@ -1898,40 +1898,47 @@ ENI_DEV(vcc->dev)->rx_map[*vci]) continue; if (vcc->qos.txtp.traffic_class != ATM_NONE) { - for (walk = vcc->dev->vccs; walk; - walk = walk->next) + for (s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) break; - if (walk) continue; + } + if (s) continue; } break; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return *vci == NR_VCI ? -EADDRINUSE : 0; } if (*vci == ATM_VCI_UNSPEC) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (vcc->qos.rxtp.traffic_class != ATM_NONE && ENI_DEV(vcc->dev)->rx_map[*vci]) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } if (vcc->qos.txtp.traffic_class == ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } - for (walk = vcc->dev->vccs; walk; walk = walk->next) + for (s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); return 0; } @@ -2139,7 +2146,7 @@ static int eni_proc_read(struct atm_dev *dev,loff_t *pos,char *page) { - unsigned long flags; + struct sock *s; static const char *signal[] = { "LOST","unknown","okay" }; struct eni_dev *eni_dev = ENI_DEV(dev); struct atm_vcc *vcc; @@ -2212,11 +2219,15 @@ return sprintf(page,"%10sbacklog %u packets\n","", skb_queue_len(&tx->backlog)); } - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) { - struct eni_vcc *eni_vcc = ENI_VCC(vcc); + read_lock(&vcc_sklist_lock); + for (s = vcc_sklist; s; s = s->sk_next) { + struct eni_vcc *eni_vcc; int length; + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; + eni_vcc = ENI_VCC(vcc); if (--left) continue; length = sprintf(page,"vcc %4d: ",vcc->vci); if (eni_vcc->rx) { @@ -2231,10 +2242,10 @@ length += sprintf(page+length,"tx[%d], txing %d bytes", eni_vcc->tx->index,eni_vcc->txing); page[length] = '\n'; - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return length+1; } - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); for (i = 0; i < eni_dev->free_len; i++) { struct eni_free *fe = eni_dev->free_list+i; unsigned long offset; diff -Nru a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c --- a/drivers/atm/fore200e.c Tue Jun 17 08:26:40 2003 +++ b/drivers/atm/fore200e.c Tue Jun 17 08:26:40 2003 @@ -1069,18 +1069,22 @@ static struct atm_vcc* fore200e_find_vcc(struct fore200e* fore200e, struct rpd* rpd) { - unsigned long flags; + struct sock *s; struct atm_vcc* vcc; - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { - - if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) - break; + read_lock(&vcc_sklist_lock); + for(s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (vcc->dev != fore200e->atm_dev) + continue; + if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) { + read_unlock(&vcc_sklist_lock); + return vcc; + } } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); - - return vcc; + read_unlock(&vcc_sklist_lock); + + return NULL; } @@ -1350,20 +1354,23 @@ static int fore200e_walk_vccs(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; struct atm_vcc* walk; + struct sock *s; /* find a free VPI */ - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { - for (*vpi = 0, walk = vcc->dev->vccs; walk; walk = walk->next) { + for (*vpi = 0, s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + s = vcc_sklist; } } } @@ -1371,16 +1378,19 @@ /* find a free VCI */ if (*vci == ATM_VCI_ANY) { - for (*vci = ATM_NOT_RSV_VCI, walk = vcc->dev->vccs; walk; walk = walk->next) { + for (*vci = ATM_NOT_RSV_VCI, s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vpi = *vpi) && (walk->vci == *vci)) { *vci = walk->vci + 1; - walk = vcc->dev->vccs; + s = vcc_sklist; } } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } @@ -2642,7 +2652,7 @@ static int fore200e_proc_read(struct atm_dev *dev,loff_t* pos,char* page) { - unsigned long flags; + struct sock *s; struct fore200e* fore200e = FORE200E_DEV(dev); int len, left = *pos; @@ -2889,8 +2899,12 @@ len = sprintf(page,"\n" " VCCs:\n address\tVPI.VCI:AAL\t(min/max tx PDU size) (min/max rx PDU size)\n"); - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { + read_lock(&vcc_sklist_lock); + for (s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + + if (vcc->dev != fore200e->atm_dev) + continue; fore200e_vcc = FORE200E_VCC(vcc); @@ -2904,7 +2918,7 @@ fore200e_vcc->rx_max_pdu ); } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); + read_unlock(&vcc_sklist_lock); return len; } diff -Nru a/drivers/atm/he.c b/drivers/atm/he.c --- a/drivers/atm/he.c Tue Jun 17 08:26:40 2003 +++ b/drivers/atm/he.c Tue Jun 17 08:26:40 2003 @@ -79,7 +79,6 @@ #include #define USE_TASKLET -#define USE_HE_FIND_VCC #undef USE_SCATTERGATHER #undef USE_CHECKSUM_HW /* still confused about this */ #define USE_RBPS @@ -328,25 +327,24 @@ he_writel_rcm(dev, val, 0x00000 | (cid << 3) | 7) static __inline__ struct atm_vcc* -he_find_vcc(struct he_dev *he_dev, unsigned cid) +__find_vcc(struct he_dev *he_dev, unsigned cid) { - unsigned long flags; struct atm_vcc *vcc; + struct sock *s; short vpi; int vci; vpi = cid >> he_dev->vcibits; vci = cid & ((1 << he_dev->vcibits) - 1); - spin_lock_irqsave(&he_dev->atm_dev->lock, flags); - for (vcc = he_dev->atm_dev->vccs; vcc; vcc = vcc->next) - if (vcc->vci == vci && vcc->vpi == vpi - && vcc->qos.rxtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + for (s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (vcc->dev == he_dev->atm_dev && + vcc->vci == vci && vcc->vpi == vpi && + vcc->qos.rxtp.traffic_class != ATM_NONE) { return vcc; - } - - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + } + } return NULL; } @@ -1566,17 +1564,6 @@ reg |= RX_ENABLE; he_writel(he_dev, reg, RC_CONFIG); -#ifndef USE_HE_FIND_VCC - he_dev->he_vcc_table = kmalloc(sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits)), GFP_KERNEL); - if (he_dev->he_vcc_table == NULL) { - hprintk("failed to alloc he_vcc_table\n"); - return -ENOMEM; - } - memset(he_dev->he_vcc_table, 0, sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits))); -#endif - for (i = 0; i < HE_NUM_CS_STPER; ++i) { he_dev->cs_stper[i].inuse = 0; he_dev->cs_stper[i].pcr = -1; @@ -1712,11 +1699,6 @@ he_dev->tpd_base, he_dev->tpd_base_phys); #endif -#ifndef USE_HE_FIND_VCC - if (he_dev->he_vcc_table) - kfree(he_dev->he_vcc_table); -#endif - if (he_dev->pci_dev) { pci_read_config_word(he_dev->pci_dev, PCI_COMMAND, &command); command &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); @@ -1798,6 +1780,7 @@ int pdus_assembled = 0; int updated = 0; + read_lock(&vcc_sklist_lock); while (he_dev->rbrq_head != rbrq_tail) { ++updated; @@ -1823,13 +1806,10 @@ buf_len = RBRQ_BUFLEN(he_dev->rbrq_head) * 4; cid = RBRQ_CID(he_dev->rbrq_head); -#ifdef USE_HE_FIND_VCC if (cid != lastcid) - vcc = he_find_vcc(he_dev, cid); + vcc = __find_vcc(he_dev, cid); lastcid = cid; -#else - vcc = HE_LOOKUP_VCC(he_dev, cid); -#endif + if (vcc == NULL) { hprintk("vcc == NULL (cid 0x%x)\n", cid); if (!RBRQ_HBUF_ERR(he_dev->rbrq_head)) @@ -1966,6 +1946,7 @@ RBRQ_MASK(++he_dev->rbrq_head)); } + read_unlock(&vcc_sklist_lock); if (updated) { if (updated > he_dev->rbrq_peak) @@ -2565,10 +2546,6 @@ #endif spin_unlock_irqrestore(&he_dev->global_lock, flags); - -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = vcc; -#endif } open_failed: @@ -2634,9 +2611,6 @@ if (timeout == 0) hprintk("close rx timeout cid 0x%x\n", cid); -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = NULL; -#endif HPRINTK("close rx cid 0x%x complete\n", cid); } diff -Nru a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c --- a/drivers/atm/idt77252.c Tue Jun 17 08:26:40 2003 +++ b/drivers/atm/idt77252.c Tue Jun 17 08:26:40 2003 @@ -2403,37 +2403,43 @@ static int idt77252_find_vcc(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; + struct sock *s; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { *vpi = 0; - walk = vcc->dev->vccs; - while (walk) { + s = vcc_sklist; + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + s = vcc_sklist; continue; } - walk = walk->next; + s = s->sk_next; } } if (*vci == ATM_VCI_ANY) { *vci = ATM_NOT_RSV_VCI; - walk = vcc->dev->vccs; - while (walk) { + s = vcc_sklist; + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vci)++; - walk = vcc->dev->vccs; + s = vcc_sklist; continue; } - walk = walk->next; + s = s->sk_next; } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Tue Jun 17 08:26:40 2003 +++ b/include/linux/atmdev.h Tue Jun 17 08:26:40 2003 @@ -293,7 +293,6 @@ struct k_atm_aal_stats *stats; /* pointer to AAL stats group */ wait_queue_head_t sleep; /* if socket is busy */ struct sock *sk; /* socket backpointer */ - struct atm_vcc *prev,*next; /* SVC part --- may move later ------------------------------------- */ short itf; /* interface number */ struct sockaddr_atmsvc local; @@ -320,8 +319,6 @@ /* (NULL) */ const char *type; /* device type name */ int number; /* device index */ - struct atm_vcc *vccs; /* VCC table (or NULL) */ - struct atm_vcc *last; /* last VCC (or undefined) */ void *dev_data; /* per-device data */ void *phy_data; /* private PHY date */ unsigned long flags; /* device flags (ATM_DF_*) */ @@ -390,6 +387,9 @@ unsigned long atm_options; /* ATM layer options */ }; +extern struct sock *vcc_sklist; +extern rwlock_t vcc_sklist_lock; + #define ATM_SKB(skb) (((struct atm_skb_data *) (skb)->cb)) struct atm_dev *atm_dev_register(const char *type,const struct atmdev_ops *ops, @@ -397,7 +397,8 @@ struct atm_dev *atm_dev_lookup(int number); void atm_dev_deregister(struct atm_dev *dev); void shutdown_atm_dev(struct atm_dev *dev); -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev); +void vcc_insert_socket(struct sock *sk); +void vcc_remove_socket(struct sock *sk); /* @@ -436,7 +437,7 @@ } -static inline void atm_dev_release(struct atm_dev *dev) +static inline void atm_dev_put(struct atm_dev *dev) { atomic_dec(&dev->refcnt); diff -Nru a/net/atm/atm_misc.c b/net/atm/atm_misc.c --- a/net/atm/atm_misc.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/atm_misc.c Tue Jun 17 08:26:40 2003 @@ -47,15 +47,20 @@ static int check_ci(struct atm_vcc *vcc,short vpi,int vci) { + struct sock *s; struct atm_vcc *walk; - for (walk = vcc->dev->vccs; walk; walk = walk->next) + for (s = vcc_sklist; s; s = s->sk_next) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vpi == vpi && walk->vci == vci && ((walk->qos.txtp.traffic_class != ATM_NONE && vcc->qos.txtp.traffic_class != ATM_NONE) || (walk->qos.rxtp.traffic_class != ATM_NONE && vcc->qos.rxtp.traffic_class != ATM_NONE))) return -EADDRINUSE; + } /* allow VCCs with same VPI/VCI iff they don't collide on TX/RX (but we may refuse such sharing for other reasons, e.g. if protocol requires to have both channels) */ @@ -65,17 +70,16 @@ int atm_find_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; static short p = 0; /* poor man's per-device cache */ static int c = 0; short old_p; int old_c; int err; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi != ATM_VPI_ANY && *vci != ATM_VCI_ANY) { err = check_ci(vcc,*vpi,*vci); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return err; } /* last scan may have left values out of bounds for current device */ @@ -90,7 +94,7 @@ if (!check_ci(vcc,p,c)) { *vpi = p; *vci = c; - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (*vci == ATM_VCI_ANY) { @@ -105,7 +109,7 @@ } } while (old_p != p || old_c != c); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/clip.c Tue Jun 17 08:26:40 2003 @@ -737,7 +737,8 @@ set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); /* allow replies and avoid getting closed if signaling dies */ - bind_vcc(vcc,&atmarpd_dev); + vcc->dev = &atmarpd_dev; + vcc_insert_socket(vcc->sk); vcc->push = NULL; vcc->pop = NULL; /* crash */ vcc->push_oam = NULL; /* crash */ diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/common.c Tue Jun 17 08:26:40 2003 @@ -157,6 +157,38 @@ #endif +struct sock *vcc_sklist; +rwlock_t vcc_sklist_lock = RW_LOCK_UNLOCKED; + +void __vcc_insert_socket(struct sock *sk) +{ + sk->sk_next = vcc_sklist; + if (sk->sk_next) + vcc_sklist->sk_pprev = &sk->sk_next; + vcc_sklist = sk; + sk->sk_pprev = &vcc_sklist; +} + +void vcc_insert_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + __vcc_insert_socket(sk); + write_unlock_irq(&vcc_sklist_lock); +} + +void vcc_remove_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + if (sk->sk_pprev) { + if (sk->sk_next) + sk->sk_next->sk_pprev = sk->sk_pprev; + *sk->sk_pprev = sk->sk_next; + sk->sk_pprev = NULL; + } + write_unlock_irq(&vcc_sklist_lock); +} + + static struct sk_buff *alloc_tx(struct atm_vcc *vcc,unsigned int size) { struct sk_buff *skb; @@ -175,16 +207,45 @@ } -int atm_create(struct socket *sock,int protocol,int family) +EXPORT_SYMBOL(vcc_sklist); +EXPORT_SYMBOL(vcc_sklist_lock); +EXPORT_SYMBOL(vcc_insert_socket); +EXPORT_SYMBOL(vcc_remove_socket); + +static void vcc_sock_destruct(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + if (atomic_read(&vcc->sk->sk_rmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: rmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_rmem_alloc)); + + if (atomic_read(&vcc->sk->sk_wmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: wmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_wmem_alloc)); + + kfree(sk->sk_protinfo); +} + +int vcc_create(struct socket *sock, int protocol, int family) { struct sock *sk; struct atm_vcc *vcc; sock->sk = NULL; - if (sock->type == SOCK_STREAM) return -EINVAL; - if (!(sk = alloc_atm_vcc_sk(family))) return -ENOMEM; - vcc = atm_sk(sk); - memset(&vcc->flags,0,sizeof(vcc->flags)); + if (sock->type == SOCK_STREAM) + return -EINVAL; + sk = sk_alloc(family, GFP_KERNEL, 1, NULL); + if (!sk) + return -ENOMEM; + sock_init_data(NULL, sk); + + vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); + if (!vcc) { + sk_free(sk); + return -ENOMEM; + } + + memset(vcc, 0, sizeof(*vcc)); + vcc->sk = sk; vcc->dev = NULL; vcc->callback = NULL; memset(&vcc->local,0,sizeof(struct sockaddr_atmsvc)); @@ -199,42 +260,49 @@ vcc->atm_options = vcc->aal_options = 0; init_waitqueue_head(&vcc->sleep); sk->sk_sleep = &vcc->sleep; + sk->sk_destruct = vcc_sock_destruct; sock->sk = sk; return 0; } -void atm_release_vcc_sk(struct sock *sk,int free_sk) +static void vcc_destroy_socket(struct sock *sk) { struct atm_vcc *vcc = atm_sk(sk); struct sk_buff *skb; - clear_bit(ATM_VF_READY,&vcc->flags); + clear_bit(ATM_VF_READY, &vcc->flags); if (vcc->dev) { - if (vcc->dev->ops->close) vcc->dev->ops->close(vcc); - if (vcc->push) vcc->push(vcc,NULL); /* atmarpd has no push */ + if (vcc->dev->ops->close) + vcc->dev->ops->close(vcc); + if (vcc->push) + vcc->push(vcc, NULL); /* atmarpd has no push */ + + vcc_remove_socket(sk); /* no more receive */ + while ((skb = skb_dequeue(&vcc->sk->sk_receive_queue))) { atm_return(vcc,skb->truesize); kfree_skb(skb); } module_put(vcc->dev->ops->owner); - atm_dev_release(vcc->dev); - if (atomic_read(&vcc->sk->sk_rmem_alloc)) - printk(KERN_WARNING "atm_release_vcc: strange ... " - "rmem_alloc == %d after closing\n", - atomic_read(&vcc->sk->sk_rmem_alloc)); - bind_vcc(vcc,NULL); + atm_dev_put(vcc->dev); } - - if (free_sk) free_atm_vcc_sk(sk); } -int atm_release(struct socket *sock) +int vcc_release(struct socket *sock) { - if (sock->sk) - atm_release_vcc_sk(sock->sk,1); + struct sock *sk = sock->sk; + + if (sk) { + sock_orphan(sk); + lock_sock(sk); + vcc_destroy_socket(sock->sk); + release_sock(sk); + sock_put(sk); + } + return 0; } @@ -289,7 +357,8 @@ if (vci > 0 && vci < ATM_NOT_RSV_VCI && !capable(CAP_NET_BIND_SERVICE)) return -EPERM; error = 0; - bind_vcc(vcc,dev); + vcc->dev = dev; + vcc_insert_socket(vcc->sk); switch (vcc->qos.aal) { case ATM_AAL0: error = atm_init_aal0(vcc); @@ -313,7 +382,7 @@ if (!error) error = adjust_tp(&vcc->qos.txtp,vcc->qos.aal); if (!error) error = adjust_tp(&vcc->qos.rxtp,vcc->qos.aal); if (error) { - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } DPRINTK("VCC %d.%d, AAL %d\n",vpi,vci,vcc->qos.aal); @@ -327,7 +396,7 @@ error = dev->ops->open(vcc,vpi,vci); if (error) { module_put(dev->ops->owner); - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } } @@ -371,7 +440,7 @@ dev = atm_dev_lookup(itf); error = __vcc_connect(vcc, dev, vpi, vci); if (error) { - atm_dev_release(dev); + atm_dev_put(dev); return error; } } else { @@ -385,7 +454,7 @@ spin_unlock(&atm_dev_lock); if (!__vcc_connect(vcc, dev, vpi, vci)) break; - atm_dev_release(dev); + atm_dev_put(dev); dev = NULL; spin_lock(&atm_dev_lock); } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:26:40 2003 +++ b/net/atm/common.h Tue Jun 17 08:26:40 2003 @@ -10,8 +10,8 @@ #include /* for poll_table */ -int atm_create(struct socket *sock,int protocol,int family); -int atm_release(struct socket *sock); +int vcc_create(struct socket *sock, int protocol, int family); +int vcc_release(struct socket *sock); int vcc_connect(struct socket *sock, int itf, short vpi, int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); @@ -24,7 +24,6 @@ int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen); -void atm_release_vcc_sk(struct sock *sk,int free_sk); void atm_shutdown_dev(struct atm_dev *dev); int atmpvc_init(void); diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/lec.c Tue Jun 17 08:26:40 2003 @@ -48,7 +48,7 @@ #include "lec.h" #include "lec_arpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" #if 0 #define DPRINTK printk @@ -810,7 +810,8 @@ lec_arp_init(priv); priv->itfnum = i; /* LANE2 addition */ priv->lecd = vcc; - bind_vcc(vcc, &lecatm_dev); + vcc->dev = &lecatm_dev; + vcc_insert_socket(vcc->sk); vcc->proto_data = dev_lec[i]; set_bit(ATM_VF_META,&vcc->flags); diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/mpc.c Tue Jun 17 08:26:40 2003 @@ -28,7 +28,7 @@ #include "lec.h" #include "mpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" /* * mpc.c: Implementation of MPOA client kernel part @@ -789,7 +789,8 @@ } mpc->mpoad_vcc = vcc; - bind_vcc(vcc, &mpc_dev); + vcc->dev = &mpc_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); diff -Nru a/net/atm/proc.c b/net/atm/proc.c --- a/net/atm/proc.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/proc.c Tue Jun 17 08:26:40 2003 @@ -334,9 +334,7 @@ static int atm_pvc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct sock *s; struct atm_vcc *vcc; int left, clip_info = 0; @@ -349,25 +347,20 @@ if (try_atm_clip_ops()) clip_info = 1; #endif - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMPVC && - vcc->dev && !left--) { - pvc_info(vcc,buf,clip_info); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); + read_lock(&vcc_sklist_lock); + for(s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMPVC && vcc->dev && !left--) { + pvc_info(vcc,buf,clip_info); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) - if (clip_info) - module_put(atm_clip_ops->owner); + if (clip_info) + module_put(atm_clip_ops->owner); #endif - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) if (clip_info) module_put(atm_clip_ops->owner); @@ -378,10 +371,8 @@ static int atm_vc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; struct atm_vcc *vcc; + struct sock *s; int left; if (!pos) @@ -389,20 +380,16 @@ "Address"," Itf VPI VCI Fam Flags Reply Send buffer" " Recv buffer\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (!left--) { - vc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + for(s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (!left--) { + vc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } @@ -410,29 +397,23 @@ static int atm_svc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct sock *s; struct atm_vcc *vcc; int left; if (!pos) return sprintf(buf,"Itf VPI VCI State Remote\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMSVC && !left--) { - svc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + for(s = vcc_sklist; s; s = s->sk_next) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMSVC && !left--) { + svc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:26:40 2003 @@ -17,10 +17,6 @@ #include "resources.h" /* devs and vccs */ #include "common.h" /* common for PVCs and SVCs */ -#ifndef NULL -#define NULL 0 -#endif - static int pvc_shutdown(struct socket *sock,int how) { @@ -109,7 +105,7 @@ static struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, - .release = atm_release, + .release = vcc_release, .bind = pvc_bind, .connect = pvc_connect, .socketpair = sock_no_socketpair, @@ -131,7 +127,7 @@ static int pvc_create(struct socket *sock,int protocol) { sock->ops = &pvc_proto_ops; - return atm_create(sock,protocol,PF_ATMPVC); + return vcc_create(sock, protocol, PF_ATMPVC); } diff -Nru a/net/atm/resources.c b/net/atm/resources.c --- a/net/atm/resources.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/resources.c Tue Jun 17 08:26:40 2003 @@ -23,11 +23,6 @@ #include "addr.h" -#ifndef NULL -#define NULL 0 -#endif - - LIST_HEAD(atm_devs); spinlock_t atm_dev_lock = SPIN_LOCK_UNLOCKED; @@ -91,7 +86,7 @@ spin_lock(&atm_dev_lock); if (number != -1) { if ((inuse = __atm_dev_lookup(number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); spin_unlock(&atm_dev_lock); __free_atm_dev(dev); return NULL; @@ -100,7 +95,7 @@ } else { dev->number = 0; while ((inuse = __atm_dev_lookup(dev->number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); dev->number++; } } @@ -402,78 +397,12 @@ else error = 0; done: - atm_dev_release(dev); + atm_dev_put(dev); return error; } -struct sock *alloc_atm_vcc_sk(int family) -{ - struct sock *sk; - struct atm_vcc *vcc; - - sk = sk_alloc(family, GFP_KERNEL, 1, NULL); - if (!sk) - return NULL; - vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); - if (!vcc) { - sk_free(sk); - return NULL; - } - sock_init_data(NULL, sk); - memset(vcc, 0, sizeof(*vcc)); - vcc->sk = sk; - - return sk; -} - - -static void unlink_vcc(struct atm_vcc *vcc) -{ - unsigned long flags; - if (vcc->dev) { - spin_lock_irqsave(&vcc->dev->lock, flags); - if (vcc->prev) - vcc->prev->next = vcc->next; - else - vcc->dev->vccs = vcc->next; - - if (vcc->next) - vcc->next->prev = vcc->prev; - else - vcc->dev->last = vcc->prev; - spin_unlock_irqrestore(&vcc->dev->lock, flags); - } -} - - -void free_atm_vcc_sk(struct sock *sk) -{ - unlink_vcc(atm_sk(sk)); - sk_free(sk); -} - -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev) -{ - unsigned long flags; - - unlink_vcc(vcc); - vcc->dev = dev; - if (dev) { - spin_lock_irqsave(&dev->lock, flags); - vcc->next = NULL; - vcc->prev = dev->last; - if (dev->vccs) - dev->last->next = vcc; - else - dev->vccs = vcc; - dev->last = vcc; - spin_unlock_irqrestore(&dev->lock, flags); - } -} - EXPORT_SYMBOL(atm_dev_register); EXPORT_SYMBOL(atm_dev_deregister); EXPORT_SYMBOL(atm_dev_lookup); EXPORT_SYMBOL(shutdown_atm_dev); -EXPORT_SYMBOL(bind_vcc); diff -Nru a/net/atm/resources.h b/net/atm/resources.h --- a/net/atm/resources.h Tue Jun 17 08:26:40 2003 +++ b/net/atm/resources.h Tue Jun 17 08:26:40 2003 @@ -14,8 +14,6 @@ extern spinlock_t atm_dev_lock; -struct sock *alloc_atm_vcc_sk(int family); -void free_atm_vcc_sk(struct sock *sk); int atm_dev_ioctl(unsigned int cmd, unsigned long arg); diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/signaling.c Tue Jun 17 08:26:40 2003 @@ -200,26 +200,21 @@ } -static void purge_vccs(struct atm_vcc *vcc) +static void purge_vcc(struct atm_vcc *vcc) { - while (vcc) { - if (vcc->sk->sk_family == PF_ATMSVC && - !test_bit(ATM_VF_META,&vcc->flags)) { - set_bit(ATM_VF_RELEASED,&vcc->flags); - vcc->reply = -EUNATCH; - vcc->sk->sk_err = EUNATCH; - wake_up(&vcc->sleep); - } - vcc = vcc->next; + if (vcc->sk->sk_family == PF_ATMSVC && + !test_bit(ATM_VF_META,&vcc->flags)) { + set_bit(ATM_VF_RELEASED,&vcc->flags); + vcc->reply = -EUNATCH; + vcc->sk->sk_err = EUNATCH; + wake_up(&vcc->sleep); } } static void sigd_close(struct atm_vcc *vcc) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct sock *s; DPRINTK("sigd_close\n"); sigd = NULL; @@ -227,14 +222,14 @@ printk(KERN_ERR "sigd_close: closing with requests pending\n"); skb_queue_purge(&vcc->sk->sk_receive_queue); - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - purge_vccs(dev->vccs); - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + for(s = vcc_sklist; s; s = s->sk_next) { + struct atm_vcc *vcc = atm_sk(s); + + if (vcc->dev) + purge_vcc(vcc); } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); } @@ -257,7 +252,8 @@ if (sigd) return -EADDRINUSE; DPRINTK("sigd_attach\n"); sigd = vcc; - bind_vcc(vcc,&sigd_dev); + vcc->dev = &sigd_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); wake_up(&sigd_sleep); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:26:40 2003 +++ b/net/atm/svc.c Tue Jun 17 08:26:40 2003 @@ -88,18 +88,21 @@ static int svc_release(struct socket *sock) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; - if (!sock->sk) return 0; - vcc = ATM_SD(sock); - DPRINTK("svc_release %p\n",vcc); - clear_bit(ATM_VF_READY,&vcc->flags); - atm_release_vcc_sk(sock->sk,0); - svc_disconnect(vcc); - /* VCC pointer is used as a reference, so we must not free it - (thereby subjecting it to re-use) before all pending connections - are closed */ - free_atm_vcc_sk(sock->sk); + if (sk) { + vcc = ATM_SD(sock); + DPRINTK("svc_release %p\n", vcc); + clear_bit(ATM_VF_READY, &vcc->flags); + /* VCC pointer is used as a reference, so we must not free it + (thereby subjecting it to re-use) before all pending connections + are closed */ + sock_hold(sk); + vcc_release(sock); + svc_disconnect(vcc); + sock_put(sk); + } return 0; } @@ -542,7 +545,7 @@ int error; sock->ops = &svc_proto_ops; - error = atm_create(sock,protocol,AF_ATMSVC); + error = vcc_create(sock, protocol, AF_ATMSVC); if (error) return error; ATM_SD(sock)->callback = svc_callback; ATM_SD(sock)->local.sas_family = AF_ATMSVC; From chas@locutus.cmf.nrl.navy.mil Tue Jun 17 06:00:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 06:00:55 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HD0d2x022004 for ; Tue, 17 Jun 2003 06:00:40 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5HCg5sG026548; Tue, 17 Jun 2003 08:42:05 -0400 (EDT) Received: (from chas@localhost) by locutus.cmf.nrl.navy.mil (8.12.7/8.12.7/Submit) id h5HCe5S1021440; Tue, 17 Jun 2003 08:40:05 -0400 Date: Tue, 17 Jun 2003 08:40:05 -0400 From: chas williams Message-Id: <200306171240.h5HCe5S1021440@locutus.cmf.nrl.navy.mil> To: davem@redhat.com Subject: [PATCH][ATM][2/3] assorted changes for atm Cc: netdev@oss.sgi.com X-Spam-Score: () hits=0.4 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3318 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev [atm]: rewrite recvmsg # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1399 -> 1.1400 # net/atm/pvc.c 1.9 -> 1.10 # net/atm/lec.c 1.25 -> 1.26 # net/atm/svc.c 1.11 -> 1.12 # net/atm/common.h 1.5 -> 1.6 # net/atm/mpoa_caches.c 1.1 -> 1.2 # net/atm/signaling.c 1.9 -> 1.10 # include/linux/atmdev.h 1.14 -> 1.15 # net/atm/clip.c 1.13 -> 1.14 # net/atm/common.c 1.28 -> 1.29 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/16 chas@relax.cmf.nrl.navy.mil 1.1400 # rewrite recvmsg and rename atm_async_release_vcc # -------------------------------------------- # diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Tue Jun 17 08:12:46 2003 +++ b/include/linux/atmdev.h Tue Jun 17 08:12:46 2003 @@ -452,7 +452,7 @@ int atm_find_ci(struct atm_vcc *vcc,short *vpi,int *vci); int atm_pcr_goal(struct atm_trafprm *tp); -void atm_async_release_vcc(struct atm_vcc *vcc,int reply); +void vcc_release_async(struct atm_vcc *vcc, int reply); #endif /* __KERNEL__ */ diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Tue Jun 17 08:12:46 2003 +++ b/net/atm/clip.c Tue Jun 17 08:12:46 2003 @@ -140,8 +140,8 @@ DPRINTK("releasing vcc %p->%p of " "entry %p\n",clip_vcc,clip_vcc->vcc, entry); - atm_async_release_vcc(clip_vcc->vcc, - -ETIMEDOUT); + vcc_release_async(clip_vcc->vcc, + -ETIMEDOUT); } if (entry->vccs || time_before(jiffies, entry->expires)) { diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:12:46 2003 +++ b/net/atm/common.c Tue Jun 17 08:12:46 2003 @@ -239,15 +239,16 @@ } -void atm_async_release_vcc(struct atm_vcc *vcc,int reply) +void vcc_release_async(struct atm_vcc *vcc, int reply) { - set_bit(ATM_VF_CLOSE,&vcc->flags); + set_bit(ATM_VF_CLOSE, &vcc->flags); vcc->reply = reply; + vcc->sk->sk_err = -reply; wake_up(&vcc->sleep); } -EXPORT_SYMBOL(atm_async_release_vcc); +EXPORT_SYMBOL(vcc_release_async); static int adjust_tp(struct atm_trafprm *tp,unsigned char aal) @@ -414,62 +415,46 @@ } -int atm_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, - int total_len, int flags) +int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, + int size, int flags) { - DECLARE_WAITQUEUE(wait,current); + struct sock *sk = sock->sk; struct atm_vcc *vcc; struct sk_buff *skb; - int eff_len,error; - void *buff; - int size; - - if (sock->state != SS_CONNECTED) return -ENOTCONN; - if (flags & ~MSG_DONTWAIT) return -EOPNOTSUPP; - if (m->msg_iovlen != 1) return -ENOSYS; /* fix this later @@@ */ - buff = m->msg_iov->iov_base; - size = m->msg_iov->iov_len; + int copied, error = -EINVAL; + + if (sock->state != SS_CONNECTED) + return -ENOTCONN; + if (flags & ~MSG_DONTWAIT) /* only handle MSG_DONTWAIT */ + return -EOPNOTSUPP; vcc = ATM_SD(sock); - add_wait_queue(&vcc->sleep,&wait); - set_current_state(TASK_INTERRUPTIBLE); - error = 1; /* <= 0 is error */ - while (!(skb = skb_dequeue(&vcc->sk->sk_receive_queue))) { - if (test_bit(ATM_VF_RELEASED,&vcc->flags) || - test_bit(ATM_VF_CLOSE,&vcc->flags)) { - error = vcc->reply; - break; - } - if (!test_bit(ATM_VF_READY,&vcc->flags)) { - error = 0; - break; - } - if (flags & MSG_DONTWAIT) { - error = -EAGAIN; - break; - } - schedule(); - set_current_state(TASK_INTERRUPTIBLE); - if (signal_pending(current)) { - error = -ERESTARTSYS; - break; - } - } - set_current_state(TASK_RUNNING); - remove_wait_queue(&vcc->sleep,&wait); - if (error <= 0) return error; - sock_recv_timestamp(m, vcc->sk, skb); - eff_len = skb->len > size ? size : skb->len; - if (skb->len > size) /* Not fit ? Report it... */ - m->msg_flags |= MSG_TRUNC; - if (vcc->dev->ops->feedback) - vcc->dev->ops->feedback(vcc,skb,(unsigned long) skb->data, - (unsigned long) buff,eff_len); - DPRINTK("RcvM %d -= %d\n", atomic_read(&vcc->sk->sk_rmem_alloc), - skb->truesize); - atm_return(vcc,skb->truesize); - error = copy_to_user(buff,skb->data,eff_len) ? -EFAULT : 0; - kfree_skb(skb); - return error ? error : eff_len; + if (test_bit(ATM_VF_RELEASED,&vcc->flags) || + test_bit(ATM_VF_CLOSE,&vcc->flags)) + return vcc->reply; + if (!test_bit(ATM_VF_READY, &vcc->flags)) + return 0; + + skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error); + if (!skb) + return error; + + copied = skb->len; + if (copied > size) { + copied = size; + msg->msg_flags |= MSG_TRUNC; + } + + error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied); + if (error) + return error; + sock_recv_timestamp(msg, sk, skb); + if (vcc->dev->ops->feedback) + vcc->dev->ops->feedback(vcc, skb, (unsigned long) skb->data, + (unsigned long) msg->msg_iov->iov_base, copied); + DPRINTK("RcvM %d -= %d\n", atomic_read(&vcc->sk->rmem_alloc), skb->truesize); + atm_return(vcc, skb->truesize); + skb_free_datagram(sk, skb); + return copied; } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:12:45 2003 +++ b/net/atm/common.h Tue Jun 17 08:12:45 2003 @@ -13,8 +13,8 @@ int atm_create(struct socket *sock,int protocol,int family); int atm_release(struct socket *sock); int atm_connect(struct socket *sock,int itf,short vpi,int vci); -int atm_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, - int total_len, int flags); +int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, + int size, int flags); int atm_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len); unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Tue Jun 17 08:12:45 2003 +++ b/net/atm/lec.c Tue Jun 17 08:12:45 2003 @@ -1079,7 +1079,7 @@ clear_bit(ATM_VF_READY,&entry->vcc->flags); entry->vcc->push(entry->vcc, NULL); #endif - atm_async_release_vcc(entry->vcc, -EPIPE); + vcc_release_async(entry->vcc, -EPIPE); entry->vcc = NULL; } if (entry->recv_vcc) { @@ -1089,7 +1089,7 @@ clear_bit(ATM_VF_READY,&entry->recv_vcc->flags); entry->recv_vcc->push(entry->recv_vcc, NULL); #endif - atm_async_release_vcc(entry->recv_vcc, -EPIPE); + vcc_release_async(entry->recv_vcc, -EPIPE); entry->recv_vcc = NULL; } } diff -Nru a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c --- a/net/atm/mpoa_caches.c Tue Jun 17 08:12:45 2003 +++ b/net/atm/mpoa_caches.c Tue Jun 17 08:12:46 2003 @@ -212,7 +212,7 @@ client->eg_ops->put(eg_entry); return; } - atm_async_release_vcc(vcc, -EPIPE); + vcc_release_async(vcc, -EPIPE); } return; @@ -447,7 +447,7 @@ client->in_ops->put(in_entry); return; } - atm_async_release_vcc(vcc, -EPIPE); + vcc_release_async(vcc, -EPIPE); } return; diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:12:45 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:12:45 2003 @@ -88,7 +88,7 @@ .setsockopt = atm_setsockopt, .getsockopt = atm_getsockopt, .sendmsg = atm_sendmsg, - .recvmsg = atm_recvmsg, + .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Tue Jun 17 08:12:46 2003 +++ b/net/atm/signaling.c Tue Jun 17 08:12:46 2003 @@ -124,6 +124,7 @@ clear_bit(ATM_VF_REGIS,&vcc->flags); clear_bit(ATM_VF_READY,&vcc->flags); vcc->reply = msg->reply; + vcc->sk->sk_err = -msg->reply; break; case as_indicate: vcc = *(struct atm_vcc **) &msg->listen_vcc; @@ -145,6 +146,7 @@ set_bit(ATM_VF_RELEASED,&vcc->flags); clear_bit(ATM_VF_READY,&vcc->flags); vcc->reply = msg->reply; + vcc->sk->sk_err = -msg->reply; break; case as_modify: modify_qos(vcc,msg); @@ -202,6 +204,7 @@ !test_bit(ATM_VF_META,&vcc->flags)) { set_bit(ATM_VF_RELEASED,&vcc->flags); vcc->reply = -EUNATCH; + vcc->sk->sk_err = EUNATCH; wake_up(&vcc->sleep); } vcc = vcc->next; diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:12:45 2003 +++ b/net/atm/svc.c Tue Jun 17 08:12:45 2003 @@ -408,7 +408,7 @@ .setsockopt = svc_setsockopt, .getsockopt = svc_getsockopt, .sendmsg = atm_sendmsg, - .recvmsg = atm_recvmsg, + .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; [atm]: remove SOCKOPS_WRAPPED; use prepare_to_wait()/finish_wait() # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1400 -> 1.1401 # net/atm/pvc.c 1.10 -> 1.11 # net/atm/svc.c 1.12 -> 1.13 # net/atm/common.h 1.6 -> 1.7 # net/atm/signaling.c 1.10 -> 1.11 # net/atm/common.c 1.29 -> 1.30 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/17 chas@relax.cmf.nrl.navy.mil 1.1401 # remove SOCKOPS_WRAPPED; use prepare_to_wait() # -------------------------------------------- # diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:13:05 2003 +++ b/net/atm/common.c Tue Jun 17 08:13:05 2003 @@ -458,32 +458,53 @@ } -int atm_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, +int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len) { - DECLARE_WAITQUEUE(wait,current); + struct sock *sk = sock->sk; + DEFINE_WAIT(wait); struct atm_vcc *vcc; struct sk_buff *skb; int eff,error; const void *buff; int size; - if (sock->state != SS_CONNECTED) return -ENOTCONN; - if (m->msg_name) return -EISCONN; - if (m->msg_iovlen != 1) return -ENOSYS; /* fix this later @@@ */ + lock_sock(sk); + if (sock->state != SS_CONNECTED) { + error = -ENOTCONN; + goto out; + } + if (m->msg_name) { + error = -EISCONN; + goto out; + } + if (m->msg_iovlen != 1) { + error = -ENOSYS; /* fix this later @@@ */ + goto out; + } buff = m->msg_iov->iov_base; size = m->msg_iov->iov_len; vcc = ATM_SD(sock); - if (test_bit(ATM_VF_RELEASED,&vcc->flags) || - test_bit(ATM_VF_CLOSE,&vcc->flags)) - return vcc->reply; - if (!test_bit(ATM_VF_READY,&vcc->flags)) return -EPIPE; - if (!size) return 0; - if (size < 0 || size > vcc->qos.txtp.max_sdu) return -EMSGSIZE; + if (test_bit(ATM_VF_RELEASED, &vcc->flags) || + test_bit(ATM_VF_CLOSE, &vcc->flags)) { + error = vcc->reply; + goto out; + } + if (!test_bit(ATM_VF_READY, &vcc->flags)) { + error = -EPIPE; + goto out; + } + if (!size) { + error = 0; + goto out; + } + if (size < 0 || size > vcc->qos.txtp.max_sdu) { + error = -EMSGSIZE; + goto out; + } /* verify_area is done by net/socket.c */ eff = (size+3) & ~3; /* align to word boundary */ - add_wait_queue(&vcc->sleep,&wait); - set_current_state(TASK_INTERRUPTIBLE); + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); error = 0; while (!(skb = alloc_tx(vcc,eff))) { if (m->msg_flags & MSG_DONTWAIT) { @@ -491,7 +512,6 @@ break; } schedule(); - set_current_state(TASK_INTERRUPTIBLE); if (signal_pending(current)) { error = -ERESTARTSYS; break; @@ -505,19 +525,24 @@ error = -EPIPE; break; } + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); } - set_current_state(TASK_RUNNING); - remove_wait_queue(&vcc->sleep,&wait); - if (error) return error; + finish_wait(&vcc->sleep, &wait); + if (error) + goto out; skb->dev = NULL; /* for paths shared with net_device interfaces */ ATM_SKB(skb)->atm_options = vcc->atm_options; if (copy_from_user(skb_put(skb,size),buff,size)) { kfree_skb(skb); - return -EFAULT; + error = -EFAULT; + goto out; } if (eff != size) memset(skb->data+size,0,eff-size); error = vcc->dev->ops->send(vcc,skb); - return error ? error : size; + error = error ? error : size; +out: + release_sock(sk); + return error; } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:13:05 2003 +++ b/net/atm/common.h Tue Jun 17 08:13:05 2003 @@ -15,7 +15,7 @@ int atm_connect(struct socket *sock,int itf,short vpi,int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); -int atm_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, +int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len); unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:13:05 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:13:05 2003 @@ -31,20 +31,29 @@ static int pvc_bind(struct socket *sock,struct sockaddr *sockaddr, int sockaddr_len) { + struct sock *sk = sock->sk; struct sockaddr_atmpvc *addr; struct atm_vcc *vcc; + int error; if (sockaddr_len != sizeof(struct sockaddr_atmpvc)) return -EINVAL; addr = (struct sockaddr_atmpvc *) sockaddr; if (addr->sap_family != AF_ATMPVC) return -EAFNOSUPPORT; + lock_sock(sk); vcc = ATM_SD(sock); - if (!test_bit(ATM_VF_HASQOS,&vcc->flags)) return -EBADFD; + if (!test_bit(ATM_VF_HASQOS, &vcc->flags)) { + error = -EBADFD; + goto out; + } if (test_bit(ATM_VF_PARTIAL,&vcc->flags)) { if (vcc->vpi != ATM_VPI_UNSPEC) addr->sap_addr.vpi = vcc->vpi; if (vcc->vci != ATM_VCI_UNSPEC) addr->sap_addr.vci = vcc->vci; } - return atm_connect(sock,addr->sap_addr.itf,addr->sap_addr.vpi, - addr->sap_addr.vci); + error = atm_connect(sock, addr->sap_addr.itf, addr->sap_addr.vpi, + addr->sap_addr.vci); +out: + release_sock(sk); + return error; } @@ -54,6 +63,31 @@ return pvc_bind(sock,sockaddr,sockaddr_len); } +static int pvc_setsockopt(struct socket *sock, int level, int optname, + char *optval, int optlen) +{ + struct sock *sk = sock->sk; + int error; + + lock_sock(sk); + error = atm_setsockopt(sock, level, optname, optval, optlen); + release_sock(sk); + return error; +} + + +static int pvc_getsockopt(struct socket *sock, int level, int optname, + char *optval, int *optlen) +{ + struct sock *sk = sock->sk; + int error; + + lock_sock(sk); + error = atm_getsockopt(sock, level, optname, optval, optlen); + release_sock(sk); + return error; +} + static int pvc_getname(struct socket *sock,struct sockaddr *sockaddr, int *sockaddr_len,int peer) @@ -72,7 +106,7 @@ } -static struct proto_ops SOCKOPS_WRAPPED(pvc_proto_ops) = { +static struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, .release = atm_release, @@ -85,17 +119,13 @@ .ioctl = vcc_ioctl, .listen = sock_no_listen, .shutdown = pvc_shutdown, - .setsockopt = atm_setsockopt, - .getsockopt = atm_getsockopt, - .sendmsg = atm_sendmsg, + .setsockopt = pvc_setsockopt, + .getsockopt = pvc_getsockopt, + .sendmsg = vcc_sendmsg, .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; - - -#include -SOCKOPS_WRAP(pvc_proto, PF_ATMPVC); static int pvc_create(struct socket *sock,int protocol) diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Tue Jun 17 08:13:05 2003 +++ b/net/atm/signaling.c Tue Jun 17 08:13:05 2003 @@ -129,10 +129,11 @@ case as_indicate: vcc = *(struct atm_vcc **) &msg->listen_vcc; DPRINTK("as_indicate!!!\n"); + lock_sock(vcc->sk); if (vcc->sk->sk_ack_backlog == vcc->sk->sk_max_ack_backlog) { sigd_enq(0,as_reject,vcc,NULL,NULL); - return 0; + goto as_indicate_complete; } vcc->sk->sk_ack_backlog++; skb_queue_tail(&vcc->sk->sk_receive_queue, skb); @@ -141,6 +142,8 @@ &vcc->sleep); vcc->callback(vcc); } +as_indicate_complete: + release_sock(vcc->sk); return 0; case as_close: set_bit(ATM_VF_RELEASED,&vcc->flags); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:13:05 2003 +++ b/net/atm/svc.c Tue Jun 17 08:13:05 2003 @@ -59,18 +59,18 @@ static void svc_disconnect(struct atm_vcc *vcc) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); struct sk_buff *skb; DPRINTK("svc_disconnect %p\n",vcc); if (test_bit(ATM_VF_REGIS,&vcc->flags)) { - add_wait_queue(&vcc->sleep,&wait); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_close,NULL,NULL,NULL); while (!test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); schedule(); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); } - remove_wait_queue(&vcc->sleep,&wait); + finish_wait(&vcc->sleep, &wait); } /* beware - socket is still in use by atmsigd until the last as_indicate has been answered */ @@ -107,80 +107,138 @@ static int svc_bind(struct socket *sock,struct sockaddr *sockaddr, int sockaddr_len) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); + struct sock *sk = sock->sk; struct sockaddr_atmsvc *addr; struct atm_vcc *vcc; + int error; - if (sockaddr_len != sizeof(struct sockaddr_atmsvc)) return -EINVAL; - if (sock->state == SS_CONNECTED) return -EISCONN; - if (sock->state != SS_UNCONNECTED) return -EINVAL; + if (sockaddr_len != sizeof(struct sockaddr_atmsvc)) + return -EINVAL; + lock_sock(sk); + if (sock->state == SS_CONNECTED) { + error = -EISCONN; + goto out; + } + if (sock->state != SS_UNCONNECTED) { + error = -EINVAL; + goto out; + } vcc = ATM_SD(sock); - if (test_bit(ATM_VF_SESSION,&vcc->flags)) return -EINVAL; + if (test_bit(ATM_VF_SESSION, &vcc->flags)) { + error = -EINVAL; + goto out; + } addr = (struct sockaddr_atmsvc *) sockaddr; - if (addr->sas_family != AF_ATMSVC) return -EAFNOSUPPORT; + if (addr->sas_family != AF_ATMSVC) { + error = -EAFNOSUPPORT; + goto out; + } clear_bit(ATM_VF_BOUND,&vcc->flags); /* failing rebind will kill old binding */ /* @@@ check memory (de)allocation on rebind */ - if (!test_bit(ATM_VF_HASQOS,&vcc->flags)) return -EBADFD; + if (!test_bit(ATM_VF_HASQOS,&vcc->flags)) { + error = -EBADFD; + goto out; + } vcc->local = *addr; vcc->reply = WAITING; - add_wait_queue(&vcc->sleep,&wait); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_bind,NULL,NULL,&vcc->local); while (vcc->reply == WAITING && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); schedule(); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); } - remove_wait_queue(&vcc->sleep,&wait); + finish_wait(&vcc->sleep, &wait); clear_bit(ATM_VF_REGIS,&vcc->flags); /* doesn't count */ - if (!sigd) return -EUNATCH; - if (!vcc->reply) set_bit(ATM_VF_BOUND,&vcc->flags); - return vcc->reply; + if (!sigd) { + error = -EUNATCH; + goto out; + } + if (!vcc->reply) + set_bit(ATM_VF_BOUND,&vcc->flags); + error = vcc->reply; +out: + release_sock(sk); + return error; } static int svc_connect(struct socket *sock,struct sockaddr *sockaddr, int sockaddr_len,int flags) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); + struct sock *sk = sock->sk; struct sockaddr_atmsvc *addr; struct atm_vcc *vcc = ATM_SD(sock); int error; DPRINTK("svc_connect %p\n",vcc); - if (sockaddr_len != sizeof(struct sockaddr_atmsvc)) return -EINVAL; - if (sock->state == SS_CONNECTED) return -EISCONN; - if (sock->state == SS_CONNECTING) { - if (vcc->reply == WAITING) return -EALREADY; - sock->state = SS_UNCONNECTED; - if (vcc->reply) return vcc->reply; + lock_sock(sk); + if (sockaddr_len != sizeof(struct sockaddr_atmsvc)) { + error = -EINVAL; + goto out; } - else { - int error; - if (sock->state != SS_UNCONNECTED) return -EINVAL; - if (test_bit(ATM_VF_SESSION,&vcc->flags)) return -EINVAL; + switch (sock->state) { + default: + error = -EINVAL; + goto out; + case SS_CONNECTED: + error = -EISCONN; + goto out; + case SS_CONNECTING: + if (vcc->reply == WAITING) { + error = -EALREADY; + goto out; + } + sock->state = SS_UNCONNECTED; + if (vcc->reply) { + error = vcc->reply; + goto out; + } + break; + case SS_UNCONNECTED: + if (test_bit(ATM_VF_SESSION, &vcc->flags)) { + error = -EINVAL; + goto out; + } addr = (struct sockaddr_atmsvc *) sockaddr; - if (addr->sas_family != AF_ATMSVC) return -EAFNOSUPPORT; - if (!test_bit(ATM_VF_HASQOS,&vcc->flags)) return -EBADFD; + if (addr->sas_family != AF_ATMSVC) { + error = -EAFNOSUPPORT; + goto out; + } + if (!test_bit(ATM_VF_HASQOS, &vcc->flags)) { + error = -EBADFD; + goto out; + } if (vcc->qos.txtp.traffic_class == ATM_ANYCLASS || - vcc->qos.rxtp.traffic_class == ATM_ANYCLASS) - return -EINVAL; + vcc->qos.rxtp.traffic_class == ATM_ANYCLASS) { + error = -EINVAL; + goto out; + } if (!vcc->qos.txtp.traffic_class && - !vcc->qos.rxtp.traffic_class) return -EINVAL; + !vcc->qos.rxtp.traffic_class) { + error = -EINVAL; + goto out; + } vcc->remote = *addr; vcc->reply = WAITING; - add_wait_queue(&vcc->sleep,&wait); + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); sigd_enq(vcc,as_connect,NULL,NULL,&vcc->remote); if (flags & O_NONBLOCK) { - remove_wait_queue(&vcc->sleep,&wait); + finish_wait(&vcc->sleep, &wait); sock->state = SS_CONNECTING; - return -EINPROGRESS; + error = -EINPROGRESS; + goto out; } error = 0; while (vcc->reply == WAITING && sigd) { - set_current_state(TASK_INTERRUPTIBLE); schedule(); - if (!signal_pending(current)) continue; + if (!signal_pending(current)) { + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + continue; + } DPRINTK("*ABORT*\n"); /* * This is tricky: @@ -196,13 +254,13 @@ */ sigd_enq(vcc,as_close,NULL,NULL,NULL); while (vcc->reply == WAITING && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); schedule(); } if (!vcc->reply) while (!test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); + prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); schedule(); } clear_bit(ATM_VF_REGIS,&vcc->flags); @@ -212,10 +270,17 @@ error = -EINTR; break; } - remove_wait_queue(&vcc->sleep,&wait); - if (error) return error; - if (!sigd) return -EUNATCH; - if (vcc->reply) return vcc->reply; + finish_wait(&vcc->sleep, &wait); + if (error) + goto out; + if (!sigd) { + error = -EUNATCH; + goto out; + } + if (vcc->reply) { + error = vcc->reply; + goto out; + } } /* * Not supported yet @@ -231,53 +296,70 @@ if (!(error = atm_connect(sock,vcc->itf,vcc->vpi,vcc->vci))) sock->state = SS_CONNECTED; else (void) svc_disconnect(vcc); +out: + release_sock(sk); return error; } static int svc_listen(struct socket *sock,int backlog) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); + struct sock *sk = sock->sk; struct atm_vcc *vcc = ATM_SD(sock); + int error; DPRINTK("svc_listen %p\n",vcc); + lock_sock(sk); /* let server handle listen on unbound sockets */ - if (test_bit(ATM_VF_SESSION,&vcc->flags)) return -EINVAL; + if (test_bit(ATM_VF_SESSION,&vcc->flags)) { + error = -EINVAL; + goto out; + } vcc->reply = WAITING; - add_wait_queue(&vcc->sleep,&wait); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_listen,NULL,NULL,&vcc->local); while (vcc->reply == WAITING && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); schedule(); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + } + finish_wait(&vcc->sleep, &wait); + if (!sigd) { + error = -EUNATCH; + goto out; } - remove_wait_queue(&vcc->sleep,&wait); - if (!sigd) return -EUNATCH; set_bit(ATM_VF_LISTEN,&vcc->flags); vcc->sk->sk_max_ack_backlog = backlog > 0 ? backlog : ATM_BACKLOG_DEFAULT; - return vcc->reply; + error = vcc->reply; +out: + release_sock(sk); + return error; } static int svc_accept(struct socket *sock,struct socket *newsock,int flags) { + struct sock *sk = sock->sk; struct sk_buff *skb; struct atmsvc_msg *msg; struct atm_vcc *old_vcc = ATM_SD(sock); struct atm_vcc *new_vcc; int error; + lock_sock(sk); + error = svc_create(newsock,0); if (error) - return error; + goto out; new_vcc = ATM_SD(newsock); DPRINTK("svc_accept %p -> %p\n",old_vcc,new_vcc); while (1) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); - add_wait_queue(&old_vcc->sleep,&wait); + prepare_to_wait(&old_vcc->sleep, &wait, TASK_INTERRUPTIBLE); while (!(skb = skb_dequeue(&old_vcc->sk->sk_receive_queue)) && sigd) { if (test_bit(ATM_VF_RELEASED,&old_vcc->flags)) break; @@ -289,16 +371,22 @@ error = -EAGAIN; break; } - set_current_state(TASK_INTERRUPTIBLE); + release_sock(sk); schedule(); + lock_sock(sk); if (signal_pending(current)) { error = -ERESTARTSYS; break; } + prepare_to_wait(&old_vcc->sleep, &wait, TASK_INTERRUPTIBLE); + } + finish_wait(&old_vcc->sleep, &wait); + if (error) + goto out; + if (!skb) { + error = -EUNATCH; + goto out; } - remove_wait_queue(&old_vcc->sleep,&wait); - if (error) return error; - if (!skb) return -EUNATCH; msg = (struct atmsvc_msg *) skb->data; new_vcc->qos = msg->qos; set_bit(ATM_VF_HASQOS,&new_vcc->flags); @@ -312,23 +400,34 @@ if (error) { sigd_enq2(NULL,as_reject,old_vcc,NULL,NULL, &old_vcc->qos,error); - return error == -EAGAIN ? -EBUSY : error; + error = error == -EAGAIN ? -EBUSY : error; + goto out; } /* wait should be short, so we ignore the non-blocking flag */ new_vcc->reply = WAITING; - add_wait_queue(&new_vcc->sleep,&wait); + prepare_to_wait(&new_vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(new_vcc,as_accept,old_vcc,NULL,NULL); while (new_vcc->reply == WAITING && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); + release_sock(sk); schedule(); + lock_sock(sk); + prepare_to_wait(&new_vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + } + finish_wait(&new_vcc->sleep, &wait); + if (!sigd) { + error = -EUNATCH; + goto out; } - remove_wait_queue(&new_vcc->sleep,&wait); - if (!sigd) return -EUNATCH; if (!new_vcc->reply) break; - if (new_vcc->reply != -ERESTARTSYS) return new_vcc->reply; + if (new_vcc->reply != -ERESTARTSYS) { + error = new_vcc->reply; + goto out; + } } newsock->state = SS_CONNECTED; - return 0; +out: + release_sock(sk); + return error; } @@ -347,17 +446,17 @@ int svc_change_qos(struct atm_vcc *vcc,struct atm_qos *qos) { - DECLARE_WAITQUEUE(wait,current); + DEFINE_WAIT(wait); vcc->reply = WAITING; - add_wait_queue(&vcc->sleep,&wait); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq2(vcc,as_modify,NULL,NULL,&vcc->local,qos,0); while (vcc->reply == WAITING && !test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { - set_current_state(TASK_UNINTERRUPTIBLE); schedule(); + prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); } - remove_wait_queue(&vcc->sleep,&wait); + finish_wait(&vcc->sleep, &wait); if (!sigd) return -EUNATCH; return vcc->reply; } @@ -366,33 +465,57 @@ static int svc_setsockopt(struct socket *sock,int level,int optname, char *optval,int optlen) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; + int error = 0; if (!__SO_LEVEL_MATCH(optname, level) || optname != SO_ATMSAP || - optlen != sizeof(struct atm_sap)) - return atm_setsockopt(sock,level,optname,optval,optlen); + optlen != sizeof(struct atm_sap)) { + error = atm_setsockopt(sock, level, optname, optval, optlen); + goto out; + } vcc = ATM_SD(sock); - if (copy_from_user(&vcc->sap,optval,optlen)) return -EFAULT; - set_bit(ATM_VF_HASSAP,&vcc->flags); - return 0; + if (copy_from_user(&vcc->sap, optval, optlen)) { + error = -EFAULT; + goto out; + } + set_bit(ATM_VF_HASSAP, &vcc->flags); +out: + release_sock(sk); + return error; } static int svc_getsockopt(struct socket *sock,int level,int optname, char *optval,int *optlen) { - int len; + struct sock *sk = sock->sk; + int error = 0, len; - if (!__SO_LEVEL_MATCH(optname, level) || optname != SO_ATMSAP) - return atm_getsockopt(sock,level,optname,optval,optlen); - if (get_user(len,optlen)) return -EFAULT; - if (len != sizeof(struct atm_sap)) return -EINVAL; - return copy_to_user(optval,&ATM_SD(sock)->sap,sizeof(struct atm_sap)) ? - -EFAULT : 0; + lock_sock(sk); + if (!__SO_LEVEL_MATCH(optname, level) || optname != SO_ATMSAP) { + error = atm_getsockopt(sock, level, optname, optval, optlen); + goto out; + } + if (get_user(len, optlen)) { + error = -EFAULT; + goto out; + } + if (len != sizeof(struct atm_sap)) { + error = -EINVAL; + goto out; + } + if (copy_to_user(optval, &ATM_SD(sock)->sap, sizeof(struct atm_sap))) { + error = -EFAULT; + goto out; + } +out: + release_sock(sk); + return error; } -static struct proto_ops SOCKOPS_WRAPPED(svc_proto_ops) = { +static struct proto_ops svc_proto_ops = { .family = PF_ATMSVC, .release = svc_release, @@ -407,15 +530,12 @@ .shutdown = svc_shutdown, .setsockopt = svc_setsockopt, .getsockopt = svc_getsockopt, - .sendmsg = atm_sendmsg, + .sendmsg = vcc_sendmsg, .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; - -#include -SOCKOPS_WRAP(svc_proto, PF_ATMSVC); static int svc_create(struct socket *sock,int protocol) { [atm]: getsockopt/setsockopt cleanup # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1401 -> 1.1402 # net/atm/pvc.c 1.11 -> 1.12 # net/atm/svc.c 1.13 -> 1.14 # net/atm/common.h 1.7 -> 1.8 # net/atm/common.c 1.30 -> 1.31 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/17 chas@relax.cmf.nrl.navy.mil 1.1402 # setsockopt/getsockopt cleanup # -------------------------------------------- # diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:20:17 2003 +++ b/net/atm/common.c Tue Jun 17 08:20:17 2003 @@ -890,14 +890,16 @@ return check_tp(&qos->rxtp); } - -static int atm_do_setsockopt(struct socket *sock,int level,int optname, - void *optval,int optlen) +int vcc_setsockopt(struct socket *sock, int level, int optname, + char *optval, int optlen) { struct atm_vcc *vcc; unsigned long value; int error; + if (__SO_LEVEL_MATCH(optname, level) && optlen != __SO_SIZE(optname)) + return -EINVAL; + vcc = ATM_SD(sock); switch (optname) { case SO_ATMQOS: @@ -931,10 +933,16 @@ } -static int atm_do_getsockopt(struct socket *sock,int level,int optname, - void *optval,int optlen) +int vcc_getsockopt(struct socket *sock, int level, int optname, + char *optval, int *optlen) { struct atm_vcc *vcc; + int len; + + if (get_user(len, optlen)) + return -EFAULT; + if (__SO_LEVEL_MATCH(optname, level) && len != __SO_SIZE(optname)) + return -EINVAL; vcc = ATM_SD(sock); switch (optname) { @@ -965,28 +973,7 @@ break; } if (!vcc->dev || !vcc->dev->ops->getsockopt) return -EINVAL; - return vcc->dev->ops->getsockopt(vcc,level,optname,optval,optlen); -} - - -int atm_setsockopt(struct socket *sock,int level,int optname,char *optval, - int optlen) -{ - if (__SO_LEVEL_MATCH(optname, level) && optlen != __SO_SIZE(optname)) - return -EINVAL; - return atm_do_setsockopt(sock,level,optname,optval,optlen); -} - - -int atm_getsockopt(struct socket *sock,int level,int optname, - char *optval,int *optlen) -{ - int len; - - if (get_user(len,optlen)) return -EFAULT; - if (__SO_LEVEL_MATCH(optname, level) && len != __SO_SIZE(optname)) - return -EINVAL; - return atm_do_getsockopt(sock,level,optname,optval,len); + return vcc->dev->ops->getsockopt(vcc, level, optname, optval, len); } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:20:17 2003 +++ b/net/atm/common.h Tue Jun 17 08:20:17 2003 @@ -19,10 +19,10 @@ int total_len); unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); -int atm_setsockopt(struct socket *sock,int level,int optname,char *optval, - int optlen); -int atm_getsockopt(struct socket *sock,int level,int optname,char *optval, - int *optlen); +int vcc_setsockopt(struct socket *sock, int level, int optname, char *optval, + int optlen); +int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, + int *optlen); int atm_connect_vcc(struct atm_vcc *vcc,int itf,short vpi,int vci); void atm_release_vcc_sk(struct sock *sk,int free_sk); diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:20:16 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:20:16 2003 @@ -70,7 +70,7 @@ int error; lock_sock(sk); - error = atm_setsockopt(sock, level, optname, optval, optlen); + error = vcc_setsockopt(sock, level, optname, optval, optlen); release_sock(sk); return error; } @@ -83,7 +83,7 @@ int error; lock_sock(sk); - error = atm_getsockopt(sock, level, optname, optval, optlen); + error = vcc_getsockopt(sock, level, optname, optval, optlen); release_sock(sk); return error; } diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:20:17 2003 +++ b/net/atm/svc.c Tue Jun 17 08:20:17 2003 @@ -471,7 +471,7 @@ if (!__SO_LEVEL_MATCH(optname, level) || optname != SO_ATMSAP || optlen != sizeof(struct atm_sap)) { - error = atm_setsockopt(sock, level, optname, optval, optlen); + error = vcc_setsockopt(sock, level, optname, optval, optlen); goto out; } vcc = ATM_SD(sock); @@ -494,7 +494,7 @@ lock_sock(sk); if (!__SO_LEVEL_MATCH(optname, level) || optname != SO_ATMSAP) { - error = atm_getsockopt(sock, level, optname, optval, optlen); + error = vcc_getsockopt(sock, level, optname, optval, optlen); goto out; } if (get_user(len, optlen)) { [atm]: connect cleanup # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1402 -> 1.1403 # net/atm/pvc.c 1.12 -> 1.13 # net/atm/svc.c 1.14 -> 1.15 # net/atm/common.h 1.8 -> 1.9 # net/atm/common.c 1.31 -> 1.32 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/17 chas@relax.cmf.nrl.navy.mil 1.1403 # connect cleanup # -------------------------------------------- # diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Tue Jun 17 08:20:36 2003 +++ b/net/atm/common.c Tue Jun 17 08:20:36 2003 @@ -277,8 +277,8 @@ } -static int atm_do_connect_dev(struct atm_vcc *vcc,struct atm_dev *dev,int vpi, - int vci) +static int __vcc_connect(struct atm_vcc *vcc, struct atm_dev *dev, int vpi, + int vci) { int error; @@ -335,29 +335,26 @@ } -static int atm_do_connect(struct atm_vcc *vcc,int itf,int vpi,int vci) +int vcc_connect(struct socket *sock, int itf, short vpi, int vci) { struct atm_dev *dev; - int return_val; - - dev = atm_dev_lookup(itf); - if (!dev) - return_val = -ENODEV; - else { - return_val = atm_do_connect_dev(vcc,dev,vpi,vci); - if (return_val) atm_dev_release(dev); - } - - return return_val; -} + struct atm_vcc *vcc = ATM_SD(sock); + int error; + DPRINTK("vcc_connect (vpi %d, vci %d)\n",vpi,vci); + if (sock->state == SS_CONNECTED) + return -EISCONN; + if (sock->state != SS_UNCONNECTED) + return -EINVAL; + if (!(vpi || vci)) + return -EINVAL; -int atm_connect_vcc(struct atm_vcc *vcc,int itf,short vpi,int vci) -{ if (vpi != ATM_VPI_UNSPEC && vci != ATM_VCI_UNSPEC) clear_bit(ATM_VF_PARTIAL,&vcc->flags); - else if (test_bit(ATM_VF_PARTIAL,&vcc->flags)) return -EINVAL; - DPRINTK("atm_connect (TX: cl %d,bw %d-%d,sdu %d; " + else + if (test_bit(ATM_VF_PARTIAL,&vcc->flags)) + return -EINVAL; + DPRINTK("vcc_connect (TX: cl %d,bw %d-%d,sdu %d; " "RX: cl %d,bw %d-%d,sdu %d,AAL %s%d)\n", vcc->qos.txtp.traffic_class,vcc->qos.txtp.min_pcr, vcc->qos.txtp.max_pcr,vcc->qos.txtp.max_sdu, @@ -365,50 +362,39 @@ vcc->qos.rxtp.max_pcr,vcc->qos.rxtp.max_sdu, vcc->qos.aal == ATM_AAL5 ? "" : vcc->qos.aal == ATM_AAL0 ? "" : " ??? code ",vcc->qos.aal == ATM_AAL0 ? 0 : vcc->qos.aal); - if (!test_bit(ATM_VF_HASQOS,&vcc->flags)) return -EBADFD; + if (!test_bit(ATM_VF_HASQOS, &vcc->flags)) + return -EBADFD; if (vcc->qos.txtp.traffic_class == ATM_ANYCLASS || vcc->qos.rxtp.traffic_class == ATM_ANYCLASS) return -EINVAL; if (itf != ATM_ITF_ANY) { - int error; - - error = atm_do_connect(vcc,itf,vpi,vci); - if (error) return error; - } - else { - struct atm_dev *dev = NULL; + dev = atm_dev_lookup(itf); + error = __vcc_connect(vcc, dev, vpi, vci); + if (error) { + atm_dev_release(dev); + return error; + } + } else { struct list_head *p, *next; + dev = NULL; spin_lock(&atm_dev_lock); list_for_each_safe(p, next, &atm_devs) { dev = list_entry(p, struct atm_dev, dev_list); atm_dev_hold(dev); spin_unlock(&atm_dev_lock); - if (!atm_do_connect_dev(vcc,dev,vpi,vci)) + if (!__vcc_connect(vcc, dev, vpi, vci)) break; atm_dev_release(dev); dev = NULL; spin_lock(&atm_dev_lock); } spin_unlock(&atm_dev_lock); - if (!dev) return -ENODEV; + if (!dev) + return -ENODEV; } if (vpi == ATM_VPI_UNSPEC || vci == ATM_VCI_UNSPEC) set_bit(ATM_VF_PARTIAL,&vcc->flags); - return 0; -} - - -int atm_connect(struct socket *sock,int itf,short vpi,int vci) -{ - int error; - - DPRINTK("atm_connect (vpi %d, vci %d)\n",vpi,vci); - if (sock->state == SS_CONNECTED) return -EISCONN; - if (sock->state != SS_UNCONNECTED) return -EINVAL; - if (!(vpi || vci)) return -EINVAL; - error = atm_connect_vcc(ATM_SD(sock),itf,vpi,vci); - if (error) return error; if (test_bit(ATM_VF_READY,&ATM_SD(sock)->flags)) sock->state = SS_CONNECTED; return 0; diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Tue Jun 17 08:20:36 2003 +++ b/net/atm/common.h Tue Jun 17 08:20:36 2003 @@ -12,7 +12,7 @@ int atm_create(struct socket *sock,int protocol,int family); int atm_release(struct socket *sock); -int atm_connect(struct socket *sock,int itf,short vpi,int vci); +int vcc_connect(struct socket *sock, int itf, short vpi, int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, @@ -24,7 +24,6 @@ int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen); -int atm_connect_vcc(struct atm_vcc *vcc,int itf,short vpi,int vci); void atm_release_vcc_sk(struct sock *sk,int free_sk); void atm_shutdown_dev(struct atm_dev *dev); diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Tue Jun 17 08:20:36 2003 +++ b/net/atm/pvc.c Tue Jun 17 08:20:36 2003 @@ -49,7 +49,7 @@ if (vcc->vpi != ATM_VPI_UNSPEC) addr->sap_addr.vpi = vcc->vpi; if (vcc->vci != ATM_VCI_UNSPEC) addr->sap_addr.vci = vcc->vci; } - error = atm_connect(sock, addr->sap_addr.itf, addr->sap_addr.vpi, + error = vcc_connect(sock, addr->sap_addr.itf, addr->sap_addr.vpi, addr->sap_addr.vci); out: release_sock(sk); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Tue Jun 17 08:20:36 2003 +++ b/net/atm/svc.c Tue Jun 17 08:20:36 2003 @@ -293,7 +293,7 @@ /* * #endif */ - if (!(error = atm_connect(sock,vcc->itf,vcc->vpi,vcc->vci))) + if (!(error = vcc_connect(sock, vcc->itf, vcc->vpi, vcc->vci))) sock->state = SS_CONNECTED; else (void) svc_disconnect(vcc); out: @@ -393,8 +393,8 @@ new_vcc->remote = msg->svc; new_vcc->local = msg->local; new_vcc->sap = msg->sap; - error = atm_connect(newsock,msg->pvc.sap_addr.itf, - msg->pvc.sap_addr.vpi,msg->pvc.sap_addr.vci); + error = vcc_connect(newsock, msg->pvc.sap_addr.itf, + msg->pvc.sap_addr.vpi, msg->pvc.sap_addr.vci); dev_kfree_skb(skb); old_vcc->sk->sk_ack_backlog--; if (error) { From sakalra@hss.hns.com Tue Jun 17 06:05:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 06:05:57 -0700 (PDT) Received: from hindon.hss.co.in (hindon.hss.co.in [202.54.26.202]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HD5f2x022767 for ; Tue, 17 Jun 2003 06:05:44 -0700 Received: from hindon.hss.co.in (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with ESMTP id h5HD2wE16848 for ; Tue, 17 Jun 2003 18:32:58 +0530 (IST) Received: from ultra.hss.co.in (ultra [192.168.100.5]) by hindon.hss.co.in (8.10.0/8.10.0) with ESMTP id h5HD2v916844 for ; Tue, 17 Jun 2003 18:32:57 +0530 (IST) Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by ultra.hss.co.in (8.10.0/8.10.0) with ESMTP id h5HD6Tg01360 for ; Tue, 17 Jun 2003 18:36:29 +0530 (IST) Sensitivity: Subject: Help for GCOV in MVL 2.4.18 Prof. version To: netdev@oss.sgi.com From: sakalra@hss.hns.com Date: Tue, 17 Jun 2003 18:31:27 +0530 Message-ID: X-MIMETrack: Serialize by Router on Sandesh/HSS(Release 6.0|September 26, 2002) at 17/06/2003 06:30:27 PM MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3319 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sakalra@hss.hns.com Precedence: bulk X-list: netdev hi , I am trying with some profiling and coverage for my code on Montavista 2.4.18 (linux2.4.18_mvl30) Professional edition. The GCOV patch for 2.4.18 (I suppose its for RedHat kernel) fails. I tried applying it manually but then it was creating problems while profiling my driver. When i tried running gcov on file.c at path /proc/module/opt/hardhat/mydriver/file1.c, GCOV abrubtly ends saying "Aborted". There is a file1.da created at this location once i have loaded the gcov-proc.o.. I want to know whether there is any seperate patch available for MVL or its the same patch. If someone has tried this then kindly help in this regards, tia, -Sandeep kalra From babydr@baby-dragons.com Tue Jun 17 07:35:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 07:36:02 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.system-techniques.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HEZs2x026713 for ; Tue, 17 Jun 2003 07:35:55 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h5HEZA9R006828; Tue, 17 Jun 2003 10:35:26 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h5HEYj85006819; Tue, 17 Jun 2003 10:35:00 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Tue, 17 Jun 2003 10:34:45 -0400 (EDT) From: "Mr. James W. Laferriere" To: "David S. Miller" cc: Linux Kernel Maillist , netdev@oss.sgi.com Subject: Re: patch for common networking error messages In-Reply-To: <20030616.181946.22044667.davem@redhat.com> Message-ID: References: <20030616.181946.22044667.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3320 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello Dave , I know your time is constrained . BUT DAMN IT there were other questions ask in the (partially quoted) email you are replying to . Some of those answers would be nice to know . RSVP , JimL On Mon, 16 Jun 2003, David S. Miller wrote: > From: Janice Girouard > Date: Mon, 16 Jun 2003 19:44:22 -0500 > It sounds like you are proposing a new family for the netlink > subsystem. > Exactly, you have to create this. -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ From yoshfuji@linux-ipv6.org Tue Jun 17 08:06:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 08:06:33 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HF6N2x028235 for ; Tue, 17 Jun 2003 08:06:25 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5HF7SBo002978; Wed, 18 Jun 2003 00:07:28 +0900 Date: Wed, 18 Jun 2003 00:07:28 +0900 (JST) Message-Id: <20030618.000728.77526626.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: vnuorval@tcs.hut.fi, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: kill 2 warnings in net/ipv6/ip6_tunnel.c From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3321 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This patch kills 2 warnings in net/ipv6/ip6_tunnel.c. Thanks. Index: linux25-LINUS/net/ipv6/ip6_tunnel.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_tunnel.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 ip6_tunnel.c --- linux25-LINUS/net/ipv6/ip6_tunnel.c 10 Jun 2003 13:21:55 -0000 1.1.1.1 +++ linux25-LINUS/net/ipv6/ip6_tunnel.c 17 Jun 2003 14:47:00 -0000 @@ -230,8 +230,6 @@ dev->init = ip6ip6_tnl_dev_init; memcpy(&t->parms, p, sizeof (*p)); t->parms.name[IFNAMSIZ - 1] = '\0'; - if (t->parms.hop_limit > 255) - t->parms.hop_limit = -1; strcpy(dev->name, t->parms.name); if (!dev->name[0]) { int i = 0; @@ -952,7 +950,7 @@ ipv6_addr_copy(&t->parms.laddr, &p->laddr); ipv6_addr_copy(&t->parms.raddr, &p->raddr); t->parms.flags = p->flags; - t->parms.hop_limit = (p->hop_limit <= 255 ? p->hop_limit : -1); + t->parms.hop_limit = p->hop_limit; t->parms.encap_limit = p->encap_limit; t->parms.flowinfo = p->flowinfo; ip6ip6_tnl_link_config(t); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From kuznet@ms2.inr.ac.ru Tue Jun 17 08:07:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 08:07:07 -0700 (PDT) Received: from dub.inr.ac.ru (dub.inr.ac.ru [193.233.7.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HF6u2x028292 for ; Tue, 17 Jun 2003 08:07:01 -0700 Received: (from kuznet@localhost) by dub.inr.ac.ru (8.6.13/ANK) id TAA25376; Tue, 17 Jun 2003 19:05:59 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200306171505.TAA25376@dub.inr.ac.ru> Subject: Re: IPSEC problems with GRE. To: jmorris@intercode.com.au (James Morris) Date: Tue, 17 Jun 2003 19:05:59 +0400 (MSD) Cc: jblake@omgwallhack.org, netdev@oss.sgi.com, davem@redhat.com In-Reply-To: from "James Morris" at éŔÎ 17, 2003 09:28:45 X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3322 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > > 3) Kernel panic, with no SysRq. Please, send backtrace yet. And explain, what is "with no SysRq". :-) Alexey From davem@redhat.com Tue Jun 17 09:03:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:04:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HG3w2x029524 for ; Tue, 17 Jun 2003 09:03:59 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA03267; Tue, 17 Jun 2003 08:59:21 -0700 Date: Tue, 17 Jun 2003 08:59:21 -0700 (PDT) Message-Id: <20030617.085921.28790392.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030616234937.GE18484@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3323 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 16 Jun 2003 16:49:37 -0700 60.0073 seconds passed, avg forwarding rate: 157557.710 pps ...Looks like a tad worse than with your patch, but not by much. Forwarding rate is still pretty crappy for an Opteron. Will fiddle a bit more tonight to see what I can do. To be honest, this isn't half-bad for pure DoS load. This reminds me, maybe a good test would be PPS for "well behaved flows" in the presence of DoS load. You'd probably need 4 systems to carry out such a test accurately. Because, really, who cares how fast we can forward the DoS traffic as long as legitimate users still see good metrics. From shemminger@osdl.org Tue Jun 17 09:09:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:09:49 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HG9b2x029896 for ; Tue, 17 Jun 2003 09:09:40 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5HG8xX10138; Tue, 17 Jun 2003 09:09:00 -0700 Date: Tue, 17 Jun 2003 09:08:59 -0700 From: Stephen Hemminger To: Valdis.Kletnieks@vt.edu Cc: girouard@us.ibm.com, davem@redhat.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages Message-Id: <20030617090859.0ffa0ca8.shemminger@osdl.org> In-Reply-To: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3324 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Binary interface's will never cut it. Read the hotplug thread to see how Linus said, he will never add a binary event daemon interface. That said, there is an oppurtunity for to provide a ascii interface (similar to /sbin/hotplug) decodes the data from the rtnetlink interface in a standardized format. Then it would be easy to write things like perl monitoring scripts that do things like: perl phone-me-if-network-dies.pl < /proc/net/events Don't flame me about the choce of name. /proc/net/events is not the right name to use for such an interface since adding more to /proc is probably not desired. From davem@redhat.com Tue Jun 17 09:14:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:14:06 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HGE22x031269 for ; Tue, 17 Jun 2003 09:14:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA03317; Tue, 17 Jun 2003 09:09:31 -0700 Date: Tue, 17 Jun 2003 09:09:30 -0700 (PDT) Message-Id: <20030617.090930.102574393.davem@redhat.com> To: shemminger@osdl.org Cc: Valdis.Kletnieks@vt.edu, girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <20030617090859.0ffa0ca8.shemminger@osdl.org> References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> <20030617090859.0ffa0ca8.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3325 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 17 Jun 2003 09:08:59 -0700 Read the hotplug thread to see how Linus said, he will never add a binary event daemon interface. Funny, rtnetlink is exactly this and it is in the tree :-) Every networking configuration event is transmitted over rtnetlink sockets to all listeners, in a fixed binary format. What Linus doesn't want is this for configuration events, ie. things like "device appears" not "my ethernet burped" From toml@us.ibm.com Tue Jun 17 09:28:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:28:13 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HGS22x031744 for ; Tue, 17 Jun 2003 09:28:09 -0700 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HGR7td173758; Tue, 17 Jun 2003 12:27:07 -0400 Received: from d01ml072.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HGR4mu116042; Tue, 17 Jun 2003 12:27:05 -0400 Subject: Re: IPSec: Policy dst bundles exhausting storage To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Tue, 17 Jun 2003 11:26:55 -0500 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/17/2003 12:27:06 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3326 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev I have a better suggestion for fix: 1) Delete the "x->u.rt.fl = *fl;" line completely. 2) Fix the test in __xfrm6_find_bundle() to do a proper prefix-mask based address comparison. rt6->rt6i_{dst,src} are masked addresses, so direct comparison is wrong. Can someone code this up? It doesn't look like anyone worked on this yet, so now that I'm back I'm starting to look at it. In prep for coding this up I deleted the "x->u.rt.fl = *fl;" line and then I noticed that (at least in my configuration) that the rt6->rt6i_src address and prefix length in the xfrm_dst structure are always zero. It's hard to tell if this is what the actual value should be or if it is just not getting initialized. Doing a ping from Machine 1 to Machine 3 and based on my configuration I would expect to see a source address/prefix length of fec0::/64. The rt6->rt6i_dst address and prefix length are correct with fec0:0:0:2::/64. Any ideas? My IPSec configuration looks like this: Machine 1 Machine 2 Machine 3 fec0:0:0:1::10 fec0::1 ----------- fec0::2 fec0:0:0:2::10 ---- fec0:0:0:2::11 Machine 1: spdadd fec0:0:0:0::/64 fec0:0:0:2::/64 -P out ipsec esp/tunnel/fec0::1-fec0::2/require; spdadd fec0:0:0:2::/64 fec0:0:0:0::/64 -P in ipsec esp/tunnel/fec0::2-fec0::1/require; add fec0::1 fec0::2 ... (spi, algorithms and keys) add fec0::2 fec0::1 ... (spi, algorithms and keys) Machine 2: spdadd fec0:0:0:0::/64 fec0:0:0:2::/64 -P in ipsec esp/tunnel/fec0::1-fec0::2/require; spdadd fec0:0:0:2::/64 fec0:0:0:0::/64 -P out ipsec esp/tunnel/fec0::2-fec0::1/require; add fec0::1 fec0::2 ... (spi, algorithms and keys) add fec0::2 fec0::1 ... (spi, algorithms and keys) A netstat -rn --inet6 on Machine 1 yields: Kernel IPv6 routing table Destination Next Hop Flags Metric Ref Use Iface ::1/128 :: U 0 0 0 lo fe80::/128 :: U 0 0 0 lo fe80::201:3ff:fe33:5355/128 :: U 0 0 0 lo fe80::202:55ff:fe7c:79b6/128 :: U 0 0 0 lo fe80::202:55ff:fee4:5bb6/128 :: U 0 0 0 lo fe80::/64 :: UA 256 0 0 eth0 fe80::/64 :: UA 256 0 0 eth1 fe80::/64 :: UA 256 0 0 eth2 fec0::/128 :: U 0 0 0 lo fec0::1/128 :: U 0 38 0 lo fec0::2/128 fec0::2 UAC 0 2055 2049 eth1 fec0::/64 :: UA 256 1 0 eth1 fec0:0:0:1::/128 :: U 0 0 0 lo fec0:0:0:1::10/128 :: U 0 0 0 lo fec0:0:0:1::/64 :: UA 256 0 0 eth2 fec0:0:0:2::/64 fec0::2 UG 1 15 0 eth1 ff00::/8 :: UA 256 0 0 eth0 ff00::/8 :: UA 256 0 0 eth1 ff00::/8 :: UA 256 0 0 eth2 Thanks, Tom From davem@redhat.com Tue Jun 17 09:41:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:41:50 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HGfe2x032129 for ; Tue, 17 Jun 2003 09:41:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA03451; Tue, 17 Jun 2003 09:36:24 -0700 Date: Tue, 17 Jun 2003 09:36:24 -0700 (PDT) Message-Id: <20030617.093624.35663874.davem@redhat.com> To: toml@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3327 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Tom Lendacky" Date: Tue, 17 Jun 2003 11:26:55 -0500 In prep for coding this up I deleted the "x->u.rt.fl = *fl;" line and then I noticed that (at least in my configuration) that the rt6->rt6i_src address and prefix length in the xfrm_dst structure are always zero. Well, of course. There is nothing initializing this. You have to replace the x->u.rt.fl = *fl line with assignments further down to rt6i_src and friends. Something like: x->u.rt6.rt6i_src = rt0->rt6i_src; etc. etc. I don't understand where you expected these assignments to be made. This is where the objects get constructed, so if it isn't being set here, it is being set nowhere :-) From Robert.Olsson@data.slu.se Tue Jun 17 09:51:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:51:31 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HGp62x032473 for ; Tue, 17 Jun 2003 09:51:07 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id SAA28980; Tue, 17 Jun 2003 18:50:04 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16111.18107.699689.704597@robur.slu.se> Date: Tue, 17 Jun 2003 18:50:03 +0200 To: "David S. Miller" Cc: sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests In-Reply-To: <20030617.085921.28790392.davem@redhat.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3328 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > 60.0073 seconds passed, avg forwarding rate: 157557.710 pps > To be honest, this isn't half-bad for pure DoS load. No thats pretty good and profiles looks as expected. It would interesting to get the singeflow performance as a comparison. Also think Simon used only /32 routes... I took "real" Internet-routing and made a script so it can be used for experiments. I can make it available. Cheers. --ro From davem@redhat.com Tue Jun 17 09:55:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 09:55:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HGtA2x000318 for ; Tue, 17 Jun 2003 09:55:10 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA03538; Tue, 17 Jun 2003 09:50:29 -0700 Date: Tue, 17 Jun 2003 09:50:28 -0700 (PDT) Message-Id: <20030617.095028.35014188.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <16111.18107.699689.704597@robur.slu.se> References: <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3329 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Tue, 17 Jun 2003 18:50:03 +0200 Also think Simon used only /32 routes... I took "real" Internet-routing and made a script so it can be used for experiments. I can make it available. Please do, I'd like to play with such a list locally. From Andrew.Morton@digeo.com Tue Jun 17 10:03:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:03:23 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HH3D2x000704 for ; Tue, 17 Jun 2003 10:03:13 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 17 Jun 2003 00:19:51 -0700 Date: Tue, 17 Jun 2003 00:20:27 -0700 From: Andrew Morton To: Andi Kleen Cc: davem@redhat.com, ak@suse.de, janiceg@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, stekloff@us.ibm.com, girouard@us.ibm.com, lkessler@us.ibm.com, kenistonj@us.ibm.com, jgarzik@pobox.com Subject: Re: patch for common networking error messages Message-Id: <20030617002027.00c96c7a.akpm@digeo.com> In-Reply-To: <20030617070957.GB2752@wotan.suse.de> References: <3EEE28DE.6040808@us.ibm.com> <20030616.133841.35533284.davem@redhat.com> <20030616205342.GH30400@wotan.suse.de> <20030616.135124.71580008.davem@redhat.com> <20030616152707.58da808c.akpm@digeo.com> <20030617070957.GB2752@wotan.suse.de> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 17 Jun 2003 07:19:51.0246 (UTC) FILETIME=[D7900EE0:01C334A0] X-archive-position: 3330 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Andi Kleen wrote: > > > Actually it already does, to cover the case where an interrupt handler calls > > printk while process-context code is performing a printk. > > I don't think it'll work. Both printk and release_console_sem take the logbuf_lock, > which will deadlock if the same CPU already holds it. Look more closely. logbuf_lock is only held to protect the logbuf contents and its indices. And to pin down the current console_sem holder to reliably ensure that he'll print the text which the nested printer just placed in the buffer. We do not call the console drivers while holding logbuf_lock. console_sem is held across the console driver call. From Robert.Olsson@data.slu.se Tue Jun 17 10:04:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:05:03 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HH4a2x001012 for ; Tue, 17 Jun 2003 10:04:37 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id TAA29233; Tue, 17 Jun 2003 19:03:38 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16111.18921.939537.978325@robur.slu.se> Date: Tue, 17 Jun 2003 19:03:37 +0200 To: "David S. Miller" Cc: Robert Olsson , sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <16105.43543.826589.672148@robur.slu.se> References: <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> <16101.4136.328760.955758@robur.slu.se> <20030612.232114.71088346.davem@redhat.com> <16105.43543.826589.672148@robur.slu.se> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3331 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev > David S. Miller writes: > Next, we should put similar metrics into fib_hash.c A starting point... Kernel hack enclosed and companion app from: ftp://robur.slu.se/pub/Linux/net-development/fibstat Just some hash metrics yet. Output below is from our DoS tests: lookup_total == hash lookup/sec zone_search == zones search/sec chain_search == chain search/sec lookup_total zone_search chain_search 0 0 0 0 0 0 0 0 0 475084 4513198 2454249 861704 8186188 4450394 867935 8245366 4480320 863319 8201514 4458924 864056 8208532 4463344 863788 8205986 4461238 861772 8186834 4449507 --- include/net/ip_fib.h.030617 2003-06-17 15:03:57.000000000 +0200 +++ include/net/ip_fib.h 2003-06-17 16:07:00.000000000 +0200 @@ -135,6 +135,21 @@ unsigned char tb_data[0]; }; +struct fib_stat +{ + unsigned int lookup_total; + unsigned int zone_search; + unsigned int chain_search; +}; + +extern struct fib_stat *fib_stat; +#define FIB_STAT_INC(field) \ + (per_cpu_ptr(fib_stat, smp_processor_id())->field++) + + +extern int __init fib_stat_init(void); + + #ifndef CONFIG_IP_MULTIPLE_TABLES extern struct fib_table *ip_fib_local_table; --- net/ipv4/fib_hash.c.030617 2003-06-15 23:02:21.000000000 +0200 +++ net/ipv4/fib_hash.c 2003-06-17 16:01:45.000000000 +0200 @@ -13,6 +13,11 @@ * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. + * + * + * Fixes: + * Robert Olsson : Added statistics + * */ #include @@ -107,6 +112,10 @@ struct fn_zone *fn_zone_list; }; + +struct fib_stat *fib_stat; + + static __inline__ fn_hash_idx_t fn_hash(fn_key_t key, struct fn_zone *fz) { u32 h = ntohl(key.datum)>>(32 - fz->fz_order); @@ -307,12 +316,19 @@ struct fn_zone *fz; struct fn_hash *t = (struct fn_hash*)tb->tb_data; + FIB_STAT_INC(lookup_total); + read_lock(&fib_hash_lock); for (fz = t->fn_zone_list; fz; fz = fz->fz_next) { struct fib_node *f; fn_key_t k = fz_key(flp->fl4_dst, fz); + FIB_STAT_INC(zone_search); + for (f = fz_chain(k, fz); f; f = f->fn_next) { + + FIB_STAT_INC(chain_search); + if (!fn_key_eq(k, f->fn_key)) { if (fn_key_leq(k, f->fn_key)) break; @@ -1108,6 +1124,54 @@ .release = ip_seq_release, }; +static int fib_stat_get_info(char *buffer, char **start, off_t offset, int length) +{ + int i; + int len = 0; + + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_possible(i)) + continue; + len += sprintf(buffer+len, "%08x %08x %08x \n", + per_cpu_ptr(fib_stat, i)->lookup_total, + per_cpu_ptr(fib_stat, i)->zone_search, + per_cpu_ptr(fib_stat, i)->chain_search + + ); + } + len -= offset; + + if (len > length) + len = length; + if (len < 0) + len = 0; + + *start = buffer + offset; + return len; +} + +int __init fib_stat_init(void) +{ + int i, rc = 0; + + fib_stat = kmalloc_percpu(sizeof (struct fib_stat), + GFP_KERNEL); + if (!fib_stat) { + rc = -ENOMEM; + goto out; + } + + for (i = 0; i < NR_CPUS; i++) { + if (cpu_possible(i)) { + memset(per_cpu_ptr(fib_stat, i), 0, + sizeof (struct fib_stat)); + } + } + + out: + return rc; +} + int __init fib_proc_init(void) { struct proc_dir_entry *p; @@ -1116,13 +1180,27 @@ p = create_proc_entry("route", S_IRUGO, proc_net); if (p) p->proc_fops = &fib_seq_fops; - else + else { + rc = -ENOMEM; + goto out; + } + + + + p = proc_net_create ("fib_stat", 0, fib_stat_get_info); + + if (!p) { rc = -ENOMEM; + remove_proc_entry("route", proc_net); + } + + out: return rc; } void __init fib_proc_exit(void) { remove_proc_entry("route", proc_net); + remove_proc_entry("fib_stat", proc_net); } #endif /* CONFIG_PROC_FS */ --- net/ipv4/route.c.030617 2003-06-16 16:56:34.000000000 +0200 +++ net/ipv4/route.c 2003-06-17 16:02:41.000000000 +0200 @@ -2754,7 +2754,8 @@ rt_cache_stat = kmalloc_percpu(sizeof (struct rt_cache_stat), GFP_KERNEL); if (!rt_cache_stat) - goto out_enomem1; + goto out_enomem0; + for (i = 0; i < NR_CPUS; i++) { if (cpu_possible(i)) { memset(per_cpu_ptr(rt_cache_stat, i), 0, @@ -2765,6 +2766,9 @@ devinet_init(); ip_fib_init(); + if(fib_stat_init()) + goto out_enomem1; + init_timer(&rt_flush_timer); rt_flush_timer.function = rt_run_flush; init_timer(&rt_periodic_timer); @@ -2785,7 +2789,7 @@ #ifdef CONFIG_PROC_FS if (rt_cache_proc_init()) - goto out_enomem; + goto out_enomem2; proc_net_create ("rt_cache_stat", 0, rt_cache_stat_get_info); #ifdef CONFIG_NET_CLS_ROUTE create_proc_read_entry("net/rt_acct", 0, 0, ip_rt_acct_read, NULL); @@ -2795,9 +2799,12 @@ xfrm4_init(); out: return rc; -out_enomem: - kfree_percpu(rt_cache_stat); + +out_enomem2: + kfree_percpu(fib_stat); out_enomem1: + kfree_percpu(rt_cache_stat); +out_enomem0: rc = -ENOMEM; goto out; } Cheers. --ro From Andrew.Morton@digeo.com Tue Jun 17 10:05:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:05:21 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HH5E2x001047 for ; Tue, 17 Jun 2003 10:05:14 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 16 Jun 2003 15:29:56 -0700 Date: Mon, 16 Jun 2003 15:30:29 -0700 From: Andrew Morton To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Fw: 2.4.21 oops Message-Id: <20030616153029.0c8f2a20.akpm@digeo.com> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 16 Jun 2003 22:29:56.0047 (UTC) FILETIME=[D025B5F0:01C33456] X-archive-position: 3332 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev This kills 2.5.71++ as well. Program received signal SIGTRAP, Trace/breakpoint trap. skb_over_panic (skb=0xc0400330, sz=-1068485248, here=0xc034540e) at net/core/skbuff.c:87 87 BUG(); (gdb) bt #0 skb_over_panic (skb=0xc0400330, sz=-1068485248, here=0xc034540e) at net/core/skbuff.c:87 #1 0xc0345417 in add_grhead (skb=0xce2e7b68, pmc=0xcc7e05f0, type=44, ppgr=0x2c) at include/linux/skbuff.h:840 #2 0xc03454fe in add_grec (skb=0xce2e7b68, pmc=0xcc7e05f0, type=4, gdeleted=0, sdeleted=0) at net/ipv6/mcast.c:1325 #3 0xc0345b9b in mld_send_cr (idev=0xcecc9290) at net/ipv6/mcast.c:1513 #4 0xc03468e1 in mld_ifc_timer_expire (data=3469513360) at net/ipv6/mcast.c:1904 #5 0xc012c0c7 in run_timer_softirq (h=0xc0504a08) at kernel/timer.c:428 #6 0xc012815a in do_softirq () at kernel/softirq.c:96 #7 0xc0119783 in smp_apic_timer_interrupt (regs= {ebx = -1072657500, ecx = 0, edx = -1068474368, esi = -1068474368, edi = -1072672768, ebp = -1068466236, eax = 0, xds = -1072693125, xes = -1068498821, orig_eax = -17, eip = -1072657456, xcs = 96, eflags = 582, esp = -1068466220, xss = -1072657306}) at arch/i386/kernel/apic.c:1061 #8 0xc010b8c2 in apic_timer_interrupt () #9 0xc0108c66 in cpu_idle () at arch/i386/kernel/process.c:146 #10 0xc010506d in rest_init () at init/main.c:375 #11 0xc0508862 in start_kernel () at init/main.c:467 Begin forwarded message: Date: Mon, 16 Jun 2003 13:40:11 +0200 (CEST) From: Andrzej Sosnowski To: linux-kernel@vger.kernel.org Subject: 2.4.21 oops Hi. Kernel makes an oops while executing the following script: #!/bin/sh for IP in `/usr/bin/seq 3 500`; do ip addr add 3ffe:80ee:c1d::$IP/48 dev eth0 ip addr add 3ffe:80ee:c1d::a:$IP/48 dev eth0 done Result: kernel BUG sched.c 564! (sorry for incomplete oops message) Tested on: debian 2.4.21 debian/redhat 2.4.21-grsec 1.9.10 redhat 2.4.21-uv2-grsec 1.9.10 This script with 2.4.20 working fine. -- ____________________________________________________________________ andrzej sosnowski * raptor@raptor.pl * http://raptor.pl * 0xB71774A2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From Robert.Olsson@data.slu.se Tue Jun 17 10:30:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:30:56 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HHUS2x002213 for ; Tue, 17 Jun 2003 10:30:29 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id TAA29669; Tue, 17 Jun 2003 19:29:25 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16111.20469.376899.55240@robur.slu.se> Date: Tue, 17 Jun 2003 19:29:25 +0200 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests In-Reply-To: <20030617.095028.35014188.davem@redhat.com> References: <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617.095028.35014188.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3333 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > Internet-routing and made a script so it can be used for > experiments. I can make it available. > > Please do, I'd like to play with such a list locally. ftp://robur.slu.se/pub/Linux/net-development/inet_routes/ Just configure the script and run... And Simon can you do a run with this routing table too? And even fibstat output could be interesting to compare. Cheers. --ro From davem@redhat.com Tue Jun 17 10:36:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:36:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HHaF2x002560 for ; Tue, 17 Jun 2003 10:36:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA03812; Tue, 17 Jun 2003 10:31:46 -0700 Date: Tue, 17 Jun 2003 10:31:45 -0700 (PDT) Message-Id: <20030617.103145.26534124.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: netdev@oss.sgi.com Subject: Re: [PATCH][ATM][3/3] assorted changes for atm From: "David S. Miller" In-Reply-To: <200306171240.h5HCeMbB021442@locutus.cmf.nrl.navy.mil> References: <200306171240.h5HCeMbB021442@locutus.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3334 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 17 Jun 2003 08:40:22 -0400 [atm]: keep vcc's on global list instead of per device I've applied all 3 of your patches. Thanks. From chas@locutus.cmf.nrl.navy.mil Tue Jun 17 10:54:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:54:39 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HHsa2x005790 for ; Tue, 17 Jun 2003 10:54:37 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5HHsWsG000656; Tue, 17 Jun 2003 13:54:32 -0400 (EDT) Message-Id: <200306171754.h5HHsWsG000656@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: [PATCH][ATM] use rtnl_{lock,unlock} during device operations (take 2) In-reply-to: Your message of "Fri, 06 Jun 2003 08:55:58 PDT." <20030606.085558.56056656.davem@redhat.com> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 17 Jun 2003 13:52:32 -0400 From: chas williams X-Spam-Score: (*) hits=1.7 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3335 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030606.085558.56056656.davem@redhat.com>,"David S. Miller" writes : > From: Werner Almesberger > Date: Fri, 6 Jun 2003 12:54:16 -0300 > > (If you want to keep Chas busy, the communication between > the kernel and its demons may be a much more interesting > topic ;-) > >Tell me it at least uses netlink ;( so i was doing a bit of thinking about this netlink conversion for signalling (and lane and clip and br2684). would i create a new family for each like NETLINK_SIGNALLING, NETLINK_LANE, or create a single new family NETLINK_ATM and multiplex nlmsg_type, or use the existing NETLINK_USERSOCK? From davem@redhat.com Tue Jun 17 10:58:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 10:58:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HHw12x006102 for ; Tue, 17 Jun 2003 10:58:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA03912; Tue, 17 Jun 2003 10:53:33 -0700 Date: Tue, 17 Jun 2003 10:53:33 -0700 (PDT) Message-Id: <20030617.105333.13769070.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: netdev@oss.sgi.com Subject: Re: [PATCH][ATM] use rtnl_{lock,unlock} during device operations (take 2) From: "David S. Miller" In-Reply-To: <200306171754.h5HHsWsG000656@ginger.cmf.nrl.navy.mil> References: <20030606.085558.56056656.davem@redhat.com> <200306171754.h5HHsWsG000656@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3336 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 17 Jun 2003 13:52:32 -0400 In message <20030606.085558.56056656.davem@redhat.com>,"David S. Miller" writes : >Tell me it at least uses netlink ;( so i was doing a bit of thinking about this netlink conversion for signalling (and lane and clip and br2684). would i create a new family for each like NETLINK_SIGNALLING, NETLINK_LANE, or create a single new family NETLINK_ATM and multiplex nlmsg_type, or use the existing NETLINK_USERSOCK? Don't user NETLINK_USERSOCK, it's for users :-) Create NETLINK_ATM, and then multiplex like rtnetlink does on the message type. From davem@redhat.com Tue Jun 17 11:20:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 11:20:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HIKL2x012995 for ; Tue, 17 Jun 2003 11:20:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id LAA04007; Tue, 17 Jun 2003 11:14:52 -0700 Date: Tue, 17 Jun 2003 11:14:52 -0700 (PDT) Message-Id: <20030617.111452.57442870.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: vnuorval@tcs.hut.fi, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: kill 2 warnings in net/ipv6/ip6_tunnel.c From: "David S. Miller" In-Reply-To: <20030618.000728.77526626.yoshfuji@linux-ipv6.org> References: <20030618.000728.77526626.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3337 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Wed, 18 Jun 2003 00:07:28 +0900 (JST) This patch kills 2 warnings in net/ipv6/ip6_tunnel.c. Applied, thank you. From shemminger@osdl.org Tue Jun 17 11:35:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 11:35:53 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HIZe2x014567 for ; Tue, 17 Jun 2003 11:35:40 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5HIZBX13747; Tue, 17 Jun 2003 11:35:11 -0700 Date: Tue, 17 Jun 2003 11:35:10 -0700 From: Stephen Hemminger To: Chad Tindel , Jay Vosburgh , "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.72] use alloc_netdev in bonding driver Message-Id: <20030617113510.5ae6a5a9.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3338 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This patch replaces the allocation of an array of bonding device structures with allocating net_device's through alloc_netdev. The net_device, statistics, and net_device are created with one allocation via alloc_netdev. The driver was keeping track of bond device's through both an array (bond_devs) and a linked list (these_bonds). It was much simpler just to have one list_head and put the entries on as they are created. Should be easy to get rid of the max_bonds parameter and dynamically add devices as needed in the future. The previous code for bond_event was one of those places that did the right thing in the hardest way possible. Added locking to the /proc interface because if device's are ever dynamically added that will be needed, and there is a small race during initialization today. Tested on 8-way with e1000 by enslave, moving data, unregistering the ether driver, and removing the bond driver. diff -Nru a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c Tue Jun 17 11:05:31 2003 +++ b/drivers/net/bonding/bond_main.c Tue Jun 17 11:05:31 2003 @@ -497,9 +497,7 @@ { NULL, -1}, }; -static int first_pass = 1; -static struct bonding *these_bonds = NULL; -static struct net_device *dev_bonds = NULL; +static LIST_HEAD(bond_dev_list); MODULE_PARM(max_bonds, "i"); MODULE_PARM_DESC(max_bonds, "Max number of bonded devices"); @@ -3408,7 +3406,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length) { - bonding_t *bond = these_bonds; + bonding_t *bond; int len = 0; off_t begin = 0; u16 link; @@ -3416,7 +3414,8 @@ len += sprintf(buf + len, "%s\n", version); - while (bond != NULL) { + read_lock(&dev_base_lock); + list_for_each_entry(bond, &bond_dev_list, bond_list) { /* * This function locks the mutex, so we can't lock it until * afterwards @@ -3526,93 +3525,48 @@ len = 0; } - - bond = bond->next_bond; } + read_unlock(&dev_base_lock); + return len; } static int bond_event(struct notifier_block *this, unsigned long event, void *ptr) { - struct bonding *this_bond = (struct bonding *)these_bonds; - struct bonding *last_bond; struct net_device *event_dev = (struct net_device *)ptr; + struct net_device *master = event_dev->master; - /* while there are bonds configured */ - while (this_bond != NULL) { - if (this_bond == event_dev->priv ) { - switch (event) { - case NETDEV_UNREGISTER: - /* - * remove this bond from a linked list of - * bonds - */ - if (this_bond == these_bonds) { - these_bonds = this_bond->next_bond; - } else { - for (last_bond = these_bonds; - last_bond != NULL; - last_bond = last_bond->next_bond) { - if (last_bond->next_bond == - this_bond) { - last_bond->next_bond = - this_bond->next_bond; - } - } - } - return NOTIFY_DONE; + if (event == NETDEV_UNREGISTER && master != NULL) + bond_release(master, event_dev); - default: - return NOTIFY_DONE; - } - } else if (this_bond->device == event_dev->master) { - switch (event) { - case NETDEV_UNREGISTER: - bond_release(this_bond->device, event_dev); - break; - } - return NOTIFY_DONE; - } - this_bond = this_bond->next_bond; - } return NOTIFY_DONE; } static struct notifier_block bond_netdev_notifier = { - notifier_call: bond_event, + .notifier_call = bond_event, }; static int __init bond_init(struct net_device *dev) { - bonding_t *bond, *this_bond, *last_bond; + bonding_t *bond; int count; #ifdef BONDING_DEBUG printk (KERN_INFO "Begin bond_init for %s\n", dev->name); #endif - bond = kmalloc(sizeof(struct bonding), GFP_KERNEL); - if (bond == NULL) { - return -ENOMEM; - } - memset(bond, 0, sizeof(struct bonding)); + bond = dev->priv; /* initialize rwlocks */ rwlock_init(&bond->lock); rwlock_init(&bond->ptrlock); - bond->stats = kmalloc(sizeof(struct net_device_stats), GFP_KERNEL); - if (bond->stats == NULL) { - kfree(bond); - return -ENOMEM; - } - memset(bond->stats, 0, sizeof(struct net_device_stats)); - + /* space is reserved for stats in alloc_netdev call. */ + bond->stats = (struct net_device_stats *)(bond + 1); bond->next = bond->prev = (slave_t *)bond; bond->current_slave = NULL; bond->current_arp_slave = NULL; bond->device = dev; - dev->priv = bond; /* Initialize the device structure. */ switch (bond_mode) { @@ -3637,8 +3591,6 @@ break; default: printk(KERN_ERR "Unknown bonding mode %d\n", bond_mode); - kfree(bond->stats); - kfree(bond); return -EINVAL; } @@ -3648,13 +3600,6 @@ dev->set_multicast_list = set_multicast_list; dev->do_ioctl = bond_ioctl; - /* - * Fill in the fields of the device structure with ethernet-generic - * values. - */ - - ether_setup(dev); - dev->tx_queue_len = 0; dev->flags |= IFF_MASTER|IFF_MULTICAST; #ifdef CONFIG_NET_FASTROUTE @@ -3687,8 +3632,6 @@ if (bond->bond_proc_dir == NULL) { printk(KERN_ERR "%s: Cannot init /proc/net/%s/\n", dev->name, dev->name); - kfree(bond->stats); - kfree(bond); return -ENOMEM; } bond->bond_proc_dir->owner = THIS_MODULE; @@ -3700,27 +3643,13 @@ printk(KERN_ERR "%s: Cannot init /proc/net/%s/info\n", dev->name, dev->name); remove_proc_entry(dev->name, proc_net); - kfree(bond->stats); - kfree(bond); return -ENOMEM; } bond->bond_proc_info_file->owner = THIS_MODULE; #endif /* CONFIG_PROC_FS */ - if (first_pass == 1) { - these_bonds = bond; - register_netdevice_notifier(&bond_netdev_notifier); - first_pass = 0; - } else { - last_bond = these_bonds; - this_bond = these_bonds->next_bond; - while (this_bond != NULL) { - last_bond = this_bond; - this_bond = this_bond->next_bond; - } - last_bond->next_bond = bond; - } + list_add_tail(&bond->bond_list, &bond_dev_list); return 0; } @@ -3753,15 +3682,11 @@ return -1; } - static int __init bonding_init(void) { int no; int err; - /* Find a name for this unit */ - static struct net_device *dev_bond = NULL; - printk(KERN_INFO "%s", version); /* @@ -3812,12 +3737,6 @@ max_bonds, 1, INT_MAX, BOND_DEFAULT_MAX_BONDS); max_bonds = BOND_DEFAULT_MAX_BONDS; } - dev_bond = dev_bonds = kmalloc(max_bonds*sizeof(struct net_device), - GFP_KERNEL); - if (dev_bond == NULL) { - return -ENOMEM; - } - memset(dev_bonds, 0, max_bonds*sizeof(struct net_device)); if (miimon < 0) { printk(KERN_WARNING @@ -4005,48 +3924,50 @@ primary = NULL; } + register_netdevice_notifier(&bond_netdev_notifier); for (no = 0; no < max_bonds; no++) { - dev_bond->init = bond_init; - - err = dev_alloc_name(dev_bond,"bond%d"); - if (err < 0) { - kfree(dev_bonds); + struct net_device *dev; + char name[IFNAMSIZ]; + + snprintf(name, IFNAMSIZ, "bond%d", no); + + dev = alloc_netdev(sizeof(bonding_t) + + sizeof(struct net_device_stats), + name, ether_setup); + if (!dev) + return -ENOMEM; + + dev->init = bond_init; + SET_MODULE_OWNER(dev); + + if ( (err = register_netdev(dev)) ) { +#ifdef BONDING_DEBUG + printk(KERN_INFO "%s: register_netdev failed %d\n", + dev->name, err); +#endif + kfree(dev); return err; - } - SET_MODULE_OWNER(dev_bond); - if (register_netdev(dev_bond) != 0) { - kfree(dev_bonds); - return -EIO; } - dev_bond++; } return 0; } static void __exit bonding_exit(void) { - struct net_device *dev_bond = dev_bonds; - struct bonding *bond; - int no; + struct bonding *bond, *nxt; unregister_netdevice_notifier(&bond_netdev_notifier); - for (no = 0; no < max_bonds; no++) { - + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { + struct net_device *dev = bond->device; #ifdef CONFIG_PROC_FS - bond = (struct bonding *) dev_bond->priv; remove_proc_entry("info", bond->bond_proc_dir); - remove_proc_entry(dev_bond->name, proc_net); + remove_proc_entry(dev->name, proc_net); #endif - unregister_netdev(dev_bond); - kfree(bond->stats); - kfree(dev_bond->priv); - - dev_bond->priv = NULL; - dev_bond++; + unregister_netdev(dev); + kfree(dev); } - kfree(dev_bonds); } module_init(bonding_init); diff -Nru a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h --- a/drivers/net/bonding/bonding.h Tue Jun 17 11:05:31 2003 +++ b/drivers/net/bonding/bonding.h Tue Jun 17 11:05:31 2003 @@ -104,7 +104,7 @@ struct proc_dir_entry *bond_proc_dir; struct proc_dir_entry *bond_proc_info_file; #endif /* CONFIG_PROC_FS */ - struct bonding *next_bond; + struct list_head bond_list; struct net_device *device; struct dev_mc_list *mc_list; unsigned short flags; From toml@us.ibm.com Tue Jun 17 11:39:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 11:39:33 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HIdR2x014881 for ; Tue, 17 Jun 2003 11:39:28 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HIcji8147728; Tue, 17 Jun 2003 14:38:46 -0400 Received: from d01ml072.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by northrelay03.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HIcZiw159866; Tue, 17 Jun 2003 14:38:36 -0400 Subject: Re: IPSec: Policy dst bundles exhausting storage To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Tue, 17 Jun 2003 13:38:15 -0500 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/17/2003 02:38:36 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3339 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev Well, of course. There is nothing initializing this. You have to replace the x->u.rt.fl = *fl line with assignments further down to rt6i_src and friends. Something like: x->u.rt6.rt6i_src = rt0->rt6i_src; etc. etc. I don't understand where you expected these assignments to be made. This is where the objects get constructed, so if it isn't being set here, it is being set nowhere :-) Ok, my explanation could have been better. In __xfrm6_bundle_create, rt0->rt6i_src address and prefix length are zero (as well as rt->rt6i_src) and so in __xfrm6_find_bundle the values in the xfrm_dst structure were then zero. So doing a tunnel mode ping from fec0::1 to fec0:0:0:2::11 in my configuration, the following values exist in __xfrm6_bundle_create: rt0->rt6i_src.addr = 0000:0000:0000:0000:0000:0000:0000:0000 rt0->rt6i_src.plen = 0 rt0->rt6i_dst.addr = fec0:0000:0000:0002:0000:0000:0000:0000 rt0->rt6i_dst.plen = 64 rt->rt6i_src.addr = 0000:0000:0000:0000:0000:0000:0000:0000 rt->rt6i_src.plen = 0 rt->rt6i_dst.addr = fec0:0000:0000:0000:0000:0000:0000:0002 rt->rt6i_dst.plen = 128 Sorry for the confusion. Thanks, Tom From jgarzik@pobox.com Tue Jun 17 11:46:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 11:46:55 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HIkm2x015237 for ; Tue, 17 Jun 2003 11:46:49 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19SLTh-0007Ko-PE; Tue, 17 Jun 2003 19:46:46 +0100 Message-ID: <3EEF620A.40608@pobox.com> Date: Tue, 17 Jun 2003 14:46:34 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: shemminger@osdl.org, Valdis.Kletnieks@vt.edu, girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> <20030617090859.0ffa0ca8.shemminger@osdl.org> <20030617.090930.102574393.davem@redhat.com> In-Reply-To: <20030617.090930.102574393.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3340 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Stephen Hemminger > Date: Tue, 17 Jun 2003 09:08:59 -0700 > > Read the hotplug thread to see how Linus > said, he will never add a binary event daemon interface. > > Funny, rtnetlink is exactly this and it is in the tree :-) ...and it's been in the tree for quite a while too. It's a shame people aren't taking advantage of it's obvious utility... Jeff From davem@redhat.com Tue Jun 17 11:48:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 11:48:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HImn2x015547 for ; Tue, 17 Jun 2003 11:48:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id LAA04184; Tue, 17 Jun 2003 11:43:47 -0700 Date: Tue, 17 Jun 2003 11:43:46 -0700 (PDT) Message-Id: <20030617.114346.94556992.davem@redhat.com> To: toml@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3341 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Tom Lendacky" Date: Tue, 17 Jun 2003 13:38:15 -0500 In __xfrm6_bundle_create, rt0->rt6i_src address and prefix length are zero (as well as rt->rt6i_src) That's perfectly fine, a 0-length prefix will cause a matche on all addresses. From babydr@baby-dragons.com Tue Jun 17 12:08:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 12:08:21 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.baby-dragons.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HJ8E2x016013 for ; Tue, 17 Jun 2003 12:08:15 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h5HJ7E9R023753; Tue, 17 Jun 2003 15:07:29 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h5HJ6hKi023747; Tue, 17 Jun 2003 15:07:04 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Tue, 17 Jun 2003 15:06:43 -0400 (EDT) From: "Mr. James W. Laferriere" To: Robert Olsson cc: netdev@oss.sgi.com, Linux networking maillist Subject: Re: Route cache performance tests In-Reply-To: <16111.20469.376899.55240@robur.slu.se> Message-ID: References: <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617.095028.35014188.davem@redhat.com> <16111.20469.376899.55240@robur.slu.se> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3342 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello Robert , First thank you for these tools . Now for the questions . Is 'fibstat' only for 2.5.x kernels ? The reason for that ? is there isn't a /proc/net/fib_stat . Under 2.4.21 , which has no mention of fib_stat anywhere in the sources . The next ? is packet-generator.c (appears) to require kernel include files . Is there an updated user level net/ includes ? Tia , JimL On Tue, 17 Jun 2003, Robert Olsson wrote: > David S. Miller writes: > > Internet-routing and made a script so it can be used for > > experiments. I can make it available. > > Please do, I'd like to play with such a list locally. > ftp://robur.slu.se/pub/Linux/net-development/inet_routes/ > Just configure the script and run... > And Simon can you do a run with this routing table too? And even fibstat > output could be interesting to compare. -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ From janiceg@us.ibm.com Tue Jun 17 12:08:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 12:08:58 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HJ8s2x016067 for ; Tue, 17 Jun 2003 12:08:54 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HJ8mll241078 for ; Tue, 17 Jun 2003 15:08:48 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HJ8kCN104260 for ; Tue, 17 Jun 2003 13:08:47 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5HJ8jIB122236; Tue, 17 Jun 2003 14:08:45 -0500 Received: from us.ibm.com (girouard-pc-udp10731670uds.austin.ibm.com [9.53.94.201]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id OAA16690; Tue, 17 Jun 2003 14:08:44 -0500 Message-ID: <3EEF66AA.3000509@us.ibm.com> Date: Tue, 17 Jun 2003 14:06:18 -0500 From: Janice M Girouard Organization: IBM Linux Technology Center - Network Device Drivers User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Garzik CC: "David S. Miller" , shemminger@osdl.org, Valdis.Kletnieks@vt.edu, Janice Girouard , Daniel Stekloff , Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> <20030617090859.0ffa0ca8.shemminger@osdl.org> <20030617.090930.102574393.davem@redhat.com> <3EEF620A.40608@pobox.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3343 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: janiceg@us.ibm.com Precedence: bulk X-list: netdev From: Jeff Gazik Date: Tue, 17 Jun 2003 01:46 ...and it's been in the tree for quite a while too. It's a shame people aren't taking advantage of it's obvious utility... I'd like to hear what others believe belong in this new netlink family. The two events that come to mind for this family (or netdev notified if that's more appropriate) are: 1) device initialization failures, 2) events that drive load balancing software. Right now if we need to throttle the card, we don't send events up to indicate we have reached capacity. Possibly the first might belong in the netdev notified family, and the second in the netlink family, since you might want to present more than a two state (success/failure.. or just failure in this case) result. - From janiceg@us.ibm.com Tue Jun 17 12:49:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 12:49:32 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HJnL2x017325 for ; Tue, 17 Jun 2003 12:49:27 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HJnF9f195658 for ; Tue, 17 Jun 2003 15:49:15 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HJnEnG147054 for ; Tue, 17 Jun 2003 13:49:15 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5HJnDGg099228; Tue, 17 Jun 2003 14:49:13 -0500 Received: from us.ibm.com (girouard-pc-udp10731670uds.austin.ibm.com [9.53.94.201]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id OAA35738; Tue, 17 Jun 2003 14:49:13 -0500 Message-ID: <3EEF7030.6030303@us.ibm.com> Date: Tue, 17 Jun 2003 14:46:56 -0500 From: Janice M Girouard Organization: IBM Linux Technology Center - Network Device Drivers User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Garzik CC: "David S. Miller" , shemminger@osdl.org, Valdis.Kletnieks@vt.edu, Janice Girouard , Daniel Stekloff , Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> <20030617090859.0ffa0ca8.shemminger@osdl.org> <20030617.090930.102574393.davem@redhat.com> <3EEF620A.40608@pobox.com> <3EEF66AA.3000509@us.ibm.com> <3EEF6A9D.6050303@pobox.com> Content-Type: multipart/related; boundary="------------010202080205050803050900" X-archive-position: 3344 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: janiceg@us.ibm.com Precedence: bulk X-list: netdev --------------010202080205050803050900 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Jeff Garzik wrote: Do you want to individually send 4000 - 16000 (or more) TX stop / start events per second to userspace? :) At some point Heisenburg defeats low latency :) How about looking at 1000 byte packet transmit example. A gigabit adapter would send 125,000 packets per second. I'm thinking that most of the time, you will have enough available buffers in the adapter that you don't start to see the adapter buffers completely fill up. Are you saying that 3.2% - 12.8% of the time in this case you're disabling the tcp/ip stack because the transmit buffers on your card are completely full? Perhaps with zero copy enabled, but the tcp/ip cpu load alone will throttle your ability to fill the adapter buffers up. What does your own experience indicate for gigabit adapter cards? I could see the buffers backing up for 10/100 cards. So that case favors your point. I'm still thinking that it's a sign someone should be buying a 2nd card and ramping up their network capability. But I can see your point. --------------010202080205050803050900-- From davem@redhat.com Tue Jun 17 12:55:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 12:55:23 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HJtI2x017686 for ; Tue, 17 Jun 2003 12:55:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA04518; Tue, 17 Jun 2003 12:50:41 -0700 Date: Tue, 17 Jun 2003 12:50:40 -0700 (PDT) Message-Id: <20030617.125040.58438649.davem@redhat.com> To: janiceg@us.ibm.com Cc: jgarzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, girouard@us.ibm.com, stekloff@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEF7030.6030303@us.ibm.com> References: <3EEF66AA.3000509@us.ibm.com> <3EEF6A9D.6050303@pobox.com> <3EEF7030.6030303@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3345 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice M Girouard Date: Tue, 17 Jun 2003 14:46:56 -0500 I could see the buffers backing up for 10/100 cards. So that case favors your point. I'm still thinking that it's a sign someone should be buying a 2nd card and ramping up their network capability. But I can see your point. And when we have 1GHZ memory busses and 10GHz cpus tomorrow, what does this say for 1gbit and 10gbit cards? You want to define a machine as having too much "work" or not, yet you only want to consider one metric to do so. Such schemes are fundamentally flawed. From toml@us.ibm.com Tue Jun 17 12:57:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 12:57:46 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HJvd2x017999 for ; Tue, 17 Jun 2003 12:57:39 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HJupr9263226; Tue, 17 Jun 2003 15:56:51 -0400 Received: from tomlt2.austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HJuonE121896; Tue, 17 Jun 2003 13:56:50 -0600 Subject: Re: IPSec: Policy dst bundles exhausting storage From: Tom Lendacky To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, toml@us.ibm.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 17 Jun 2003 14:57:04 -0500 Message-Id: <1055879830.16368.7.camel@tomlt2.tomloffice.austin.ibm.com> Mime-Version: 1.0 X-archive-position: 3346 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev That's perfectly fine, a 0-length prefix will cause a matche on all addresses. Ok, I just wanted to verify that. Here's a patch for your review. I call ipv6_addr_prefix on both of the rt6i addresses just in case they aren't stored in prefix form at any point now or in the future. Thanks, Tom diff -ur linux-2.5.71-orig/net/ipv6/xfrm6_policy.c linux-2.5.71-new/net/ipv6/xfrm6_policy.c --- linux-2.5.71-orig/net/ipv6/xfrm6_policy.c 2003-06-14 14:18:02.000000000 -0500 +++ linux-2.5.71-new/net/ipv6/xfrm6_policy.c 2003-06-17 14:44:52.000000000 -0500 @@ -60,8 +60,23 @@ read_lock_bh(&policy->lock); for (dst = policy->bundles; dst; dst = dst->next) { struct xfrm_dst *xdst = (struct xfrm_dst*)dst; - if (!ipv6_addr_cmp(&xdst->u.rt6.rt6i_dst.addr, &fl->fl6_dst) && - !ipv6_addr_cmp(&xdst->u.rt6.rt6i_src.addr, &fl->fl6_src) && + struct in6_addr rt_dst_prefix, fl_dst_prefix, + rt_src_prefix, fl_src_prefix; + + ipv6_addr_prefix(&rt_dst_prefix, + &xdst->u.rt6.rt6i_dst.addr, + xdst->u.rt6.rt6i_dst.plen); + ipv6_addr_prefix(&fl_dst_prefix, + &fl->fl6_dst, + xdst->u.rt6.rt6i_dst.plen); + ipv6_addr_prefix(&rt_src_prefix, + &xdst->u.rt6.rt6i_src.addr, + xdst->u.rt6.rt6i_src.plen); + ipv6_addr_prefix(&fl_src_prefix, + &fl->fl6_src, + xdst->u.rt6.rt6i_src.plen); + if (!ipv6_addr_cmp(&rt_dst_prefix, &fl_dst_prefix) && + !ipv6_addr_cmp(&rt_src_prefix, &fl_src_prefix) && __xfrm6_bundle_ok(xdst, fl)) { dst_clone(dst); break; @@ -133,7 +148,6 @@ dst_prev->child = &rt->u.dst; for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; - x->u.rt.fl = *fl; dst_prev->dev = rt->u.dst.dev; if (rt->u.dst.dev) @@ -157,6 +171,8 @@ x->u.rt6.rt6i_node = rt0->rt6i_node; x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; memcpy(&x->u.rt6.rt6i_gateway, &rt0->rt6i_gateway, sizeof(x->u.rt6.rt6i_gateway)); + x->u.rt6.rt6i_dst = rt0->rt6i_dst; + x->u.rt6.rt6i_src = rt0->rt6i_src; header_len -= x->u.dst.xfrm->props.header_len; trailer_len -= x->u.dst.xfrm->props.trailer_len; } From davem@redhat.com Tue Jun 17 13:00:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:00:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HK0u2x018351 for ; Tue, 17 Jun 2003 13:00:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA04559; Tue, 17 Jun 2003 12:56:30 -0700 Date: Tue, 17 Jun 2003 12:56:29 -0700 (PDT) Message-Id: <20030617.125629.85394621.davem@redhat.com> To: toml@us.ibm.com Cc: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: <1055879830.16368.7.camel@tomlt2.tomloffice.austin.ibm.com> References: <1055879830.16368.7.camel@tomlt2.tomloffice.austin.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3347 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Tom Lendacky Date: 17 Jun 2003 14:57:04 -0500 That's perfectly fine, a 0-length prefix will cause a matche on all addresses. Ok, I just wanted to verify that. Here's a patch for your review. Looks like it would work. I call ipv6_addr_prefix on both of the rt6i addresses just in case they aren't stored in prefix form at any point now or in the future. I think this is a bit overkill, can you redo this patch without this? If we un-prefix'ify ipv6 addresses in the routing entries, we're going to have to go over the whole tree and audit this kind of stuff anyways. Thanks. From jgarzik@pobox.com Tue Jun 17 13:06:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:07:02 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HK6u2x018722 for ; Tue, 17 Jun 2003 13:06:57 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19SM36-0007xE-Uw; Tue, 17 Jun 2003 20:23:21 +0100 Message-ID: <3EEF6A9D.6050303@pobox.com> Date: Tue, 17 Jun 2003 15:23:09 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: Janice M Girouard CC: "David S. Miller" , shemminger@osdl.org, Valdis.Kletnieks@vt.edu, Janice Girouard , Daniel Stekloff , Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages References: <200306170434.h5H4YZPZ003025@turing-police.cc.vt.edu> <20030617090859.0ffa0ca8.shemminger@osdl.org> <20030617.090930.102574393.davem@redhat.com> <3EEF620A.40608@pobox.com> <3EEF66AA.3000509@us.ibm.com> In-Reply-To: <3EEF66AA.3000509@us.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3348 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Janice M Girouard wrote: > 2) events that drive load balancing software. Right now if we need to > throttle the card, we don't send events up to indicate we have reached > capacity. Question related to this item specifically :) Do you want to individually send 4000 - 16000 (or more) TX stop / start events per second to userspace? :) At some point Heisenburg defeats low latency :) If not (and I hope not), perhaps also look into the net stack statistics already kept (or add more sampling stats if necessary), and instead trigger events based on sampling those statistics. Jeff From sim@netnation.com Tue Jun 17 13:07:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:07:54 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HK7Q2x018771 for ; Tue, 17 Jun 2003 13:07:26 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19SMjh-0007rS-4S; Tue, 17 Jun 2003 13:07:21 -0700 Date: Tue, 17 Jun 2003 13:07:21 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030617200721.GA25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16111.18107.699689.704597@robur.slu.se> User-Agent: Mutt/1.5.4i X-archive-position: 3349 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 06:50:03PM +0200, Robert Olsson wrote: > David S. Miller writes: > > > 60.0073 seconds passed, avg forwarding rate: 157557.710 pps > > > To be honest, this isn't half-bad for pure DoS load. > > No thats pretty good and profiles looks as expected. It would interesting > to get the singeflow performance as a comparison. I changed Juno to send from a single IP, but it only spat out about 330000 pps, which the dual Tigon3 Opteron box forwarded completely. In order to do a single flow forwarding test, I need to be able to create more input traffic somehow. Seeing as you wrote pktgen.c, maybe you could help in this department. :) > Also think Simon used only /32 routes... I took "real" Internet-routing > and made a script so it can be used for experiments. I can make it available. Yes, I found that area less interesting since Dave M. fixed the hash buckets. But yes, the prefix scanning will slow it down some. Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going on here. 60.0042 seconds passed, avg forwarding rate: 50759.683 pps 60.0039 seconds passed, avg forwarding rate: 50311.258 pps 60.0046 seconds passed, avg forwarding rate: 50420.562 pps 60.0036 seconds passed, avg forwarding rate: 50399.389 pps 60.0038 seconds passed, avg forwarding rate: 50431.732 pps 60.0041 seconds passed, avg forwarding rate: 50403.777 pps 60.0036 seconds passed, avg forwarding rate: 50210.604 pps 60.0033 seconds passed, avg forwarding rate: 50279.220 pps 60.0036 seconds passed, avg forwarding rate: 50549.291 pps 60.0046 seconds passed, avg forwarding rate: 50437.615 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.26 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c02bf730 16019 33.2014 fn_hash_lookup c0292b70 3882 8.04593 ip_route_input_slow c0221710 2335 4.83958 tg3_rx c02bd550 2004 4.15354 fib_validate_source c0290d70 1955 4.05198 rt_hash_code c0294e50 1670 3.46128 ip_rcv c02933a0 1404 2.90997 ip_route_input c01351b0 1349 2.79597 __kmalloc c02885c0 1314 2.72343 netif_receive_skb c02b8040 1168 2.42083 inet_select_addr c0135270 1123 2.32756 kfree c0284620 987 2.04568 alloc_skb c028ec90 900 1.86536 eth_type_trans c0135170 860 1.78246 kmem_cache_alloc c02be8e0 844 1.7493 fib_semantic_match c0135230 812 1.68297 kmem_cache_free c0222330 652 1.35135 tg3_start_xmit c02916b0 648 1.34306 rt_intern_hash c02215c0 542 1.12336 tg3_recycle_rx c010fc40 459 0.951335 do_gettimeofday c028f220 422 0.874648 pfifo_fast_dequeue c02be9b0 419 0.86843 __fib_res_prefsrc c028eb20 417 0.864285 eth_header c0295df0 386 0.800033 ip_forward c028c520 363 0.752363 neigh_resolve_output c0284860 345 0.715056 __kfree_skb c0284840 311 0.644586 kfree_skbmem c02847a0 295 0.611424 skb_release_data c028b530 285 0.590698 neigh_lookup c01adc80 276 0.572044 memcpy c0134ff0 269 0.557536 free_block c0291460 240 0.49743 rt_garbage_collect c0134e20 236 0.489139 cache_alloc_refill c02972d0 216 0.447687 ip_finish_output c0114030 215 0.445614 get_offset_tsc c0128a00 197 0.408307 call_rcu c028ae20 193 0.400017 dst_alloc c0288080 187 0.387581 dev_queue_xmit c028af50 184 0.381363 dst_destroy c028f1a0 175 0.362709 pfifo_fast_enqueue c011f080 170 0.352346 local_bh_enable c0221350 168 0.348201 tg3_tx c0221e90 160 0.33162 tg3_set_txd c0297570 152 0.315039 ip_output c028c3a0 149 0.308821 neigh_hh_init c028eeb0 141 0.29224 qdisc_restart size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0 18296 18 213694 0 0 0 0 163018 0 0 0 50676 50674 1 0 17616 11 214529 0 0 0 0 163993 0 0 0 50536 50534 0 0 17841 12 213816 0 0 0 0 163157 0 0 0 50659 50657 1 0 18272 7 214093 0 0 0 0 163583 0 0 0 50510 50508 1 0 18216 9 214843 0 0 0 0 164214 0 0 0 50629 50627 0 0 18318 16 214976 0 0 0 0 164299 0 0 0 50677 50675 0 0 18099 9 213447 0 0 0 0 162995 0 0 0 50452 50450 1 0 17610 14 216438 0 0 0 0 165408 0 0 0 51030 51028 1 0 17643 14 214638 0 0 0 0 163987 0 0 0 50651 50649 0 0 17516 7 213185 0 0 0 0 163016 0 0 0 50169 50167 1 0 18355 10 213894 0 0 0 0 163564 0 0 0 50330 50328 1 0 17723 11 214477 0 0 0 0 163705 0 0 0 50772 50770 0 0 17915 6 214342 0 0 0 0 163625 0 0 0 50717 50715 0 0 18166 19 213965 0 0 0 0 163521 0 0 0 50444 50442 0 0 17943 19 213417 0 0 0 0 162955 0 0 0 50462 50460 2 0 17515 5 214423 0 0 0 0 163718 0 0 0 50705 50703 0 0 18231 10 213434 0 0 0 0 162919 0 0 0 50515 50513 1 0 17523 8 213856 0 0 0 0 163385 0 0 0 50471 50469 0 0 18217 16 214940 0 0 0 0 164165 0 0 0 50775 50773 0 0 ...recompiling with fibstats... Erm. I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes. What's it supposed to be against? [sroot@debinst:/d/linux-2.5]# patch -p0 --dry < ../fib_stats2.pat patching file include/net/ip_fib.h Hunk #1 succeeded at 139 (offset 4 lines). patching file net/ipv4/fib_hash.c Hunk #3 succeeded at 305 (offset -11 lines). Hunk #4 succeeded at 1110 with fuzz 1 (offset -14 lines). Hunk #5 succeeded at 1166 (offset -14 lines). patching file net/ipv4/route.c Hunk #1 FAILED at 2754. Hunk #2 succeeded at 2760 (offset -6 lines). Hunk #3 succeeded at 2783 (offset -6 lines). Hunk #4 FAILED at 2793. 2 out of 4 hunks FAILED -- saving rejects to file net/ipv4/route.c.rej In any event, here is the profile of the single flow case with the full routing table (probably identical to the empty routing table case). The sender is pushing enough for NAPI to kick in, so there is a lot of tg3 overhead that would be with more traffic: 60.0041 seconds passed, avg forwarding rate: 329808.310 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.26 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c0222330 4470 8.51445 tg3_start_xmit c0221710 3760 7.16204 tg3_rx c0294e50 3142 5.98488 ip_rcv c02885c0 2428 4.62485 netif_receive_skb c0295df0 2065 3.93341 ip_forward c028f220 2058 3.92007 pfifo_fast_dequeue c02933a0 2033 3.87245 ip_route_input c01351b0 1987 3.78483 __kmalloc c0290d70 1904 3.62674 rt_hash_code c02972d0 1752 3.33721 ip_finish_output c01adc80 1649 3.14101 memcpy c0134ff0 1626 3.0972 free_block c0284620 1511 2.87815 alloc_skb c0135270 1489 2.83624 kfree c0288080 1461 2.78291 dev_queue_xmit c028ec90 1351 2.57338 eth_type_trans c0135170 1319 2.51243 kmem_cache_alloc c028f1a0 1243 2.36766 pfifo_fast_enqueue c0134e20 1172 2.23242 cache_alloc_refill c0297570 1145 2.18099 ip_output c0135230 1133 2.15814 kmem_cache_free c0221350 1085 2.06671 tg3_tx c0221a50 991 1.88766 tg3_poll c02215c0 893 1.70098 tg3_recycle_rx c0221e90 832 1.58479 tg3_set_txd c0221b60 812 1.5467 tg3_interrupt c028eeb0 755 1.43812 qdisc_restart c010fc40 672 1.28002 do_gettimeofday c011f080 578 1.10097 local_bh_enable c0284840 492 0.937161 kfree_skbmem c010a8b2 426 0.811444 restore_all c02847a0 375 0.714299 skb_release_data c010c6a0 327 0.622869 handle_IRQ_event c010c910 288 0.548582 do_IRQ c0284860 284 0.540963 __kfree_skb c0114030 283 0.539058 get_offset_tsc c021f4b0 270 0.514296 tg3_enable_ints c0115e10 234 0.445723 end_level_ioapic_irq c011f4a0 220 0.419056 cpu_raise_softirq c028eb20 199 0.379055 eth_header c0288500 188 0.358102 net_tx_action c028c520 187 0.356197 neigh_resolve_output c0134b40 179 0.340959 cache_init_objs c0288900 171 0.32572 net_rx_action c0132290 165 0.314292 buffered_rmqueue c01321b0 122 0.232385 free_hot_cold_page If I start two threads on the sender (Xeon w/HT), I'm able to push 420000 pps, which only partially starts to use NAPI on the Opteron box. Going to try 2.4 again for a comparison (note: 2.5 seems to have an opposite PCI scan order from 2.4 for the dual Tigon3s). Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From Robert.Olsson@data.slu.se Tue Jun 17 13:16:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:16:33 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKGP2x019854 for ; Tue, 17 Jun 2003 13:16:26 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id WAA32385; Tue, 17 Jun 2003 22:12:15 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16111.30238.978982.656557@robur.slu.se> Date: Tue, 17 Jun 2003 22:12:14 +0200 To: "Mr. James W. Laferriere" Cc: Robert Olsson , netdev@oss.sgi.com, Linux networking maillist Subject: Re: Route cache performance tests In-Reply-To: References: <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617.095028.35014188.davem@redhat.com> <16111.20469.376899.55240@robur.slu.se> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3350 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Mr. James W. Laferriere writes: > > Is 'fibstat' only for 2.5.x kernels ? Yes. > The reason for that ? is there isn't a /proc/net/fib_stat . > Under 2.4.21 , which has no mention of fib_stat anywhere in the > sources . The kernel part creates /proc/net/fib_stat. I should pretty straight for 2.4.X too. If people find it useful it can be backported. Try. Look at route.c and rt_cache_stat if you run into problems. > The next ? is packet-generator.c (appears) to require kernel > include files . Is there an updated user level net/ includes ? It's a kernel module and should be compiled with the kernel. Cheers. --ro From gandalf@wlug.westbo.se Tue Jun 17 13:17:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:17:24 -0700 (PDT) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKHH2x019969 for ; Tue, 17 Jun 2003 13:17:18 -0700 Received: by tux.rsn.bth.se (Postfix, from userid 501) id 0C84B36FDD; Tue, 17 Jun 2003 22:17:15 +0200 (CEST) Subject: Re: Route cache performance tests From: Martin Josefsson To: Simon Kirby Cc: Robert Olsson , "David S. Miller" , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: <20030617200721.GA25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1055881034.3199.43.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 17 Jun 2003 22:17:14 +0200 X-archive-position: 3351 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev On Tue, 2003-06-17 at 22:07, Simon Kirby wrote: > > Also think Simon used only /32 routes... I took "real" Internet-routing > > and made a script so it can be used for experiments. I can make it available. > > Yes, I found that area less interesting since Dave M. fixed the hash > buckets. But yes, the prefix scanning will slow it down some. > > Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going > on here. > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf > 17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0 Did you have rp_filter enabled? Looks like it. -- /Martin From janiceg@us.ibm.com Tue Jun 17 13:26:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:26:57 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKQj2x020533 for ; Tue, 17 Jun 2003 13:26:51 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.17.195.12]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HKQdeu250776 for ; Tue, 17 Jun 2003 16:26:39 -0400 Received: from austin.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay03.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HKQcOs062462 for ; Tue, 17 Jun 2003 14:26:39 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5HKQbGg094816; Tue, 17 Jun 2003 15:26:37 -0500 Received: from us.ibm.com (girouard-pc-udp10731670uds.austin.ibm.com [9.53.94.201]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id PAA18552; Tue, 17 Jun 2003 15:26:36 -0500 Message-ID: <3EEF78F4.2070604@us.ibm.com> Date: Tue, 17 Jun 2003 15:24:20 -0500 From: Janice M Girouard Organization: IBM Linux Technology Center - Network Device Drivers User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: jgarzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, girouard@us.ibm.com, stekloff@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages References: <3EEF66AA.3000509@us.ibm.com> <3EEF6A9D.6050303@pobox.com> <3EEF7030.6030303@us.ibm.com> <20030617.125040.58438649.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3352 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: janiceg@us.ibm.com Precedence: bulk X-list: netdev From David S. Miller: And when we have 1GHZ memory busses and 10GHz cpus tomorrow, what does this say for 1gbit and 10gbit cards? Such schemes are fundamentally flawed. Bottom line.. I was asking for input, and I received it. It's valid to say... look at the statistics. I really like the concept of driving events through netlink, but querying statistics works. p.s. It's been my experience that the memory system is the main bottleneck when trying to support a heavy network load. When the 10 Gigabit card emerges, and it's here today, the memory system will be pressed to support it, especially if you're not using zerocopy and you're thinking of using more than one card. Perhaps if RDMA is capabilities are added to Linux, then things might be different. So.. when do you think RDMA will show up on Linx? From davem@redhat.com Tue Jun 17 13:31:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:31:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKVt2x021262 for ; Tue, 17 Jun 2003 13:31:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04712; Tue, 17 Jun 2003 13:27:21 -0700 Date: Tue, 17 Jun 2003 13:27:20 -0700 (PDT) Message-Id: <20030617.132720.22019247.davem@redhat.com> To: janiceg@us.ibm.com Cc: jgarzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, girouard@us.ibm.com, stekloff@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <3EEF78F4.2070604@us.ibm.com> References: <3EEF7030.6030303@us.ibm.com> <20030617.125040.58438649.davem@redhat.com> <3EEF78F4.2070604@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3353 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice M Girouard Date: Tue, 17 Jun 2003 15:24:20 -0500 Perhaps if RDMA is capabilities are added to Linux, then things might be different. So.. when do you think RDMA will show up on Linx? RDMA is total junk. On RX, clever RX buffer management is what we need. On TX zerocopy + TSO is more than sufficient and we have that today. From sim@netnation.com Tue Jun 17 13:37:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:37:32 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKb42x021607 for ; Tue, 17 Jun 2003 13:37:05 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19SNCR-0008Lq-F4; Tue, 17 Jun 2003 13:37:03 -0700 Date: Tue, 17 Jun 2003 13:37:03 -0700 From: Simon Kirby To: Martin Josefsson Cc: Robert Olsson , "David S. Miller" , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030617203703.GB25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> <1055881034.3199.43.camel@tux.rsn.bth.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1055881034.3199.43.camel@tux.rsn.bth.se> User-Agent: Mutt/1.5.4i X-archive-position: 3354 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 10:17:14PM +0200, Martin Josefsson wrote: > > > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf > > 17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0 > > Did you have rp_filter enabled? Looks like it. Yes, good spotting. Forwarding rate more than doubles when I turn off rp_filter off (Debian turns it on by default). 60.0049 seconds passed, avg forwarding rate: 108222.462 pps 60.0041 seconds passed, avg forwarding rate: 108868.822 pps 60.0042 seconds passed, avg forwarding rate: 108767.194 pps 60.0040 seconds passed, avg forwarding rate: 108872.188 pps 60.0045 seconds passed, avg forwarding rate: 108856.575 pps 60.0041 seconds passed, avg forwarding rate: 108743.443 pps Cpu type: Athlon Cpu speed was (MHz estimation) : 1394.26 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000 vma samples % symbol name c02bf730 15382 33.0213 fn_hash_lookup c0292b70 2127 4.56614 ip_route_input_slow c02916b0 1380 2.96252 rt_intern_hash c0222330 1340 2.87665 tg3_start_xmit c02bd550 1336 2.86806 fib_validate_source c0221710 1219 2.61689 tg3_rx c02be8e0 1154 2.47735 fib_semantic_match c0294e50 1068 2.29273 ip_rcv c028f220 983 2.11026 pfifo_fast_dequeue c0135230 981 2.10596 kmem_cache_free c02b8040 906 1.94496 inet_select_addr c0290d70 901 1.93422 rt_hash_code c0134ff0 877 1.8827 free_block c028eb20 873 1.87411 eth_header c0135170 805 1.72814 kmem_cache_alloc c02885c0 798 1.71311 netif_receive_skb c02933a0 788 1.69164 ip_route_input c028c520 778 1.67017 neigh_resolve_output c0295df0 744 1.59718 ip_forward c0134e20 734 1.57572 cache_alloc_refill c01351b0 727 1.56069 __kmalloc c028b530 686 1.47267 neigh_lookup c01adc80 535 1.14851 memcpy c0135270 510 1.09484 kfree c0284620 506 1.08626 alloc_skb c0291460 498 1.06908 rt_garbage_collect c028ec90 480 1.03044 eth_type_trans c028ae20 439 0.942424 dst_alloc c0128a00 437 0.938131 call_rcu c02972d0 433 0.929544 ip_finish_output c0297570 407 0.873728 ip_output c011f080 380 0.815766 local_bh_enable c0288080 376 0.807179 dev_queue_xmit c028eeb0 369 0.792151 qdisc_restart c0221e90 361 0.774977 tg3_set_txd c028af50 360 0.772831 dst_destroy c028f1a0 352 0.755657 pfifo_fast_enqueue c028c3a0 317 0.68052 neigh_hh_init c02215c0 315 0.676227 tg3_recycle_rx c0221350 308 0.6612 tg3_tx c02be9b0 294 0.631145 __fib_res_prefsrc c010fc40 200 0.42935 do_gettimeofday c02b48c0 195 0.418617 arp_hash c0292860 195 0.418617 rt_set_nexthop c02b4df0 178 0.382122 arp_bind_neighbour c0294500 159 0.341334 dst_free size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 20262 5 109659 0 0 0 0 0 0 0 0 109659 109657 1 0 19229 7 109493 0 0 0 0 0 0 0 0 109493 109491 0 0 20320 4 109576 0 0 0 0 0 0 0 0 109576 109574 1 0 19280 9 109439 0 0 0 0 0 0 0 0 109439 109437 1 0 20325 10 109314 0 0 0 0 0 0 0 0 109314 109312 1 0 18983 6 109530 0 0 0 0 0 0 0 0 109530 109528 1 0 20313 5 109867 0 0 0 0 0 0 0 0 109867 109865 0 0 19127 4 109256 0 0 0 0 0 0 0 0 109256 109254 1 0 18897 4 109508 0 0 0 0 0 0 0 0 109508 109506 1 0 20338 11 109717 0 0 0 0 0 0 0 0 109717 109715 0 0 19054 7 109209 0 0 0 0 0 0 0 0 109209 109207 1 0 20397 11 109273 0 0 0 0 0 0 0 0 109273 109271 1 0 Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From toml@us.ibm.com Tue Jun 17 13:40:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:40:41 -0700 (PDT) Received: from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.106]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKeZ2x021924 for ; Tue, 17 Jun 2003 13:40:36 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e6.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HKdrxr116434; Tue, 17 Jun 2003 16:39:53 -0400 Received: from tomlt2.austin.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HKdq3e055120; Tue, 17 Jun 2003 16:39:52 -0400 Subject: Re: IPSec: Policy dst bundles exhausting storage From: Tom Lendacky To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, toml@us.ibm.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 17 Jun 2003 15:40:12 -0500 Message-Id: <1055882412.16482.2.camel@tomlt2.tomloffice.austin.ibm.com> Mime-Version: 1.0 X-archive-position: 3355 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev I think this is a bit overkill, can you redo this patch without this? No problem... here's the new patch. Thanks, Tom diff -ur linux-2.5.71-orig/net/ipv6/xfrm6_policy.c linux-2.5.71-new/net/ipv6/xfrm6_policy.c --- linux-2.5.71-orig/net/ipv6/xfrm6_policy.c 2003-06-14 14:18:02.000000000 -0500 +++ linux-2.5.71-new/net/ipv6/xfrm6_policy.c 2003-06-17 15:34:48.000000000 -0500 @@ -60,8 +60,16 @@ read_lock_bh(&policy->lock); for (dst = policy->bundles; dst; dst = dst->next) { struct xfrm_dst *xdst = (struct xfrm_dst*)dst; - if (!ipv6_addr_cmp(&xdst->u.rt6.rt6i_dst.addr, &fl->fl6_dst) && - !ipv6_addr_cmp(&xdst->u.rt6.rt6i_src.addr, &fl->fl6_src) && + struct in6_addr fl_dst_prefix, fl_src_prefix; + + ipv6_addr_prefix(&fl_dst_prefix, + &fl->fl6_dst, + xdst->u.rt6.rt6i_dst.plen); + ipv6_addr_prefix(&fl_src_prefix, + &fl->fl6_src, + xdst->u.rt6.rt6i_src.plen); + if (!ipv6_addr_cmp(&xdst->u.rt6.rt6i_dst.addr, &fl_dst_prefix) && + !ipv6_addr_cmp(&xdst->u.rt6.rt6i_src.addr, &fl_src_prefix) && __xfrm6_bundle_ok(xdst, fl)) { dst_clone(dst); break; @@ -133,7 +141,6 @@ dst_prev->child = &rt->u.dst; for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; - x->u.rt.fl = *fl; dst_prev->dev = rt->u.dst.dev; if (rt->u.dst.dev) @@ -157,6 +164,8 @@ x->u.rt6.rt6i_node = rt0->rt6i_node; x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; memcpy(&x->u.rt6.rt6i_gateway, &rt0->rt6i_gateway, sizeof(x->u.rt6.rt6i_gateway)); + x->u.rt6.rt6i_dst = rt0->rt6i_dst; + x->u.rt6.rt6i_src = rt0->rt6i_src; header_len -= x->u.dst.xfrm->props.header_len; trailer_len -= x->u.dst.xfrm->props.trailer_len; } From girouard@us.ibm.com Tue Jun 17 13:41:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:41:06 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKf22x021971 for ; Tue, 17 Jun 2003 13:41:03 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HKertd175814; Tue, 17 Jun 2003 16:40:53 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HKeo3e057268; Tue, 17 Jun 2003 16:40:51 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: garzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, Janice Girouard , Daniel Stekloff , Larry Kessler , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Tue, 17 Jun 2003 15:40:48 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/17/2003 16:40:51 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3356 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev From: David S. Miller" Date: 06/17/2003 03:27 PM On RX, clever RX buffer management is what we need. What RX buffer management are you proposing? I'm having a hard time understanding how you'll get rid of the copy without support from the card. From babydr@baby-dragons.com Tue Jun 17 13:41:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:41:22 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.system-techniques.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKf02x021972 for ; Tue, 17 Jun 2003 13:41:00 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h5HKeO9R024192; Tue, 17 Jun 2003 16:40:39 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h5HKe3Kp024187; Tue, 17 Jun 2003 16:40:18 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Tue, 17 Jun 2003 16:40:03 -0400 (EDT) From: "Mr. James W. Laferriere" To: Linux networking maillist cc: NetDev Subject: BUG: Massive performance drop in conncetion time with 2.4.21 (62KB) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3357 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello All , Per DaveM (I hope these are the right lists) . Here goes . I made 'me too'ism in another thread that may be related to what I am presenting here . After my .sig is what I hope to be enough pertitanent information to get the bug stomped on . This is driving me crazy . Also if someone would like to point me at a URL:/Method(s)/... to be able to acquire further information that would make the effort for those that really can code in the kernel jobs easier please do . The slow down mentioned below happens with any network based connection . JimL On Mon, 16 Jun 2003, Mr. James W. Laferriere wrote: > Hello Martin & Stephan , Since moving to 2.4.21 release from > 2.4.21-rc3 . I have noticed that all network connections are slow > to start . But have normal responsiveness after the connection > has been negotiated . This all network activity . I do have two > nics in my system . One eepro100 driven & one ns38320 driven . > My MB is a Tyan thunder s2567 . When I was running 2.4.21-rc3 the > starting of the connections seemed avaerage but since there has > been a Markedly slower response to connection establishment . > Twyl , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ # (time strace nslookup eisner.decus.org) 2>&1 | tee where-is-it-spend-allitstime.log execve("/usr/bin/nslookup", ["nslookup", "eisner.decus.org"], [/* 31 vars */]) = 0 brk(0) = 0x812bad4 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=72386, ...}) = 0 old_mmap(NULL, 72386, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40015000 close(3) = 0 open("/lib/libnsl.so.1", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260:\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=353351, ...}) = 0 old_mmap(NULL, 84956, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40027000 mprotect(0x40039000, 11228, PROT_NONE) = 0 old_mmap(0x40039000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x11000) = 0x40039000 old_mmap(0x4003a000, 7132, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4003a000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0h\222\1"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=5029105, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4003c000 old_mmap(NULL, 1191168, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4003d000 mprotect(0x40156000, 40192, PROT_NONE) = 0 old_mmap(0x40156000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x119000) = 0x40156000 old_mmap(0x4015c000, 15616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4015c000 close(3) = 0 munmap(0x40015000, 72386) = 0 rt_sigaction(SIGINT, {0x80f3b60, ~[], 0x4000000}, NULL, 8) = 0 rt_sigaction(SIGTERM, {0x80f3b60, ~[], 0x4000000}, NULL, 8) = 0 rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0 rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0 getpid() = 7258 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 close(3) = 0 socket(PF_INET6, SOCK_STREAM, 0) = 3 getsockname(3, {sin_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 close(3) = 0 brk(0) = 0x812bad4 brk(0x812bb4c) = 0x812bb4c brk(0x812c000) = 0x812c000 brk(0x812f000) = 0x812f000 brk(0x8132000) = 0x8132000 open("/usr/share/locale/C/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/libisc.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libisc.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/libisc.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libisc.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/libdns.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libdns.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/libdns.cat", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/C/LC_MESSAGES/libdns.cat", O_RDONLY) = -1 ENOENT (No such file or directory) write(2, "Note: nslookup is deprecated an"..., 206Note: nslookup is deprecated and may be removed from future releases. Consider using the `dig' or `host' programs instead. Run nslookup with the `-sil[ent]' option to prevent this message from appearing. ) = 206 open("/etc/resolv.conf", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=181, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 read(3, "domain baby-dragons.com\nsearch b"..., 4096) = 181 brk(0x8133000) = 0x8133000 brk(0x8134000) = 0x8134000 read(3, "", 4096) = 0 close(3) = 0 munmap(0x40015000, 4096) = 0 rt_sigaction(SIGHUP, {0x80f3b70, ~[], 0x4000000}, NULL, 8) = 0 brk(0x8135000) = 0x8135000 brk(0x8146000) = 0x8146000 brk(0x8157000) = 0x8157000 brk(0x8168000) = 0x8168000 brk(0x8179000) = 0x8179000 brk(0x818a000) = 0x818a000 brk(0x819b000) = 0x819b000 gettimeofday({1055863921, 890694}, NULL) = 0 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(3, SOL_SOCKET, SO_BSDCOMPAT, [1], 4) = 0 setsockopt(3, SOL_SOCKET, 0x1d /* SO_??? */, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}}, 16) = 0 recvmsg(3, 0xbffff590, 0) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1055863921, 891731}, NULL) = 0 sendmsg(3, {msg_name(16)={sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("199.33.245.254")}}, msg_iov(1)=[{"\231O\1\0\0\1\0\0\0\0\0\0\6eisner\5decus\3org\0\0\1"..., 34}], msg_controllen=0, msg_flags=0}, 0) = 34 gettimeofday({1055863921, 892168}, NULL) = 0 select(4, [3], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1055863921, 892448}, NULL) = 0 gettimeofday({1055863921, 892516}, NULL) = 0 select(4, [3], [], NULL, {0, 998178}) = 0 (Timeout) gettimeofday({1055863922, 887219}, NULL) = 0 gettimeofday({1055863922, 887407}, NULL) = 0 select(4, [3], [], NULL, {0, 3287}) = 0 (Timeout) gettimeofday({1055863922, 897256}, NULL) = 0 gettimeofday({1055863922, 897362}, NULL) = 0 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(4, SOL_SOCKET, SO_BSDCOMPAT, [1], 4) = 0 setsockopt(4, SOL_SOCKET, 0x1d /* SO_??? */, [1], 4) = 0 setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(4, {sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}}, 16) = 0 recvmsg(4, 0xbffff490, 0) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1055863922, 898193}, NULL) = 0 sendmsg(4, {msg_name(16)={sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.103.194.6")}}, msg_iov(1)=[{"\231O\1\0\0\1\0\0\0\0\0\0\6eisner\5decus\3org\0\0\1"..., 34}], msg_controllen=0, msg_flags=0}, 0) = 34 gettimeofday({1055863922, 898540}, NULL) = 0 select(5, [3 4], [], NULL, {0, 0}) = 0 (Timeout) gettimeofday({1055863922, 898803}, NULL) = 0 gettimeofday({1055863922, 898874}, NULL) = 0 select(5, [3 4], [], NULL, {0, 998488}) = 1 (in [4], left {0, 880000}) gettimeofday({1055863923, 22646}, NULL) = 0 recvmsg(4, {msg_name(16)={sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.103.194.6")}}, msg_iov(1)=[{"\231O\201\200\0\1\0\1\0\3\0\5\6eisner\5decus\3org\0\0\1"..., 65535}], msg_controllen=20, msg_control=0xbffff51c, , msg_flags=0}, 0) = 194 fstat64(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 close(4) = 0 close(3) = 0 getpid() = 7258 kill(7258, SIGTERM) = 0 --- SIGTERM (Terminated) --- sigreturn() = ? (mask now [RTMIN]) gettimeofday({1055863923, 25301}, NULL) = 0 brk(0x8131000) = 0x8131000 write(1, "Server:\t\t192.103.194.6\nAddress:\t"..., 123Server: 192.103.194.6 Address: 192.103.194.6#53 Non-authoritative answer: Name: eisner.decus.org Address: 192.135.80.34 ) = 123 munmap(0x40015000, 4096) = 0 _exit(0) = ? real 0m1.151s user 0m0.020s sys 0m0.010s # sh scripts/ver_linux >> where-is-it-spend-allitstime.log If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux filesrv1 2.4.21 #1 SMP Sun Jun 15 19:30:01 EDT 2003 i686 unknown Gnu C 2.95.3 Gnu make 3.79.1 util-linux 2.11r mount 2.11r modutils 2.4.25 e2fsprogs 1.32 jfsutils 1.0.18 reiserfsprogs 3.x.1b pcmcia-cs 3.1.33 PPP 2.4.1 Linux C Library 2.2.5 Dynamic linker (ldd) 2.2.5 Procps 2.0.13 Net-tools 1.60 Kbd 1.06 Sh-utils 2.0 Modules Loaded # cat .config >> where-is-it-spend-allitstime.log # # Automatically generated by make menuconfig: don't edit # CONFIG_X86=y # CONFIG_SBUS is not set CONFIG_UID16=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y # # Loadable module support # CONFIG_MODULES=y CONFIG_MODVERSIONS=y CONFIG_KMOD=y # # Processor type and features # # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set CONFIG_MPENTIUMIII=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MELAN is not set # CONFIG_MCRUSOE is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_X86_HAS_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_F00F_WORKS_OK=y CONFIG_X86_MCE=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_MICROCODE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_HIGHMEM=y CONFIG_HIGHIO=y # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y CONFIG_SMP=y # CONFIG_X86_NUMA is not set # CONFIG_X86_TSC_DISABLE is not set CONFIG_X86_TSC=y CONFIG_HAVE_DEC_LOCK=y # # General setup # CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_ISA=y CONFIG_PCI_NAMES=y # CONFIG_EISA is not set # CONFIG_MCA is not set CONFIG_HOTPLUG=y # # PCMCIA/CardBus support # # CONFIG_PCMCIA is not set # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # CONFIG_HOTPLUG_PCI_COMPAQ is not set # CONFIG_HOTPLUG_PCI_COMPAQ_NVRAM is not set # CONFIG_HOTPLUG_PCI_IBM is not set # CONFIG_HOTPLUG_PCI_ACPI is not set CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y # CONFIG_KCORE_AOUT is not set CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set # CONFIG_PM is not set # CONFIG_ACPI is not set # CONFIG_APM is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play configuration # # CONFIG_PNP is not set # CONFIG_ISAPNP is not set # # Block devices # CONFIG_BLK_DEV_FD=y # CONFIG_BLK_DEV_XD is not set # CONFIG_PARIDE is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_CISS_SCSI_TAPE is not set CONFIG_BLK_DEV_DAC960=y # CONFIG_BLK_DEV_UMEM is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_NBD is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=8192 CONFIG_BLK_DEV_INITRD=y CONFIG_BLK_STATS=y # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID5=y # CONFIG_MD_MULTIPATH is not set # CONFIG_BLK_DEV_LVM is not set # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_NETFILTER=y CONFIG_NETFILTER_DEBUG=y CONFIG_FILTER=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=y CONFIG_NET_IPGRE=y # CONFIG_NET_IPGRE_BROADCAST is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_INET_ECN=y CONFIG_SYN_COOKIES=y # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=y CONFIG_IP_NF_FTP=y # CONFIG_IP_NF_AMANDA is not set # CONFIG_IP_NF_TFTP is not set CONFIG_IP_NF_IRC=y CONFIG_IP_NF_QUEUE=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_MATCH_LIMIT=y CONFIG_IP_NF_MATCH_MAC=y CONFIG_IP_NF_MATCH_PKTTYPE=y CONFIG_IP_NF_MATCH_MARK=y CONFIG_IP_NF_MATCH_MULTIPORT=y CONFIG_IP_NF_MATCH_TOS=y CONFIG_IP_NF_MATCH_ECN=y CONFIG_IP_NF_MATCH_DSCP=y CONFIG_IP_NF_MATCH_AH_ESP=y CONFIG_IP_NF_MATCH_LENGTH=y CONFIG_IP_NF_MATCH_TTL=y CONFIG_IP_NF_MATCH_TCPMSS=y CONFIG_IP_NF_MATCH_HELPER=y CONFIG_IP_NF_MATCH_STATE=y CONFIG_IP_NF_MATCH_CONNTRACK=y CONFIG_IP_NF_MATCH_UNCLEAN=y CONFIG_IP_NF_MATCH_OWNER=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_MIRROR=y CONFIG_IP_NF_NAT=y CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=y CONFIG_IP_NF_TARGET_REDIRECT=y CONFIG_IP_NF_NAT_LOCAL=y CONFIG_IP_NF_NAT_SNMP_BASIC=y CONFIG_IP_NF_NAT_IRC=y CONFIG_IP_NF_NAT_FTP=y CONFIG_IP_NF_MANGLE=y CONFIG_IP_NF_TARGET_TOS=y CONFIG_IP_NF_TARGET_ECN=y CONFIG_IP_NF_TARGET_DSCP=y CONFIG_IP_NF_TARGET_MARK=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IP_NF_TARGET_ULOG=y CONFIG_IP_NF_TARGET_TCPMSS=y CONFIG_IP_NF_ARPTABLES=y CONFIG_IP_NF_ARPFILTER=y CONFIG_IPV6=y # # IPv6: Netfilter Configuration # CONFIG_IP6_NF_QUEUE=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_MATCH_LIMIT=y CONFIG_IP6_NF_MATCH_MAC=y # CONFIG_IP6_NF_MATCH_RT is not set # CONFIG_IP6_NF_MATCH_OPTS is not set # CONFIG_IP6_NF_MATCH_FRAG is not set # CONFIG_IP6_NF_MATCH_HL is not set CONFIG_IP6_NF_MATCH_MULTIPORT=y CONFIG_IP6_NF_MATCH_OWNER=y CONFIG_IP6_NF_MATCH_MARK=y # CONFIG_IP6_NF_MATCH_IPV6HEADER is not set # CONFIG_IP6_NF_MATCH_AHESP is not set CONFIG_IP6_NF_MATCH_LENGTH=y CONFIG_IP6_NF_MATCH_EUI64=y CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_LOG=y CONFIG_IP6_NF_MANGLE=y CONFIG_IP6_NF_TARGET_MARK=y # CONFIG_KHTTPD is not set CONFIG_ATM=y CONFIG_ATM_CLIP=y # CONFIG_ATM_CLIP_NO_ICMP is not set CONFIG_ATM_LANE=y CONFIG_ATM_MPOA=y # CONFIG_ATM_BR2684 is not set CONFIG_VLAN_8021Q=y # CONFIG_IPX is not set # CONFIG_ATALK is not set # # Appletalk devices # # CONFIG_DEV_APPLETALK is not set # CONFIG_DECNET is not set # CONFIG_BRIDGE is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_LLC is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_NET_FASTROUTE is not set # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # # Telephony Support # # CONFIG_PHONE is not set # CONFIG_PHONE_IXJ is not set # CONFIG_PHONE_IXJ_PCMCIA is not set # # ATA/IDE/MFM/RLL support # CONFIG_IDE=y # # IDE, ATA and ATAPI Block devices # CONFIG_BLK_DEV_IDE=y # CONFIG_BLK_DEV_HD_IDE is not set # CONFIG_BLK_DEV_HD is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y # CONFIG_IDEDISK_STROKE is not set # CONFIG_BLK_DEV_IDECS is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set CONFIG_BLK_DEV_CMD640=y # CONFIG_BLK_DEV_CMD640_ENHANCED is not set # CONFIG_BLK_DEV_ISAPNP is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_BLK_DEV_GENERIC=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_PCI_WIP is not set # CONFIG_BLK_DEV_ADMA100 is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_WDC_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_AMD74XX_OVERRIDE is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_HPT34X_AUTODMA is not set # CONFIG_BLK_DEV_HPT366 is not set CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_PDC202XX_BURST is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set CONFIG_BLK_DEV_RZ1000=y # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_SVWKS=y # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_CHIPSETS is not set CONFIG_IDEDMA_AUTO=y # CONFIG_IDEDMA_IVB is not set # CONFIG_DMA_NONPCI is not set CONFIG_BLK_DEV_IDE_MODES=y # CONFIG_BLK_DEV_ATARAID is not set # CONFIG_BLK_DEV_ATARAID_PDC is not set # CONFIG_BLK_DEV_ATARAID_HPT is not set # CONFIG_BLK_DEV_ATARAID_SII is not set # # SCSI support # CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_SD_EXTRA_DEVS=40 CONFIG_CHR_DEV_ST=y CONFIG_CHR_DEV_OSST=y CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_SR_EXTRA_DEVS=2 CONFIG_CHR_DEV_SG=y CONFIG_SCSI_DEBUG_QUEUES=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_7000FASST is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AHA152X is not set # CONFIG_SCSI_AHA1542 is not set # CONFIG_SCSI_AHA1740 is not set # CONFIG_SCSI_AACRAID is not set CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=32 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 # CONFIG_AIC7XXX_PROBE_EISA_VL is not set # CONFIG_AIC7XXX_BUILD_FIRMWARE is not set # CONFIG_SCSI_AIC79XX is not set CONFIG_SCSI_DPT_I2O=y # CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_IN2000 is not set # CONFIG_SCSI_AM53C974 is not set # CONFIG_SCSI_MEGARAID is not set CONFIG_SCSI_BUSLOGIC=y # CONFIG_SCSI_OMIT_FLASHPOINT is not set # CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_DTC3280 is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_DMA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_GENERIC_NCR5380 is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_NCR53C406A is not set # CONFIG_SCSI_NCR53C7xx is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set # CONFIG_SCSI_PAS16 is not set # CONFIG_SCSI_PCI2000 is not set # CONFIG_SCSI_PCI2220I is not set # CONFIG_SCSI_PSI240I is not set # CONFIG_SCSI_QLOGIC_FAS is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set # CONFIG_SCSI_SEAGATE is not set # CONFIG_SCSI_SIM710 is not set # CONFIG_SCSI_SYM53C416 is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_T128 is not set # CONFIG_SCSI_U14_34F is not set # CONFIG_SCSI_ULTRASTOR is not set # CONFIG_SCSI_NSP32 is not set # CONFIG_SCSI_DEBUG is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # CONFIG_FUSION_BOOT is not set # CONFIG_FUSION_ISENSE is not set # CONFIG_FUSION_CTL is not set # CONFIG_FUSION_LAN is not set # # IEEE 1394 (FireWire) support (EXPERIMENTAL) # # CONFIG_IEEE1394 is not set # # I2O device support # # CONFIG_I2O is not set # CONFIG_I2O_PCI is not set # CONFIG_I2O_BLOCK is not set # CONFIG_I2O_LAN is not set # CONFIG_I2O_SCSI is not set # CONFIG_I2O_PROC is not set # # Network device support # CONFIG_NETDEVICES=y # # ARCnet devices # # CONFIG_ARCNET is not set CONFIG_DUMMY=y # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set CONFIG_TUN=y # CONFIG_ETHERTAP is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y # CONFIG_SUNLANCE is not set # CONFIG_HAPPYMEAL is not set # CONFIG_SUNBMAC is not set # CONFIG_SUNQE is not set # CONFIG_SUNGEM is not set CONFIG_NET_VENDOR_3COM=y # CONFIG_EL1 is not set # CONFIG_EL2 is not set # CONFIG_ELPLUS is not set # CONFIG_EL16 is not set # CONFIG_EL3 is not set # CONFIG_3C515 is not set # CONFIG_ELMC is not set # CONFIG_ELMC_II is not set CONFIG_VORTEX=y # CONFIG_TYPHOON is not set # CONFIG_LANCE is not set # CONFIG_NET_VENDOR_SMC is not set # CONFIG_NET_VENDOR_RACAL is not set # CONFIG_AT1700 is not set # CONFIG_DEPCA is not set # CONFIG_HP100 is not set CONFIG_NET_ISA=y # CONFIG_E2100 is not set # CONFIG_EWRK3 is not set # CONFIG_EEXPRESS is not set # CONFIG_EEXPRESS_PRO is not set # CONFIG_HPLAN_PLUS is not set # CONFIG_HPLAN is not set # CONFIG_LP486E is not set # CONFIG_ETH16I is not set CONFIG_NE2000=y CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_AC3200 is not set # CONFIG_APRICOT is not set # CONFIG_CS89x0 is not set CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set CONFIG_DE4X5=y # CONFIG_DGRS is not set # CONFIG_DM9102 is not set CONFIG_EEPRO100=y # CONFIG_EEPRO100_PIO is not set # CONFIG_E100 is not set # CONFIG_LNE390 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set CONFIG_NE2K_PCI=y # CONFIG_NE3210 is not set # CONFIG_ES3210 is not set # CONFIG_8139CP is not set # CONFIG_8139TOO is not set # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set # CONFIG_8139_OLD_RX_RESET is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_SUNDANCE_MMIO is not set CONFIG_TLAN=y # CONFIG_TC35815 is not set # CONFIG_VIA_RHINE is not set # CONFIG_VIA_RHINE_MMIO is not set # CONFIG_WINBOND_840 is not set # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_MYRI_SBUS is not set CONFIG_NS83820=y # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PLIP is not set CONFIG_PPP=y CONFIG_PPP_MULTILINK=y CONFIG_PPP_FILTER=y CONFIG_PPP_ASYNC=y CONFIG_PPP_SYNC_TTY=y CONFIG_PPP_DEFLATE=y CONFIG_PPP_BSDCOMP=y # CONFIG_PPPOE is not set CONFIG_PPPOATM=y # CONFIG_SLIP is not set # # Wireless LAN (non-hamradio) # CONFIG_NET_RADIO=y # CONFIG_STRIP is not set # CONFIG_WAVELAN is not set # CONFIG_ARLAN is not set # CONFIG_AIRONET4500 is not set # CONFIG_AIRONET4500_NONCS is not set # CONFIG_AIRONET4500_PROC is not set # CONFIG_AIRO is not set # CONFIG_HERMES is not set # CONFIG_PLX_HERMES is not set # CONFIG_PCI_HERMES is not set CONFIG_NET_WIRELESS=y # # Token Ring devices # # CONFIG_TR is not set CONFIG_NET_FC=y CONFIG_IPHASE5526=y # CONFIG_RCPCI is not set # CONFIG_SHAPER is not set # # Wan interfaces # # CONFIG_WAN is not set # # ATM drivers # # CONFIG_ATM_TCP is not set # CONFIG_ATM_LANAI is not set # CONFIG_ATM_ENI is not set # CONFIG_ATM_FIRESTREAM is not set # CONFIG_ATM_ZATM is not set CONFIG_ATM_NICSTAR=y CONFIG_ATM_NICSTAR_USE_SUNI=y # CONFIG_ATM_NICSTAR_USE_IDT77105 is not set # CONFIG_ATM_IDT77252 is not set # CONFIG_ATM_AMBASSADOR is not set # CONFIG_ATM_HORIZON is not set # CONFIG_ATM_IA is not set CONFIG_ATM_FORE200E_MAYBE=y CONFIG_ATM_FORE200E_PCA=y CONFIG_ATM_FORE200E_PCA_DEFAULT_FW=y CONFIG_ATM_FORE200E_TX_RETRY=16 CONFIG_ATM_FORE200E_DEBUG=1 CONFIG_ATM_FORE200E=y # # Amateur Radio support # # CONFIG_HAMRADIO is not set # # IrDA (infrared) support # # CONFIG_IRDA is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Old CD-ROM drivers (not SCSI, not IDE) # # CONFIG_CD_NO_IDESCSI is not set # # Input core support # CONFIG_INPUT=y # CONFIG_INPUT_KEYBDEV is not set # CONFIG_INPUT_MOUSEDEV is not set # CONFIG_INPUT_JOYDEV is not set CONFIG_INPUT_EVDEV=y # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_SERIAL_CONSOLE=y CONFIG_SERIAL_EXTENDED=y CONFIG_SERIAL_MANY_PORTS=y CONFIG_SERIAL_SHARE_IRQ=y # CONFIG_SERIAL_DETECT_IRQ is not set # CONFIG_SERIAL_MULTIPORT is not set # CONFIG_HUB6 is not set # CONFIG_SERIAL_NONSTANDARD is not set CONFIG_UNIX98_PTYS=y CONFIG_UNIX98_PTY_COUNT=256 # # I2C support # # CONFIG_I2C is not set # # Mice # # CONFIG_BUSMOUSE is not set CONFIG_MOUSE=y CONFIG_PSMOUSE=y # CONFIG_82C710_MOUSE is not set # CONFIG_PC110_PAD is not set # CONFIG_MK712_MOUSE is not set # # Joysticks # # CONFIG_INPUT_GAMEPORT is not set # CONFIG_INPUT_NS558 is not set # CONFIG_INPUT_LIGHTNING is not set # CONFIG_INPUT_PCIGAME is not set # CONFIG_INPUT_CS461X is not set # CONFIG_INPUT_EMU10K1 is not set # CONFIG_INPUT_SERIO is not set # CONFIG_INPUT_SERPORT is not set # CONFIG_INPUT_ANALOG is not set # CONFIG_INPUT_A3D is not set # CONFIG_INPUT_ADI is not set # CONFIG_INPUT_COBRA is not set # CONFIG_INPUT_GF2K is not set # CONFIG_INPUT_GRIP is not set # CONFIG_INPUT_INTERACT is not set # CONFIG_INPUT_TMDC is not set # CONFIG_INPUT_SIDEWINDER is not set # CONFIG_INPUT_IFORCE_USB is not set # CONFIG_INPUT_IFORCE_232 is not set # CONFIG_INPUT_WARRIOR is not set # CONFIG_INPUT_MAGELLAN is not set # CONFIG_INPUT_SPACEORB is not set # CONFIG_INPUT_SPACEBALL is not set # CONFIG_INPUT_STINGER is not set # CONFIG_INPUT_DB9 is not set # CONFIG_INPUT_GAMECON is not set # CONFIG_INPUT_TURBOGRAFX is not set # CONFIG_QIC02_TAPE is not set # CONFIG_IPMI_HANDLER is not set # CONFIG_IPMI_PANIC_EVENT is not set # CONFIG_IPMI_DEVICE_INTERFACE is not set # CONFIG_IPMI_KCS is not set # CONFIG_IPMI_WATCHDOG is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_SCx200_GPIO is not set # CONFIG_AMD_RNG is not set CONFIG_INTEL_RNG=y # CONFIG_AMD_PM768 is not set # CONFIG_NVRAM is not set CONFIG_RTC=y # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # CONFIG_SONYPI is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_AGP_I810=y # CONFIG_AGP_VIA is not set # CONFIG_AGP_AMD is not set # CONFIG_AGP_AMD_8151 is not set # CONFIG_AGP_SIS is not set # CONFIG_AGP_ALI is not set CONFIG_AGP_SWORKS=y CONFIG_DRM=y # CONFIG_DRM_OLD is not set CONFIG_DRM_NEW=y CONFIG_DRM_TDFX=y CONFIG_DRM_R128=y # CONFIG_DRM_RADEON is not set # CONFIG_DRM_I810 is not set # CONFIG_DRM_I810_XFREE_41 is not set # CONFIG_DRM_I830 is not set # CONFIG_DRM_MGA is not set # CONFIG_DRM_SIS is not set # CONFIG_MWAVE is not set # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # File systems # CONFIG_QUOTA=y # CONFIG_AUTOFS_FS is not set # CONFIG_AUTOFS4_FS is not set # CONFIG_REISERFS_FS is not set # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set # CONFIG_ADFS_FS is not set # CONFIG_ADFS_FS_RW is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BEFS_DEBUG is not set # CONFIG_BFS_FS is not set CONFIG_EXT3_FS=y CONFIG_JBD=y CONFIG_JBD_DEBUG=y CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y # CONFIG_UMSDOS_FS is not set CONFIG_VFAT_FS=y CONFIG_EFS_FS=y # CONFIG_JFFS_FS is not set # CONFIG_JFFS2_FS is not set CONFIG_CRAMFS=y CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y # CONFIG_JFS_FS is not set # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_MINIX_FS=y # CONFIG_VXFS_FS is not set CONFIG_NTFS_FS=y CONFIG_NTFS_RW=y # CONFIG_HPFS_FS is not set CONFIG_PROC_FS=y # CONFIG_DEVFS_FS is not set # CONFIG_DEVFS_MOUNT is not set # CONFIG_DEVFS_DEBUG is not set CONFIG_DEVPTS_FS=y # CONFIG_QNX4FS_FS is not set # CONFIG_QNX4FS_RW is not set CONFIG_ROMFS_FS=y CONFIG_EXT2_FS=y # CONFIG_SYSV_FS is not set CONFIG_UDF_FS=y CONFIG_UDF_RW=y CONFIG_UFS_FS=y CONFIG_UFS_FS_WRITE=y # # Network File Systems # # CONFIG_CODA_FS is not set # CONFIG_INTERMEZZO_FS is not set CONFIG_NFS_FS=y CONFIG_NFS_V3=y # CONFIG_ROOT_NFS is not set CONFIG_NFSD=y CONFIG_NFSD_V3=y CONFIG_NFSD_TCP=y CONFIG_SUNRPC=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_SMB_FS=y # CONFIG_SMB_NLS_DEFAULT is not set # CONFIG_NCP_FS is not set # CONFIG_NCPFS_PACKET_SIGNING is not set # CONFIG_NCPFS_IOCTL_LOCKING is not set # CONFIG_NCPFS_STRONG is not set # CONFIG_NCPFS_NFS_NS is not set # CONFIG_NCPFS_OS2_NS is not set # CONFIG_NCPFS_SMALLDOS is not set # CONFIG_NCPFS_NLS is not set # CONFIG_NCPFS_EXTRAS is not set CONFIG_ZISOFS_FS=y # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y # CONFIG_MINIX_SUBPARTITION is not set CONFIG_SOLARIS_X86_PARTITION=y # CONFIG_UNIXWARE_DISKLABEL is not set CONFIG_LDM_PARTITION=y CONFIG_LDM_DEBUG=y # CONFIG_SGI_PARTITION is not set CONFIG_ULTRIX_PARTITION=y CONFIG_SUN_PARTITION=y # CONFIG_EFI_PARTITION is not set CONFIG_SMB_NLS=y CONFIG_NLS=y # # Native Language Support # CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Console drivers # CONFIG_VGA_CONSOLE=y CONFIG_VIDEO_SELECT=y # CONFIG_MDA_CONSOLE is not set # # Frame-buffer support # CONFIG_FB=y CONFIG_DUMMY_CONSOLE=y # CONFIG_FB_RIVA is not set # CONFIG_FB_CLGEN is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_PM3 is not set # CONFIG_FB_CYBER2000 is not set # CONFIG_FB_VESA is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_HGA is not set CONFIG_VIDEO_SELECT=y # CONFIG_FB_MATROX is not set # CONFIG_FB_ATY is not set # CONFIG_FB_RADEON is not set CONFIG_FB_ATY128=y # CONFIG_FB_INTEL is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_VIRTUAL is not set CONFIG_FBCON_ADVANCED=y # CONFIG_FBCON_MFB is not set # CONFIG_FBCON_CFB2 is not set # CONFIG_FBCON_CFB4 is not set CONFIG_FBCON_CFB8=y CONFIG_FBCON_CFB16=y CONFIG_FBCON_CFB24=y CONFIG_FBCON_CFB32=y # CONFIG_FBCON_AFB is not set # CONFIG_FBCON_ILBM is not set # CONFIG_FBCON_IPLAN2P2 is not set # CONFIG_FBCON_IPLAN2P4 is not set # CONFIG_FBCON_IPLAN2P8 is not set # CONFIG_FBCON_MAC is not set # CONFIG_FBCON_VGA_PLANES is not set CONFIG_FBCON_VGA=y # CONFIG_FBCON_HGA is not set # CONFIG_FBCON_FONTWIDTH8_ONLY is not set CONFIG_FBCON_FONTS=y CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # CONFIG_FONT_SUN8x16 is not set # CONFIG_FONT_SUN12x22 is not set # CONFIG_FONT_6x11 is not set # CONFIG_FONT_PEARL_8x8 is not set # CONFIG_FONT_ACORN_8x8 is not set # # Sound # # CONFIG_SOUND is not set # # USB support # CONFIG_USB=y CONFIG_USB_DEBUG=y CONFIG_USB_DEVICEFS=y CONFIG_USB_BANDWIDTH=y # CONFIG_USB_EHCI_HCD is not set CONFIG_USB_UHCI_ALT=y CONFIG_USB_OHCI=y # CONFIG_USB_AUDIO is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_MIDI is not set CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set # CONFIG_USB_STORAGE_DPCM is not set # CONFIG_USB_STORAGE_HP8200e is not set # CONFIG_USB_STORAGE_SDDR09 is not set # CONFIG_USB_STORAGE_SDDR55 is not set # CONFIG_USB_STORAGE_JUMPSHOT is not set # CONFIG_USB_ACM is not set CONFIG_USB_PRINTER=y # CONFIG_USB_HID is not set # CONFIG_USB_HIDINPUT is not set # CONFIG_USB_HIDDEV is not set # CONFIG_USB_KBD is not set # CONFIG_USB_MOUSE is not set # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_DC2XX is not set # CONFIG_USB_MDC800 is not set CONFIG_USB_SCANNER=y # CONFIG_USB_MICROTEK is not set CONFIG_USB_HPUSBSCSI=y # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_CATC is not set # CONFIG_USB_CDCETHER is not set # CONFIG_USB_USBNET is not set # CONFIG_USB_USS720 is not set # # USB Serial Converter support # # CONFIG_USB_SERIAL is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_TIGL is not set # CONFIG_USB_BRLVGER is not set # CONFIG_USB_LCD is not set # # Bluetooth support # CONFIG_BLUEZ=y CONFIG_BLUEZ_L2CAP=y # CONFIG_BLUEZ_SCO is not set # CONFIG_BLUEZ_RFCOMM is not set # CONFIG_BLUEZ_BNEP is not set # # Bluetooth device drivers # CONFIG_BLUEZ_HCIUSB=y # CONFIG_BLUEZ_USB_SCO is not set # CONFIG_BLUEZ_USB_ZERO_PACKET is not set # CONFIG_BLUEZ_HCIUART is not set # CONFIG_BLUEZ_HCIDTL1 is not set # CONFIG_BLUEZ_HCIBT3C is not set # CONFIG_BLUEZ_HCIBLUECARD is not set # CONFIG_BLUEZ_HCIBTUART is not set # CONFIG_BLUEZ_HCIVHCI is not set # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_HIGHMEM=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_IOVIRT=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_SPINLOCK=y CONFIG_FRAME_POINTER=y # # Library routines # CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y # lspci -v >> /home/babydr/where-is-it-spend-allitstime.log 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23) Flags: fast devsel Memory at f0000000 (32-bit, prefetchable) [disabled] [size=32M] Memory at effff000 (32-bit, non-prefetchable) [disabled] [size=4K] 00:00.1 PCI bridge: ServerWorks CNB20LE Host Bridge (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 64 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 0000b000-0000bfff Memory behind bridge: fe400000-fe4fffff Prefetchable memory behind bridge: f2000000-fa0fffff Capabilities: [80] AGP version 2.0 00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01) Flags: medium devsel 00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01) Flags: medium devsel 00:01.0 SCSI storage controller: LSI Logic / Symbios Logic (formerly NCR) 53c1010 66MHz Ultra3 SCSI Adapter (rev 01) Subsystem: LSI Logic / Symbios Logic (formerly NCR): Unknown device 1000 Flags: bus master, medium devsel, latency 72, IRQ 29 I/O ports at c400 [size=256] Memory at fe9ff800 (64-bit, non-prefetchable) [size=1K] Memory at fe9f6000 (64-bit, non-prefetchable) [size=8K] Expansion ROM at fe9f0000 [disabled] [size=16K] Capabilities: [40] Power Management version 2 00:01.1 SCSI storage controller: LSI Logic / Symbios Logic (formerly NCR) 53c1010 66MHz Ultra3 SCSI Adapter (rev 01) Subsystem: LSI Logic / Symbios Logic (formerly NCR): Unknown device 1000 Flags: bus master, medium devsel, latency 72, IRQ 28 I/O ports at c800 [size=256] Memory at fe9ffc00 (64-bit, non-prefetchable) [size=1K] Memory at fe9fc000 (64-bit, non-prefetchable) [size=8K] Expansion ROM at fe9f8000 [disabled] [size=16K] Capabilities: [40] Power Management version 2 00:02.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 09) Subsystem: Ensoniq Creative Sound Blaster AudioPCI64V, AudioPCI128 Flags: bus master, slow devsel, latency 64, IRQ 30 I/O ports at cf00 [size=64] Capabilities: [dc] Power Management version 2 00:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Server Adapter (PILA8470B) Flags: bus master, medium devsel, latency 64, IRQ 23 Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K] I/O ports at ce80 [size=64] Memory at fe800000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at fe700000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 51) Subsystem: ServerWorks OSB4 South Bridge Flags: bus master, medium devsel, latency 0 00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP]) Flags: bus master, medium devsel, latency 64 I/O ports at ffa0 [size=16] 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 04) (prog-if 10 [OHCI]) Subsystem: ServerWorks OSB4/CSB5 USB Controller Flags: bus master, medium devsel, latency 64, IRQ 10 Memory at fe9f5000 (32-bit, non-prefetchable) [size=4K] 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 PF/PRO AGP 4x TMDS (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc Rage Fury Pro/Xpert 2000 Pro Flags: bus master, stepping, 66Mhz, medium devsel, latency 64, IRQ 17 Memory at f4000000 (32-bit, prefetchable) [size=64M] I/O ports at b800 [size=256] Memory at fe4fc000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at fe4c0000 [disabled] [size=128K] Capabilities: [50] AGP version 2.0 Capabilities: [5c] Power Management version 2 02:01.0 Ethernet controller: National Semiconductor Corporation DP83820 10/100/1000 Ethernet Controller Subsystem: National Semiconductor Corporation: Unknown device f022 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 24 I/O ports at e800 [size=256] Memory at febff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at febe0000 [disabled] [size=64K] Capabilities: [40] Power Management version 2 02:02.0 I2O: Distributed Processing Technology SmartRAID V Controller (rev 01) (prog-if 01) Subsystem: Distributed Processing Technology 3010S Fibre Channel Flags: bus master, 66Mhz, slow devsel, latency 64, IRQ 26 BIST result: 00 Memory at fc000000 (32-bit, prefetchable) [size=32M] Expansion ROM at [disabled] [size=32K] Capabilities: [80] Power Management version 2 02:02.1 PCI bridge: Distributed Processing Technology PCI Bridge (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, slow devsel, latency 64 Bus: primary=02, secondary=03, subordinate=03, sec-latency=64 I/O behind bridge: 0000d000-0000dfff Capabilities: [68] Power Management version 2 ### dmesg output saved right after first boot into 2.4.21 . JimL # cat dmesg.20030615195422 >> where-is-it-spend-allitstime.log Linux version 2.4.21 (root@(none)) (gcc version 2.95.3 20010315 (release)) #1 SMP Sun Jun 15 19:30:01 EDT 2003 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000080000000 (usable) BIOS-e820: 00000000fec00000 - 00000000fec02000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) 1152MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000ff780 hm, page 000ff000 reserved twice. hm, page 00100000 reserved twice. hm, page 000f0000 reserved twice. hm, page 000f1000 reserved twice. On node 0 totalpages: 524288 zone(0): 4096 pages. zone(1): 225280 pages. zone(2): 294912 pages. Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: AMI Product ID: CNB20HE APIC at: 0xFEE00000 Processor #0 Pentium(tm) Pro APIC version 17 Processor #1 Pentium(tm) Pro APIC version 17 I/O APIC #4 Version 17 at 0xFEC00000. I/O APIC #5 Version 17 at 0xFEC01000. Enabling APIC mode: Flat.^IUsing 2 I/O APICs Processors: 2 Kernel command line: BOOT_IMAGE=filesrv1 ro root=801 ramdisk=8192 Initializing CPU#0 Detected 849.158 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 1690.82 BogoMIPS Memory: 2067164k/2097152k available (3033k kernel code, 29600k reserved, 1093k data, 216k init, 1179648k highmem) Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer-cache hash table entries: 131072 (order: 7, 524288 bytes) Page-cache hash table entries: 524288 (order: 9, 2097152 bytes) CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU0: Intel Pentium III (Coppermine) stepping 0a per-CPU timeslice cutoff: 731.33 usecs. enabled ExtINT on CPU#0 ESR value before enabling vector: 00000004 ESR value after enabling vector: 00000000 Booting processor 1/1 eip 2000 Initializing CPU#1 masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 1697.38 BogoMIPS CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. Intel machine check reporting enabled on CPU#1. CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU1: Intel Pentium III (Coppermine) stepping 0a Total of 2 processors activated (3388.21 BogoMIPS). ENABLING IO-APIC IRQs Setting 4 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 4 ... ok. Setting 5 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 5 ... ok. init IO_APIC IRQs IO-APIC (apicid-pin) 4-0, 4-5, 4-9, 4-11, 4-14, 4-15, 5-0, 5-2, 5-3, 5-4, 5-5, 5-6, 5-9, 5-11, 5-15 not connected. ..TIMER: vector=0x31 pin1=2 pin2=0 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... ..... (found pin 0) ...works. number of MP IRQ sources: 18. number of IO-APIC #4 registers: 16. number of IO-APIC #5 registers: 16. testing the IO APIC....................... IO APIC #4...... .... register #00: 04000000 ....... : physical APIC id: 04 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 000F0011 ....... : max redirection entries: 000F ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 003 03 0 0 0 0 0 1 1 31 01 003 03 0 0 0 0 0 1 1 39 02 000 00 1 0 0 0 0 0 0 00 03 003 03 0 0 0 0 0 1 1 41 04 003 03 0 0 0 0 0 1 1 49 05 000 00 1 0 0 0 0 0 0 00 06 003 03 0 0 0 0 0 1 1 51 07 003 03 0 0 0 0 0 1 1 59 08 003 03 0 0 0 0 0 1 1 61 09 000 00 1 0 0 0 0 0 0 00 0a 003 03 1 1 0 1 0 1 1 69 0b 000 00 1 0 0 0 0 0 0 00 0c 003 03 0 0 0 0 0 1 1 71 0d 003 03 0 0 0 0 0 1 1 79 0e 000 00 1 0 0 0 0 0 0 00 0f 000 00 1 0 0 0 0 0 0 00 IO APIC #5...... .... register #00: 05000000 ....... : physical APIC id: 05 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 000F0011 ....... : max redirection entries: 000F ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 01000000 ....... : arbitration: 01 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 003 03 1 1 0 1 0 1 1 81 02 000 00 1 0 0 0 0 0 0 00 03 000 00 1 0 0 0 0 0 0 00 04 000 00 1 0 0 0 0 0 0 00 05 000 00 1 0 0 0 0 0 0 00 06 000 00 1 0 0 0 0 0 0 00 07 003 03 1 1 0 1 0 1 1 89 08 003 03 1 1 0 1 0 1 1 91 09 000 00 1 0 0 0 0 0 0 00 0a 003 03 1 1 0 1 0 1 1 99 0b 000 00 1 0 0 0 0 0 0 00 0c 003 03 1 1 0 1 0 1 1 A1 0d 003 03 1 1 0 1 0 1 1 A9 0e 003 03 1 1 0 1 0 1 1 B1 0f 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:0 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ10 -> 0:10 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ17 -> 1:1 IRQ23 -> 1:7 IRQ24 -> 1:8 IRQ26 -> 1:10 IRQ28 -> 1:12 IRQ29 -> 1:13 IRQ30 -> 1:14 .................................... done. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 849.1599 MHz. ..... host bus clock speed is 99.9008 MHz. cpu: 0, clocks: 999008, slice: 333002 CPU0 cpu: 1, clocks: 999008, slice: 333002 CPU1 checking TSC synchronization across CPUs: passed. Waiting on wait_init_idle (map = 0x2) All processors have done init_idle PCI: PCI BIOS revision 2.10 entry at 0xfdbc1, last bus=3 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: Discovered primary peer bus 02 [IRQ] PCI: Using IRQ router ServerWorks [1166/0200] at 00:0f.0 PCI->APIC IRQ transform: (B0,I1,P0) -> 29 PCI->APIC IRQ transform: (B0,I1,P1) -> 28 PCI->APIC IRQ transform: (B0,I2,P0) -> 30 PCI->APIC IRQ transform: (B0,I7,P0) -> 23 PCI->APIC IRQ transform: (B0,I15,P0) -> 10 PCI->APIC IRQ transform: (B1,I0,P0) -> 17 PCI->APIC IRQ transform: (B2,I1,P0) -> 24 PCI->APIC IRQ transform: (B2,I2,P0) -> 26 Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket BlueZ Core ver 2.2 Copyright (C) 2000,2001 Qualcomm Inc Written 2000,2001 by Maxim Krasnyansky IA-32 Microcode Update Driver: v1.11 Starting kswapd allocated 32 pages and 32 bhs reserved for the highmem bounces VFS: Diskquotas version dquot_6.4.0 initialized Journalled Block Device driver loaded Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NTFS driver v1.1.22 [Flags: R/W] EFS: 1.0a - http://aeschi.ch.eu.org/efs/ udf: registering filesystem aty128fb: Rage128 BIOS located at segment C00C0000 aty128fb: Rage128 Pro PF (AGP) [chip rev 0x1] 32M 128-bit SDR SGRAM (1:1) Console: switching to colour frame buffer device 80x30 fb0: ATY Rage128 frame buffer device on PCI mtrr: type mismatch for f4000000,2000000 old: uncachable new: write-combining aty128fb: Rage128 MTRR set to ON pty: 256 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A Real Time Clock Driver v1.10e Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 fore200e: FORE Systems 200E-series driver - version 0.2d RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize loop: loaded (max 8 devices) ThunderLAN driver v1.15 TLAN: 0 devices installed, PCI: 0 EISA: 0 ns83820.c: National Semiconductor DP83820 10/100/1000 driver. eth0: ns83820.c: 0x22c: f022100b, subsystem: 100b:f022 eth0: detected 64 bit PCI data bus. eth0: enabling optical transceiver eth0: ns83820 v0.20: DP83820 v1.3: 00:40:f4:66:df:ed io=0xfebff000 irq=24 f=sg eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin and others eth1: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:E0:81:04:D2:78, IRQ 23. Board assembly 567812-052, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). PPP generic driver version 2.4.2 PPP Deflate Compression module registered PPP BSD Compression module registered Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 1920M agpgart: AGP aperture is 32M @ 0xf0000000 [drm] Initialized tdfx 1.0.0 20010216 on minor 0 [drm] AGP 0.99 on Serverworks HE @ 0xf0000000 32MB mtrr: type mismatch for f0000000,2000000 old: uncachable new: write-combining [drm] Initialized r128 2.2.0 20010917 on minor 1 Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx SvrWks OSB4: IDE controller at PCI slot 00:0f.1 PCI: Enabling device 00:0f.1 (0000 -> 0001) SvrWks OSB4: chipset revision 0 SvrWks OSB4: not 100%% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio SCSI subsystem driver Revision: 1.00 sym.0.1.1: setting PCI_COMMAND_PARITY... sym.0.1.0: setting PCI_COMMAND_PARITY... sym0: <1010-66> rev 0x1 on pci bus 0 device 1 function 0 irq 29 sym0: using 64 bit DMA addressing sym0: Symbios NVRAM, ID 7, Fast-80, SE, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym0: SCSI BUS has been reset. sym1: <1010-66> rev 0x1 on pci bus 0 device 1 function 1 irq 28 sym1: using 64 bit DMA addressing sym1: Symbios NVRAM, ID 7, Fast-80, SE, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. sym1: SCSI BUS has been reset. scsi0 : sym-2.1.19-pre3 scsi1 : sym-2.1.19-pre3 Vendor: SEAGATE Model: ST118273LW Rev: 6246 Type: Direct-Access ANSI SCSI revision: 02 sym0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50.0 ns, offset 15) Vendor: IBM Model: DDRS-34560D Rev: DC1B Type: Direct-Access ANSI SCSI revision: 02 sym0:2: FAST-20 WIDE SCSI 40.0 MB/s ST (50.0 ns, offset 15) Vendor: COMPAQ Model: ST34371W Rev: 0940 Type: Direct-Access ANSI SCSI revision: 02 sym0:0:0: tagged command queuing enabled, command queue depth 16. sym0:1:0: tagged command queuing enabled, command queue depth 16. sym0:2:0: tagged command queuing enabled, command queue depth 16. Vendor: COMPAQ Model: TSL-9000 Rev: 2.06 Type: Sequential-Access ANSI SCSI revision: 02 Vendor: COMPAQ Model: TSL-9000 Rev: 2.06 Type: Medium Changer ANSI SCSI revision: 02 Vendor: HP Model: CD-Writer+ 9200 Rev: 1.0e Type: CD-ROM ANSI SCSI revision: 04 Vendor: HP Model: CD-Writer+ 9200 Rev: 1.0c Type: CD-ROM ANSI SCSI revision: 04 Loading Adaptec I2O RAID: Version 2.4 Build 5 Detecting Adaptec I2O RAID controllers... Adaptec I2O RAID controller 0 at fa847000 size=100000 irq=26 dpti: If you have a lot of devices this could take a few minutes. dpti0: Reading the hardware resource table. TID 008 Vendor: ADAPTEC Device: AIC-7899 Rev: 00000001 TID 525 Vendor: ADAPTEC Device: RAID-5 Rev: 380E scsi2 : Vendor: Adaptec Model: 2110S FW:380E Vendor: ADAPTEC Model: RAID-5 Rev: 380E Type: Direct-Access ANSI SCSI revision: 02 st: Version 20020805, bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16 Attached scsi tape st0 at scsi1, channel 0, id 4, lun 0 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0 Attached scsi disk sdc at scsi0, channel 0, id 2, lun 0 Attached scsi disk sdd at scsi2, channel 0, id 0, lun 0 sym0:0: FAST-20 WIDE SCSI 40.0 MB/s ST (50.0 ns, offset 15) SCSI device sda: 35566480 512-byte hdwr sectors (18210 MB) Partition check: sda:<7>ldm_validate_partition_table(): Found an MS-DOS partition table, not a dynamic disk. sda1 sda2 SCSI device sdb: 8925000 512-byte hdwr sectors (4570 MB) sdb:<7>ldm_validate_partition_table(): Found an MS-DOS partition table, not a dynamic disk. sdb1 sdb2 SCSI device sdc: 8386000 512-byte hdwr sectors (4294 MB) sdc:<7>ldm_validate_partition_table(): Found an MS-DOS partition table, not a dynamic disk. sdc1 sdc2 SCSI device sdd: 177827840 512-byte hdwr sectors (91048 MB) sdd:<7>ldm_validate_partition_table(): Found an MS-DOS partition table, not a dynamic disk. sdd1 sdd2 Attached scsi CD-ROM sr0 at scsi1, channel 0, id 5, lun 0 Attached scsi CD-ROM sr1 at scsi1, channel 0, id 6, lun 0 sym1:5: FAST-10 SCSI 10.0 MB/s ST (100.0 ns, offset 15) sr0: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.12 sym1:6: FAST-10 SCSI 10.0 MB/s ST (100.0 ns, offset 15) sr1: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray Attached scsi generic sg4 at scsi1, channel 0, id 4, lun 1, type 8 usb.c: registered new driver usbdevfs usb.c: registered new driver hub host/uhci.c: USB Universal Host Controller Interface driver v1.1 host/usb-ohci.c: USB OHCI at membase 0xfa948000, IRQ 10 host/usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller usb.c: new USB bus registered, assigned bus number 1 usb.c: kmalloc IF c283b76c, numif 1 usb.c: new device strings: Mfr=0, Product=2, SerialNumber=1 usb.c: USB device number 1 default language ID 0x0 Product: USB OHCI Root Hub SerialNumber: fa948000 hub.c: USB hub found hub.c: 4 ports detected hub.c: standalone hub hub.c: ganged power switching hub.c: global over-current protection hub.c: Port indicators are not supported hub.c: power on to power good time: 2ms hub.c: hub controller current requirement: 0mA hub.c: port removable status: RRRR hub.c: local power source is good hub.c: no over-current condition exists hub.c: enabling power on all ports usb.c: hub driver claimed interface c283b76c usb.c: kusbd: /sbin/hotplug add 1 usb.c: kusbd policy returned 0xfffffffe usb.c: registered new driver usbscanner scanner.c: 0.4.12:USB Scanner Driver usb.c: registered new driver usblp printer.c: v0.11: USB Printer Device Class driver hpusbscsi.c: [hpusbscsi_init:250] driver loaded, DebugLvel=0 usb.c: registered new driver hpusbscsi Initializing USB Mass Storage driver... usb.c: registered new driver usb-storage USB Mass Storage support registered. md: linear personality registered as nr 1 md: raid0 personality registered as nr 2 md: raid1 personality registered as nr 3 md: raid5 personality registered as nr 4 raid5: measuring checksumming speed 8regs : 1482.000 MB/sec 32regs : 979.200 MB/sec pIII_sse : 1690.000 MB/sec pII_mmx : 1873.200 MB/sec p5_mmx : 2003.600 MB/sec raid5: using function: pIII_sse (1690.000 MB/sec) md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. BlueZ HCI USB driver ver 2.4 Copyright (C) 2000,2001 Qualcomm Inc Written 2000,2001 by Maxim Krasnyansky usb.c: registered new driver hci_usb NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 8192 buckets, 128Kbytes TCP: Hash tables configured (established 131072 bind 43690) IPv4 over IPv4 tunneling driver GRE over IPv4 tunneling driver ip_conntrack version 2.1 (8192 buckets, 65536 max) - 292 bytes per conntrack ip_tables: (C) 2000-2002 Netfilter core team arp_tables: (C) 2002 David S. Miller NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. IPv6 v0.8 for NET4.0 IPv6 over IPv4 tunneling driver ip6_tables: (C) 2000-2002 Netfilter core team registering ipv6 mark target BlueZ L2CAP ver 2.1 Copyright (C) 2000,2001 Qualcomm Inc Written 2000,2001 by Maxim Krasnyansky lec.c: Jun 15 2003 19:36:52 initialized 802.1Q VLAN Support v1.8 Ben Greear All bugs added by David S. Miller VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 216k freed Adding Swap: 525288k swap-space (priority -1) Adding Swap: 524480k swap-space (priority -2) eth0: link now 1000F mbps, full duplex and up. eth1: no IPv6 routers present eth0: no IPv6 routers present mtrr: type mismatch for f4000000,2000000 old: uncachable new: write-combining mtrr: type mismatch for f4000000,2000000 old: uncachable new: write-combining - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From davem@redhat.com Tue Jun 17 13:41:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:41:55 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKfW2x022135 for ; Tue, 17 Jun 2003 13:41:32 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04796; Tue, 17 Jun 2003 13:36:35 -0700 Date: Tue, 17 Jun 2003 13:36:35 -0700 (PDT) Message-Id: <20030617.133635.84366118.davem@redhat.com> To: sim@netnation.com Cc: gandalf@wlug.westbo.se, Robert.Olsson@data.slu.se, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030617203703.GB25773@netnation.com> References: <20030617200721.GA25773@netnation.com> <1055881034.3199.43.camel@tux.rsn.bth.se> <20030617203703.GB25773@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3358 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Tue, 17 Jun 2003 13:37:03 -0700 Forwarding rate more than doubles when I turn off rp_filter off (Debian turns it on by default). I have no idea why they do this, it's the stupidest thing you can possibly do by default. If we thought it was a good idea to turn this on by default we would have done so in the kernel. Does anyone have some cycles to spare to try and urge whoever is repsponsible for this in Debian to leave the kernel's default setting alone? Thanks. From davem@redhat.com Tue Jun 17 13:46:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:46:46 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKkc2x023181 for ; Tue, 17 Jun 2003 13:46:38 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04831; Tue, 17 Jun 2003 13:42:07 -0700 Date: Tue, 17 Jun 2003 13:42:06 -0700 (PDT) Message-Id: <20030617.134206.133917056.davem@redhat.com> To: girouard@us.ibm.com Cc: garzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, stekloff@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3359 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Tue, 17 Jun 2003 15:40:48 -0500 From: David S. Miller" Date: 06/17/2003 03:27 PM On RX, clever RX buffer management is what we need. What RX buffer management are you proposing? I'm having a hard time understanding how you'll get rid of the copy without support from the card. Sigh... someone write store email down somewhere for the next time someone asks about this. The "one true way (tm)" works like this: 1) Chip has a "flow cache", LRU based, managed like routing caches in many production router implementations. Difference is that it merely does flow watching. Flow entries are keyed on saddr/daddr/sport/dport. Flow misses kill the oldest entry, and replace it with the new one. Entries are only created in response to full sized data packets. 2) The receive buffering is segmented into small (256 byte) and PAGE sized buffers. IP/TCP/whatever headers (determined using a simply programmable header parser logic, so you can do things like RPC etc. headers for NFS) are put into the "small" buffers, data portions for matching flows get accumulated into the PAGE sized buffers. It is implied that the card's flow cache keeps track of the pointers into page it is currently trying to fill for that flow. So the first time you see a flow, you add a entry and grab a page buffer and stick the data part into the page buffer and the TCP/IP/etc. headers into a "small" buffer. You defer a configurable amount of time waiting for more TCP data packets (a packet train) to accumulate more into the PAGE buffer for that flow. Such receive buffers are presented to the stack as a linked list of packets, with some indicator that together their data parts are filling a page. Things like "sys_receivefile()" and NFS flip these things into the filesystem page cache. I'm surprised this isn't evident to more people... From krkumar@us.ibm.com Tue Jun 17 13:46:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:47:00 -0700 (PDT) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKkt2x023223 for ; Tue, 17 Jun 2003 13:46:56 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e2.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HKk49X060772; Tue, 17 Jun 2003 16:46:04 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HKk13e056718; Tue, 17 Jun 2003 16:46:02 -0400 Message-ID: <3EEF7E09.8080608@us.ibm.com> Date: Tue, 17 Jun 2003 13:46:01 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 References: <3ED80230.2030508@us.ibm.com> <20030531.110249.12960077.yoshfuji@linux-ipv6.org> <20030530.233257.21920899.davem@redhat.com> In-Reply-To: <20030530.233257.21920899.davem@redhat.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-archive-position: 3360 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Dave and Yoshfuji, I have a question about the following, which seems to be the approach both of you prefer. I thought we need a new routing message type called RTM_GETPLIST which will return full prefix list. If you use RTA_RA6INFO, then should that trigger only when the prefix list has changed (add or delete) ? Should I have both interfaces, one for returning entire list (RTM) and one for changes in prefix list (RTA) ? Please let me know if my understanding is correct. Thanks, - KK David S. Miller wrote: > From: YOSHIFUJI Hideaki / $B5HF#1QL@(B > Date: Sat, 31 May 2003 11:02:49 +0900 (JST) > > Again, what I proposed was to store prefix information on fib with > some flags to represent advertised by routers and give user-space > the RA information using new rtattr (RTA_RA6INFO or something like that). > > This sounds very reasonable. > From sim@netnation.com Tue Jun 17 13:51:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:51:25 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKp22x023854 for ; Tue, 17 Jun 2003 13:51:02 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19SNPx-00007n-J5; Tue, 17 Jun 2003 13:51:01 -0700 Date: Tue, 17 Jun 2003 13:51:01 -0700 From: Simon Kirby To: "David S. Miller" Cc: gandalf@wlug.westbo.se, Robert.Olsson@data.slu.se, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030617205101.GD25773@netnation.com> References: <20030617200721.GA25773@netnation.com> <1055881034.3199.43.camel@tux.rsn.bth.se> <20030617203703.GB25773@netnation.com> <20030617.133635.84366118.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030617.133635.84366118.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 3362 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote: > I have no idea why they do this, it's the stupidest thing > you can possibly do by default. > > If we thought it was a good idea to turn this on by default > we would have done so in the kernel. > > Does anyone have some cycles to spare to try and urge whoever is > repsponsible for this in Debian to leave the kernel's default setting > alone? Sure, I can do this. But why is this stupid? It uses more CPU, but stops IP spoofing by default. Specific firewall rules would have to be created otherwise. And the overhead only really shows when the routing table is large, right? Simon- From Robert.Olsson@data.slu.se Tue Jun 17 13:50:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:51:13 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKom2x023848 for ; Tue, 17 Jun 2003 13:50:49 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id WAA00619; Tue, 17 Jun 2003 22:49:50 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16111.32494.113831.745019@robur.slu.se> Date: Tue, 17 Jun 2003 22:49:50 +0200 To: Simon Kirby Cc: Robert Olsson , "David S. Miller" , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests In-Reply-To: <20030617200721.GA25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3361 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > I changed Juno to send from a single IP, but it only spat out about > 330000 pps, which the dual Tigon3 Opteron box forwarded completely. > In order to do a single flow forwarding test, I need to be able to create > more input traffic somehow. Seeing as you wrote pktgen.c, maybe you > could help in this department. :) OK. See below. > > Also think Simon used only /32 routes... I took "real" Internet-routing > > and made a script so it can be used for experiments. I can make it available. > > Yes, I found that area less interesting since Dave M. fixed the hash > buckets. But yes, the prefix scanning will slow it down some. Well I don't think it's so easy as there are 33 zones with prefixes if you have all the routes in one zone I'm not sure what happens thats why I suggested the comparison. > Erm. I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's > join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes. > What's it supposed to be against? Sorry. Our production system and lab uses very patched 2.5.66 I'll make a patch for 2.5.71.... > If I start two threads on the sender (Xeon w/HT), I'm able to push 420000 > pps, which only partially starts to use NAPI on the Opteron box. Going > to try 2.4 again for a comparison (note: 2.5 seems to have an opposite > PCI scan order from 2.4 for the dual Tigon3s). Not bad. Replace net/core/pktgen.c in 2.5.X with the version from ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/ and edit pktgen.sh to suit your needs. And see what you got. I'm interested since you are using both different processors as NIC's. Also packet generation itself is interesting as it tests driver/HW xmit-path. Cheers. --ro From davem@redhat.com Tue Jun 17 13:54:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:54:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKs42x024472 for ; Tue, 17 Jun 2003 13:54:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04937; Tue, 17 Jun 2003 13:49:38 -0700 Date: Tue, 17 Jun 2003 13:49:38 -0700 (PDT) Message-Id: <20030617.134938.62369406.davem@redhat.com> To: toml@us.ibm.com Cc: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: IPSec: Policy dst bundles exhausting storage From: "David S. Miller" In-Reply-To: <1055882412.16482.2.camel@tomlt2.tomloffice.austin.ibm.com> References: <1055882412.16482.2.camel@tomlt2.tomloffice.austin.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3363 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Tom Lendacky Date: 17 Jun 2003 15:40:12 -0500 I think this is a bit overkill, can you redo this patch without this? No problem... here's the new patch. Applied, thanks Tom. From davem@redhat.com Tue Jun 17 13:54:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:54:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKs22x024467 for ; Tue, 17 Jun 2003 13:54:03 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04925; Tue, 17 Jun 2003 13:49:24 -0700 Date: Tue, 17 Jun 2003 13:49:24 -0700 (PDT) Message-Id: <20030617.134924.119862290.davem@redhat.com> To: sim@netnation.com Cc: gandalf@wlug.westbo.se, Robert.Olsson@data.slu.se, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030617205101.GD25773@netnation.com> References: <20030617203703.GB25773@netnation.com> <20030617.133635.84366118.davem@redhat.com> <20030617205101.GD25773@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3364 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Tue, 17 Jun 2003 13:51:01 -0700 Specific firewall rules would have to be created otherwise. And the overhead only really shows when the routing table is large, right? rp filter breaks things... just like firewalls break things... so just like a user enables firewall rules by himself, he may enable rp filter by himself... From girouard@us.ibm.com Tue Jun 17 13:58:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:58:17 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKwB2x025099 for ; Tue, 17 Jun 2003 13:58:11 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HKw5pS046826; Tue, 17 Jun 2003 16:58:05 -0400 Received: from d01ml063.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HKw23e035376; Tue, 17 Jun 2003 16:58:03 -0400 Subject: Re: patch for common networking error messages To: "David S. Miller" Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Janice Girouard Date: Tue, 17 Jun 2003 15:57:59 -0500 X-MIMETrack: Serialize by Router on D01ML063/01/M/IBM(Release 6.0.1 w/SPRs JHEG5JQ5CD, THTO5KLVS6, JHEG5HMLFK, JCHN5K5PG9|March 27, 2003) at 06/17/2003 16:58:03 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 3365 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: girouard@us.ibm.com Precedence: bulk X-list: netdev Did I understand: > 1) Chip has a "flow cache", LRU based, managed like routing caches You need the chip to support your technique. Are the vendors picking up on this? I still don't see how this gets rid of the copy_to_user space once you've gathered the buffers. How do you feed the user buffer addresses to the card? You must have something equivalent to the queue pair management supported in RDMA. What technique are you using? Is it proprietary? "David S. Miller" cc: garzik@pobox.com, shemminger@osdl.org, Valdis.Kletnieks@vt.edu, Daniel Stekloff/Beaverton/IBM@IBMUS, 06/17/2003 03:42 Larry Kessler/Beaverton/IBM@IBMUS, PM linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ltcfwd.linux.ibm.com Subject: Re: patch for common networking error messages From: Janice Girouard Date: Tue, 17 Jun 2003 15:40:48 -0500 From: David S. Miller" Date: 06/17/2003 03:27 PM On RX, clever RX buffer management is what we need. What RX buffer management are you proposing? I'm having a hard time understanding how you'll get rid of the copy without support from the card. Sigh... someone write store email down somewhere for the next time someone asks about this. The "one true way (tm)" works like this: 1) Chip has a "flow cache", LRU based, managed like routing caches in many production router implementations. Difference is that it merely does flow watching. Flow entries are keyed on saddr/daddr/sport/dport. Flow misses kill the oldest entry, and replace it with the new one. Entries are only created in response to full sized data packets. 2) The receive buffering is segmented into small (256 byte) and PAGE sized buffers. IP/TCP/whatever headers (determined using a simply programmable header parser logic, so you can do things like RPC etc. headers for NFS) are put into the "small" buffers, data portions for matching flows get accumulated into the PAGE sized buffers. It is implied that the card's flow cache keeps track of the pointers into page it is currently trying to fill for that flow. So the first time you see a flow, you add a entry and grab a page buffer and stick the data part into the page buffer and the TCP/IP/etc. headers into a "small" buffer. You defer a configurable amount of time waiting for more TCP data packets (a packet train) to accumulate more into the PAGE buffer for that flow. Such receive buffers are presented to the stack as a linked list of packets, with some indicator that together their data parts are filling a page. Things like "sys_receivefile()" and NFS flip these things into the filesystem page cache. I'm surprised this isn't evident to more people... From fw@deneb.enyo.de Tue Jun 17 13:58:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:58:46 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKwe2x025155 for ; Tue, 17 Jun 2003 13:58:41 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 4.20) id 19SNXD-00029n-TH; Tue, 17 Jun 2003 22:58:31 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.20) id 19SNXD-0002Fo-O0; Tue, 17 Jun 2003 22:58:31 +0200 To: Simon Kirby Cc: ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> <20030609163010.GA11509@netnation.com> From: Florian Weimer Mail-Followup-To: Simon Kirby , ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Date: Tue, 17 Jun 2003 22:58:31 +0200 In-Reply-To: <20030609163010.GA11509@netnation.com> (Simon Kirby's message of "Mon, 9 Jun 2003 09:30:10 -0700") Message-ID: <87ptlc4e14.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 3366 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Simon Kirby writes: > What Zebra quirks? Zebra doesn't send BGP keepalives while updating the kernel's view of the routing table. If a configuration change results massive routing table updates (e.g. changed LOCAL_PREF), it's quite likely that you BGP peering sessions terminate because of a timeout. Other "quirks" are just things that don't work as they should (mostly Cisco incompatibilities, sometimes genuine bugs in route-map support etc.). It's not dramatic in most cases, but like any complex technology, it takes some time to get used to. (Disclaimer: I'm not a Zebra user. 8-) > And I wouldn't exactly call it difficult to "squeeze" performance out of > a PC when the 7206 VXRs have a 200 MHz processor. You missed the NPE-G1 part. cisco 7204VXR (NPE-G1) processor (revision A) with 245760K/16384K bytes of memory. SB-1 CPU at 700Mhz, Implementation 1, Rev 0.2, 512KB L2 Cache Probably still slow by x86 standards, and with a rather small cache, but it's sufficient for a few kpps, I guess... From yoshfuji@linux-ipv6.org Tue Jun 17 13:59:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 13:59:08 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HKx22x025229 for ; Tue, 17 Jun 2003 13:59:03 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5HKxxBo009709; Wed, 18 Jun 2003 05:59:59 +0900 Date: Wed, 18 Jun 2003 05:59:59 +0900 (JST) Message-Id: <20030618.055959.55006678.yoshfuji@linux-ipv6.org> To: krkumar@us.ibm.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EEF7E09.8080608@us.ibm.com> References: <20030531.110249.12960077.yoshfuji@linux-ipv6.org> <20030530.233257.21920899.davem@redhat.com> <3EEF7E09.8080608@us.ibm.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3367 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <3EEF7E09.8080608@us.ibm.com> (at Tue, 17 Jun 2003 13:46:01 -0700), Krishna Kumar says: > I have a question about the following, which seems to be the > approach both of you prefer. I thought we need a new routing > message type called RTM_GETPLIST which will return full prefix > list. If you use RTA_RA6INFO, then should that trigger only > when the prefix list has changed (add or delete) ? Should I > have both interfaces, one for returning entire list (RTM) and > one for changes in prefix list (RTA) ? > > Please let me know if my understanding is correct. Well, I think the problem is to set RTF_ADDRCONF flag to all prefix routes. I beleive this should be for autoconf (RA) routes only as comments says; dad_starts and multicast add routes with such flag, but this should be wrong. After we fix this, we can get prefix information filtering routes by RTF_ADDRCONF flag; of course, we can get the routes using RTM_GETROUTE. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From sim@netnation.com Tue Jun 17 14:07:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 14:07:26 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HL712x026135 for ; Tue, 17 Jun 2003 14:07:02 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19SNfR-0000Oq-0R; Tue, 17 Jun 2003 14:07:01 -0700 Date: Tue, 17 Jun 2003 14:07:01 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030617210700.GE25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030617200721.GA25773@netnation.com> User-Agent: Mutt/1.5.4i X-archive-position: 3368 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 01:07:21PM -0700, Simon Kirby wrote: > Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going > on here. > > 60.0042 seconds passed, avg forwarding rate: 50759.683 pps Ummmm, yeah, 2.5.71 is quite a bit slower than 2.4.21. I applied Alexey's 2.5.71 rtcache fixes to 2.4.21 (changing "fl" to "key" in the scoring function), and now I see: 60.0065 seconds passed, avg forwarding rate: 135379.152 pps If reboot and don't fill the routing table: 60.0104 seconds passed, avg forwarding rate: 259027.200 pps This is with standard juno (pseudo-random sources). This is with CONFIG_IP_MULTIPLE_TABLES still on, too. I'll turn that off and do some profiles. The only weird thing I'm seeing while doing this is that the route cache table continues to grow slowly, and the pps slowly falls off over a few minutes. "ip route flush cache" restores performance again. I'll verify this is not happening in 2.5. Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From davem@redhat.com Tue Jun 17 14:19:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 14:19:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HLJC2x026519 for ; Tue, 17 Jun 2003 14:19:12 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA05071; Tue, 17 Jun 2003 14:14:43 -0700 Date: Tue, 17 Jun 2003 14:14:43 -0700 (PDT) Message-Id: <20030617.141443.24610277.davem@redhat.com> To: girouard@us.ibm.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3369 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Janice Girouard Date: Tue, 17 Jun 2003 15:57:59 -0500 Did I understand: > 1) Chip has a "flow cache", LRU based, managed like routing caches You need the chip to support your technique. No shit Sherlock. But it should be noted that the idea can be fully verified in software by adding the scheme into the RX processing of some existing ethernet driver. Are the vendors picking up on this? If they're going to ignore my ideas, that's not my problem. I still don't see how this gets rid of the copy_to_user space once you've gathered the buffers. How do you feed the user buffer addresses to the card? You flip the pages into userspace, ie. you replace the page the user currently has with the one the networking buffer is using. What technique are you using? Is it proprietary? ROFL! From krkumar@us.ibm.com Tue Jun 17 14:32:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 14:32:57 -0700 (PDT) Received: from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.106]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HLWk2x027338 for ; Tue, 17 Jun 2003 14:32:52 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e6.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5HLVwxr177902; Tue, 17 Jun 2003 17:31:58 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5HLVsFK024378; Tue, 17 Jun 2003 17:31:56 -0400 Message-ID: <3EEF88C8.30005@us.ibm.com> Date: Tue, 17 Jun 2003 14:31:52 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: yoshfuji@linux-ipv6.org CC: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 References: <20030531.110249.12960077.yoshfuji@linux-ipv6.org> <20030530.233257.21920899.davem@redhat.com> <3EEF7E09.8080608@us.ibm.com> <20030618.055959.55006678.yoshfuji@linux-ipv6.org> In-Reply-To: <20030618.055959.55006678.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3370 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Yoshifuji, > Well, I think the problem is to set RTF_ADDRCONF flag to all prefix routes. I don't think we should change that, RTF_ADDRCONF should be set for all RA routes not just the prefix route. But I agree with your other comments, that RTF_ADDRCONF must not be used when configuring routes from user space. The filtering should check for both the flag as well as whether it is a prefix route entry. I guess I will work on sending messages for both prefix list changes and to get entire prefix list. thanks, - KK YOSHIFUJI Hideaki wrote: > In article <3EEF7E09.8080608@us.ibm.com> (at Tue, 17 Jun 2003 13:46:01 -0700), Krishna Kumar says: > > >>I have a question about the following, which seems to be the >>approach both of you prefer. I thought we need a new routing >>message type called RTM_GETPLIST which will return full prefix >>list. If you use RTA_RA6INFO, then should that trigger only >>when the prefix list has changed (add or delete) ? Should I >>have both interfaces, one for returning entire list (RTM) and >>one for changes in prefix list (RTA) ? >> >>Please let me know if my understanding is correct. > > > Well, I think the problem is to set RTF_ADDRCONF flag to all prefix routes. > I beleive this should be for autoconf (RA) routes only as comments says; > dad_starts and multicast add routes with such flag, but this should be wrong. > After we fix this, we can get prefix information filtering routes by > RTF_ADDRCONF flag; of course, we can get the routes using RTM_GETROUTE. > From andi@averellmail.firstfloor.org Tue Jun 17 15:04:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:04:47 -0700 (PDT) Received: from zero.aec.at (Halcyon.Jones@zero.aec.at [193.170.194.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HM4a2x000468 for ; Tue, 17 Jun 2003 15:04:38 -0700 Received: from fred.muc.de (Joel.Buxter@localhost.localdomain [127.0.0.1]) by zero.aec.at (8.11.6/8.11.2) with ESMTP id h5HM4Hm24990; Wed, 18 Jun 2003 00:04:18 +0200 Received: by fred.muc.de (Postfix on SuSE Linux 7.3 (i386), from userid 500) id 1DEB15BBAE; Wed, 18 Jun 2003 00:04:21 +0200 (CEST) Date: Wed, 18 Jun 2003 00:04:20 +0200 From: Andi Kleen To: netdev@oss.sgi.com Cc: mostrows@speakeasy.net, paulus@au.ibm.com Subject: [PATCH] Convert pppoe to new style protocol Message-ID: <20030617220420.GA1169@averell> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 3371 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Convert pppoe to a new style protocol, otherwise it is unusable on SMP compiled kernels because of the printk in deliver*old_ones. It works fine here, but only tested on a UP box. I checked with paulus and he confirmed that ppp_input is safe to be called with the new softirq semantics. The PPP path also appears to work with shared skbs (never modify skb data, not 100% audited however) but not with non linear skbs. The pppoe code itself is safe because it never uses timers or does anything complicated. If someone knows a place in ppp_input that modifies the data area then please contain now so that a skb_copy can be added. -Andi diff -u linux-2.5.72-work/drivers/net/pppoe.c-o linux-2.5.72-work/drivers/net/pppoe.c --- linux-2.5.72-work/drivers/net/pppoe.c-o 2003-06-14 23:42:51.000000000 +0200 +++ linux-2.5.72-work/drivers/net/pppoe.c 2003-06-17 23:15:11.000000000 +0200 @@ -348,6 +348,9 @@ struct pppox_opt *po = pppox_sk(sk); struct pppox_opt *relay_po = NULL; + if (skb_is_nonlinear(skb) && skb_linearize(skb, GFP_ATOMIC)) + goto abort_kfree; + if (sk->sk_state & PPPOX_BOUND) { skb_pull(skb, sizeof(struct pppoe_hdr)); ppp_input(&po->chan, skb); @@ -463,11 +466,13 @@ struct packet_type pppoes_ptype = { .type = __constant_htons(ETH_P_PPP_SES), .func = pppoe_rcv, + .data = (void*) 1, }; struct packet_type pppoed_ptype = { .type = __constant_htons(ETH_P_PPP_DISC), .func = pppoe_disc_rcv, + .data = (void*) 1, }; /*********************************************************************** From ralph@istop.com Tue Jun 17 15:11:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:11:07 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMB22x002140 for ; Tue, 17 Jun 2003 15:11:03 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id BB9B536A8D; Tue, 17 Jun 2003 18:11:01 -0400 (EDT) Date: Tue, 17 Jun 2003 18:11:00 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance tests In-Reply-To: <20030617200721.GA25773@netnation.com> Message-ID: References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3372 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 17 Jun 2003, Simon Kirby wrote: > vma samples % symbol name > c02bf730 16019 33.2014 fn_hash_lookup > c0292b70 3882 8.04593 ip_route_input_slow > c0221710 2335 4.83958 tg3_rx > c02bd550 2004 4.15354 fib_validate_source > c0290d70 1955 4.05198 rt_hash_code > c0294e50 1670 3.46128 ip_rcv > c02933a0 1404 2.90997 ip_route_input If turning off rp_filter doubles your performance, then the profile numbers above are misleading. My (obviously incorrect) assumption would be that fib_validate_source is responsible for rp_filter, and turning it off would lead to only a 5% performance increase. Considering that, what kind of performance difference should removing the route hashing make (i.e. going with r-trees or something like that). In most of the profiles fn_hash_lookup has been at the top of the list. -Ralph From davem@redhat.com Tue Jun 17 15:12:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:12:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMCK2x002437 for ; Tue, 17 Jun 2003 15:12:20 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA05350; Tue, 17 Jun 2003 15:07:51 -0700 Date: Tue, 17 Jun 2003 15:07:51 -0700 (PDT) Message-Id: <20030617.150751.52901849.davem@redhat.com> To: ak@muc.de Cc: netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol From: "David S. Miller" In-Reply-To: <20030617220420.GA1169@averell> References: <20030617220420.GA1169@averell> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3373 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Wed, 18 Jun 2003 00:04:20 +0200 Convert pppoe to a new style protocol, otherwise it is unusable on SMP compiled kernels because of the printk in deliver*old_ones. It works fine here, but only tested on a UP box. Please don't add new skb_linearize() users, I'm trying to make that only local to net/core/dev.c Otherwise I'm fine with your patch. From davem@redhat.com Tue Jun 17 15:13:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:13:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMDJ2x002773 for ; Tue, 17 Jun 2003 15:13:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA05368; Tue, 17 Jun 2003 15:08:49 -0700 Date: Tue, 17 Jun 2003 15:08:49 -0700 (PDT) Message-Id: <20030617.150849.132446622.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: sim@netnation.com, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: References: <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3374 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 17 Jun 2003 18:11:00 -0400 (EDT) My (obviously incorrect) assumption would be that fib_validate_source is responsible for rp_filter, and turning it off would lead to only a 5% performance increase. fib_validate_source() with rp_filter enabled causes an extra fib_lookup() to occur for each packet. From jamagallon@able.es Tue Jun 17 15:28:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:28:07 -0700 (PDT) Received: from aneto.able.es (aneto.able.es [212.97.163.22]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMRx2x003231 for ; Tue, 17 Jun 2003 15:28:01 -0700 Received: from werewolf.able.es ([212.97.183.44]) by aneto.able.es (Netscape Messaging Server 4.15 aneto Mar 14 2002 21:29:48) with ESMTP id HGNFQ902.6KH; Wed, 18 Jun 2003 00:25:21 +0100 Date: Wed, 18 Jun 2003 00:27:50 +0200 From: "J.A. Magallon" To: Jeff Garzik Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCHES] 2.4.x net driver updates Message-ID: <20030617222750.GE13990@werewolf.able.es> References: <20030612194926.GA7653@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20030612194926.GA7653@gtf.org>; from jgarzik@pobox.com on Thu, Jun 12, 2003 at 21:49:26 +0200 X-Mailer: Balsa 2.0.11 Lines: 20 X-archive-position: 3375 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jamagallon@able.es Precedence: bulk X-list: netdev On 06.12, Jeff Garzik wrote: > > BK users may issue a > > bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 > > Others may download the patch from > > ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-rc8-netdrvr2.patch.bz2 > Any info about the RX_POLLING (NAPI) option for e1000 ? What is that for ? -- J.A. Magallon \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.2 (Cooker) for i586 Linux 2.4.21-jam1 (gcc 3.3 (Mandrake Linux 9.2 3.3-1mdk)) From davem@redhat.com Tue Jun 17 15:45:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:45:14 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMjA2x003750 for ; Tue, 17 Jun 2003 15:45:10 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA05435; Tue, 17 Jun 2003 15:40:38 -0700 Date: Tue, 17 Jun 2003 15:40:37 -0700 (PDT) Message-Id: <20030617.154037.78070671.davem@redhat.com> To: jamagallon@able.es Cc: jgarzik@pobox.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCHES] 2.4.x net driver updates From: "David S. Miller" In-Reply-To: <20030617222750.GE13990@werewolf.able.es> References: <20030612194926.GA7653@gtf.org> <20030617222750.GE13990@werewolf.able.es> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3376 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "J.A. Magallon" Date: Wed, 18 Jun 2003 00:27:50 +0200 Any info about the RX_POLLING (NAPI) option for e1000 ? What is that for ? Software based interrupt mitigation, see: Documentation/networking/NAPI_HOWTO.txt and more specifically: http://www.cyberus.ca/~hadi/usenix-paper.tgz From sim@netnation.com Tue Jun 17 15:50:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 15:51:10 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HMoe2x004095 for ; Tue, 17 Jun 2003 15:50:40 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19SPHg-0002Ed-7C; Tue, 17 Jun 2003 15:50:36 -0700 Date: Tue, 17 Jun 2003 15:50:36 -0700 From: Simon Kirby To: "David S. Miller" Cc: Robert Olsson , ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests Message-ID: <20030617225036.GG25773@netnation.com> References: <20030616.160856.35828947.davem@redhat.com> <20030616232750.GD18484@netnation.com> <20030616234937.GE18484@netnation.com> <20030617.085921.28790392.davem@redhat.com> <16111.18107.699689.704597@robur.slu.se> <20030617200721.GA25773@netnation.com> <20030617210700.GE25773@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030617210700.GE25773@netnation.com> User-Agent: Mutt/1.5.4i X-archive-position: 3377 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 02:07:01PM -0700, Simon Kirby wrote: > 60.0104 seconds passed, avg forwarding rate: 259027.200 pps > > This is with standard juno (pseudo-random sources). > > This is with CONFIG_IP_MULTIPLE_TABLES still on, too. Here is with CONFIG_IP_MULTIPLE_TABLES=n and CONFIG_NETFILTER=n (rp_filter off and CONFIG_SMP=n in all tests): 60.0050 seconds passed, avg forwarding rate: 276893.102 pps 60.0046 seconds passed, avg forwarding rate: 257257.533 pps 60.0101 seconds passed, avg forwarding rate: 251852.843 pps 60.0106 seconds passed, avg forwarding rate: 248110.756 pps 60.0045 seconds passed, avg forwarding rate: 246280.066 pps "rtstat -i 1" shows the pps rate decreasing because of the growing rtcache: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 16688 18 294882 0 0 0 0 0 0 0 0 294882 294880 2 0 16834 24 302376 0 0 0 0 0 0 0 0 302376 302374 2 0 16970 12 294288 0 0 0 0 0 0 0 0 294288 294286 2 0 17037 21 294278 0 0 0 0 0 0 0 0 294278 294276 1 0 17133 20 293080 0 0 0 0 0 0 0 0 293080 293078 1 0 17195 22 293978 0 0 0 0 0 0 0 0 293978 293976 1 0 17293 16 292184 0 0 0 0 0 0 0 0 292184 292182 1 0 17370 19 293681 0 0 0 0 0 0 0 0 293681 293679 1 0 17450 21 293079 0 0 0 0 0 0 0 0 293079 293077 1 0 17542 12 293388 0 0 0 0 0 0 0 0 293388 293386 1 0 17604 16 293684 0 0 0 0 0 0 0 0 293684 293682 1 0 17676 27 294573 0 0 0 0 0 0 0 0 294573 294571 0 0 17762 18 291582 0 0 0 0 0 0 0 0 291582 291580 1 0 ... 21615 17 257683 0 0 0 0 0 0 0 0 257683 257681 1 0 21641 23 257077 0 0 0 0 0 0 0 0 257077 257075 0 0 21672 23 257077 0 0 0 0 0 0 0 0 257077 257075 1 0 Profile: vma samples % symbol name c025ee10 9228 15.1465 fn_hash_lookup c02379a0 4025 6.60648 ip_route_input_slow c0236650 3321 5.45096 rt_intern_hash c0235d60 2775 4.55478 rt_hash_code c012d750 2622 4.30365 kmem_cache_alloc c012d950 2338 3.83751 kmem_cache_free c0239f50 2323 3.81288 ip_rcv c0238060 2277 3.73738 ip_route_input c0233ac0 2190 3.59458 eth_header c0231b50 2119 3.47805 neigh_resolve_output c025ca50 2074 3.40419 fib_validate_source c023aed0 1987 3.26139 ip_forward c0230a70 1926 3.16126 neigh_lookup c012d830 1845 3.02831 kmalloc c0234270 1523 2.49979 pfifo_fast_dequeue c025dde0 1407 2.3094 fib_semantic_match c02376c0 1354 2.2224 rt_set_nexthop c022a730 1322 2.16988 __kfree_skb c022a480 1248 2.04842 alloc_skb c012da00 1229 2.01723 kfree c0259830 1171 1.92204 inet_select_addr c0233ca0 1169 1.91875 eth_type_trans c02304a0 994 1.63151 dst_destroy c0230380 976 1.60197 dst_alloc c010c910 806 1.32294 do_gettimeofday c02319d0 793 1.3016 neigh_hh_init c0236380 739 1.21297 rt_garbage_collect c0233ef0 732 1.20148 qdisc_restart c0234200 731 1.19984 pfifo_fast_enqueue c022e590 695 1.14075 netif_receive_skb c022dff0 623 1.02257 dev_queue_xmit c023d6e0 575 0.943783 ip_finish_output c02562a0 293 0.480919 arp_hash Full route table: 60.0054 seconds passed, avg forwarding rate: 141888.209 pps vma samples % symbol name c025ee10 21133 42.5588 fn_hash_lookup c02379a0 2219 4.46874 ip_route_input_slow c0236650 1600 3.22217 rt_intern_hash c0235d60 1471 2.96238 rt_hash_code c012d750 1282 2.58176 kmem_cache_alloc c012d950 1279 2.57572 kmem_cache_free c0259830 1253 2.52336 inet_select_addr c0239f50 1231 2.47906 ip_rcv c0233ac0 1214 2.44482 eth_header c025dde0 1165 2.34614 fib_semantic_match c0231b50 1133 2.2817 neigh_resolve_output c0238060 1126 2.2676 ip_route_input c025ca50 1120 2.25552 fib_validate_source c0230a70 1041 2.09642 neigh_lookup c023aed0 1032 2.0783 ip_forward c012d830 1002 2.01788 kmalloc c02376c0 762 1.53456 rt_set_nexthop c0234270 733 1.47616 pfifo_fast_dequeue c022a730 709 1.42782 __kfree_skb c022a480 633 1.27477 alloc_skb c012da00 629 1.26671 kfree c0233ca0 623 1.25463 eth_type_trans c02304a0 549 1.10561 dst_destroy c0230380 519 1.04519 dst_alloc c02319d0 447 0.900193 neigh_hh_init c0236380 426 0.857902 rt_garbage_collect c010c910 425 0.855889 do_gettimeofday c022e590 395 0.795473 netif_receive_skb c0234200 392 0.789431 pfifo_fast_enqueue c0233ef0 381 0.767279 qdisc_restart c022dff0 345 0.69478 dev_queue_xmit c025de90 298 0.600129 __fib_res_prefsrc c023d6e0 294 0.592073 ip_finish_output c02567d0 145 0.292009 arp_bind_neighbour Here's 2.5.72 (which seems to have all of the patches already in), empty routing table: 60.0085 seconds passed, avg forwarding rate: 166543.268 pps 60.0080 seconds passed, avg forwarding rate: 167055.912 pps 60.0051 seconds passed, avg forwarding rate: 166843.560 pps vma samples % symbol name c02c0020 5193 10.2685 fn_hash_lookup c02930d0 3475 6.87139 ip_route_input_slow c02222c0 2349 4.64486 tg3_start_xmit c0291c10 2217 4.38385 rt_intern_hash c02bde40 1910 3.77679 fib_validate_source c0293900 1864 3.68583 ip_route_input c02953d0 1646 3.25477 ip_rcv c0288b40 1609 3.1816 netif_receive_skb c0135210 1462 2.89093 kmem_cache_free c0134fd0 1457 2.88104 free_block c02216a0 1390 2.74856 tg3_rx c0135150 1295 2.56071 kmem_cache_alloc c02912d0 1275 2.52116 rt_hash_code c028f040 1250 2.47172 eth_header c0134e00 1250 2.47172 cache_alloc_refill c028ca40 1223 2.41833 neigh_resolve_output c028f740 1072 2.11975 pfifo_fast_dequeue c0135190 1019 2.01495 __kmalloc c028ba50 991 1.95958 neigh_lookup c02bf1d0 843 1.66693 fib_semantic_match c01adc10 816 1.61354 memcpy c0284ba0 768 1.51863 alloc_skb c0135250 724 1.43162 kfree c028f1b0 702 1.38812 eth_type_trans c02b8930 674 1.33275 inet_select_addr c0288600 668 1.32089 dev_queue_xmit c0297850 664 1.31298 ip_finish_output c0221e20 654 1.29321 tg3_set_txd c028b340 635 1.25564 dst_alloc c01289e0 632 1.2497 call_rcu c0296370 606 1.19829 ip_forward c0297af0 567 1.12117 ip_output c028c8c0 547 1.08163 neigh_hh_init c028b470 547 1.08163 dst_destroy c011f080 489 0.966938 local_bh_enable c0221550 477 0.94321 tg3_recycle_rx Erp, needed a new rtstat. And a wider console, apparently: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search 22233 18 329638 0 0 0 0 0 0 0 0 329638 329634 2 0 665742 0 20523 22 329074 0 0 0 0 0 0 0 0 329074 329070 2 0 665184 0 23510 26 331502 0 0 0 0 0 0 0 0 331502 331498 2 0 671610 0 22552 24 330464 0 0 0 0 0 0 0 0 330464 330460 4 0 669214 0 20359 8 329512 0 0 0 0 0 0 0 0 329512 329508 2 0 664428 0 19965 22 330090 0 0 0 0 0 0 0 0 330090 330086 2 0 663296 0 20081 20 332660 0 0 0 0 0 0 0 0 332660 332656 2 0 671912 0 21113 22 330458 0 0 0 0 0 0 0 0 330458 330454 2 0 666340 0 19864 14 329778 0 0 0 0 0 0 0 0 329778 329774 2 0 667324 0 20195 18 329702 0 0 0 0 0 0 0 0 329702 329698 2 0 670646 0 Route cache size does not increase on 2.5, so the problems in 2.4 are probably the result of me hacking in the 2.5 patch. 2.5.72, full routing table: 60.0057 seconds passed, avg forwarding rate: 101800.795 pps 60.0045 seconds passed, avg forwarding rate: 101612.797 pps 60.0046 seconds passed, avg forwarding rate: 102004.873 pps 60.0044 seconds passed, avg forwarding rate: 102042.629 pps 60.0055 seconds passed, avg forwarding rate: 102135.224 pps 60.0057 seconds passed, avg forwarding rate: 102158.546 pps 60.0044 seconds passed, avg forwarding rate: 102200.430 pps vma samples % symbol name c02c0020 14206 33.0911 fn_hash_lookup c02930d0 2103 4.89867 ip_route_input_slow c02222c0 1436 3.34498 tg3_start_xmit c0291c10 1328 3.09341 rt_intern_hash c02bde40 1315 3.06313 fib_validate_source c0293900 1122 2.61356 ip_route_input c02953d0 1028 2.3946 ip_rcv c02bf1d0 1013 2.35966 fib_semantic_match c0288b40 957 2.22921 netif_receive_skb c0134fd0 840 1.95667 free_block c0135210 823 1.91707 kmem_cache_free c02b8930 811 1.88912 inet_select_addr c02216a0 804 1.87282 tg3_rx c028ca40 801 1.86583 neigh_resolve_output c02912d0 786 1.83089 rt_hash_code c028f040 709 1.65153 eth_header c0135190 700 1.63056 __kmalloc c0134e00 692 1.61193 cache_alloc_refill c028f740 644 1.50012 pfifo_fast_dequeue c028ba50 597 1.39064 neigh_lookup c0135150 591 1.37666 kmem_cache_alloc c0284ba0 539 1.25553 alloc_skb c01adc10 491 1.14372 memcpy c0135250 449 1.04589 kfree c028b340 437 1.01794 dst_alloc c028f1b0 433 1.00862 eth_type_trans c0297af0 422 0.982996 ip_output c01289e0 402 0.936408 call_rcu c0297850 400 0.931749 ip_finish_output c0296370 386 0.899138 ip_forward c0221e20 384 0.894479 tg3_set_txd c0288600 365 0.850221 dev_queue_xmit c011f080 364 0.847892 local_bh_enable c02919c0 356 0.829257 rt_garbage_collect c028c8c0 317 0.738411 neigh_hh_init c028b470 313 0.729094 dst_destroy c02212e0 299 0.696483 tg3_tx size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search 18755 12 202740 0 0 0 0 0 0 0 0 202740 202736 2 0 405030 0 19945 20 203884 0 0 0 0 0 0 0 0 203884 203880 2 0 406726 0 18449 8 204152 0 0 0 0 0 0 0 0 204152 204148 0 0 409590 0 19637 10 205302 0 0 0 0 0 0 0 0 205302 205298 2 0 413004 0 19213 10 204022 0 0 0 0 0 0 0 0 204022 204018 2 0 411092 0 20182 8 204280 0 0 0 0 0 0 0 0 204280 204276 2 0 412044 0 19311 14 203378 0 0 0 0 0 0 0 0 203378 203374 2 0 411052 0 18790 16 202480 0 0 0 0 0 0 0 0 202480 202476 2 0 409440 0 18835 24 204776 0 0 0 0 0 0 0 0 204776 204772 0 0 414416 0 19830 8 204792 0 0 0 0 0 0 0 0 204792 204788 2 0 415514 0 Simon- From davem@redhat.com Tue Jun 17 16:12:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 16:12:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HNCB2x005168 for ; Tue, 17 Jun 2003 16:12:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA05599; Tue, 17 Jun 2003 16:07:21 -0700 Date: Tue, 17 Jun 2003 16:07:21 -0700 (PDT) Message-Id: <20030617.160721.50340210.davem@redhat.com> To: sim@netnation.com Cc: Robert.Olsson@data.slu.se, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance tests From: "David S. Miller" In-Reply-To: <20030617225036.GG25773@netnation.com> References: <20030617200721.GA25773@netnation.com> <20030617210700.GE25773@netnation.com> <20030617225036.GG25773@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3378 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Tue, 17 Jun 2003 15:50:36 -0700 so the problems in 2.4 are probably the result of me hacking in the 2.5 patch. I have them in my pending 2.4.x tree, try this: diff -Nru a/include/net/route.h b/include/net/route.h --- a/include/net/route.h Tue Jun 17 16:08:06 2003 +++ b/include/net/route.h Tue Jun 17 16:08:06 2003 @@ -114,6 +114,8 @@ unsigned int gc_ignored; unsigned int gc_goal_miss; unsigned int gc_dst_overflow; + unsigned int in_hlist_search; + unsigned int out_hlist_search; } ____cacheline_aligned_in_smp; extern struct ip_rt_acct *ip_rt_acct; diff -Nru a/net/ipv4/Config.in b/net/ipv4/Config.in --- a/net/ipv4/Config.in Tue Jun 17 16:08:06 2003 +++ b/net/ipv4/Config.in Tue Jun 17 16:08:06 2003 @@ -14,7 +14,6 @@ bool ' IP: equal cost multipath' CONFIG_IP_ROUTE_MULTIPATH bool ' IP: use TOS value as routing key' CONFIG_IP_ROUTE_TOS bool ' IP: verbose route monitoring' CONFIG_IP_ROUTE_VERBOSE - bool ' IP: large routing tables' CONFIG_IP_ROUTE_LARGE_TABLES fi bool ' IP: kernel level autoconfiguration' CONFIG_IP_PNP if [ "$CONFIG_IP_PNP" = "y" ]; then diff -Nru a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c --- a/net/ipv4/fib_hash.c Tue Jun 17 16:08:07 2003 +++ b/net/ipv4/fib_hash.c Tue Jun 17 16:08:07 2003 @@ -89,7 +89,7 @@ int fz_nent; /* Number of entries */ int fz_divisor; /* Hash divisor */ - u32 fz_hashmask; /* (1<fz_hashmask) int fz_order; /* Zone order */ @@ -149,9 +149,19 @@ static rwlock_t fib_hash_lock = RW_LOCK_UNLOCKED; -#define FZ_MAX_DIVISOR 1024 +#define FZ_MAX_DIVISOR ((PAGE_SIZE< FZ_MAX_DIVISOR) { + printk(KERN_CRIT "route.c: bad divisor %d!\n", old_divisor); + return; + } + new_divisor = (old_divisor << 1); + break; } + + new_hashmask = (new_divisor - 1); + #if RT_CACHE_DEBUG >= 2 printk("fn_rehash_zone: hash for zone %d grows from %d\n", fz->fz_order, old_divisor); #endif - ht = kmalloc(new_divisor*sizeof(struct fib_node*), GFP_KERNEL); + ht = fz_hash_alloc(new_divisor); if (ht) { memset(ht, 0, new_divisor*sizeof(struct fib_node*)); + write_lock_bh(&fib_hash_lock); old_ht = fz->fz_hash; fz->fz_hash = ht; @@ -210,10 +235,10 @@ fz->fz_divisor = new_divisor; fn_rebuild_zone(fz, old_ht, old_divisor); write_unlock_bh(&fib_hash_lock); - kfree(old_ht); + + fz_hash_free(old_ht, old_divisor); } } -#endif /* CONFIG_IP_ROUTE_LARGE_TABLES */ static void fn_free_node(struct fib_node * f) { @@ -233,12 +258,11 @@ memset(fz, 0, sizeof(struct fn_zone)); if (z) { fz->fz_divisor = 16; - fz->fz_hashmask = 0xF; } else { fz->fz_divisor = 1; - fz->fz_hashmask = 0; } - fz->fz_hash = kmalloc(fz->fz_divisor*sizeof(struct fib_node*), GFP_KERNEL); + fz->fz_hashmask = (fz->fz_divisor - 1); + fz->fz_hash = fz_hash_alloc(fz->fz_divisor); if (!fz->fz_hash) { kfree(fz); return NULL; @@ -467,12 +491,10 @@ if ((fi = fib_create_info(r, rta, n, &err)) == NULL) return err; -#ifdef CONFIG_IP_ROUTE_LARGE_TABLES - if (fz->fz_nent > (fz->fz_divisor<<2) && + if (fz->fz_nent > (fz->fz_divisor<<1) && fz->fz_divisor < FZ_MAX_DIVISOR && (z==32 || (1< fz->fz_divisor)) fn_rehash_zone(fz); -#endif fp = fz_chain_p(key, fz); diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c --- a/net/ipv4/route.c Tue Jun 17 16:08:07 2003 +++ b/net/ipv4/route.c Tue Jun 17 16:08:07 2003 @@ -108,7 +108,7 @@ int ip_rt_max_size; int ip_rt_gc_timeout = RT_GC_TIMEOUT; int ip_rt_gc_interval = 60 * HZ; -int ip_rt_gc_min_interval = 5 * HZ; +int ip_rt_gc_min_interval = HZ / 2; int ip_rt_redirect_number = 9; int ip_rt_redirect_load = HZ / 50; int ip_rt_redirect_silence = ((HZ / 50) << (9 + 1)); @@ -287,7 +287,7 @@ for (lcpu = 0; lcpu < smp_num_cpus; lcpu++) { i = cpu_logical_map(lcpu); - len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", + len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", dst_entries, rt_cache_stat[i].in_hit, rt_cache_stat[i].in_slow_tot, @@ -304,7 +304,9 @@ rt_cache_stat[i].gc_total, rt_cache_stat[i].gc_ignored, rt_cache_stat[i].gc_goal_miss, - rt_cache_stat[i].gc_dst_overflow + rt_cache_stat[i].gc_dst_overflow, + rt_cache_stat[i].in_hlist_search, + rt_cache_stat[i].out_hlist_search ); } @@ -344,16 +346,17 @@ rth->u.dst.expires; } -static __inline__ int rt_may_expire(struct rtable *rth, int tmo1, int tmo2) +static __inline__ int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2) { - int age; + unsigned long age; int ret = 0; if (atomic_read(&rth->u.dst.__refcnt)) goto out; ret = 1; - if (rth->u.dst.expires && (long)(rth->u.dst.expires - jiffies) <= 0) + if (rth->u.dst.expires && + time_after_eq(jiffies, rth->u.dst.expires)) goto out; age = jiffies - rth->u.dst.lastuse; @@ -365,6 +368,25 @@ out: return ret; } +/* Bits of score are: + * 31: very valuable + * 30: not quite useless + * 29..0: usage counter + */ +static inline u32 rt_score(struct rtable *rt) +{ + u32 score = rt->u.dst.__use; + + if (rt_valuable(rt)) + score |= (1<<31); + + if (!rt->key.iif || + !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL))) + score |= (1<<30); + + return score; +} + /* This runs via a timer and thus is always in BH context. */ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy) { @@ -375,7 +397,7 @@ for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; t -= ip_rt_gc_timeout) { - unsigned tmo = ip_rt_gc_timeout; + unsigned long tmo = ip_rt_gc_timeout; i = (i + 1) & rt_hash_mask; rthp = &rt_hash_table[i].chain; @@ -384,7 +406,7 @@ while ((rth = *rthp) != NULL) { if (rth->u.dst.expires) { /* Entry is expired even if it is in use */ - if ((long)(now - rth->u.dst.expires) <= 0) { + if (time_before_eq(now, rth->u.dst.expires)) { tmo >>= 1; rthp = &rth->u.rt_next; continue; @@ -402,7 +424,7 @@ write_unlock(&rt_hash_table[i].lock); /* Fallback loop breaker. */ - if ((jiffies - now) > 0) + if (time_after(jiffies, now)) break; } rover = i; @@ -504,7 +526,7 @@ static int rt_garbage_collect(void) { - static unsigned expire = RT_GC_TIMEOUT; + static unsigned long expire = RT_GC_TIMEOUT; static unsigned long last_gc; static int rover; static int equilibrium; @@ -556,7 +578,7 @@ int i, k; for (i = rt_hash_mask, k = rover; i >= 0; i--) { - unsigned tmo = expire; + unsigned long tmo = expire; k = (k + 1) & rt_hash_mask; rthp = &rt_hash_table[k].chain; @@ -602,7 +624,7 @@ if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size) goto out; - } while (!in_softirq() && jiffies - now < 1); + } while (!in_softirq() && time_before_eq(jiffies, now)); if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size) goto out; @@ -626,10 +648,19 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { struct rtable *rth, **rthp; - unsigned long now = jiffies; + unsigned long now; + struct rtable *cand, **candp; + u32 min_score; + int chain_length; int attempts = !in_softirq(); restart: + chain_length = 0; + min_score = ~(u32)0; + cand = NULL; + candp = NULL; + now = jiffies; + rthp = &rt_hash_table[hash].chain; write_lock_bh(&rt_hash_table[hash].lock); @@ -650,9 +681,35 @@ return 0; } + if (!atomic_read(&rth->u.dst.__refcnt)) { + u32 score = rt_score(rth); + + if (score <= min_score) { + cand = rth; + candp = rthp; + min_score = score; + } + } + + chain_length++; + rthp = &rth->u.rt_next; } + if (cand) { + /* ip_rt_gc_elasticity used to be average length of chain + * length, when exceeded gc becomes really aggressive. + * + * The second limit is less certain. At the moment it allows + * only 2 entries per bucket. We will see. + */ + if (chain_length > ip_rt_gc_elasticity || + (chain_length > 1 && !(min_score & (1<<31)))) { + *candp = cand->u.rt_next; + rt_free(cand); + } + } + /* Try to bind route to arp only if it is output route or unicast forwarding path. */ @@ -960,7 +1017,7 @@ /* No redirected packets during ip_rt_redirect_silence; * reset the algorithm. */ - if (jiffies - rt->u.dst.rate_last > ip_rt_redirect_silence) + if (time_after(jiffies, rt->u.dst.rate_last + ip_rt_redirect_silence)) rt->u.dst.rate_tokens = 0; /* Too many ignored redirects; do not send anything @@ -974,8 +1031,9 @@ /* Check for load limit; set rate_last to the latest sent * redirect. */ - if (jiffies - rt->u.dst.rate_last > - (ip_rt_redirect_load << rt->u.dst.rate_tokens)) { + if (time_after(jiffies, + (rt->u.dst.rate_last + + (ip_rt_redirect_load << rt->u.dst.rate_tokens)))) { icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway); rt->u.dst.rate_last = jiffies; ++rt->u.dst.rate_tokens; @@ -1672,6 +1730,7 @@ skb->dst = (struct dst_entry*)rth; return 0; } + rt_cache_stat[smp_processor_id()].in_hlist_search++; } read_unlock(&rt_hash_table[hash].lock); @@ -2032,6 +2091,7 @@ *rp = rth; return 0; } + rt_cache_stat[smp_processor_id()].out_hlist_search++; } read_unlock_bh(&rt_hash_table[hash].lock); From shemminger@osdl.org Tue Jun 17 16:31:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 16:31:37 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HNVR2x005852 for ; Tue, 17 Jun 2003 16:31:29 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5HNVGX29621; Tue, 17 Jun 2003 16:31:17 -0700 Date: Tue, 17 Jun 2003 16:31:16 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.72] Red Creek VPN update Message-Id: <20030617163116.1f7b6d78.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3379 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Some cleanup's to Red Creek VPN driver. - use alloc_etherdev instead of init_etherdev - don't decrement module ref count negative in case of error. Don't have the hardware to actually test the real thing. But the driver builds and loads fine. diff -Nru a/drivers/net/rcpci45.c b/drivers/net/rcpci45.c --- a/drivers/net/rcpci45.c Tue Jun 17 15:25:17 2003 +++ b/drivers/net/rcpci45.c Tue Jun 17 15:25:17 2003 @@ -171,13 +171,14 @@ * will be assigned to the LAN API layer. */ - dev = init_etherdev (NULL, sizeof (*pDpa)); + dev = alloc_etherdev(sizeof(*pDpa)); if (!dev) { printk (KERN_ERR - "(rcpci45 driver:) init_etherdev alloc failed\n"); + "(rcpci45 driver:) alloc_etherdev alloc failed\n"); error = -ENOMEM; goto err_out; } + SET_MODULE_OWNER(dev); SET_NETDEV_DEV(dev, &pdev->dev); @@ -257,6 +258,9 @@ dev->do_ioctl = &RCioctl; dev->set_config = &RCconfig; + if ((error = register_netdev(dev))) + goto err_out_free_region; + return 0; /* success */ err_out_free_region: @@ -265,7 +269,6 @@ pci_free_consistent (pdev, MSG_BUF_SIZE, pDpa->msgbuf, pDpa->msgbuf_dma); err_out_free_dev: - unregister_netdev (dev); kfree (dev); err_out: card_idx--; @@ -717,11 +720,9 @@ if (retry > REBOOT_REINIT_RETRY_LIMIT) { printk (KERN_WARNING "%s unable to reinitialize adapter after reboot\n", dev->name); - printk (KERN_WARNING "%s decrementing driver and closing interface\n", dev->name); + printk (KERN_WARNING "%s shutting down interface\n", dev->name); RCDisableI2OInterrupts (dev); dev->flags &= ~IFF_UP; - MOD_DEC_USE_COUNT; - /* FIXME: kill MOD_DEC_USE_COUNT, use dev_put */ } else { printk (KERN_INFO "%s: rescheduling timer...\n", dev->name); From shemminger@osdl.org Tue Jun 17 16:32:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 16:32:40 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5HNWZ2x006043 for ; Tue, 17 Jun 2003 16:32:36 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5HNWTX30412; Tue, 17 Jun 2003 16:32:29 -0700 Date: Tue, 17 Jun 2003 16:32:28 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.72] Eliminate bogus function in Red Creek VPN Message-Id: <20030617163228.34d3daa4.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3380 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev The following function is defined but never used. diff -Nru a/drivers/net/rcpci45.c b/drivers/net/rcpci45.c --- a/drivers/net/rcpci45.c Tue Jun 17 15:25:28 2003 +++ b/drivers/net/rcpci45.c Tue Jun 17 15:25:28 2003 @@ -537,17 +537,6 @@ (PFNCALLBACK) RCreset_callback); } -int -broadcast_packet (unsigned char *address) -{ - int i; - for (i = 0; i < 6; i++) - if (address[i] != 0xff) - return 0; - - return 1; -} - /* * RCrecv_callback() * From acme@conectiva.com.br Tue Jun 17 18:30:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 18:31:01 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I1UJ2x014035 for ; Tue, 17 Jun 2003 18:30:20 -0700 Received: from [200.181.170.138] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19SRnF-0005yK-00; Tue, 17 Jun 2003 22:31:23 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 2E32B1966C; Wed, 18 Jun 2003 01:32:15 +0000 (UTC) Date: Tue, 17 Jun 2003 22:32:15 -0300 From: Arnaldo Carvalho de Melo To: "David S. Miller" Cc: Linux Networking Development Mailing List Subject: [RFC] taking advantage of sk_{add,del}_node Message-ID: <20030618013214.GC17694@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 3381 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Hi, While doing the hlist work I noticed that it may be a good idea to have the sock_{hold,put} done automatically in most (all?) sk_{add,del}_node, as we're adding the sock to a list, we must hold a reference, so take a look at this (untested, with some issues) patch and please give comments about it. Doing this we'll have some protocols that doesn't do any refcounting at all suck less, and make it easier to fill the gap, as I have an idea for the next step for a sk_search(list, callback_cmp_func), done as a macro and using the callback_cmp_func as an inline, that would do the sock_hold on finding the sock when callback_cmp_func returns true (think about a variation on bsearch(3) idea...). With those thingies in place writing a protocol will be a piece of cake and we'll have less differences among network families implementations 8) Thanks, - Arnaldo ===== include/net/sock.h 1.45 vs edited ===== --- 1.45/include/net/sock.h Tue Jun 17 13:35:27 2003 +++ edited/include/net/sock.h Tue Jun 17 22:01:57 2003 @@ -289,7 +289,7 @@ node->pprev = NULL; } -static __inline__ int sk_del_node_init(struct sock *sk) +static __inline__ int __sk_del_node_init(struct sock *sk) { if (sk_hashed(sk)) { __hlist_del(&sk->sk_node); @@ -299,8 +299,23 @@ return 0; } +static __inline__ int sk_del_node_init(struct sock *sk) +{ + int rc = __sk_del_node_init(sk); + + if (rc) + __sock_put(sk); + return rc; +} + +static __inline__ void __sk_add_node(struct sock *sk, struct hlist_head *list) +{ + hlist_add_head(&sk->sk_node, list); +} + static __inline__ void sk_add_node(struct sock *sk, struct hlist_head *list) { + sock_hold(sk); hlist_add_head(&sk->sk_node, list); } ===== net/bluetooth/af_bluetooth.c 1.20 vs edited ===== --- 1.20/net/bluetooth/af_bluetooth.c Mon Jun 16 12:11:36 2003 +++ edited/net/bluetooth/af_bluetooth.c Tue Jun 17 22:03:33 2003 @@ -143,15 +143,13 @@ { write_lock_bh(&l->lock); sk_add_node(sk, &l->head); - sock_hold(sk); write_unlock_bh(&l->lock); } void bt_sock_unlink(struct bt_sock_list *l, struct sock *sk) { write_lock_bh(&l->lock); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk) write_unlock_bh(&l->lock); } ===== net/decnet/af_decnet.c 1.29 vs edited ===== --- 1.29/net/decnet/af_decnet.c Mon Jun 16 12:11:36 2003 +++ edited/net/decnet/af_decnet.c Tue Jun 17 22:04:17 2003 @@ -276,7 +276,7 @@ return; write_lock_bh(&dn_hash_lock); - hlist_del(&sk->sk_node); + sk_del_node_init(&sk->sk_node); DN_SK(sk)->addrloc = 0; list = listen_hash(&DN_SK(sk)->addr); sk_add_node(sk, list); ===== net/econet/af_econet.c 1.21 vs edited ===== --- 1.21/net/econet/af_econet.c Mon Jun 16 12:11:36 2003 +++ edited/net/econet/af_econet.c Tue Jun 17 22:04:31 2003 @@ -96,8 +96,7 @@ static void econet_remove_socket(struct hlist_head *list, struct sock *sk) { write_lock_bh(&econet_lock); - if (sk_del_node_init(sk)) - sock_put(sk); + sk_del_node_init(sk) write_unlock_bh(&econet_lock); } @@ -105,7 +104,6 @@ { write_lock_bh(&econet_lock); sk_add_node(sk, list); - sock_hold(sk); write_unlock_bh(&econet_lock); } ===== net/ipv4/raw.c 1.34 vs edited ===== --- 1.34/net/ipv4/raw.c Mon Jun 16 12:11:36 2003 +++ edited/net/ipv4/raw.c Tue Jun 17 22:04:38 2003 @@ -91,17 +91,14 @@ write_lock_bh(&raw_v4_lock); sk_add_node(sk, head); sock_prot_inc_use(sk->sk_prot); - sock_hold(sk); write_unlock_bh(&raw_v4_lock); } static void raw_v4_unhash(struct sock *sk) { write_lock_bh(&raw_v4_lock); - if (sk_del_node_init(sk)) { + if (sk_del_node_init(sk)) sock_prot_dec_use(sk->sk_prot); - __sock_put(sk); - } write_unlock_bh(&raw_v4_lock); } ===== net/ipv4/tcp_ipv4.c 1.63 vs edited ===== --- 1.63/net/ipv4/tcp_ipv4.c Tue Jun 17 13:35:27 2003 +++ edited/net/ipv4/tcp_ipv4.c Tue Jun 17 22:05:38 2003 @@ -359,7 +359,7 @@ lock = &tcp_ehash[sk->sk_hashent].lock; write_lock(lock); } - sk_add_node(sk, list); + __sk_add_node(sk, list); sock_prot_inc_use(sk->sk_prot); write_unlock(lock); if (listen_possible && sk->sk_state == TCP_LISTEN) @@ -392,7 +392,7 @@ write_lock_bh(&head->lock); } - if (sk_del_node_init(sk)) + if (__sk_del_node_init(sk)) sock_prot_dec_use(sk->sk_prot); write_unlock_bh(lock); @@ -608,7 +608,7 @@ inet->sport = htons(lport); sk->sk_hashent = hash; BUG_TRAP(sk_unhashed(sk)); - sk_add_node(sk, &head->chain); + __sk_add_node(sk, &head->chain); sock_prot_inc_use(sk->sk_prot); write_unlock(&head->lock); ===== net/ipv4/udp.c 1.43 vs edited ===== --- 1.43/net/ipv4/udp.c Mon Jun 16 12:11:36 2003 +++ edited/net/ipv4/udp.c Tue Jun 17 22:04:45 2003 @@ -189,7 +189,6 @@ sk_add_node(sk, h); sock_prot_inc_use(sk->sk_prot); - sock_hold(sk); } write_unlock_bh(&udp_hash_lock); return 0; @@ -210,7 +209,6 @@ if (sk_del_node_init(sk)) { inet_sk(sk)->num = 0; sock_prot_dec_use(sk->sk_prot); - __sock_put(sk); } write_unlock_bh(&udp_hash_lock); } ===== net/ipv6/raw.c 1.31 vs edited ===== --- 1.31/net/ipv6/raw.c Mon Jun 16 12:11:36 2003 +++ edited/net/ipv6/raw.c Tue Jun 17 22:06:08 2003 @@ -64,17 +64,14 @@ write_lock_bh(&raw_v6_lock); sk_add_node(sk, list); sock_prot_inc_use(sk->sk_prot); - sock_hold(sk); write_unlock_bh(&raw_v6_lock); } static void raw_v6_unhash(struct sock *sk) { write_lock_bh(&raw_v6_lock); - if (sk_del_node_init(sk)) { + if (sk_del_node_init(sk)) sock_prot_dec_use(sk->sk_prot); - __sock_put(sk); - } write_unlock_bh(&raw_v6_lock); } ===== net/ipv6/udp.c 1.41 vs edited ===== --- 1.41/net/ipv6/udp.c Mon Jun 16 12:24:58 2003 +++ edited/net/ipv6/udp.c Tue Jun 17 22:06:13 2003 @@ -160,7 +160,6 @@ if (sk_unhashed(sk)) { sk_add_node(sk, &udp_hash[snum & (UDP_HTABLE_SIZE - 1)]); sock_prot_inc_use(sk->sk_prot); - sock_hold(sk); } write_unlock_bh(&udp_hash_lock); return 0; @@ -181,7 +180,6 @@ if (sk_del_node_init(sk)) { inet_sk(sk)->num = 0; sock_prot_dec_use(sk->sk_prot); - __sock_put(sk); } write_unlock_bh(&udp_hash_lock); } ===== net/ipx/af_ipx.c 1.38 vs edited ===== --- 1.38/net/ipx/af_ipx.c Tue Jun 17 14:11:58 2003 +++ edited/net/ipx/af_ipx.c Tue Jun 17 22:06:30 2003 @@ -142,7 +142,6 @@ spin_lock_bh(&intrfc->if_sklist_lock); sk_del_node_init(sk); spin_unlock_bh(&intrfc->if_sklist_lock); - sock_put(sk); ipxitf_put(intrfc); out: return; @@ -229,7 +228,6 @@ static void ipxitf_insert_socket(struct ipx_interface *intrfc, struct sock *sk) { ipxitf_hold(intrfc); - sock_hold(sk); spin_lock_bh(&intrfc->if_sklist_lock); ipx_sk(sk)->intrfc = intrfc; sk_add_node(sk, &intrfc->if_sklist); ===== net/key/af_key.c 1.42 vs edited ===== --- 1.42/net/key/af_key.c Mon Jun 16 12:11:36 2003 +++ edited/net/key/af_key.c Tue Jun 17 22:06:34 2003 @@ -115,15 +115,13 @@ { pfkey_table_grab(); sk_add_node(sk, &pfkey_table); - sock_hold(sk); pfkey_table_ungrab(); } static void pfkey_remove(struct sock *sk) { pfkey_table_grab(); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); pfkey_table_ungrab(); } ===== net/llc/llc_sap.c 1.20 vs edited ===== --- 1.20/net/llc/llc_sap.c Mon Jun 16 12:11:36 2003 +++ edited/net/llc/llc_sap.c Tue Jun 17 22:06:40 2003 @@ -35,7 +35,6 @@ write_lock_bh(&sap->sk_list.lock); llc_sk(sk)->sap = sap; sk_add_node(sk, &sap->sk_list.list); - sock_hold(sk); write_unlock_bh(&sap->sk_list.lock); } @@ -50,8 +49,7 @@ void llc_sap_unassign_sock(struct llc_sap *sap, struct sock *sk) { write_lock_bh(&sap->sk_list.lock); - if (sk_del_node_init(sk)) - sock_put(sk); + sk_del_node_init(sk); write_unlock_bh(&sap->sk_list.lock); } ===== net/netlink/af_netlink.c 1.28 vs edited ===== --- 1.28/net/netlink/af_netlink.c Mon Jun 16 12:11:36 2003 +++ edited/net/netlink/af_netlink.c Tue Jun 17 22:06:45 2003 @@ -193,7 +193,6 @@ if (nlk_sk(sk)->pid == 0) { nlk_sk(sk)->pid = pid; sk_add_node(sk, &nl_table[sk->sk_protocol]); - sock_hold(sk); err = 0; } } @@ -204,8 +203,7 @@ static void netlink_remove(struct sock *sk) { netlink_table_grab(); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); netlink_table_ungrab(); } ===== net/packet/af_packet.c 1.30 vs edited ===== --- 1.30/net/packet/af_packet.c Mon Jun 16 12:11:36 2003 +++ edited/net/packet/af_packet.c Tue Jun 17 22:06:51 2003 @@ -758,8 +758,7 @@ return 0; write_lock_bh(&packet_sklist_lock); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); write_unlock_bh(&packet_sklist_lock); /* @@ -984,7 +983,6 @@ write_lock_bh(&packet_sklist_lock); sk_add_node(sk, &packet_sklist); - sock_hold(sk); write_unlock_bh(&packet_sklist_lock); return(0); ===== net/unix/af_unix.c 1.48 vs edited ===== --- 1.48/net/unix/af_unix.c Mon Jun 16 16:46:51 2003 +++ edited/net/unix/af_unix.c Tue Jun 17 22:06:58 2003 @@ -211,15 +211,13 @@ static void __unix_remove_socket(struct sock *sk) { - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); } static void __unix_insert_socket(struct hlist_head *list, struct sock *sk) { BUG_TRAP(sk_unhashed(sk)); sk_add_node(sk, list); - sock_hold(sk); } static inline void unix_remove_socket(struct sock *sk) ===== net/wanrouter/af_wanpipe.c 1.27 vs edited ===== --- 1.27/net/wanrouter/af_wanpipe.c Mon Jun 16 12:11:36 2003 +++ edited/net/wanrouter/af_wanpipe.c Tue Jun 17 22:07:04 2003 @@ -982,8 +982,7 @@ set_bit(1,&wanpipe_tx_critical); write_lock(&wanpipe_sklist_lock); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); write_unlock(&wanpipe_sklist_lock); clear_bit(1,&wanpipe_tx_critical); @@ -1143,8 +1142,7 @@ } write_lock(&wanpipe_sklist_lock); - if (sk_del_node_init(sk)) - __sock_put(sk); + sk_del_node_init(sk); write_unlock(&wanpipe_sklist_lock); @@ -1206,8 +1204,7 @@ * appropriate locks */ write_lock(&wanpipe_sklist_lock); - if (sk_del_node_init(init)) - __sock_put(sk); + sk_del_node_init(sk); write_unlock(&wanpipe_sklist_lock); sk->sk_socket = NULL; @@ -1536,7 +1533,6 @@ set_bit(1,&wanpipe_tx_critical); write_lock(&wanpipe_sklist_lock); sk_add_node(sk, &wanpipe_sklist); - sock_hold(sk); write_unlock(&wanpipe_sklist_lock); clear_bit(1,&wanpipe_tx_critical); @@ -2434,7 +2430,6 @@ set_bit(1,&wanpipe_tx_critical); write_lock(&wanpipe_sklist_lock); sk_add_node(newsk, &wanpipe_sklist); - sock_hold(sk); write_unlock(&wanpipe_sklist_lock); clear_bit(1,&wanpipe_tx_critical); ===== net/x25/af_x25.c 1.28 vs edited ===== --- 1.28/net/x25/af_x25.c Mon Jun 16 12:11:36 2003 +++ edited/net/x25/af_x25.c Tue Jun 17 22:07:26 2003 @@ -154,8 +154,7 @@ static void x25_remove_socket(struct sock *sk) { write_lock_bh(&x25_list_lock); - if (sk_del_node_init(sk)) - sock_put(sk); + sk_del_node_init(sk); write_unlock_bh(&x25_list_lock); } @@ -219,7 +218,6 @@ { write_lock_bh(&x25_list_lock); sk_add_node(sk, &x25_list); - sock_hold(sk); write_unlock_bh(&x25_list_lock); } From scott.feldman@intel.com Tue Jun 17 19:48:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 19:48:10 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I2m32x015432 for ; Tue, 17 Jun 2003 19:48:04 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5I2hNl12616 for ; Wed, 18 Jun 2003 02:43:23 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by talaria.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5I2mZj14666 for ; Wed, 18 Jun 2003 02:48:35 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061719480212244 ; Tue, 17 Jun 2003 19:48:02 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 17 Jun 2003 19:48:03 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: e100-3.0.0_dev8 "Minneapolis Moline" release Date: Tue, 17 Jun 2003 19:48:02 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e100-3.0.0_dev8 "Minneapolis Moline" release Thread-Index: AcM1RAkZqhs6qkm0T0SZTNGqRepA2Q== From: "Feldman, Scott" To: , X-OriginalArrivalTime: 18 Jun 2003 02:48:03.0120 (UTC) FILETIME=[09934300:01C33544] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5I2m32x015432 X-archive-position: 3382 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev [Someone once suggested e100 as the first nominee into the driver hall of shame. I'd like to revoke that nomination with this rewrite of e100]. DON'T USE THIS DRIVER ON A PRODUCTION SYSTEM! http://sf.net/projects/e1000, download e100-3.0.0_dev8 (tar file or kernel patches). Your help in testing would be greatly appreciated. There are many 8255x devices supported by e100, so hopefully we'll get good coverage from the community. Also, any feedback on the code correctness, maintainability, credits, etc. would help. We're really motivated in getting this driver as perfect as possible. It's about 2200 lines, down from about 9000 before, so hopefully that helps. :) It does NAPI, ethtool (probably the most ethtool-compliant driver out there), and MII ioctl. The only module parameter is "debug", used to set the initial message level. All other driver settings must be done via ethtool. DON'T USE THIS DRIVER ON A PRODUCTION SYSTEM! Thanks to Jeff and Arjan for great feedback and encouragement. -scott From pekkas@netcore.fi Tue Jun 17 22:52:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 22:52:30 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I5q22x019322 for ; Tue, 17 Jun 2003 22:52:03 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5I5or704805; Wed, 18 Jun 2003 08:50:53 +0300 Date: Wed, 18 Jun 2003 08:50:53 +0300 (EEST) From: Pekka Savola To: Simon Kirby cc: "David S. Miller" , , , , , , , , Subject: Re: Route cache performance tests In-Reply-To: <20030617205101.GD25773@netnation.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3383 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Tue, 17 Jun 2003, Simon Kirby wrote: > On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote: > > > I have no idea why they do this, it's the stupidest thing > > you can possibly do by default. > > > > If we thought it was a good idea to turn this on by default > > we would have done so in the kernel. > > > > Does anyone have some cycles to spare to try and urge whoever is > > repsponsible for this in Debian to leave the kernel's default setting > > alone? > > Sure, I can do this. But why is this stupid? It uses more CPU, but > stops IP spoofing by default. Specific firewall rules would have to be > created otherwise. And the overhead only really shows when the routing > table is large, right? Personally I think rp_filter by default is the only good choice (security/operational-wise). It's typically not useful when you have a lot of routes, though.. but as the 99.9% of users _don't_, it still seems like a good default value. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From jgarzik@pobox.com Tue Jun 17 23:58:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 17 Jun 2003 23:58:43 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I6wX2x021947 for ; Tue, 17 Jun 2003 23:58:34 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19SP1x-0001Vx-0e; Tue, 17 Jun 2003 23:34:21 +0100 Message-ID: <3EEF9762.2040900@pobox.com> Date: Tue, 17 Jun 2003 18:34:10 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "J.A. Magallon" CC: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCHES] 2.4.x net driver updates References: <20030612194926.GA7653@gtf.org> <20030617222750.GE13990@werewolf.able.es> In-Reply-To: <20030617222750.GE13990@werewolf.able.es> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3384 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev J.A. Magallon wrote: > On 06.12, Jeff Garzik wrote: > >>BK users may issue a >> >> bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 >> >>Others may download the patch from >> >>ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-rc8-netdrvr2.patch.bz2 >> > > > Any info about the RX_POLLING (NAPI) option for e1000 ? > What is that for ? NAPI enables a software polling mode, or software interrupt migitation if you prefer to call it that. It kicks in at moderate to high packet rates, allows the net stack to more globally balance net traffic, and avoids problems associated with high packet load / DoS situations which would otherwise max out a cpu. But it's a new feature, so being conservative there is a staged rollout, with NAPI support in e100[0] being an option that can be turned off. Some drivers like tg3 simply always enable NAPI. Jeff From ak@suse.de Wed Jun 18 01:01:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 01:01:37 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I81O2x023322 for ; Wed, 18 Jun 2003 01:01:25 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id E2B3614533; Wed, 18 Jun 2003 10:01:18 +0200 (MEST) Date: Wed, 18 Jun 2003 10:01:18 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol Message-ID: <20030618080118.GC23037@wotan.suse.de> References: <20030617220420.GA1169@averell> <20030617.150751.52901849.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030617.150751.52901849.davem@redhat.com> X-archive-position: 3385 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 03:07:51PM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Wed, 18 Jun 2003 00:04:20 +0200 > > Convert pppoe to a new style protocol, otherwise it is unusable > on SMP compiled kernels because of the printk in deliver*old_ones. > It works fine here, but only tested on a UP box. > > Please don't add new skb_linearize() users, I'm trying > to make that only local to net/core/dev.c Then offer a flag or something for protocols that also doesn't trigger printks ? > > Otherwise I'm fine with your patch. How else should it be done? I'm not going through the whole PPP layer now to audit it for non linear skb cleanliness... -Andi From werner@almesberger.net Wed Jun 18 05:22:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 05:22:29 -0700 (PDT) Received: from host.almesberger.net (almesberger.net [63.105.73.239] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ICMF2x008907 for ; Wed, 18 Jun 2003 05:22:18 -0700 Received: from almesberger.net (vpnwa-home [10.200.0.2]) by host.almesberger.net (8.11.6/8.9.3) with ESMTP id h5ICLoG07104; Wed, 18 Jun 2003 05:21:51 -0700 Received: (from werner@localhost) by almesberger.net (8.11.6/8.11.6) id h5ICLQA22189; Wed, 18 Jun 2003 09:21:26 -0300 Date: Wed, 18 Jun 2003 09:21:26 -0300 From: Werner Almesberger To: "David S. Miller" Cc: ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol Message-ID: <20030618092126.A28100@almesberger.net> References: <20030617220420.GA1169@averell> <20030617.150751.52901849.davem@redhat.com> <20030618080118.GC23037@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030618080118.GC23037@wotan.suse.de>; from ak@suse.de on Wed, Jun 18, 2003 at 10:01:18AM +0200 X-archive-position: 3386 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wa@almesberger.net Precedence: bulk X-list: netdev On Tue, Jun 17, 2003 at 03:07:51PM -0700, David S. Miller wrote: > Please don't add new skb_linearize() users, I'm trying > to make that only local to net/core/dev.c I know you find documentation unmanly, but maybe ... - Werner --- skbuff.h.orig Wed Jun 18 09:13:57 2003 +++ skbuff.h Wed Jun 18 09:15:05 2003 @@ -1126,6 +1126,9 @@ * * If there is no free memory -ENOMEM is returned, otherwise zero * is returned and the old skb data released. + * + * DO NOT USE THIS IN NEW CODE ! skb_linearize will be for internal + * use by net/core/dev.c only. */ int skb_linearize(struct sk_buff *skb, int gfp); -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ From andre@tomt.net Wed Jun 18 05:54:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 05:54:44 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ICsa2x022379 for ; Wed, 18 Jun 2003 05:54:37 -0700 Received: (qmail 2231 invoked by uid 1006); 18 Jun 2003 12:57:20 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.16 (sophie: 2.14/3.69. spamassassin: 2.55. Clear:. Processed in 0.024225 secs); 18 Jun 2003 12:57:20 -0000 Received: from slask.tomt.net (HELO slurv.ws.pasop.tomt.net) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 18 Jun 2003 12:57:20 -0000 Subject: Re: IPv6 bugs introduced in 2.4.21 From: Andre Tomt To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org In-Reply-To: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> References: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> Content-Type: text/plain; charset=ISO-8859-1 Organization: Message-Id: <1055940864.7481.163.camel@slurv.ws.pasop.tomt.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4- Date: 18 Jun 2003 14:54:24 +0200 Content-Transfer-Encoding: 8bit X-archive-position: 3387 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev On man, 2003-06-16 at 21:50, Andre Tomt wrote: > The lower part of a /127 network is somehow, strangely routed to lo - > observe.. Replying to myself, with a finding. I found a temporarily workaround for this problem. Adding the address with a /128 prefixlen, and then adding a static route entry for the other end - like this: ip -6 addr add 2001:730:3::1:2b/128 dev aorta ip -6 route add 2001:730:3::1:2a/128 dev aorta .."fixes" the issue. There is also one more impact of this bug, traffic that travels directly between the two peers gets discarded, BGP and such breaks this way. Added CC to usagi-users, and I'm now subscribed to netdev (and usagi-users), no need for CC'ing me directly. :-) -- Mvh, André Tomt From yoshfuji@linux-ipv6.org Wed Jun 18 06:41:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 06:41:59 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IDfn2x026036 for ; Wed, 18 Jun 2003 06:41:51 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5IDgvBo014466; Wed, 18 Jun 2003 22:42:57 +0900 Date: Wed, 18 Jun 2003 22:42:57 +0900 (JST) Message-Id: <20030618.224257.130940019.yoshfuji@linux-ipv6.org> To: andre@skjellin.no Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: IPv6 bugs introduced in 2.4.21 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> References: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3388 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> (at 16 Jun 2003 21:50:48 +0200), Andre Tomt says: > I mailed you guys a little while ago on the "unable to use > SOMENETWORK::0000 as a nexthop gateway" bug in 2.4.21-pre/rc a while > ago. It is still present in 2.4.21, rendering the "first" /128 of a > arbitrary prefixlen unusable - :0000. This is especially bad with /127 > tunnels, rendering :0000 and :0001 unusable). But! There is one more : This is NOT the bug but by the spec. prefix:: is an anycast address, not a unicast; you cannot use it like an unicast address. Well... Do you really need to assign global address on the point-to-point device? If yes, you should not use /127 prefix; please use /64 instead. Thank you. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From ak@muc.de Wed Jun 18 07:00:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 07:00:25 -0700 (PDT) Received: from colin2.muc.de (qmailr@colin2.muc.de [193.149.48.15]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IE0H2x028110 for ; Wed, 18 Jun 2003 07:00:18 -0700 Received: (qmail 15932 invoked by uid 3709); 18 Jun 2003 12:51:59 -0000 Date: 18 Jun 2003 14:51:59 +0200 Date: Wed, 18 Jun 2003 14:51:59 +0200 From: Andi Kleen To: Werner Almesberger Cc: "David S. Miller" , ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol Message-ID: <20030618125159.GA15238@colin2.muc.de> References: <20030617220420.GA1169@averell> <20030617.150751.52901849.davem@redhat.com> <20030618080118.GC23037@wotan.suse.de> <20030618092126.A28100@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030618092126.A28100@almesberger.net> User-Agent: Mutt/1.4.1i X-archive-position: 3389 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@colin2.muc.de Precedence: bulk X-list: netdev On Wed, Jun 18, 2003 at 09:21:26AM -0300, Werner Almesberger wrote: > On Tue, Jun 17, 2003 at 03:07:51PM -0700, David S. Miller wrote: > > Please don't add new skb_linearize() users, I'm trying > > to make that only local to net/core/dev.c > > I know you find documentation unmanly, but maybe ... But what is the replacement? For moving over whole subsystems to non linear skbs it's a bit late in the release... And the "legacy protocol" setting cannot be used anymore because it triggers a printk for every packet on a SMP kernel. -Andi From andre@tomt.net Wed Jun 18 07:11:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 07:11:13 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IEB62x028617 for ; Wed, 18 Jun 2003 07:11:07 -0700 Received: (qmail 17730 invoked by uid 1006); 18 Jun 2003 14:13:50 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.16 (sophie: 2.14/3.69. spamassassin: 2.55. Clear:. Processed in 0.025593 secs); 18 Jun 2003 14:13:50 -0000 Received: from slask.tomt.net (HELO slurv.ws.pasop.tomt.net) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 18 Jun 2003 14:13:50 -0000 Subject: Re: IPv6 bugs introduced in 2.4.21 From: Andre Tomt To: YOSHIFUJI Hideaki / =?UTF-8?Q?=E5=90=89=E8=97=A4=E8=8B=B1?= =?UTF-8?Q?=E6=98=8E?= Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org In-Reply-To: <20030618.224257.130940019.yoshfuji@linux-ipv6.org> References: <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> <20030618.224257.130940019.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=UTF-8 Organization: Message-Id: <1055945454.7480.184.camel@slurv.ws.pasop.tomt.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4- Date: 18 Jun 2003 16:10:54 +0200 Content-Transfer-Encoding: 8bit X-archive-position: 3390 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev On ons, 2003-06-18 at 15:42, YOSHIFUJI Hideaki / ĺ‰č—¤č‹±ćŽ wrote: > In article <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> (at 16 Jun 2003 21:50:48 +0200), Andre Tomt says: > > > I mailed you guys a little while ago on the "unable to use > > SOMENETWORK::0000 as a nexthop gateway" bug in 2.4.21-pre/rc a while > > ago. It is still present in 2.4.21, rendering the "first" /128 of a > > arbitrary prefixlen unusable - :0000. This is especially bad with /127 > > tunnels, rendering :0000 and :0001 unusable). But! There is one more > : > > This is NOT the bug but by the spec. > prefix:: is an anycast address, not a unicast; > you cannot use it like an unicast address. Ok, that probably is correct. It works in 2.4.20, that does not mean it's correct behavior though ;-) > Well... > > Do you really need to assign global address on the point-to-point device? Yes. > If yes, you should not use /127 prefix; please use /64 instead. No one in their right mind assigns /64's for a linknetwork with two peers. It's a pointopoint-link. All people I know use either /128 pointopoint or pointomultipoint semantics (BSD, KAME), or /127's as Linux refuses to use the traditional pointopoint or peer parameter in ifconfig and iproute for ipv6. The /127 matches both 2a and 2b, why does it end up at localhost? From babydr@baby-dragons.com Wed Jun 18 07:38:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 07:38:19 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.baby-dragons.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IEc82x029153 for ; Wed, 18 Jun 2003 07:38:11 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h5IEam9R027518; Wed, 18 Jun 2003 10:37:08 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h5IEaChN027513; Wed, 18 Jun 2003 10:36:27 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Wed, 18 Jun 2003 10:36:12 -0400 (EDT) From: "Mr. James W. Laferriere" To: Chris Friesen cc: NetDev , Linux networking maillist , Stephan von Krawczynski Subject: Re: BUG: Massive performance drop in routing throughput with 2.4.21 In-Reply-To: <3EF07435.8020407@nortelnetworks.com> Message-ID: References: <20030616141806.6a92f839.skraw@ithnet.com> <20030616145135.0ef5c436.skraw@ithnet.com> <20030616151035.735fcaf2.martin.zwickel@technotrend.de> <1055880260.19796.7.camel@rth.ninka.net> <20030618151034.0a84b2e2.skraw@ithnet.com> <3EF07435.8020407@nortelnetworks.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3391 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello Chris , Moved to netdev & linux-net for appropriate coverage . I have been following this thread on all lists as it directly effects my 'host' system . I for one have not seen ANY request for further information from Dave or anybody concering this problem . If you will look at my orignal post (sent to kernel first then to netdev & linux-net) you'll find there is quite a bit of material . Whether it is necessary or not is yet to be seen . To date only yourself , Stephan & Dave have made comments on this post in any manner . None to my knowledge have requested further info's . Twyl , JimL On Wed, 18 Jun 2003, Chris Friesen wrote: > Stephan von Krawczynski wrote: > > On 17 Jun 2003 13:04:20 -0700 > > "David S. Miller" wrote: > >>You can start by reporting the bug and all your debugging > >>informtion to the correct list. > >> > >>Networking developers DO NOT sit on linux-kernel, it's too high > >>volume for them. So use the correct list to report such > >>problems. >...snip... > > Maybe I should have made it a bit clearer in my original post to this thread: > > the thing is a show-stopper. > Did you see David's post? He specifically said that the network > developers are *not* on linux-kernel. You would do better to send this > to the linux network developers list, at netdev@oss.sgi.com. > You still haven't given the information that he asked for. > Chris -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ From cfriesen@nortelnetworks.com Wed Jun 18 08:04:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 08:04:45 -0700 (PDT) Received: from zcars0m9.nortelnetworks.com (zcars0m9.nortelnetworks.com [47.129.242.157]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IF4Z2x030147 for ; Wed, 18 Jun 2003 08:04:37 -0700 Received: from zcard309.ca.nortel.com (zcard309.ca.nortel.com [47.129.242.69]) by zcars0m9.nortelnetworks.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id h5IEunn19427; Wed, 18 Jun 2003 10:56:49 -0400 (EDT) Received: from zcard0k6.ca.nortel.com ([47.129.242.158]) by zcard309.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id ND94BG2V; Wed, 18 Jun 2003 10:56:50 -0400 Received: from pcard0ks.ca.nortel.com ([47.129.117.131]) by zcard0k6.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id LV8RA625; Wed, 18 Jun 2003 10:56:49 -0400 Received: from nortelnetworks.com (localhost.localdomain [127.0.0.1]) by pcard0ks.ca.nortel.com (Postfix) with ESMTP id C4D5F2E133; Wed, 18 Jun 2003 10:56:48 -0400 (EDT) Message-ID: <3EF07DB0.3090003@nortelnetworks.com> Date: Wed, 18 Jun 2003 10:56:48 -0400 X-Sybari-Space: 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: "Mr. James W. Laferriere" Cc: NetDev , Linux networking maillist , Stephan von Krawczynski Subject: Re: BUG: Massive performance drop in routing throughput with 2.4.21 References: <20030616141806.6a92f839.skraw@ithnet.com> <20030616145135.0ef5c436.skraw@ithnet.com> <20030616151035.735fcaf2.martin.zwickel@technotrend.de> <1055880260.19796.7.camel@rth.ninka.net> <20030618151034.0a84b2e2.skraw@ithnet.com> <3EF07435.8020407@nortelnetworks.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3392 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortelnetworks.com Precedence: bulk X-list: netdev Mr. James W. Laferriere wrote: > If you will look at my orignal post (sent to > kernel first then to netdev & linux-net) Ah. I must have missed the netdev post--I think I'm having subscription issues. My apologies. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com From jeroen@unfix.org Wed Jun 18 08:20:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 08:20:23 -0700 (PDT) Received: from purgatory.unfix.org (postfix@cust.92.136.adsl.cistron.nl [195.64.92.136]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IFKE2x031136 for ; Wed, 18 Jun 2003 08:20:15 -0700 Received: from limbo (limbo.unfix.org [10.100.13.33]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by purgatory.unfix.org (Postfix) with ESMTP id 820EF7FDD; Wed, 18 Jun 2003 17:20:10 +0200 (CEST) From: "Jeroen Massar" To: , "'YOSHIFUJI Hideaki / ????'" Cc: , Subject: RE: (usagi-users 02434) Re: IPv6 bugs introduced in 2.4.21 Date: Wed, 18 Jun 2003 17:20:10 +0200 Organization: Unfix Message-ID: <003401c335ad$1c0cc880$210d640a@unfix.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.3416 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <1055945454.7480.184.camel@slurv.ws.pasop.tomt.net> Importance: Normal X-archive-position: 3393 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jeroen@unfix.org Precedence: bulk X-list: netdev Andre Tomt [mailto:andre@tomt.net] wrote: > On ons, 2003-06-18 at 15:42, YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote: > > In article > <1055793048.24660.160.camel@slurv.ws.pasop.tomt.net> (at 16 > Jun 2003 21:50:48 +0200), Andre Tomt says: > > > > > I mailed you guys a little while ago on the "unable to use > > > SOMENETWORK::0000 as a nexthop gateway" bug in > 2.4.21-pre/rc a while > > > ago. It is still present in 2.4.21, rendering the "first" > /128 of a > > > arbitrary prefixlen unusable - :0000. This is especially > bad with /127 > > > tunnels, rendering :0000 and :0001 unusable). But! There > is one more > > : > > > > This is NOT the bug but by the spec. > > prefix:: is an anycast address, not a unicast; > > you cannot use it like an unicast address. This kind of explains it, though I don't really like the way it was forced upon us without any big notification, then again I didn't read the changelog so it could be there ;) Is there a toggle for turning this behaviour off ? Notez bien that many people use :: and ::1 and ::2 etc as a unicast address. This will force them to stop using those ofcourse unless one simply removes those routes to the lo device, like I did :) > > If yes, you should not use /127 prefix; please use /64 instead. > > No one in their right mind assigns /64's for a linknetwork with two > peers. It's a pointopoint-link. All people I know use either /128 > pointopoint or pointomultipoint semantics (BSD, KAME), or /127's as > Linux refuses to use the traditional pointopoint or peer parameter in > ifconfig and iproute for ipv6. For SixXS we only use /127's on the IPng POP because of the age of the POP. The other POP's all use /64's for 'transitnetworks', the point to point tunnels. Those are a lot of users. The endpoints currently on the IPng POP will not be migrated to use /64's all of a sudden though. > The /127 matches both 2a and 2b, why does it end up at localhost? Routing, remove the route which goes over lo. Greets, Jeroen From chas@locutus.cmf.nrl.navy.mil Wed Jun 18 08:26:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 08:26:30 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IFQH2x031475 for ; Wed, 18 Jun 2003 08:26:17 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5IFQ4sG013407; Wed, 18 Jun 2003 11:26:04 -0400 (EDT) Message-Id: <200306181526.h5IFQ4sG013407@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: [PATCH][ATM][3/3] assorted changes for atm In-reply-to: Your message of "Tue, 17 Jun 2003 10:31:45 PDT." <20030617.103145.26534124.davem@redhat.com> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Wed, 18 Jun 2003 11:24:03 -0400 From: chas williams X-Spam-Score: () hits=-0.3 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3394 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev i has been brought to my attention that my vcc sklist patch didnt take advantage of the new hlist changes. here is an updated version relative to 2.5.72. [atm]: move vcc's to global sk-based linked list # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1332 -> 1.1333 # drivers/atm/he.c 1.14 -> 1.15 # net/atm/atm_misc.c 1.6 -> 1.7 # drivers/atm/eni.c 1.16 -> 1.17 # net/atm/proc.c 1.19 -> 1.20 # net/atm/pvc.c 1.15 -> 1.16 # drivers/atm/idt77252.c 1.16 -> 1.17 # net/atm/lec.c 1.28 -> 1.29 # drivers/atm/atmtcp.c 1.9 -> 1.10 # net/atm/svc.c 1.17 -> 1.18 # net/atm/common.h 1.11 -> 1.12 # net/atm/signaling.c 1.13 -> 1.14 # net/atm/resources.h 1.6 -> 1.7 # net/atm/mpc.c 1.19 -> 1.20 # include/linux/atmdev.h 1.17 -> 1.18 # net/atm/resources.c 1.12 -> 1.13 # net/atm/clip.c 1.16 -> 1.17 # drivers/atm/fore200e.c 1.17 -> 1.18 # net/atm/common.c 1.34 -> 1.35 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/18 chas@relax.cmf.nrl.navy.mil 1.1333 # move vcc's to global sk-based linked list # -------------------------------------------- # diff -Nru a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c --- a/drivers/atm/atmtcp.c Wed Jun 18 11:19:12 2003 +++ b/drivers/atm/atmtcp.c Wed Jun 18 11:19:12 2003 @@ -153,9 +153,10 @@ static int atmtcp_v_ioctl(struct atm_dev *dev,unsigned int cmd,void *arg) { - unsigned long flags; struct atm_cirange ci; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; if (cmd != ATM_SETCIRANGE) return -ENOIOCTLCMD; if (copy_from_user(&ci,(void *) arg,sizeof(ci))) return -EFAULT; @@ -163,14 +164,18 @@ if (ci.vci_bits == ATM_CI_MAX) ci.vci_bits = MAX_VCI_BITS; if (ci.vpi_bits > MAX_VPI_BITS || ci.vpi_bits < 0 || ci.vci_bits > MAX_VCI_BITS || ci.vci_bits < 0) return -EINVAL; - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; if ((vcc->vpi >> ci.vpi_bits) || (vcc->vci >> ci.vci_bits)) { - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EBUSY; } - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); dev->ci_range = ci; return 0; } @@ -233,9 +238,10 @@ static void atmtcp_c_close(struct atm_vcc *vcc) { - unsigned long flags; struct atm_dev *atmtcp_dev; struct atmtcp_dev_data *dev_data; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; atmtcp_dev = (struct atm_dev *) vcc->dev_data; @@ -246,19 +252,24 @@ kfree(dev_data); shutdown_atm_dev(atmtcp_dev); vcc->dev_data = NULL; - spin_lock_irqsave(&atmtcp_dev->lock, flags); - for (walk = atmtcp_dev->vccs; walk; walk = walk->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != atmtcp_dev) + continue; wake_up(&walk->sleep); - spin_unlock_irqrestore(&atmtcp_dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); } static int atmtcp_c_send(struct atm_vcc *vcc,struct sk_buff *skb) { - unsigned long flags; struct atm_dev *dev; struct atmtcp_hdr *hdr; - struct atm_vcc *out_vcc; + struct sock *s; + struct hlist_node *node; + struct atm_vcc *out_vcc = NULL; struct sk_buff *new_skb; int result = 0; @@ -270,13 +281,17 @@ (struct atmtcp_control *) skb->data); goto done; } - spin_lock_irqsave(&dev->lock, flags); - for (out_vcc = dev->vccs; out_vcc; out_vcc = out_vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + out_vcc = atm_sk(s); + if (out_vcc->dev != dev) + continue; if (out_vcc->vpi == ntohs(hdr->vpi) && out_vcc->vci == ntohs(hdr->vci) && out_vcc->qos.rxtp.traffic_class != ATM_NONE) break; - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); if (!out_vcc) { atomic_inc(&vcc->stats->tx_err); goto done; @@ -366,7 +381,7 @@ if (itf != -1) dev = atm_dev_lookup(itf); if (dev) { if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } if (PRIV(dev)->vcc) return -EBUSY; @@ -378,7 +393,8 @@ if (error) return error; } PRIV(dev)->vcc = vcc; - bind_vcc(vcc,&atmtcp_control_dev); + vcc->dev = &atmtcp_control_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); vcc->dev_data = dev; @@ -402,7 +418,7 @@ dev = atm_dev_lookup(itf); if (!dev) return -ENODEV; if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } dev_data = PRIV(dev); @@ -410,7 +426,7 @@ dev_data->persist = 0; if (PRIV(dev)->vcc) return 0; kfree(dev_data); - atm_dev_release(dev); + atm_dev_put(dev); shutdown_atm_dev(dev); return 0; } diff -Nru a/drivers/atm/eni.c b/drivers/atm/eni.c --- a/drivers/atm/eni.c Wed Jun 18 11:19:12 2003 +++ b/drivers/atm/eni.c Wed Jun 18 11:19:12 2003 @@ -1887,10 +1887,11 @@ static int get_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) *vpi = 0; if (*vci == ATM_VCI_ANY) { for (*vci = ATM_NOT_RSV_VCI; *vci < NR_VCI; (*vci)++) { @@ -1898,40 +1899,48 @@ ENI_DEV(vcc->dev)->rx_map[*vci]) continue; if (vcc->qos.txtp.traffic_class != ATM_NONE) { - for (walk = vcc->dev->vccs; walk; - walk = walk->next) + s = NULL; + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) break; - if (walk) continue; + } + if (s) continue; } break; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return *vci == NR_VCI ? -EADDRINUSE : 0; } if (*vci == ATM_VCI_UNSPEC) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (vcc->qos.rxtp.traffic_class != ATM_NONE && ENI_DEV(vcc->dev)->rx_map[*vci]) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } if (vcc->qos.txtp.traffic_class == ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); return 0; } @@ -2139,7 +2148,8 @@ static int eni_proc_read(struct atm_dev *dev,loff_t *pos,char *page) { - unsigned long flags; + struct hlist_node *node; + struct sock *s; static const char *signal[] = { "LOST","unknown","okay" }; struct eni_dev *eni_dev = ENI_DEV(dev); struct atm_vcc *vcc; @@ -2212,11 +2222,15 @@ return sprintf(page,"%10sbacklog %u packets\n","", skb_queue_len(&tx->backlog)); } - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) { - struct eni_vcc *eni_vcc = ENI_VCC(vcc); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct eni_vcc *eni_vcc; int length; + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; + eni_vcc = ENI_VCC(vcc); if (--left) continue; length = sprintf(page,"vcc %4d: ",vcc->vci); if (eni_vcc->rx) { @@ -2231,10 +2245,10 @@ length += sprintf(page+length,"tx[%d], txing %d bytes", eni_vcc->tx->index,eni_vcc->txing); page[length] = '\n'; - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return length+1; } - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); for (i = 0; i < eni_dev->free_len; i++) { struct eni_free *fe = eni_dev->free_list+i; unsigned long offset; diff -Nru a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c --- a/drivers/atm/fore200e.c Wed Jun 18 11:19:12 2003 +++ b/drivers/atm/fore200e.c Wed Jun 18 11:19:12 2003 @@ -1069,18 +1069,23 @@ static struct atm_vcc* fore200e_find_vcc(struct fore200e* fore200e, struct rpd* rpd) { - unsigned long flags; + struct sock *s; struct atm_vcc* vcc; + struct hlist_node *node; - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { - - if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) - break; + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != fore200e->atm_dev) + continue; + if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) { + read_unlock(&vcc_sklist_lock); + return vcc; + } } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); - - return vcc; + read_unlock(&vcc_sklist_lock); + + return NULL; } @@ -1350,20 +1355,26 @@ static int fore200e_walk_vccs(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; struct atm_vcc* walk; + struct sock *s; + struct hlist_node *node; /* find a free VPI */ - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { - for (*vpi = 0, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vpi = 0; +restart_vpi_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + goto restart_vpi_search; } } } @@ -1371,16 +1382,21 @@ /* find a free VCI */ if (*vci == ATM_VCI_ANY) { - for (*vci = ATM_NOT_RSV_VCI, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vci = ATM_NOT_RSV_VCI; +restart_vci_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vpi = *vpi) && (walk->vci == *vci)) { *vci = walk->vci + 1; - walk = vcc->dev->vccs; + goto restart_vci_search; } } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } @@ -2642,7 +2658,8 @@ static int fore200e_proc_read(struct atm_dev *dev,loff_t* pos,char* page) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct fore200e* fore200e = FORE200E_DEV(dev); int len, left = *pos; @@ -2889,8 +2906,12 @@ len = sprintf(page,"\n" " VCCs:\n address\tVPI.VCI:AAL\t(min/max tx PDU size) (min/max rx PDU size)\n"); - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + + if (vcc->dev != fore200e->atm_dev) + continue; fore200e_vcc = FORE200E_VCC(vcc); @@ -2904,7 +2925,7 @@ fore200e_vcc->rx_max_pdu ); } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); + read_unlock(&vcc_sklist_lock); return len; } diff -Nru a/drivers/atm/he.c b/drivers/atm/he.c --- a/drivers/atm/he.c Wed Jun 18 11:19:12 2003 +++ b/drivers/atm/he.c Wed Jun 18 11:19:12 2003 @@ -79,7 +79,6 @@ #include #define USE_TASKLET -#define USE_HE_FIND_VCC #undef USE_SCATTERGATHER #undef USE_CHECKSUM_HW /* still confused about this */ #define USE_RBPS @@ -328,25 +327,25 @@ he_writel_rcm(dev, val, 0x00000 | (cid << 3) | 7) static __inline__ struct atm_vcc* -he_find_vcc(struct he_dev *he_dev, unsigned cid) +__find_vcc(struct he_dev *he_dev, unsigned cid) { - unsigned long flags; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; short vpi; int vci; vpi = cid >> he_dev->vcibits; vci = cid & ((1 << he_dev->vcibits) - 1); - spin_lock_irqsave(&he_dev->atm_dev->lock, flags); - for (vcc = he_dev->atm_dev->vccs; vcc; vcc = vcc->next) - if (vcc->vci == vci && vcc->vpi == vpi - && vcc->qos.rxtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev == he_dev->atm_dev && + vcc->vci == vci && vcc->vpi == vpi && + vcc->qos.rxtp.traffic_class != ATM_NONE) { return vcc; - } - - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + } + } return NULL; } @@ -1566,17 +1565,6 @@ reg |= RX_ENABLE; he_writel(he_dev, reg, RC_CONFIG); -#ifndef USE_HE_FIND_VCC - he_dev->he_vcc_table = kmalloc(sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits)), GFP_KERNEL); - if (he_dev->he_vcc_table == NULL) { - hprintk("failed to alloc he_vcc_table\n"); - return -ENOMEM; - } - memset(he_dev->he_vcc_table, 0, sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits))); -#endif - for (i = 0; i < HE_NUM_CS_STPER; ++i) { he_dev->cs_stper[i].inuse = 0; he_dev->cs_stper[i].pcr = -1; @@ -1712,11 +1700,6 @@ he_dev->tpd_base, he_dev->tpd_base_phys); #endif -#ifndef USE_HE_FIND_VCC - if (he_dev->he_vcc_table) - kfree(he_dev->he_vcc_table); -#endif - if (he_dev->pci_dev) { pci_read_config_word(he_dev->pci_dev, PCI_COMMAND, &command); command &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); @@ -1798,6 +1781,7 @@ int pdus_assembled = 0; int updated = 0; + read_lock(&vcc_sklist_lock); while (he_dev->rbrq_head != rbrq_tail) { ++updated; @@ -1823,13 +1807,10 @@ buf_len = RBRQ_BUFLEN(he_dev->rbrq_head) * 4; cid = RBRQ_CID(he_dev->rbrq_head); -#ifdef USE_HE_FIND_VCC if (cid != lastcid) - vcc = he_find_vcc(he_dev, cid); + vcc = __find_vcc(he_dev, cid); lastcid = cid; -#else - vcc = HE_LOOKUP_VCC(he_dev, cid); -#endif + if (vcc == NULL) { hprintk("vcc == NULL (cid 0x%x)\n", cid); if (!RBRQ_HBUF_ERR(he_dev->rbrq_head)) @@ -1966,6 +1947,7 @@ RBRQ_MASK(++he_dev->rbrq_head)); } + read_unlock(&vcc_sklist_lock); if (updated) { if (updated > he_dev->rbrq_peak) @@ -2565,10 +2547,6 @@ #endif spin_unlock_irqrestore(&he_dev->global_lock, flags); - -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = vcc; -#endif } open_failed: @@ -2634,9 +2612,6 @@ if (timeout == 0) hprintk("close rx timeout cid 0x%x\n", cid); -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = NULL; -#endif HPRINTK("close rx cid 0x%x complete\n", cid); } diff -Nru a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c --- a/drivers/atm/idt77252.c Wed Jun 18 11:19:12 2003 +++ b/drivers/atm/idt77252.c Wed Jun 18 11:19:12 2003 @@ -2403,37 +2403,43 @@ static int idt77252_find_vcc(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; + struct sock *s; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { *vpi = 0; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } if (*vci == ATM_VCI_ANY) { *vci = ATM_NOT_RSV_VCI; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vci)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Wed Jun 18 11:19:12 2003 +++ b/include/linux/atmdev.h Wed Jun 18 11:19:12 2003 @@ -293,7 +293,6 @@ struct k_atm_aal_stats *stats; /* pointer to AAL stats group */ wait_queue_head_t sleep; /* if socket is busy */ struct sock *sk; /* socket backpointer */ - struct atm_vcc *prev,*next; /* SVC part --- may move later ------------------------------------- */ short itf; /* interface number */ struct sockaddr_atmsvc local; @@ -320,8 +319,6 @@ /* (NULL) */ const char *type; /* device type name */ int number; /* device index */ - struct atm_vcc *vccs; /* VCC table (or NULL) */ - struct atm_vcc *last; /* last VCC (or undefined) */ void *dev_data; /* per-device data */ void *phy_data; /* private PHY date */ unsigned long flags; /* device flags (ATM_DF_*) */ @@ -390,6 +387,9 @@ unsigned long atm_options; /* ATM layer options */ }; +extern struct hlist_head vcc_sklist; +extern rwlock_t vcc_sklist_lock; + #define ATM_SKB(skb) (((struct atm_skb_data *) (skb)->cb)) struct atm_dev *atm_dev_register(const char *type,const struct atmdev_ops *ops, @@ -397,7 +397,8 @@ struct atm_dev *atm_dev_lookup(int number); void atm_dev_deregister(struct atm_dev *dev); void shutdown_atm_dev(struct atm_dev *dev); -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev); +void vcc_insert_socket(struct sock *sk); +void vcc_remove_socket(struct sock *sk); /* @@ -436,7 +437,7 @@ } -static inline void atm_dev_release(struct atm_dev *dev) +static inline void atm_dev_put(struct atm_dev *dev) { atomic_dec(&dev->refcnt); diff -Nru a/net/atm/atm_misc.c b/net/atm/atm_misc.c --- a/net/atm/atm_misc.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/atm_misc.c Wed Jun 18 11:19:12 2003 @@ -47,15 +47,21 @@ static int check_ci(struct atm_vcc *vcc,short vpi,int vci) { + struct hlist_node *node; + struct sock *s; struct atm_vcc *walk; - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vpi == vpi && walk->vci == vci && ((walk->qos.txtp.traffic_class != ATM_NONE && vcc->qos.txtp.traffic_class != ATM_NONE) || (walk->qos.rxtp.traffic_class != ATM_NONE && vcc->qos.rxtp.traffic_class != ATM_NONE))) return -EADDRINUSE; + } /* allow VCCs with same VPI/VCI iff they don't collide on TX/RX (but we may refuse such sharing for other reasons, e.g. if protocol requires to have both channels) */ @@ -65,17 +71,16 @@ int atm_find_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; static short p = 0; /* poor man's per-device cache */ static int c = 0; short old_p; int old_c; int err; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi != ATM_VPI_ANY && *vci != ATM_VCI_ANY) { err = check_ci(vcc,*vpi,*vci); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return err; } /* last scan may have left values out of bounds for current device */ @@ -90,7 +95,7 @@ if (!check_ci(vcc,p,c)) { *vpi = p; *vci = c; - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (*vci == ATM_VCI_ANY) { @@ -105,7 +110,7 @@ } } while (old_p != p || old_c != c); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/clip.c Wed Jun 18 11:19:12 2003 @@ -737,7 +737,8 @@ set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); /* allow replies and avoid getting closed if signaling dies */ - bind_vcc(vcc,&atmarpd_dev); + vcc->dev = &atmarpd_dev; + vcc_insert_socket(vcc->sk); vcc->push = NULL; vcc->pop = NULL; /* crash */ vcc->push_oam = NULL; /* crash */ diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/common.c Wed Jun 18 11:19:12 2003 @@ -157,6 +157,29 @@ #endif +HLIST_HEAD(vcc_sklist); +rwlock_t vcc_sklist_lock = RW_LOCK_UNLOCKED; + +void __vcc_insert_socket(struct sock *sk) +{ + sk_add_node(sk, &vcc_sklist); +} + +void vcc_insert_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + __vcc_insert_socket(sk); + write_unlock_irq(&vcc_sklist_lock); +} + +void vcc_remove_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + sk_del_node_init(sk); + write_unlock_irq(&vcc_sklist_lock); +} + + static struct sk_buff *alloc_tx(struct atm_vcc *vcc,unsigned int size) { struct sk_buff *skb; @@ -175,16 +198,45 @@ } -int atm_create(struct socket *sock,int protocol,int family) +EXPORT_SYMBOL(vcc_sklist); +EXPORT_SYMBOL(vcc_sklist_lock); +EXPORT_SYMBOL(vcc_insert_socket); +EXPORT_SYMBOL(vcc_remove_socket); + +static void vcc_sock_destruct(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + if (atomic_read(&vcc->sk->sk_rmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: rmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_rmem_alloc)); + + if (atomic_read(&vcc->sk->sk_wmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: wmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_wmem_alloc)); + + kfree(sk->sk_protinfo); +} + +int vcc_create(struct socket *sock, int protocol, int family) { struct sock *sk; struct atm_vcc *vcc; sock->sk = NULL; - if (sock->type == SOCK_STREAM) return -EINVAL; - if (!(sk = alloc_atm_vcc_sk(family))) return -ENOMEM; - vcc = atm_sk(sk); - memset(&vcc->flags,0,sizeof(vcc->flags)); + if (sock->type == SOCK_STREAM) + return -EINVAL; + sk = sk_alloc(family, GFP_KERNEL, 1, NULL); + if (!sk) + return -ENOMEM; + sock_init_data(NULL, sk); + + vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); + if (!vcc) { + sk_free(sk); + return -ENOMEM; + } + + memset(vcc, 0, sizeof(*vcc)); + vcc->sk = sk; vcc->dev = NULL; vcc->callback = NULL; memset(&vcc->local,0,sizeof(struct sockaddr_atmsvc)); @@ -199,42 +251,49 @@ vcc->atm_options = vcc->aal_options = 0; init_waitqueue_head(&vcc->sleep); sk->sk_sleep = &vcc->sleep; + sk->sk_destruct = vcc_sock_destruct; sock->sk = sk; return 0; } -void atm_release_vcc_sk(struct sock *sk,int free_sk) +static void vcc_destroy_socket(struct sock *sk) { struct atm_vcc *vcc = atm_sk(sk); struct sk_buff *skb; - clear_bit(ATM_VF_READY,&vcc->flags); + clear_bit(ATM_VF_READY, &vcc->flags); if (vcc->dev) { - if (vcc->dev->ops->close) vcc->dev->ops->close(vcc); - if (vcc->push) vcc->push(vcc,NULL); /* atmarpd has no push */ + if (vcc->dev->ops->close) + vcc->dev->ops->close(vcc); + if (vcc->push) + vcc->push(vcc, NULL); /* atmarpd has no push */ + + vcc_remove_socket(sk); /* no more receive */ + while ((skb = skb_dequeue(&vcc->sk->sk_receive_queue))) { atm_return(vcc,skb->truesize); kfree_skb(skb); } module_put(vcc->dev->ops->owner); - atm_dev_release(vcc->dev); - if (atomic_read(&vcc->sk->sk_rmem_alloc)) - printk(KERN_WARNING "atm_release_vcc: strange ... " - "rmem_alloc == %d after closing\n", - atomic_read(&vcc->sk->sk_rmem_alloc)); - bind_vcc(vcc,NULL); + atm_dev_put(vcc->dev); } - - if (free_sk) free_atm_vcc_sk(sk); } -int atm_release(struct socket *sock) +int vcc_release(struct socket *sock) { - if (sock->sk) - atm_release_vcc_sk(sock->sk,1); + struct sock *sk = sock->sk; + + if (sk) { + sock_orphan(sk); + lock_sock(sk); + vcc_destroy_socket(sock->sk); + release_sock(sk); + sock_put(sk); + } + return 0; } @@ -289,7 +348,8 @@ if (vci > 0 && vci < ATM_NOT_RSV_VCI && !capable(CAP_NET_BIND_SERVICE)) return -EPERM; error = 0; - bind_vcc(vcc,dev); + vcc->dev = dev; + vcc_insert_socket(vcc->sk); switch (vcc->qos.aal) { case ATM_AAL0: error = atm_init_aal0(vcc); @@ -313,7 +373,7 @@ if (!error) error = adjust_tp(&vcc->qos.txtp,vcc->qos.aal); if (!error) error = adjust_tp(&vcc->qos.rxtp,vcc->qos.aal); if (error) { - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } DPRINTK("VCC %d.%d, AAL %d\n",vpi,vci,vcc->qos.aal); @@ -327,7 +387,7 @@ error = dev->ops->open(vcc,vpi,vci); if (error) { module_put(dev->ops->owner); - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } } @@ -371,7 +431,7 @@ dev = atm_dev_lookup(itf); error = __vcc_connect(vcc, dev, vpi, vci); if (error) { - atm_dev_release(dev); + atm_dev_put(dev); return error; } } else { @@ -385,7 +445,7 @@ spin_unlock(&atm_dev_lock); if (!__vcc_connect(vcc, dev, vpi, vci)) break; - atm_dev_release(dev); + atm_dev_put(dev); dev = NULL; spin_lock(&atm_dev_lock); } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Wed Jun 18 11:19:12 2003 +++ b/net/atm/common.h Wed Jun 18 11:19:12 2003 @@ -10,8 +10,8 @@ #include /* for poll_table */ -int atm_create(struct socket *sock,int protocol,int family); -int atm_release(struct socket *sock); +int vcc_create(struct socket *sock, int protocol, int family); +int vcc_release(struct socket *sock); int vcc_connect(struct socket *sock, int itf, short vpi, int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); @@ -24,7 +24,6 @@ int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen); -void atm_release_vcc_sk(struct sock *sk,int free_sk); void atm_shutdown_dev(struct atm_dev *dev); int atmpvc_init(void); diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/lec.c Wed Jun 18 11:19:12 2003 @@ -48,7 +48,7 @@ #include "lec.h" #include "lec_arpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" #if 0 #define DPRINTK printk @@ -810,7 +810,8 @@ lec_arp_init(priv); priv->itfnum = i; /* LANE2 addition */ priv->lecd = vcc; - bind_vcc(vcc, &lecatm_dev); + vcc->dev = &lecatm_dev; + vcc_insert_socket(vcc->sk); vcc->proto_data = dev_lec[i]; set_bit(ATM_VF_META,&vcc->flags); diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/mpc.c Wed Jun 18 11:19:12 2003 @@ -28,7 +28,7 @@ #include "lec.h" #include "mpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" /* * mpc.c: Implementation of MPOA client kernel part @@ -789,7 +789,8 @@ } mpc->mpoad_vcc = vcc; - bind_vcc(vcc, &mpc_dev); + vcc->dev = &mpc_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); diff -Nru a/net/atm/proc.c b/net/atm/proc.c --- a/net/atm/proc.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/proc.c Wed Jun 18 11:19:12 2003 @@ -334,9 +334,8 @@ static int atm_pvc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left, clip_info = 0; @@ -349,25 +348,20 @@ if (try_atm_clip_ops()) clip_info = 1; #endif - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMPVC && - vcc->dev && !left--) { - pvc_info(vcc,buf,clip_info); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMPVC && vcc->dev && !left--) { + pvc_info(vcc,buf,clip_info); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) - if (clip_info) - module_put(atm_clip_ops->owner); + if (clip_info) + module_put(atm_clip_ops->owner); #endif - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) if (clip_info) module_put(atm_clip_ops->owner); @@ -378,10 +372,9 @@ static int atm_vc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; int left; if (!pos) @@ -389,20 +382,16 @@ "Address"," Itf VPI VCI Fam Flags Reply Send buffer" " Recv buffer\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (!left--) { - vc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (!left--) { + vc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } @@ -410,29 +399,24 @@ static int atm_svc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left; if (!pos) return sprintf(buf,"Itf VPI VCI State Remote\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMSVC && !left--) { - svc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMSVC && !left--) { + svc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/pvc.c Wed Jun 18 11:19:12 2003 @@ -17,10 +17,6 @@ #include "resources.h" /* devs and vccs */ #include "common.h" /* common for PVCs and SVCs */ -#ifndef NULL -#define NULL 0 -#endif - static int pvc_shutdown(struct socket *sock,int how) { @@ -109,7 +105,7 @@ static struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, - .release = atm_release, + .release = vcc_release, .bind = pvc_bind, .connect = pvc_connect, .socketpair = sock_no_socketpair, @@ -131,7 +127,7 @@ static int pvc_create(struct socket *sock,int protocol) { sock->ops = &pvc_proto_ops; - return atm_create(sock,protocol,PF_ATMPVC); + return vcc_create(sock, protocol, PF_ATMPVC); } diff -Nru a/net/atm/resources.c b/net/atm/resources.c --- a/net/atm/resources.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/resources.c Wed Jun 18 11:19:12 2003 @@ -23,11 +23,6 @@ #include "addr.h" -#ifndef NULL -#define NULL 0 -#endif - - LIST_HEAD(atm_devs); spinlock_t atm_dev_lock = SPIN_LOCK_UNLOCKED; @@ -91,7 +86,7 @@ spin_lock(&atm_dev_lock); if (number != -1) { if ((inuse = __atm_dev_lookup(number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); spin_unlock(&atm_dev_lock); __free_atm_dev(dev); return NULL; @@ -100,7 +95,7 @@ } else { dev->number = 0; while ((inuse = __atm_dev_lookup(dev->number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); dev->number++; } } @@ -402,78 +397,12 @@ else error = 0; done: - atm_dev_release(dev); + atm_dev_put(dev); return error; } -struct sock *alloc_atm_vcc_sk(int family) -{ - struct sock *sk; - struct atm_vcc *vcc; - - sk = sk_alloc(family, GFP_KERNEL, 1, NULL); - if (!sk) - return NULL; - vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); - if (!vcc) { - sk_free(sk); - return NULL; - } - sock_init_data(NULL, sk); - memset(vcc, 0, sizeof(*vcc)); - vcc->sk = sk; - - return sk; -} - - -static void unlink_vcc(struct atm_vcc *vcc) -{ - unsigned long flags; - if (vcc->dev) { - spin_lock_irqsave(&vcc->dev->lock, flags); - if (vcc->prev) - vcc->prev->next = vcc->next; - else - vcc->dev->vccs = vcc->next; - - if (vcc->next) - vcc->next->prev = vcc->prev; - else - vcc->dev->last = vcc->prev; - spin_unlock_irqrestore(&vcc->dev->lock, flags); - } -} - - -void free_atm_vcc_sk(struct sock *sk) -{ - unlink_vcc(atm_sk(sk)); - sk_free(sk); -} - -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev) -{ - unsigned long flags; - - unlink_vcc(vcc); - vcc->dev = dev; - if (dev) { - spin_lock_irqsave(&dev->lock, flags); - vcc->next = NULL; - vcc->prev = dev->last; - if (dev->vccs) - dev->last->next = vcc; - else - dev->vccs = vcc; - dev->last = vcc; - spin_unlock_irqrestore(&dev->lock, flags); - } -} - EXPORT_SYMBOL(atm_dev_register); EXPORT_SYMBOL(atm_dev_deregister); EXPORT_SYMBOL(atm_dev_lookup); EXPORT_SYMBOL(shutdown_atm_dev); -EXPORT_SYMBOL(bind_vcc); diff -Nru a/net/atm/resources.h b/net/atm/resources.h --- a/net/atm/resources.h Wed Jun 18 11:19:12 2003 +++ b/net/atm/resources.h Wed Jun 18 11:19:12 2003 @@ -14,8 +14,6 @@ extern spinlock_t atm_dev_lock; -struct sock *alloc_atm_vcc_sk(int family); -void free_atm_vcc_sk(struct sock *sk); int atm_dev_ioctl(unsigned int cmd, unsigned long arg); diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/signaling.c Wed Jun 18 11:19:12 2003 @@ -200,26 +200,22 @@ } -static void purge_vccs(struct atm_vcc *vcc) +static void purge_vcc(struct atm_vcc *vcc) { - while (vcc) { - if (vcc->sk->sk_family == PF_ATMSVC && - !test_bit(ATM_VF_META,&vcc->flags)) { - set_bit(ATM_VF_RELEASED,&vcc->flags); - vcc->reply = -EUNATCH; - vcc->sk->sk_err = EUNATCH; - wake_up(&vcc->sleep); - } - vcc = vcc->next; + if (vcc->sk->sk_family == PF_ATMSVC && + !test_bit(ATM_VF_META,&vcc->flags)) { + set_bit(ATM_VF_RELEASED,&vcc->flags); + vcc->reply = -EUNATCH; + vcc->sk->sk_err = EUNATCH; + wake_up(&vcc->sleep); } } static void sigd_close(struct atm_vcc *vcc) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; DPRINTK("sigd_close\n"); sigd = NULL; @@ -227,14 +223,14 @@ printk(KERN_ERR "sigd_close: closing with requests pending\n"); skb_queue_purge(&vcc->sk->sk_receive_queue); - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - purge_vccs(dev->vccs); - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct atm_vcc *vcc = atm_sk(s); + + if (vcc->dev) + purge_vcc(vcc); } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); } @@ -257,7 +253,8 @@ if (sigd) return -EADDRINUSE; DPRINTK("sigd_attach\n"); sigd = vcc; - bind_vcc(vcc,&sigd_dev); + vcc->dev = &sigd_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); wake_up(&sigd_sleep); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Wed Jun 18 11:19:12 2003 +++ b/net/atm/svc.c Wed Jun 18 11:19:12 2003 @@ -88,18 +88,21 @@ static int svc_release(struct socket *sock) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; - if (!sock->sk) return 0; - vcc = ATM_SD(sock); - DPRINTK("svc_release %p\n",vcc); - clear_bit(ATM_VF_READY,&vcc->flags); - atm_release_vcc_sk(sock->sk,0); - svc_disconnect(vcc); - /* VCC pointer is used as a reference, so we must not free it - (thereby subjecting it to re-use) before all pending connections - are closed */ - free_atm_vcc_sk(sock->sk); + if (sk) { + vcc = ATM_SD(sock); + DPRINTK("svc_release %p\n", vcc); + clear_bit(ATM_VF_READY, &vcc->flags); + /* VCC pointer is used as a reference, so we must not free it + (thereby subjecting it to re-use) before all pending connections + are closed */ + sock_hold(sk); + vcc_release(sock); + svc_disconnect(vcc); + sock_put(sk); + } return 0; } @@ -542,7 +545,7 @@ int error; sock->ops = &svc_proto_ops; - error = atm_create(sock,protocol,AF_ATMSVC); + error = vcc_create(sock, protocol, AF_ATMSVC); if (error) return error; ATM_SD(sock)->callback = svc_callback; ATM_SD(sock)->local.sas_family = AF_ATMSVC; From linux-netdev@gmane.org Wed Jun 18 08:33:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 08:33:39 -0700 (PDT) Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IFXW2x031822 for ; Wed, 18 Jun 2003 08:33:34 -0700 Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19SewC-0006Mn-00 for ; Wed, 18 Jun 2003 17:33:28 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: netdev@oss.sgi.com Received: from news by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19Sew9-0006M4-00 for ; Wed, 18 Jun 2003 17:33:25 +0200 From: Jason Lunz Subject: Re: e100-3.0.0_dev8 "Minneapolis Moline" release Date: Wed, 18 Jun 2003 15:33:25 +0000 (UTC) Organization: PBR Streetgang Lines: 97 Message-ID: References: X-Complaints-To: usenet@main.gmane.org User-Agent: slrn/0.9.7.4 (Linux) X-archive-position: 3395 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lunz@falooley.org Precedence: bulk X-list: netdev scott.feldman@intel.com said: > Also, any feedback on the code correctness, maintainability, credits, > etc. would help. Is the anonymous union in struct cb some kind of gcc3-only thing? It doesn't compile for my debian woody gcc 2.95 unless I name the union member. -- Jason Lunz Reflex Security lunz@reflexsecurity.com http://www.reflexsecurity.com/ --- e100.c Tue Jun 17 08:51:45 2003 +++ e100.c.union Wed Jun 18 11:27:45 2003 @@ -434,7 +434,7 @@ u16 eol; } tbd; } tcb; - }; + } data; struct cb *next, *prev; dma_addr_t dma_addr; struct sk_buff *skb; @@ -867,7 +867,7 @@ static void e100_configure(struct nic *nic, struct cb *cb, struct sk_buff *skb) { - struct config *config = &cb->config; + struct config *config = &cb->data.config; u8 *c = (u8 *)config; cb->command = cpu_to_le16(cb_config); @@ -942,7 +942,7 @@ dev_addr[3], dev_addr[4], dev_addr[5]); cb->command = cpu_to_le16(cb_iaaddr); - memcpy(cb->iaaddr, dev_addr, ETH_ALEN); + memcpy(cb->data.iaaddr, dev_addr, ETH_ALEN); } #define NCONFIG_AUTO_SWITCH 0x0080 @@ -1051,9 +1051,9 @@ u16 i, count = min(netdev->mc_count, E100_MAX_MULTICAST_ADDRS); cb->command = cpu_to_le16(cb_multi); - cb->multi.count = cpu_to_le16(count * ETH_ALEN); + cb->data.multi.count = cpu_to_le16(count * ETH_ALEN); for(i = 0; list && i < count; i++, list = list->next) - memcpy(&cb->multi.addr[i*ETH_ALEN], &list->dmi_addr, ETH_ALEN); + memcpy(&cb->data.multi.addr[i*ETH_ALEN], &list->dmi_addr, ETH_ALEN); } static void e100_set_multicast_list(struct net_device *netdev) @@ -1175,13 +1175,13 @@ struct sk_buff *skb) { cb->command = nic->tx_command; - cb->tcb.tbd_array = cb->dma_addr + offsetof(struct cb, tcb.tbd); - cb->tcb.tcb_byte_count = 0; - cb->tcb.threshold = nic->tx_threshold; - cb->tcb.tbd_count = 1; - cb->tcb.tbd.buf_addr = cpu_to_le32(pci_map_single(nic->pdev, + cb->data.tcb.tbd_array = cb->dma_addr + offsetof(struct cb, data.tcb.tbd); + cb->data.tcb.tcb_byte_count = 0; + cb->data.tcb.threshold = nic->tx_threshold; + cb->data.tcb.tbd_count = 1; + cb->data.tcb.tbd.buf_addr = cpu_to_le32(pci_map_single(nic->pdev, skb->data, skb->len, PCI_DMA_TODEVICE)); - cb->tcb.tbd.size = cpu_to_le16(skb->len); + cb->data.tcb.tbd.size = cpu_to_le16(skb->len); } static int e100_xmit_frame(struct sk_buff *skb, struct net_device *netdev) @@ -1217,8 +1217,8 @@ nic->net_stats.tx_bytes += cb->skb->len; pci_unmap_single(nic->pdev, - le32_to_cpu(cb->tcb.tbd.buf_addr), - le16_to_cpu(cb->tcb.tbd.size), + le32_to_cpu(cb->data.tcb.tbd.buf_addr), + le16_to_cpu(cb->data.tcb.tbd.size), PCI_DMA_TODEVICE); dev_kfree_skb_irq(cb->skb); tx_cleaned = 1; @@ -1241,8 +1241,8 @@ struct cb *cb = nic->cb_to_clean; if(cb->skb) { pci_unmap_single(nic->pdev, - le32_to_cpu(cb->tcb.tbd.buf_addr), - le16_to_cpu(cb->tcb.tbd.size), + le32_to_cpu(cb->data.tcb.tbd.buf_addr), + le16_to_cpu(cb->data.tcb.tbd.size), PCI_DMA_TODEVICE); dev_kfree_skb_irq(cb->skb); } From andi@averellmail.firstfloor.org Wed Jun 18 08:44:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 08:44:23 -0700 (PDT) Received: from zero.aec.at (Martin.Bulstrode@zero.aec.at [193.170.194.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IFiH2x032229 for ; Wed, 18 Jun 2003 08:44:17 -0700 Received: from fred.muc.de (Conmore.Apel.Brune@localhost.localdomain [127.0.0.1]) by zero.aec.at (8.11.6/8.11.2) with ESMTP id h5IB9jm27716; Wed, 18 Jun 2003 13:09:45 +0200 Received: by fred.muc.de (Postfix on SuSE Linux 7.3 (i386), from userid 500) id 791295BBBD; Wed, 18 Jun 2003 13:09:46 +0200 (CEST) Date: Wed, 18 Jun 2003 13:09:46 +0200 From: Andi Kleen To: netdev@oss.sgi.com Cc: jgarzik@pobox.com Subject: [PATCH] Remove copied inet_aton code in bond_main.c Message-ID: <20030618110946.GA6851@averell> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 3396 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev According to a report the my_inet_aton code in bond_main.c is copied from 4.4BSD, but it doesn't carry a BSD copyright license. In addition it is somewhat redundant with the standard in_aton. Convert it to use the linux function. Error handling is a bit worse than before, but not much. Patch for 2.5 bonding. The 2.4 version has the same problem, but afaik it is scheduled to be replaced by the 2.5 codebase anyways. -Andi --- linux/drivers/net/bonding/bond_main.c-o 2003-06-14 23:42:51.000000000 +0200 +++ linux/drivers/net/bonding/bond_main.c 2003-06-18 13:03:04.000000000 +0200 @@ -390,6 +390,7 @@ #include #include #include +#include #include #include #include @@ -2797,84 +2798,6 @@ mod_timer(&bond->arp_timer, next_timer); } -typedef uint32_t in_addr_t; - -int -my_inet_aton(char *cp, unsigned long *the_addr) { - static const in_addr_t max[4] = { 0xffffffff, 0xffffff, 0xffff, 0xff }; - in_addr_t val; - char c; - union iaddr { - uint8_t bytes[4]; - uint32_t word; - } res; - uint8_t *pp = res.bytes; - int digit,base; - - res.word = 0; - - c = *cp; - for (;;) { - /* - * Collect number up to ``.''. - * Values are specified as for C: - * 0x=hex, 0=octal, isdigit=decimal. - */ - if (!isdigit(c)) goto ret_0; - val = 0; base = 10; digit = 0; - for (;;) { - if (isdigit(c)) { - val = (val * base) + (c - '0'); - c = *++cp; - digit = 1; - } else { - break; - } - } - if (c == '.') { - /* - * Internet format: - * a.b.c.d - * a.b.c (with c treated as 16 bits) - * a.b (with b treated as 24 bits) - */ - if (pp > res.bytes + 2 || val > 0xff) { - goto ret_0; - } - *pp++ = val; - c = *++cp; - } else - break; - } - /* - * Check for trailing characters. - */ - if (c != '\0' && (!isascii(c) || !isspace(c))) { - goto ret_0; - } - /* - * Did we get a valid digit? - */ - if (!digit) { - goto ret_0; - } - - /* Check whether the last part is in its limits depending on - the number of parts in total. */ - if (val > max[pp - res.bytes]) { - goto ret_0; - } - - if (the_addr != NULL) { - *the_addr = res.word | htonl (val); - } - - return (1); - -ret_0: - return (0); -} - static int bond_sethwaddr(struct net_device *master, struct net_device *slave) { #ifdef BONDING_DEBUG @@ -3958,15 +3881,18 @@ for (arp_ip_count=0 ; (arp_ip_count < MAX_ARP_IP_TARGETS) && arp_ip_target[arp_ip_count]; arp_ip_count++ ) { - /* TODO: check and log bad ip address */ - if (my_inet_aton(arp_ip_target[arp_ip_count], - &arp_target[arp_ip_count]) == 0) { + /* not complete check, but should be good enough to + catch mistakes */ + if (!isdigit(arp_ip_target[arp_ip_count][0])) { printk(KERN_WARNING "bonding_init(): bad arp_ip_target module " "parameter (%s), ARP monitoring will not be " "performed\n", arp_ip_target[arp_ip_count]); arp_interval = 0; + } else { + u32 ip = in_aton(arp_ip_target[arp_ip_count]); + *(u32 *)(arp_ip_target[arp_ip_count]) = ip; } } From yoshfuji@linux-ipv6.org Wed Jun 18 09:06:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 09:06:57 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IG6j2x000588 for ; Wed, 18 Jun 2003 09:06:46 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5IG7rBo015267; Thu, 19 Jun 2003 01:07:53 +0900 Date: Thu, 19 Jun 2003 01:07:53 +0900 (JST) Message-Id: <20030619.010753.75976590.yoshfuji@linux-ipv6.org> To: jeroen@unfix.org Cc: usagi-users@linux-ipv6.org, netdev@oss.sgi.com, info@sixxs.net Subject: Re: (usagi-users 02434) Re: IPv6 bugs introduced in 2.4.21 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <003401c335ad$1c0cc880$210d640a@unfix.org> References: <1055945454.7480.184.camel@slurv.ws.pasop.tomt.net> <003401c335ad$1c0cc880$210d640a@unfix.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3397 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <003401c335ad$1c0cc880$210d640a@unfix.org> (at Wed, 18 Jun 2003 17:20:10 +0200), "Jeroen Massar" says: > Is there a toggle for turning this behaviour off ? Switch forwarding off. Routers are REQUIRED to be enabled. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From linux-netdev@gmane.org Wed Jun 18 09:09:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 09:09:19 -0700 (PDT) Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IG992x000960 for ; Wed, 18 Jun 2003 09:09:10 -0700 Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19SfQ0-0000ZA-00 for ; Wed, 18 Jun 2003 18:04:16 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: netdev@oss.sgi.com Received: from news by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19SfGa-00083J-00 for ; Wed, 18 Jun 2003 17:54:32 +0200 From: Jason Lunz Subject: Re: e100-3.0.0_dev8 "Minneapolis Moline" release Date: Wed, 18 Jun 2003 15:54:32 +0000 (UTC) Organization: PBR Streetgang Lines: 163 Message-ID: References: X-Complaints-To: usenet@main.gmane.org User-Agent: slrn/0.9.7.4 (Linux) X-archive-position: 3398 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lunz@reflexsecurity.com Precedence: bulk X-list: netdev scott.feldman@intel.com said: > Your help in testing would be greatly appreciated. There are many 8255x > devices supported by e100, so hopefully we'll get good coverage from the > community. It's running on my workstation with no problems. passes the "ping -qf" load test. lspci says I have: 00:0b.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter Flags: bus master, medium devsel, latency 32, IRQ 11 Memory at d8100000 (32-bit, non-prefetchable) [size=4K] I/O ports at e800 [size=64] Memory at d8000000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at [disabled] [size=1M] Capabilities: [dc] Power Management version 2 ethtool output seems normal, but what's with the rx-mini and rx-jumbo max settings in the -g output? [stoli](0) # ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Half Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x00000007 (7) Link detected: yes [stoli](0) # ethtool -a eth0 Pause parameters for eth0: Cannot get device pause settings: Operation not supported [stoli](76) # ethtool -c eth0 Coalesce parameters for eth0: Cannot get device coalesce settings: Operation not supported [stoli](82) # ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 256 RX Mini: 134512692 RX Jumbo: 134512692 TX: 256 Current hardware settings: RX: 64 RX Mini: 5 RX Jumbo: 4 TX: 64 [stoli](0) # ethtool -i eth0 driver: e100 version: 3.0.0_dev8 firmware-version: N/A bus-info: 00:0b.0 [stoli](0) # ethtool -d eth0 SCB Status and Command Word ------------- SCB Status Word (Lower Word) 0x0050 RU Status: Ready CU Status: Suspended ---- Interrupts Pending ---- Flow Control Pause: no Early Receive: no Software Generated Interrupt: no MDI Done: no RU Not In Ready State: no CU Not in Active State: no RU Received Frame: no CU Completed Command: no SCB Command Word (Upper Word) 0x0000 RU Command: No Command CU Command: No Command Software Generated Interrupt: no ---- Interrupts Masked ---- ALL Interrupts: no Flow Control Pause: no Early Receive: no RU Not In Ready State: no CU Not in Active State: no RU Received Frame: no CU Completed Command: no [stoli](0) # ethtool -e eth0 Offset Values ------ ------ 0x0000 00 02 b3 11 b9 e0 03 02 00 00 01 02 01 47 00 00 0x0010 13 72 10 83 a2 40 0c 00 86 80 00 00 00 00 00 00 0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060 a8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a c3 [stoli](0) # ethtool -k eth0 RX/TX checksumming parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported no checksumming/SG info available [stoli](83) # ethtool -p eth0 [stoli](130) # ethtool -r eth0 [stoli](0) # ethtool -S eth0 NIC statistics: rx_packets: 1410143 tx_packets: 1410029 rx_bytes: 138271433 tx_bytes: 138179379 rx_errors: 41 tx_errors: 0 rx_dropped: 0 tx_dropped: 0 multicast: 0 collisions: 136914 rx_length_errors: 41 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 0 rx_missed_errors: 0 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 [stoli](0) # ethtool -t eth0 online The test result is PASS The test extra info: Link test (on/offline) 0 Eeprom test (on/offline) 0 Self test (offline) 0 Mac loopback (offline) 0 Phy loopback (offline) 0 [stoli](0) # ethtool -t eth0 offline The test result is PASS The test extra info: Link test (on/offline) 0 Eeprom test (on/offline) 0 Self test (offline) 0 Mac loopback (offline) 0 Phy loopback (offline) 0 -- Jason Lunz Reflex Security lunz@reflexsecurity.com http://www.reflexsecurity.com/ From scott.feldman@intel.com Wed Jun 18 09:39:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 09:40:03 -0700 (PDT) Received: from caduceus.jf.intel.com (fmr06.intel.com [134.134.136.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IGdu2x001657 for ; Wed, 18 Jun 2003 09:39:57 -0700 Received: from petasus.jf.intel.com (petasus.jf.intel.com [10.7.209.6]) by caduceus.jf.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5IGYOV20200 for ; Wed, 18 Jun 2003 16:34:24 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by petasus.jf.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5IGZFm15727 for ; Wed, 18 Jun 2003 16:35:15 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061809503412559 ; Wed, 18 Jun 2003 09:50:34 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 18 Jun 2003 09:39:49 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e100-3.0.0_dev8 "Minneapolis Moline" release Date: Wed, 18 Jun 2003 09:39:49 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e100-3.0.0_dev8 "Minneapolis Moline" release Thread-Index: AcM1r1kSxc0wJiBwTrija/GV48D6wAACHdLw From: "Feldman, Scott" To: "Jason Lunz" , X-OriginalArrivalTime: 18 Jun 2003 16:39:49.0992 (UTC) FILETIME=[3C639680:01C335B8] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5IGdu2x001657 X-archive-position: 3399 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > Is the anonymous union in struct cb some kind of gcc3-only > thing? It doesn't compile for my debian woody gcc 2.95 unless > I name the union member. I'm not sure, but it's easy enough to make it known to avoid the issue. I'll do that. -scott From scott.feldman@intel.com Wed Jun 18 09:52:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 09:52:29 -0700 (PDT) Received: from caduceus.jf.intel.com (fmr06.intel.com [134.134.136.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IGqM2x002145 for ; Wed, 18 Jun 2003 09:52:22 -0700 Received: from petasus.jf.intel.com (petasus.jf.intel.com [10.7.209.6]) by caduceus.jf.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5IGkpV29827 for ; Wed, 18 Jun 2003 16:46:51 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by petasus.jf.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5IGlgm25915 for ; Wed, 18 Jun 2003 16:47:42 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061810030005417 ; Wed, 18 Jun 2003 10:03:01 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 18 Jun 2003 09:52:16 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e100-3.0.0_dev8 "Minneapolis Moline" release Date: Wed, 18 Jun 2003 09:52:15 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e100-3.0.0_dev8 "Minneapolis Moline" release Thread-Index: AcM1tE358eiRVYN0TjCMoVriszGE4gABS+0g From: "Feldman, Scott" To: "Jason Lunz" , X-OriginalArrivalTime: 18 Jun 2003 16:52:16.0221 (UTC) FILETIME=[F92D18D0:01C335B9] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5IGqM2x002145 X-archive-position: 3400 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > It's running on my workstation with no problems. passes the > "ping -qf" load test. Do you have any other tests you can through at it? > ethtool output seems normal, but what's with the rx-mini and > rx-jumbo max settings in the -g output? Good catch. Needs this: @@ -1848,8 +1848,12 @@ static int e100_ethtool(struct net_devic case ETHTOOL_GRINGPARAM: ecmd->ring.rx_max_pending = rfds->max; ecmd->ring.tx_max_pending = cbs->max; + ecmd->ring.rx_mini_max_pending = 0; + ecmd->ring.rx_jumbo_max_pending = 0; ecmd->ring.rx_pending = rfds->count; ecmd->ring.tx_pending = cbs->count; + ecmd->ring.rx_mini_pending = 0; + ecmd->ring.rx_jumbo_pending = 0; if(copy_to_user(useraddr, ecmd, sizeof(ecmd->ring))) err = -EFAULT; break; -scott From davem@redhat.com Wed Jun 18 10:20:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 10:20:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IHKH2x003345 for ; Wed, 18 Jun 2003 10:20:17 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA07376; Wed, 18 Jun 2003 10:15:36 -0700 Date: Wed, 18 Jun 2003 10:15:35 -0700 (PDT) Message-Id: <20030618.101535.112616670.davem@redhat.com> To: ak@suse.de Cc: ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol From: "David S. Miller" In-Reply-To: <20030618080118.GC23037@wotan.suse.de> References: <20030617220420.GA1169@averell> <20030617.150751.52901849.davem@redhat.com> <20030618080118.GC23037@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3401 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Wed, 18 Jun 2003 10:01:18 +0200 I'm not going through the whole PPP layer now to audit it for non linear skb cleanliness... Then you're not making it a compatible protocol. And the printk will remain until someone fixes it. If I could get Rusty to fixup _ALL_ of netfilter, I can get you to fix just _ONE_ thing. From davem@redhat.com Wed Jun 18 10:30:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 10:31:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IHUv2x003708 for ; Wed, 18 Jun 2003 10:30:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA07446; Wed, 18 Jun 2003 10:26:03 -0700 Date: Wed, 18 Jun 2003 10:26:03 -0700 (PDT) Message-Id: <20030618.102603.48525259.davem@redhat.com> To: lunz@falooley.org Cc: netdev@oss.sgi.com Subject: Re: e100-3.0.0_dev8 "Minneapolis Moline" release From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3402 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jason Lunz Date: Wed, 18 Jun 2003 15:33:25 +0000 (UTC) Is the anonymous union in struct cb some kind of gcc3-only thing? It doesn't compile for my debian woody gcc 2.95 unless I name the union member. Yes, please don't use anonymous unions in kernel code that you expect to compile anything other than the most recent versions of gcc. From andre@tomt.net Wed Jun 18 10:34:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 10:34:51 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IHYi2x004042 for ; Wed, 18 Jun 2003 10:34:45 -0700 Received: (qmail 11603 invoked by uid 1006); 18 Jun 2003 17:37:27 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.16 (sophie: 2.14/3.69. spamassassin: 2.55. Clear:. Processed in 0.024727 secs); 18 Jun 2003 17:37:27 -0000 Received: from slask.tomt.net (HELO slurv.ws.pasop.tomt.net) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 18 Jun 2003 17:37:26 -0000 Subject: Re: (usagi-users 02435) Re: IPv6 bugs introduced in 2.4.21 From: Andre Tomt To: usagi-users@linux-ipv6.org Cc: netdev@oss.sgi.com, info@sixxs.net In-Reply-To: <003401c335ad$1c0cc880$210d640a@unfix.org> References: <003401c335ad$1c0cc880$210d640a@unfix.org> Content-Type: text/plain; charset=ISO-8859-1 Organization: Message-Id: <1055957670.7478.227.camel@slurv.ws.pasop.tomt.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4- Date: 18 Jun 2003 19:34:30 +0200 Content-Transfer-Encoding: 8bit X-archive-position: 3403 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev On ons, 2003-06-18 at 17:20, Jeroen Massar wrote: > Notez bien that many people use :: and ::1 and ::2 etc as a unicast > address. I'm getting majorly confused now, things I've taken for granted with IPv6, and used in IOS, *BSD, Windows and Linux for ages, suddenly stops working with Linux, and is wrong (it seems). Is there anything wrong about using $network::1/64 ? Other than not being EUI64.. 2001:730:f:1:: is invalid as a unicast address? (note it still works on ptp-interfaces as long as you set nexthop to the dev, not the ip address - but I expected that.) /127's does not match two addresses (:2a :2b in this case)? > > The /127 matches both 2a and 2b, why does it end up at localhost? > > Routing, remove the route which goes over lo. There is no route that goes over lo, other than one for a different prefix, wich is set to avoid loops on unmatched prefixes. unreachable 2001:730:f::/48 dev lo metric 2048 error -101 mtu 16436 advmss 16376. Because of routing? I don't understand. the /127 route matches both 2a and 2b, and has no conflicting routes. It _should_ end up on the aorta-dev as far as I can see. 2001:730:3::1:2a/127 via :: dev aorta proto kernel metric 256 mtu 1480 advmss 1420 (kernel autogenerated) I tried removing it, and add a route without the via ::, still does not work like before. non-routed traffic still goes to the bitbucket, but routed traffic works as well as ever. -- Mvh, André Tomt From shemminger@osdl.org Wed Jun 18 10:38:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 10:38:52 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IHch2x004370 for ; Wed, 18 Jun 2003 10:38:45 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5IHcNX14804; Wed, 18 Jun 2003 10:38:23 -0700 Date: Wed, 18 Jun 2003 10:38:23 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: osdl survives with 2.5 Message-Id: <20030618103823.4dae17f4.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3404 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev As part of effort to stabilize 2.5 kernel, the public web server http://www.osdl.org has been running 2.5 for over a month. Yesterday, with the announcement of Linus's move, the web server went from 30,000 hits a day to 577,000 hits in one day without a hiccup. Good work. From davem@redhat.com Wed Jun 18 11:02:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 11:02:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5II2V2x005097 for ; Wed, 18 Jun 2003 11:02:32 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA07420; Wed, 18 Jun 2003 10:21:22 -0700 Date: Wed, 18 Jun 2003 10:21:22 -0700 (PDT) Message-Id: <20030618.102122.59661992.davem@redhat.com> To: ak@colin2.muc.de Cc: wa@almesberger.net, ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol From: "David S. Miller" In-Reply-To: <20030618125159.GA15238@colin2.muc.de> References: <20030618080118.GC23037@wotan.suse.de> <20030618092126.A28100@almesberger.net> <20030618125159.GA15238@colin2.muc.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3405 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: 18 Jun 2003 14:51:59 +0200,Wed, 18 Jun 2003 14:51:59 +0200 On Wed, Jun 18, 2003 at 09:21:26AM -0300, Werner Almesberger wrote: > I know you find documentation unmanly, but maybe ... But what is the replacement? For moving over whole subsystems to non linear skbs it's a bit late in the release... We did it for all of netfilter just the other week, networking development still continues, it's not too late. Stop looking for excuses and just do the work Andi. From toml@us.ibm.com Wed Jun 18 12:23:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 12:23:17 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IJN02x008339 for ; Wed, 18 Jun 2003 12:23:07 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5IJMGpS204840; Wed, 18 Jun 2003 15:22:16 -0400 Received: from tomlt2.austin.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5IJMEnC113686; Wed, 18 Jun 2003 15:22:15 -0400 Subject: Flow cache flush oops From: Tom Lendacky To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, toml@us.ibm.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 18 Jun 2003 14:22:33 -0500 Message-Id: <1055964154.1818.59.camel@tomlt2.tomloffice.austin.ibm.com> Mime-Version: 1.0 X-archive-position: 3406 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev I consistently receive the following oops on 2.5.72 on an SMP machine after flushing some IPSec policy's and sa's that I had established and sent data through. I have not been able to reproduce this on a UP machine. I haven't done a lot of debugging in an SMP environment so if anyone else can reproduce this and help debug, it would be greatly appreciated. The oops data is below. Thanks, Tom Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c02bd20d *pde = 00000000 Oops: 0000 [#1] CPU: 1 EIP: 0060:[] Not tainted EFLAGS: 00010286 EIP is at flow_cache_flush_tasklet+0x3d/0xb0 eax: 00000000 ebx: 00000000 ecx: 0000000a edx: 01408f80 esi: f7e7ff0c edi: c04d2e80 ebp: 00000001 esp: f7e7fee8 ds: 007b es: 007b ss: 0068 Process events/1 (pid: 7, threadinfo=f7e7e000 task=f7f44ca0) Stack: c0405920 f7e7ff0c f7e7ff14 00000293 c02bd391 f7e7ff0c f7e7ff0c 00000001 00000000 00000000 00000001 00000001 00000001 f7e7ff1c f7e7ff1c f7532c60 f7e7e000 f7035c00 f7e7ff4c f7f8c060 c0329a91 f7035c14 f7e7ff4c c0329b5b Call Trace: [] flow_cache_flush+0xa1/0xbf [] xfrm_policy_gc_kill+0x71/0xa0 [] xfrm_policy_gc_task+0x9b/0xb0 [] worker_thread+0x237/0x330 [] xfrm_policy_gc_task+0x0/0xb0 [] default_wake_function+0x0/0x30 [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x30 [] worker_thread+0x0/0x330 [] kernel_thread_helper+0x5/0x18 Code: 8b 14 98 85 d2 74 36 8d b6 00 00 00 00 8d bf 00 00 00 00 8b From davem@redhat.com Wed Jun 18 12:27:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 12:28:06 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IJRv2x008665 for ; Wed, 18 Jun 2003 12:27:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA08084; Wed, 18 Jun 2003 12:23:11 -0700 Date: Wed, 18 Jun 2003 12:23:11 -0700 (PDT) Message-Id: <20030618.122311.118621111.davem@redhat.com> To: toml@us.ibm.com Cc: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, herbert@gondor.apana.org.au Subject: Re: Flow cache flush oops From: "David S. Miller" In-Reply-To: <1055964154.1818.59.camel@tomlt2.tomloffice.austin.ibm.com> References: <1055964154.1818.59.camel@tomlt2.tomloffice.austin.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3407 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Tom Lendacky Date: 18 Jun 2003 14:22:33 -0500 I consistently receive the following oops on 2.5.72 on an SMP machine after flushing some IPSec policy's and sa's that I had established and sent data through. I've forwarded your report to Herbert, who worked on this stuff. He should be able to figure it out. From jeroen@unfix.org Wed Jun 18 16:04:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 16:04:11 -0700 (PDT) Received: from purgatory.unfix.org (postfix@cust.92.136.adsl.cistron.nl [195.64.92.136]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5IN412x012238 for ; Wed, 18 Jun 2003 16:04:02 -0700 Received: from limbo (limbo.unfix.org [10.100.13.33]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by purgatory.unfix.org (Postfix) with ESMTP id A84B07FDE; Thu, 19 Jun 2003 01:03:58 +0200 (CEST) From: "Jeroen Massar" To: "'Andre Tomt'" , Cc: , Subject: RE: (usagi-users 02435) Re: IPv6 bugs introduced in 2.4.21 Date: Thu, 19 Jun 2003 01:03:56 +0200 Organization: Unfix Message-ID: <001501c335ed$e54dfa80$210d640a@unfix.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.3416 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal In-Reply-To: <1055957670.7478.227.camel@slurv.ws.pasop.tomt.net> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5IN412x012238 X-archive-position: 3408 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jeroen@unfix.org Precedence: bulk X-list: netdev Andre Tomt wrote: > On ons, 2003-06-18 at 17:20, Jeroen Massar wrote: > > Notez bien that many people use :: and ::1 and ::2 etc as a unicast > > address. > > I'm getting majorly confused now, things I've taken for granted with > IPv6, and used in IOS, *BSD, Windows and Linux for ages, > suddenly stops working with Linux, and is wrong (it seems). You have quite some short ages. IPv4 isn't even around for halve an age yet... though it's coming close :) > Is there anything wrong about using $network::1/64 ? Other than not > being EUI64.. Absolutely not. Actually one can fill in the last 64 bits by himself. Though there are some bits that should be untouched because of DAD and some rules. > 2001:730:f:1:: is invalid as a unicast address? (note it > still works on ptp-interfaces as long as you set nexthop to the dev, not the > ip address - but I expected that.) > > /127's does not match two addresses (:2a :2b in this case)? > > > > > > The /127 matches both 2a and 2b, why does it end up at localhost? > > > > Routing, remove the route which goes over lo. > > There is no route that goes over lo, other than one for a different > prefix, wich is set to avoid loops on unmatched prefixes. > unreachable 2001:730:f::/48 dev lo metric 2048 error -101 > mtu 16436 advmss 16376. Setting an unreachable for a prefix routed to your destination is very good practice indeed. > Because of routing? I don't understand. the /127 route matches both 2a > and 2b, and has no conflicting routes. It _should_ end up on the > aorta-dev as far as I can see. > > 2001:730:3::1:2a/127 via :: dev aorta proto kernel metric > 256 mtu 1480 advmss 1420 > (kernel autogenerated) Via :: on device aorta.. which explains it quite well. Both IP's go to the device and over the wire to the other side. But packets coming back get dropped as they have no destination on your box. Use tcpdump and you'll see :) > I tried removing it, and add a route without the via ::, > still does not > work like before. non-routed traffic still goes to the bitbucket, but > routed traffic works as well as ever. As said we still use /127, and will keep on using /127's, at the IPng POP. My routing table entries for the tunnel currently look like: 3ffe:8114:1000::26 dev sixxs metric 1024 mtu 1280 advmss 1220 3ffe:8114:1000::26/127 via :: dev sixxs proto kernel metric 256 mtu 1280 advmss 1220 As you can see I've added a seperate route to the 'remote' side (3ffe:8114:1000::26). This way the packets go along quite cleanly. Btw this is on 2.4.21-rc1 and on that system I had to remove the routes to lo too. Unicast my :) Greets, Jeroen From toml@us.ibm.com Wed Jun 18 16:12:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 16:12:41 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5INCa2x012631 for ; Wed, 18 Jun 2003 16:12:37 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5INB1pS182284; Wed, 18 Jun 2003 19:11:01 -0400 Received: from d01ml072.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by northrelay03.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5INAwgP026870; Wed, 18 Jun 2003 19:10:58 -0400 Subject: Re: Flow cache flush oops To: Herbert Xu Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Wed, 18 Jun 2003 18:10:48 -0500 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/18/2003 07:10:59 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3409 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev If your compiler is producing anything like mine, then edi should be smp_processor_id() which looks a bit weird. Please send me your net/core/flow.o file so that I can compare it. What compiler are you using? And how many CPUs? The compiler version output is: gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7) The system has 4 CPUs. Herbert, I'll send the flow.o file to you separately. Thanks, Tom From davem@redhat.com Wed Jun 18 16:35:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 16:36:01 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5INZp2x013130 for ; Wed, 18 Jun 2003 16:35:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA08810; Wed, 18 Jun 2003 16:30:57 -0700 Date: Wed, 18 Jun 2003 16:30:57 -0700 (PDT) Message-Id: <20030618.163057.74745635.davem@redhat.com> To: shemminger@osdl.org Cc: ctindel@users.sourceforge.net, fubar@us.ibm.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.72] use alloc_netdev in bonding driver From: "David S. Miller" In-Reply-To: <20030617113510.5ae6a5a9.shemminger@osdl.org> References: <20030617113510.5ae6a5a9.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3410 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 17 Jun 2003 11:35:10 -0700 This patch replaces the allocation of an array of bonding device structures with allocating net_device's through alloc_netdev. The net_device, statistics, and net_device are created with one allocation via alloc_netdev. Applied, thanks Stephen. From davem@redhat.com Wed Jun 18 16:38:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 16:38:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5INcf2x013459 for ; Wed, 18 Jun 2003 16:38:42 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA08825; Wed, 18 Jun 2003 16:34:02 -0700 Date: Wed, 18 Jun 2003 16:34:02 -0700 (PDT) Message-Id: <20030618.163402.41640357.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.72] Eliminate bogus function in Red Creek VPN From: "David S. Miller" In-Reply-To: <20030617163228.34d3daa4.shemminger@osdl.org> References: <20030617163228.34d3daa4.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3411 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev I've added both of your red creek vpn driver changes to my tree, thanks. From davem@redhat.com Wed Jun 18 16:40:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 16:40:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5INej2x013770 for ; Wed, 18 Jun 2003 16:40:45 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA08837; Wed, 18 Jun 2003 16:36:00 -0700 Date: Wed, 18 Jun 2003 16:36:00 -0700 (PDT) Message-Id: <20030618.163600.71097155.davem@redhat.com> To: wa@almesberger.net Cc: ak@muc.de, netdev@oss.sgi.com, mostrows@speakeasy.net, paulus@au.ibm.com Subject: Re: [PATCH] Convert pppoe to new style protocol From: "David S. Miller" In-Reply-To: <20030618092126.A28100@almesberger.net> References: <20030617.150751.52901849.davem@redhat.com> <20030618080118.GC23037@wotan.suse.de> <20030618092126.A28100@almesberger.net> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3412 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Werner Almesberger Date: Wed, 18 Jun 2003 09:21:26 -0300 + * + * DO NOT USE THIS IN NEW CODE ! skb_linearize will be for internal + * use by net/core/dev.c only. */ int skb_linearize(struct sk_buff *skb, int gfp); I have a better idea, I just added the __deprecated tag to this function declaration :-) From herbert@gondor.apana.org.au Wed Jun 18 17:10:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 17:10:16 -0700 (PDT) Received: from arnor.me.apana.org.au (mail@[203.14.152.115]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5J09o2x014269 for ; Wed, 18 Jun 2003 17:09:57 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 19SmxH-0006dT-00; Thu, 19 Jun 2003 10:07:07 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 19SmxE-0006Xg-00; Thu, 19 Jun 2003 10:07:04 +1000 Date: Thu, 19 Jun 2003 10:07:04 +1000 To: Tom Lendacky Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Flow cache flush oops Message-ID: <20030619000704.GA25135@gondor.apana.org.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 3413 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Wed, Jun 18, 2003 at 06:10:48PM -0500, Tom Lendacky wrote: > > The compiler version output is: > gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7) Thanks, you've found two bugs in my code :) Firstly it's only initialising one CPU so only that one has a flow cache. Secondly flow_flush_cache is not checking whether the calling CPU has been initialised. I'll post a patch tonight. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From pekkas@netcore.fi Wed Jun 18 22:37:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 18 Jun 2003 22:37:51 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5J5bf2x018534 for ; Wed, 18 Jun 2003 22:37:42 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5J5bO215129; Thu, 19 Jun 2003 08:37:24 +0300 Date: Thu, 19 Jun 2003 08:37:23 +0300 (EEST) From: Pekka Savola To: usagi-users@linux-ipv6.org cc: jeroen@unfix.org, , Subject: Re: (usagi-users 02436) Re: IPv6 bugs introduced in 2.4.21 In-Reply-To: <20030619.010753.75976590.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3414 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Thu, 19 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <003401c335ad$1c0cc880$210d640a@unfix.org> (at Wed, 18 Jun 2003 17:20:10 +0200), "Jeroen Massar" says: > > > Is there a toggle for turning this behaviour off ? > > Switch forwarding off. > Routers are REQUIRED to be enabled. Does the same happen with any other prefix length (e.g. the prefix::/126)? Otherwise, it's just a (un)fortunate bug. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From herbert@gondor.apana.org.au Thu Jun 19 02:38:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 02:38:42 -0700 (PDT) Received: from arnor.me.apana.org.au (mail@[203.14.152.115]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5J9c62x025148 for ; Thu, 19 Jun 2003 02:38:24 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 19SvpK-0000N5-00; Thu, 19 Jun 2003 19:35:30 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 19Svp8-00074A-00; Thu, 19 Jun 2003 19:35:18 +1000 Date: Thu, 19 Jun 2003 19:35:18 +1000 To: Tom Lendacky Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Flow cache flush oops Message-ID: <20030619093518.GA27025@gondor.apana.org.au> References: <20030619000704.GA25135@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="y0ulUmNC+osPPQO6" Content-Disposition: inline In-Reply-To: <20030619000704.GA25135@gondor.apana.org.au> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 3415 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --y0ulUmNC+osPPQO6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jun 19, 2003 at 10:07:04AM +1000, herbert wrote: > > Thanks, you've found two bugs in my code :) Firstly it's only initialising > one CPU so only that one has a flow cache. Secondly flow_flush_cache is > not checking whether the calling CPU has been initialised. > > I'll post a patch tonight. Here is the patch. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --y0ulUmNC+osPPQO6 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p Index: kernel-source-2.5/net/core/flow.c =================================================================== RCS file: /home/gondolin/herbert/src/CVS/debian/kernel-source-2.5/net/core/flow.c,v retrieving revision 1.5 diff -u -r1.5 flow.c --- kernel-source-2.5/net/core/flow.c 13 Jun 2003 11:22:17 -0000 1.5 +++ kernel-source-2.5/net/core/flow.c 19 Jun 2003 09:23:38 -0000 @@ -300,7 +300,8 @@ local_bh_disable(); smp_call_function(flow_cache_flush_per_cpu, &info, 1, 0); - flow_cache_flush_tasklet((unsigned long)&info); + if (test_bit(smp_processor_id(), &info.cpumap)) + flow_cache_flush_tasklet((unsigned long)&info); local_bh_enable(); wait_for_completion(&info.completion); @@ -308,12 +309,13 @@ up(&flow_flush_sem); } -static void __devinit flow_cache_cpu_online(int cpu) +static void __devinit flow_cache_cpu_prepare(int cpu) { struct tasklet_struct *tasklet; unsigned long order; flow_hash_rnd_recalc(cpu) = 1; + flow_count(cpu) = 0; for (order = 0; (PAGE_SIZE << order) < @@ -328,7 +330,10 @@ tasklet = flow_flush_tasklet(cpu); tasklet_init(tasklet, flow_cache_flush_tasklet, 0); +} +static void __devinit flow_cache_cpu_online(int cpu) +{ down(&flow_cache_cpu_sem); set_bit(cpu, &flow_cache_cpu_map); flow_cache_cpu_count++; @@ -341,6 +346,9 @@ unsigned long cpu = (unsigned long)cpu; switch (action) { case CPU_UP_PREPARE: + flow_cache_cpu_prepare(cpu); + break; + case CPU_ONLINE: flow_cache_cpu_online(cpu); break; } @@ -353,6 +361,8 @@ static int __init flow_cache_init(void) { + int i; + flow_cachep = kmem_cache_create("flow_cache", sizeof(struct flow_cache_entry), 0, SLAB_HWCACHE_ALIGN, @@ -370,8 +380,12 @@ flow_hash_rnd_timer.expires = jiffies + FLOW_HASH_RND_PERIOD; add_timer(&flow_hash_rnd_timer); - flow_cache_cpu_online(smp_processor_id()); register_cpu_notifier(&flow_cache_cpu_nb); + for (i = 0; i < NR_CPUS; i++) + if (cpu_online(i)) { + flow_cache_cpu_prepare(i); + flow_cache_cpu_online(i); + } return 0; } --y0ulUmNC+osPPQO6-- From shmulik.hen@intel.com Thu Jun 19 05:11:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 05:12:19 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JCBw2x029623 for ; Thu, 19 Jun 2003 05:11:58 -0700 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5JC7GO04888 for ; Thu, 19 Jun 2003 12:07:16 GMT Received: from fmsmsxvs041.fm.intel.com (fmsmsxvs041.fm.intel.com [132.233.42.126]) by petasus.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5JC59S02988 for ; Thu, 19 Jun 2003 12:05:09 GMT Received: from jrslxjul1.npdj.intel.com ([10.12.254.186]) by fmsmsxvs041.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061905082030542 ; Thu, 19 Jun 2003 05:08:29 -0700 Date: Thu, 19 Jun 2003 15:11:44 +0300 (IDT) From: Shmulik Hen X-X-Sender: Reply-To: Shmulik Hen To: bond-devel , linux-net , linux-netdev cc: Amir Noam , "Chad N. Tindel" , "David S. Miller" , Jay Vosburgh , Jeff Garzik , Noam Marom , Shmulik Hen , Tsippy Mendelson Subject: System hard locks with bonding and tcpdump Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3416 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Hi, We've noticed on several occasions that when using tcpdump on one of bond's slaves the system will freeze in what seems to be a deadlock. The scenario is as follows: 4x Pentium 3 machine SMP kernel 2.4.20 + KDB bonding 20030513 in RLB mode (also happened in 20030320 in xor mode) 3x e100 slaves run bi-directional TCP stress traffic to multiple client using iperf do 'tcpdump -i ethX' on one of the slaves The system will occasionally freeze. When using the -p option to *not* set the interface into promiscuous mode, every thing is OK. Using KDB we were able to conclude that this is probably a deadlock between the br lock and either dev->queue_lock or dev->xmit_lock (see trace below). Going through the code of dev_queue_xmit(), we've noticed the following sentence in the comment block: "Check this and shot the lock. It is not prone from deadlocks." We're not sure what it means or who put it in there, but maybe it relates to what we see. Any help will be appreciated. Entering kdb (current=0xdf0a4000, pid 12008) on processor 3 due to Keyboard Entry [3]kdb> bt 0xdf0a4000 12008 11975 1 003 run 0xdf0a4370*iperf EBP EIP Function (args) 0xc021e87e .text.lock.dev+0x83 kernel .text 0xc0100000 0xc021e7fb 0xc021e980 0xdf0a5a44 0xc021c5ed dev_queue_xmit+0x9d (0xdeb71f00, 0xf7ed34f8, 0x10, 0xdeb71f00, 0xdef2e89c) kernel .text 0xc0100000 0xc021c550 0xc021c870 0xdf0a5a64 0xc0230127 ip_finish_output2+0xa7 (0xdeb71f00, 0xdeb71f00) kernel .text 0xc0100000 0xc0230080 0xc0230190 0xdf0a5a74 0xc022eead ip_output+0x4d (0xdeb71f00, 0x0, 0x29, 0x0, 0xdd5e4d00) kernel .text 0xc0100000 0xc022ee60 0xc022eec0 0xdf0a5a9c 0xc023025e ip_queue_xmit2+0xce (0xdeb71f00, 0xf68fa640, 0xf6cce9a0, 0xf6cce9a0, 0x286) kernel .text 0xc0100000 0xc0230190 0xc023050e 0xdf0a5aec 0xc022f0c8 ip_queue_xmit+0x208 (0xdeb71f00, 0xdef2e8b0, 0x5c8, 0xdeb71f00, 0xde356e40) kernel .text 0xc0100000 0xc022eec0 0xc022f180 0xdf0a5b2c 0xc023f404 tcp_transmit_skb+0x2c4 (0xde040880, 0xdeb71f00, 0xdeb71f00, 0xdeb71f00, 0xde0408e8) kernel .text 0xc0100000 0xc023f140 0xc023f5a0 0xdf0a5b58 0xc023ffa1 tcp_write_xmit+0x181 (0xde040880, 0x0, 0x0, 0x532861b9, 0xde0409b8) kernel .text 0xc0100000 0xc023fe20 0xc0240090 0xdf0a5b74 0xc023d133 __tcp_data_snd_check+0x93 (0xde040880, 0xde356e40, 0x6, 0xde3e5e6c, 0x46) kernel .text 0xc0100000 0xc023d0a0 0xc023d150 0xdf0a5bb8 0xc023d915 tcp_rcv_established+0x4c5 (0xde040880, 0xdeb71f00, 0xdf342054, 0x20, 0x0) kernel .text 0xc0100000 0xc023d450 0xc023dcd0 0xdf0a5bdc 0xc0245a2f tcp_v4_do_rcv+0x13f (0xde040880, 0xdeb71f00, 0x202, 0x246, 0x3) kernel .text 0xc0100000 0xc02458f0 0xc0245a40 0xdf0a5c14 0xc0245f00 tcp_v4_rcv+0x4c0 (0xdeb71f00, 0xe015dd60, 0x11, 0xc960c898, 0x6) kernel .text 0xc0100000 0xc0245a40 0xc0245fb0 0xdf0a5c3c 0xc022c47d ip_local_deliver_finish+0x12d (0xdeb71f00) kernel .text 0xc0100000 0xc022c350 0xc022c490 0xdf0a5c48 0xc022c132 ip_local_deliver+0x32 (0xdeb71f00, 0x4dfdb98f, 0x66fdb98f, 0x0, 0xdeb25400) kernel .text 0xc0100000 0xc022c100 0xc022c140 0xdf0a5c80 0xc022c669 ip_rcv_finish+0x1d9 (0xdeb71f00, 0xdeaabf00, 0x34, 0x5872be0, 0x0) kernel .text 0xc0100000 0xc022c490 0xc022c6cf 0xdf0a5ca4 0xc022c2ab ip_rcv+0x16b (0xdeb71f00, 0xdeb25400, 0xc0380ec4, 0xde7dec00, 0xc03ac00c) kernel .text 0xc0100000 0xc022c140 0xc022c350 0xdf0a5cc4 0xc021cdb4 netif_receive_skb+0xd4 (0xdeb71f00, 0x5ea1ea, 0xc03ac000,0x40, 0xc03ac0ec) kernel .text 0xc0100000 0xc021cce0 0xc021ce70 0xdf0a5ce8 0xc021cefa process_backlog+0x8a (0xc03ab5f0, 0x246, 0x3, 0xdeb25400,0xdeb7d860) kernel .text 0xc0100000 0xc021ce70 0xc021cfa0 0xdf0a5d34 0xc011f649 do_softirq+0xe9 (0x0, 0xc03d4d48, 0x60, 0xdeb7d860, 0xdeb25400) kernel .text 0xc0100000 0xc011f560 0xc011f650 0xc021e88f .text.lock.dev+0x94 kernel .text 0xc0100000 0xc021e7fb 0xc021e980 0xdf0a5d5c 0xc021c65f dev_queue_xmit+0x10f (0xdeb7d860, 0xde87a238, 0x10, 0xdeb7d860, 0xde2f6a9c) kernel .text 0xc0100000 0xc021c550 0xc021c870 0xdf0a5d7c 0xc0230127 ip_finish_output2+0xa7 (0xdeb7d860, 0xdeb7d860) kernel .text 0xc0100000 0xc0230080 0xc0230190 0xdf0a5d8c 0xc022eead ip_output+0x4d (0xdeb7d860, 0x0, 0x5a8, 0x5a8, 0xdd5e41c0) kernel .text 0xc0100000 0xc022ee60 0xc022eec0 0xdf0a5db4 0xc023025e ip_queue_xmit2+0xce (0xdeb7d860, 0xf7bf0e40, 0xde010ca0, 0xdeb7d860, 0xdeb7d860) kernel .text 0xc0100000 0xc0230190 0xc023050e 0xdf0a5e04 0xc022f0c8 ip_queue_xmit+0x208 (0xdeb7d860, 0xde2f6ab0, 0x20, 0xdeb7d860, 0xc19ad800) kernel .text 0xc0100000 0xc022eec0 0xc022f180 0xdf0a5e44 0xc023f404 tcp_transmit_skb+0x2c4 (0xf79dc060, 0xdeb7d860, 0xde047854, 0xf79dc198, 0xf79dc060) kernel .text 0xc0100000 0xc023f140 0xc023f5a0 0xdf0a5e60 0xc02416d4 tcp_send_ack+0x84 (0xf79dc060, 0xc0217cf5, 0xf79dc060, 0xdf0a4000, 0xf298) kernel .text 0xc0100000 0xc0241650 0xc0241710 0xdf0a5e80 0xc02352d2 cleanup_rbuf+0xc2 (0xf79dc060, 0xb50, 0xdf0a5f60, 0x5a8, 0xdf0a5f68) kernel .text 0xc0100000 0xc0235210 0xc0235320 0xdf0a5ed8 0xc0235b10 tcp_recvmsg+0x3e0 (0xf79dc060, 0xdf0a5f68, 0xf298, 0x0, 0x0) kernel .text 0xc0100000 0xc0235730 0xc02360b0 0xdf0a5f00 0xc02538ea inet_recvmsg+0x4a (0xdf721980, 0xdf0a5f68, 0xfde8, 0x0, 0xdf0a5f1c) kernel .text 0xc0100000 0xc02538a0 0xc0253900 0xdf0a5f48 0xc0214fdf sock_recvmsg+0x4f (0xdf721980, 0xdf0a5f68, 0xfde8, 0x0, 0x41120d70) kernel .text 0xc0100000 0xc0214f90 0xc0215080 0xdf0a5f90 0xc021511c sock_read+0x9c (0xf5435520, 0x41120220, 0xfde8, 0xf5435540, 0x3eb75f16) kernel .text 0xc0100000 0xc0215080 0xc0215120 0xdf0a5fbc 0xc013d8bc sys_read+0x9c (0x9, 0x41120220, 0xfde8, 0x411201c0, 0x41120190) kernel .text 0xc0100000 0xc013d820 0xc013d970 0xc010774f system_call+0x33 kernel .text 0xc0100000 0xc010771c 0xc0107754 [3]kdb> cpu 0 Entering kdb (current=0xdeada000, pid 12009) on processor 0 due to cpu switch [0]kdb> bt 0xdeada000 12009 11975 1 000 run 0xdeada370*iperf EBP EIP Function (args) 0xdeadbd40 0xc0106317 __read_lock_failed+0x3 (0xdef99c30, 0xde7ded60, 0x8000, 0xdeb25400, 0xdef99c20) kernel .text 0xc0100000 0xc0106314 0xc0106328 0xc021e862 .text.lock.dev+0x67 kernel .text 0xc0100000 0xc021e7fb 0xc021e980 0xc021c3d9 dev_queue_xmit_nit+0x39 (0xdef99c20, 0xdeb25400, 0xdeada000, 0xdef99c20, 0xde9fbbe0) kernel .text 0xc0100000 0xc021c3a0 0xc021c4b0 0xdeadbd5c 0xc021c6ff dev_queue_xmit+0x1af (0xdef99c20, 0xde9fbbf8, 0x10, 0xdef99c20, 0xde55909c) kernel .text 0xc0100000 0xc021c550 0xc021c870 0xdeadbd7c 0xc0230127 ip_finish_output2+0xa7 (0xdef99c20, 0xdef99c20) kernel .text 0xc0100000 0xc0230080 0xc0230190 0xdeadbd8c 0xc022eead ip_output+0x4d (0xdef99c20, 0x0, 0x5a8, 0x5a8, 0xc9688360) kernel .text 0xc0100000 0xc022ee60 0xc022eec0 0xdeadbdb4 0xc023025e ip_queue_xmit2+0xce (0xdef99c20, 0xf7bf0e40, 0xf6a42760, 0xdef99c20, 0xdef99c20) kernel .text 0xc0100000 0xc0230190 0xc023050e 0xdeadbe04 0xc022f0c8 ip_queue_xmit+0x208 (0xdef99c20, 0xde5590b0, 0x20, 0xdef99c20, 0xc19ad800) kernel .text 0xc0100000 0xc022eec0 0xc022f180 0xdeadbe44 0xc023f404 tcp_transmit_skb+0x2c4 (0xf7847b80, 0xdef99c20, 0xdf9ad854, 0xf7847cb8, 0xf7847b80) kernel .text 0xc0100000 0xc023f140 0xc023f5a0 0xdeadbe60 0xc02416d4 tcp_send_ack+0x84 (0xf7847b80, 0xc0217cf5, 0xf7847b80, 0xdeada000, 0xe748) kernel .text 0xc0100000 0xc0241650 0xc0241710 0xdeadbe80 0xc02352d2 cleanup_rbuf+0xc2 (0xf7847b80, 0x16a0, 0x0, 0xdeadbf68, 0xdf941b40) kernel .text 0xc0100000 0xc0235210 0xc0235320 0xdeadbed8 0xc0235b10 tcp_recvmsg+0x3e0 (0xf7847b80, 0xdeadbf68, 0xe748, 0x0, 0x0) kernel .text 0xc0100000 0xc0235730 0xc02360b0 0xdeadbf00 0xc02538ea inet_recvmsg+0x4a (0xdf941b40, 0xdeadbf68, 0xfde8, 0x0, 0xdeadbf1c) kernel .text 0xc0100000 0xc02538a0 0xc0253900 0xdeadbf48 0xc0214fdf sock_recvmsg+0x4f (0xdf941b40, 0xdeadbf68, 0xfde8, 0x0, 0x41131740) kernel .text 0xc0100000 0xc0214f90 0xc0215080 0xdeadbf90 0xc021511c sock_read+0x9c (0xdebfb420, 0x411300a0, 0xfde8, 0xdebfb440, 0x3eb75f16) kernel .text 0xc0100000 0xc0215080 0xc0215120 0xdeadbfbc 0xc013d8bc sys_read+0x9c (0xa, 0x411300a0, 0xfde8, 0x41130040, 0x41130010) kernel .text 0xc0100000 0xc013d820 0xc013d970 0xc010774f system_call+0x33 kernel .text 0xc0100000 0xc010771c 0xc0107754 [0]kdb> cpu 1 Entering kdb (current=0xde892000, pid 11942) on processor 1 due to cpu switch [1]kdb> bt 0xde892000 11942 11941 1 001 run 0xde892370*iperf EBP EIP Function (args) 0xc021e87e .text.lock.dev+0x83 kernel .text 0xc0100000 0xc021e7fb 0xc021e980 0xde893b0c 0xc021c5ed dev_queue_xmit+0x9d (0xf7662d40, 0xde87a238, 0x10, 0xf7662d40, 0xdec2689c) kernel .text 0xc0100000 0xc021c550 0xc021c870 0xde893b2c 0xc0230127 ip_finish_output2+0xa7 (0xf7662d40, 0xf7662d40) kernel .text 0xc0100000 0xc0230080 0xc0230190 0xde893b3c 0xc022eead ip_output+0x4d (0xf7662d40, 0xe015dd60, 0xc960d89c, 0xdeb78660, 0xdd5e41c0) kernel .text 0xc0100000 0xc022ee60 0xc022eec0 0xde893b64 0xc023025e ip_queue_xmit2+0xce (0xf7662d40, 0x0, 0x1841b43, 0x2f, 0x286) kernel .text 0xc0100000 0xc0230190 0xc023050e 0xde893bb4 0xc022f0c8 ip_queue_xmit+0x208 (0xf7662d40, 0xdec268b0, 0x5c8, 0xf7662d40, 0xdf126840) kernel .text 0xc0100000 0xc022eec0 0xc022f180 0xde893bf4 0xc023f404 tcp_transmit_skb+0x2c4 (0xdf87cc20, 0xf7662d40, 0xded0af00, 0xded0af00, 0xdf87cc88) kernel .text 0xc0100000 0xc023f140 0xc023f5a0 0xde893c20 0xc023ffa1 tcp_write_xmit+0x181 (0xdf87cc20, 0x0, 0x0, 0x4fc022eb, 0xdf87cd58) kernel .text 0xc0100000 0xc023fe20 0xc0240090 0xde893c3c 0xc023d133 __tcp_data_snd_check+0x93 (0xdf87cc20, 0xdeab30a0, 0x6, 0x2, 0xf79dc198) kernel .text 0xc0100000 0xc023d0a0 0xc023d150 0xde893c80 0xc023d915 tcp_rcv_established+0x4c5 (0xdf87cc20, 0xded0af00, 0xde93c854, 0x20, 0x0) kernel .text 0xc0100000 0xc023d450 0xc023dcd0 0xde893ca4 0xc0245a2f tcp_v4_do_rcv+0x13f (0xdf87cc20, 0xded0af00, 0xf7bfd2e0) kernel .text 0xc0100000 0xc02458f0 0xc0245a40 0xde893cdc 0xc0245f00 tcp_v4_rcv+0x4c0 (0x80e88913, 0x15, 0x80e8, 0xde93c854, 0x0) kernel .text 0xc0100000 0xc0245a40 0xc0245fb0 0xde893cd8 0xc0230127 ip_finish_output2+0xa7 (0xded0af00) kernel .text 0xc0100000 0xc0230080 0xc0230190 0xde893d04 0xc022c47d ip_local_deliver_finish+0x12d (0xf6a42a80, 0x6, 0xded0af00, 0xde93c840, 0xdeb25400) kernel .text 0xc0100000 0xc022c350 0xc022c490 0xde893d10 0xc023025e ip_queue_xmit2+0xce (0xded0af00, 0x4dfdb98f, 0x6bfdb98f, 0x0, 0xdeb25400) kernel .text 0xc0100000 0xc0230190 0xc023050e 0xde893d48 0xc022c669 ip_rcv_finish+0x1d9 (0xded0af00, 0xde7a15e0, 0x34, 0x5872be0, 0x0) kernel .text 0xc0100000 0xc022c490 0xc022c6cf 0xde893d6c 0xc022c2ab ip_rcv+0x16b (0xded0af00, 0xdeb25400, 0xc0380ec4, 0xde7dec00, 0xc03abccc) kernel .text 0xc0100000 0xc022c140 0xc022c350 0xde893d8c 0xc021cdb4 netif_receive_skb+0xd4 (0xded0af00, 0x5ea1ea, 0xc03abcc0,0x40, 0xc03abdac) kernel .text 0xc0100000 0xc021cce0 0xc021ce70 0xde893db0 0xc021cefa process_backlog+0x8a (0xc03ab5f0, 0x46, 0x1, 0xc03a7ec0, 0x36) kernel .text 0xc0100000 0xc021ce70 0xc021cfa0 0xde893dfc 0xc011f649 do_softirq+0xe9 (0x36, 0xde893e2c, 0xdd31ac40, 0x6c0, 0x20) kernel .text 0xc0100000 0xc011f560 0xc011f650 0xde893e24 0xc01094a6 do_IRQ+0xf6 (0x35343338, 0x10, 0x805dce0, 0x805dd00, 0xdea5ca78) kernel .text 0xc0100000 0xc01093b0 0xc01094b0 0xde893ef0 0xc010bfc8 call_do_IRQ+0x5 (0xde81fb40, 0xde893f68, 0xfde8, 0x0) kernel .text 0xc0100000 0xc010bfc3 0xc010bfd0 0xc0253941 inet_sendmsg+0x41 (0xde8abcc0, 0xde893f68, 0xfde8, 0xde893f20, 0x2ea6) kernel .text 0xc0100000 0xc0253900 0xc0253950 0xde893f4c 0xc0214f50 sock_sendmsg+0x70 (0xde8abcc0, 0xde893f68, 0xfde8, 0x8054f48, 0xfde8) kernel .text 0xc0100000 0xc0214ee0 0xc0214f90 0xde893f90 0xc02151c2 sock_write+0xa2 (0xf5865460, 0x8054f48, 0xfde8, 0xf5865480, 0x3eb75f16) kernel .text 0xc0100000 0xc0215120 0xc02151e0 0xde893fbc 0xc013da0c sys_write+0x9c (0x3, 0x8054f48, 0xfde8, 0x0, 0x8054eb8) kernel .text 0xc0100000 0xc013d970 0xc013dac0 0xc010774f system_call+0x33 kernel .text 0xc0100000 0xc010771c 0xc0107754 [1]kdb> cpu 2 Entering kdb (current=0xc9764000, pid 17303) on processor 2 due to cpu switch [2]kdb> bt 0xc9764000 17303 779 1 002 run 0xc9764370*tcpdump EBP EIP Function (args) 0xc9765e80 0xc01062fb __write_lock_failed+0x7 (0xc9765e98, 0xc021bb78, 0xdebfb520, 0xf72609a0, 0xc981b120) kernel .text 0xc0100000 0xc01062f4 0xc0106314 0xc026025b .text.lock.brlock+0x5 kernel .text 0xc0100000 0xc0260256 0xc0260260 0xc0260309 get_options+0x49 (0xc99bb900, 0xc9765eec, 0x14, 0x0, 0x3000011) kernel .text 0xc0100000 0xc02602c0 0xc0260310 0xc9765f78 0xc0215c14 sys_bind+0x64 (0x3, 0xbffff780, 0x14, 0xf58650e0, 0x8933) kernel .text 0xc0100000 0xc0215bb0 0xc0215c40 0xc9765fbc 0xc02168be sys_socketcall+0x8e (0x2, 0xbffff770, 0x80a616c, 0xbffff8a0, 0x3) kernel .text 0xc0100000 0xc0216830 0xc0216a70 0xc010774f system_call+0x33 kernel .text 0xc0100000 0xc010771c 0xc0107754 [2]kdb> -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | From pb@bieringer.de Thu Jun 19 05:44:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 05:44:20 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JCi72x030198 for ; Thu, 19 Jun 2003 05:44:08 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 1ACF61387A; Thu, 19 Jun 2003 14:44:01 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 3AE321387E for ; Thu, 19 Jun 2003 14:44:00 +0200 (CEST) X-AV-Checked: Thu Jun 19 14:44:00 2003 smtp2.aerasec.de Received: from [192.168.1.2] (pD950F34A.dip.t-dialin.net [217.80.243.74]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 82F2F1387A for ; Thu, 19 Jun 2003 14:43:59 +0200 (CEST) Date: Thu, 19 Jun 2003 14:43:53 +0200 From: Peter Bieringer To: Maillist netdev Subject: Re: sundance driver does not work with D-Link DFE-580TX Message-ID: <57880000.1056026633@worker.muc.bieringer.de> In-Reply-To: <200305261604.h4QG4iQY020725@sandelman.ottawa.on.ca> References: <200305261604.h4QG4iQY020725@sandelman.ottawa.on.ca> X-Mailer: Mulberry/3.0.3 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3417 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Monday, May 26, 2003 12:04:43 PM -0400 Michael Richardson wrote: > >>>>>> "Peter" == Peter Bieringer writes: > Peter> pls. take a look into: > > Peter> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=69000 > > Peter> Looks like current included "sundance.c" driver doesn't work > proper Peter> together with current "mii.c" on a D-Link DFE-580TX. > > This is a redhat bug. Stock kernels do not have this problem. > Nor does pure-Becker code. I (also others independed) finally find the bug: RHL uses CONFIG_SUNDANCE_MMIO=y on kernel building which causes the problem. Looks like this option should be removed or the driver should be improved (perhaps autodetection?). Imho it doesn't make much sense to add an compile option which (in case of enabled) break usages of already available NICs. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From shmulik.hen@intel.com Thu Jun 19 06:15:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 06:15:21 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JDF82x030794 for ; Thu, 19 Jun 2003 06:15:09 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5JDARh06509 for ; Thu, 19 Jun 2003 13:10:27 GMT Received: from fmsmsxvs043.fm.intel.com (fmsmsxvs043.fm.intel.com [132.233.42.129]) by talaria.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5JDFdp06751 for ; Thu, 19 Jun 2003 13:15:39 GMT Received: from jrslxjul1.npdj.intel.com ([10.12.254.186]) by fmsmsxvs043.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061906122402381 ; Thu, 19 Jun 2003 06:12:27 -0700 Date: Thu, 19 Jun 2003 16:15:02 +0300 (IDT) From: Shmulik Hen X-X-Sender: Reply-To: Shmulik Hen To: "Hen, Shmulik" cc: bond-devel , linux-net , linux-netdev , "Noam, Amir" , "Chad N. Tindel" , "David S. Miller" , Jay Vosburgh , Jeff Garzik , "Marom, Noam" , "Mendelson, Tsippy" Subject: Re: System hard locks with bonding and tcpdump In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by oss.sgi.com id h5JDF82x030794 X-archive-position: 3418 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev On Thu, 19 Jun 2003, Hen, Shmulik wrote: >         The system will occasionally freeze. When using the -p option to > *not* set the interface into promiscuous mode, every thing is OK. Using > KDB we were able to conclude that this is probably a deadlock between the > br lock and either dev->queue_lock or dev->xmit_lock (see trace below). > > > > Entering kdb (current=0xc9764000, pid 17303) on processor 2 due to cpu switch > [2]kdb> bt > 0xc9764000    17303      779  1  002  run   0xc9764370*tcpdump > EBP        EIP        Function (args) > 0xc9765e80 0xc01062fb __write_lock_failed+0x7 (0xc9765e98, 0xc021bb78, 0xdebfb520, 0xf72609a0, 0xc981b120) >                                kernel .text 0xc0100000 0xc01062f4 0xc0106314 >            0xc026025b .text.lock.brlock+0x5 >                                kernel .text 0xc0100000 0xc0260256 0xc0260260 > There seems to be more info regarding this bug that points to the fact that this may be a kernel bug. A more comprehensive investigation done by Tsippy Mendelson reveals the following details: The deadlock is not between the br lock and the dev locks, but rather between different lock attempts done on the same br lock. Looking at the transmit flow the TCP packet passes when tcpdump is running, it looks as though nf_hook_slow() does a br_read_lock_bh(BR_NETPROTO_LOCK) first, and later, further down the flow, dev_queue_xmit_nit() does a br_read_lock(BR_NETPROTO_LOCK). In between, tcpdump tries to hold BR_NETPROTO_LOCK for writing (as seen in the trace) and so, we get a write lock waiting on a read lock, and another read lock waiting on the write lock but is on the same CPU of the first lock - deadlock! The funny thing is that just above the place where the first lock is held the following comment appears: "We may already have this, but read-locks nest anyway" Any thoughts/comments about what can be done ? -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | From shmulik.hen@intel.com Thu Jun 19 06:32:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 06:32:47 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JDWZ2x031200 for ; Thu, 19 Jun 2003 06:32:35 -0700 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5JDRsh16247 for ; Thu, 19 Jun 2003 13:27:54 GMT Received: from fmsmsxvs041.fm.intel.com (fmsmsxvs041.fm.intel.com [132.233.42.126]) by petasus.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5JDPkL29658 for ; Thu, 19 Jun 2003 13:25:47 GMT Received: from jrslxjul1.npdj.intel.com ([10.12.254.186]) by fmsmsxvs041.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061906290530554 ; Thu, 19 Jun 2003 06:29:08 -0700 Date: Thu, 19 Jun 2003 16:32:29 +0300 (IDT) From: Shmulik Hen X-X-Sender: Reply-To: Shmulik Hen To: linux-net , linux-netdev , bond-devel cc: Amir Noam , Noam Marom , Scott Feldman , Shmulik Hen , "Chad N. Tindel" , "David S. Miller" , Jay Vosburgh , Jeff Garzik , Tsippy Mendelson Subject: [bonding][BUG] UDP Tx stops after link disconnection of active slave Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3419 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Hi, We've noticed that when using configuring bonding to active backup mode with two slaves and running UDP stress traffic to several clients, traffic will stop when disconnecting the link for the current active slave. When re-connecting the slave the traffic will resume even though it is not the active slave anymore. While traffic is stopped it looks as though the application generating the traffic (iperf, netperf, etc.) is hung and doesn't generate new packets. Once link is restored the application wakes up again and stars sending new packets. Further investigation showed that slave adapters tend to stop processing the queue when their link is down, and therefore any packets that were already passed from bonding to the disconnected slave are *stuck* in the queue, thus causing the resources associated with those packets not to free and so the application hits it's resource limits and stalls. Upon link connection, the packets in the queue are processed and the application is free to allocate new resources. Is there any safe way for bonding to empty the queue of slaves that lost their link/ became inactive? -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | From toml@us.ibm.com Thu Jun 19 07:09:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 07:09:34 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JE9G2x032411 for ; Thu, 19 Jun 2003 07:09:25 -0700 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5JE5TpS190022; Thu, 19 Jun 2003 10:05:30 -0400 Received: from d01ml072.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5JE5K82181920; Thu, 19 Jun 2003 10:05:26 -0400 Subject: Re: Flow cache flush oops To: Herbert Xu Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Thu, 19 Jun 2003 09:05:16 -0500 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 06/19/2003 10:05:26 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 3420 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev Here is the patch. Installed and working. Thank you for the quick response. Tom From ctindel@calma.pair.com Thu Jun 19 08:35:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 08:35:08 -0700 (PDT) Received: from calma.pair.com (calma.pair.com [209.68.1.95]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JFZ02x006283 for ; Thu, 19 Jun 2003 08:35:00 -0700 Received: (qmail 76381 invoked by uid 3059); 19 Jun 2003 15:34:59 -0000 Date: Thu, 19 Jun 2003 11:34:59 -0400 From: "Chad N. Tindel" To: Shmulik Hen Cc: linux-net , linux-netdev , bond-devel , Amir Noam , Noam Marom , Scott Feldman , "David S. Miller" , Jay Vosburgh , Jeff Garzik , Tsippy Mendelson Subject: Re: [bonding][BUG] UDP Tx stops after link disconnection of active slave Message-ID: <20030619153459.GA75674@calma.pair.com> Mail-Followup-To: Shmulik Hen , linux-net , linux-netdev , bond-devel , Amir Noam , Noam Marom , Scott Feldman , "David S. Miller" , Jay Vosburgh , Jeff Garzik , Tsippy Mendelson References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-archive-position: 3421 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chad@tindel.net Precedence: bulk X-list: netdev > We've noticed that when using configuring bonding to active backup > mode with two slaves and running UDP stress traffic to several clients, > traffic will stop when disconnecting the link for the current active > slave. When re-connecting the slave the traffic will resume even though it > is not the active slave anymore. While traffic is stopped it looks as > though the application generating the traffic (iperf, netperf, etc.) is > hung and doesn't generate new packets. Once link is restored the > application wakes up again and stars sending new packets. Further > investigation showed that slave adapters tend to stop processing the queue > when their link is down, and therefore any packets that were already > passed from bonding to the disconnected slave are *stuck* in the queue, > thus causing the resources associated with those packets not to free and > so the application hits it's resource limits and stalls. Upon link > connection, the packets in the queue are processed and the application is > free to allocate new resources. > > Is there any safe way for bonding to empty the queue of slaves that lost > their link/ became inactive? Interesting. Perhaps bonding should go through the queue and move all the pending packets over into the new active slave? It'd be nice if we could just transfer the entire IP stack around as one big object... Chad From jgarzik@pobox.com Thu Jun 19 10:20:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 10:20:15 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JHK32x009623 for ; Thu, 19 Jun 2003 10:20:06 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19T34r-000286-6q; Thu, 19 Jun 2003 18:20:02 +0100 Message-ID: <3EF1F0B6.8020505@pobox.com> Date: Thu, 19 Jun 2003 13:19:50 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: Peter Bieringer CC: Maillist netdev Subject: Re: sundance driver does not work with D-Link DFE-580TX References: <200305261604.h4QG4iQY020725@sandelman.ottawa.on.ca> <57880000.1056026633@worker.muc.bieringer.de> In-Reply-To: <57880000.1056026633@worker.muc.bieringer.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3422 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Peter Bieringer wrote: > I (also others independed) finally find the bug: > > RHL uses CONFIG_SUNDANCE_MMIO=y on kernel building which causes the > problem. FWIW we've changed this to N. Jeff From shemminger@osdl.org Thu Jun 19 12:13:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:13:49 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJDd2x013257 for ; Thu, 19 Jun 2003 12:13:42 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5JJDPX14315; Thu, 19 Jun 2003 12:13:25 -0700 Date: Thu, 19 Jun 2003 12:13:25 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] add prefetch to skb_queue_walk Message-Id: <20030619121325.0a2059ee.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3423 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev skb_queue_walk macro can use prefetch's (like list_for_each) --- include/linux/skbuff.h.orig 2003-06-19 12:08:17.000000000 -0700 +++ include/linux/skbuff.h 2003-06-19 12:08:43.000000000 -0700 @@ -1149,9 +1149,9 @@ } #define skb_queue_walk(queue, skb) \ - for (skb = (queue)->next; \ + for (skb = (queue)->next, prefetch(skb->next); \ (skb != (struct sk_buff *)(queue)); \ - skb = skb->next) + skb = skb->next, prefetch(skb->next)) extern struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, From davem@redhat.com Thu Jun 19 12:17:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:17:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJHj2x013572 for ; Thu, 19 Jun 2003 12:17:45 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA10946; Thu, 19 Jun 2003 12:11:08 -0700 Date: Thu, 19 Jun 2003 12:11:08 -0700 (PDT) Message-Id: <20030619.121108.21920417.davem@redhat.com> To: herbert@gondor.apana.org.au Cc: toml@us.ibm.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Flow cache flush oops From: "David S. Miller" In-Reply-To: <20030619093518.GA27025@gondor.apana.org.au> References: <20030619000704.GA25135@gondor.apana.org.au> <20030619093518.GA27025@gondor.apana.org.au> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3424 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Herbert Xu Date: Thu, 19 Jun 2003 19:35:18 +1000 Here is the patch. Applied, thanks Herbert. Reviewing this made me notice that we don't check the __get_free_pages() return value. I'll fix that. Thanks again. From shemminger@osdl.org Thu Jun 19 12:20:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:20:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJKb2x013902 for ; Thu, 19 Jun 2003 12:20:38 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5JJKOX16398; Thu, 19 Jun 2003 12:20:24 -0700 Date: Thu, 19 Jun 2003 12:20:24 -0700 From: Stephen Hemminger To: "David S. Miller" , Michal Ostrowski Cc: netdev@oss.sgi.com Subject: [PATCH] missing owner field entry on pppoe /proc Message-Id: <20030619122024.3d21c0ec.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3425 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev The /proc file operations for PPPOE is missing an owner entry. This means that if someone has /proc file open, it is still possible to unload the module. Patch against 2.5.72 diff -Nru a/drivers/net/pppoe.c b/drivers/net/pppoe.c --- a/drivers/net/pppoe.c Thu Jun 19 12:17:27 2003 +++ b/drivers/net/pppoe.c Thu Jun 19 12:17:28 2003 @@ -1061,6 +1061,7 @@ } static struct file_operations pppoe_seq_fops = { + .owner = THIS_MODULE, .open = pppoe_seq_open, .read = seq_read, .llseek = seq_lseek, From shemminger@osdl.org Thu Jun 19 12:24:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:24:53 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJOj2x014212 for ; Thu, 19 Jun 2003 12:24:45 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5JJOXX17403; Thu, 19 Jun 2003 12:24:34 -0700 Date: Thu, 19 Jun 2003 12:24:33 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.72] mark assert error cases in skbuff unlikely Message-Id: <20030619122433.2dc720ec.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3426 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev skbuff.h includes a number of checks for error conditions which can be optimized by the compiler. Use BUG_ON where possible. --- include/linux/skbuff.h.orig 2003-06-19 12:10:27.000000000 -0700 +++ include/linux/skbuff.h 2003-06-19 11:44:43.000000000 -0700 @@ -802,12 +802,9 @@ skb_shinfo(skb)->nr_frags = i+1; } -#define SKB_PAGE_ASSERT(skb) do { if (skb_shinfo(skb)->nr_frags) \ - BUG(); } while (0) -#define SKB_FRAG_ASSERT(skb) do { if (skb_shinfo(skb)->frag_list) \ - BUG(); } while (0) -#define SKB_LINEAR_ASSERT(skb) do { if (skb_is_nonlinear(skb)) \ - BUG(); } while (0) +#define SKB_PAGE_ASSERT(skb) BUG_ON(skb_shinfo(skb)->nr_frags) +#define SKB_FRAG_ASSERT(skb) BUG_ON(skb_shinfo(skb)->frag_list) +#define SKB_LINEAR_ASSERT(skb) BUG_ON(skb_is_nonlinear(skb)) /* * Add data to an sk_buff @@ -836,7 +833,7 @@ SKB_LINEAR_ASSERT(skb); skb->tail += len; skb->len += len; - if (skb->tail>skb->end) + if (unlikely(skb->tail>skb->end)) skb_over_panic(skb, len, current_text_addr()); return tmp; } @@ -861,7 +858,7 @@ { skb->data -= len; skb->len += len; - if (skb->datahead) + if (unlikely(skb->datahead)) skb_under_panic(skb, len, current_text_addr()); return skb->data; } @@ -869,8 +866,7 @@ static inline char *__skb_pull(struct sk_buff *skb, unsigned int len) { skb->len -= len; - if (skb->len < skb->data_len) - BUG(); + BUG_ON(skb->len < skb->data_len); return skb->data += len; } @@ -1132,8 +1128,7 @@ static inline void *kmap_skb_frag(const skb_frag_t *frag) { #ifdef CONFIG_HIGHMEM - if (in_irq()) - BUG(); + BUG_ON(in_irq()); local_bh_disable(); #endif From davem@redhat.com Thu Jun 19 12:39:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:39:54 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJdn2x031039 for ; Thu, 19 Jun 2003 12:39:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA11065; Thu, 19 Jun 2003 12:34:59 -0700 Date: Thu, 19 Jun 2003 12:34:59 -0700 (PDT) Message-Id: <20030619.123459.35669634.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] add prefetch to skb_queue_walk From: "David S. Miller" In-Reply-To: <20030619121325.0a2059ee.shemminger@osdl.org> References: <20030619121325.0a2059ee.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3427 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Thu, 19 Jun 2003 12:13:25 -0700 skb_queue_walk macro can use prefetch's (like list_for_each) Applied, thanks. From davem@redhat.com Thu Jun 19 12:41:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:41:19 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJfF2x001357 for ; Thu, 19 Jun 2003 12:41:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA11078; Thu, 19 Jun 2003 12:36:25 -0700 Date: Thu, 19 Jun 2003 12:36:25 -0700 (PDT) Message-Id: <20030619.123625.38706439.davem@redhat.com> To: shemminger@osdl.org Cc: mostrows@styx.uwaterloo.ca, netdev@oss.sgi.com Subject: Re: [PATCH] missing owner field entry on pppoe /proc From: "David S. Miller" In-Reply-To: <20030619122024.3d21c0ec.shemminger@osdl.org> References: <20030619122024.3d21c0ec.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3428 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Thu, 19 Jun 2003 12:20:24 -0700 The /proc file operations for PPPOE is missing an owner entry. This means that if someone has /proc file open, it is still possible to unload the module. Applied, thanks. From davem@redhat.com Thu Jun 19 12:42:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 12:42:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JJgR2x001867 for ; Thu, 19 Jun 2003 12:42:28 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA11086; Thu, 19 Jun 2003 12:37:40 -0700 Date: Thu, 19 Jun 2003 12:37:40 -0700 (PDT) Message-Id: <20030619.123740.35020381.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.5.72] mark assert error cases in skbuff unlikely From: "David S. Miller" In-Reply-To: <20030619122433.2dc720ec.shemminger@osdl.org> References: <20030619122433.2dc720ec.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3429 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Thu, 19 Jun 2003 12:24:33 -0700 skbuff.h includes a number of checks for error conditions which can be optimized by the compiler. Use BUG_ON where possible. Applied, thanks. From shemminger@osdl.org Thu Jun 19 15:19:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 15:19:38 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JMJR2x012701 for ; Thu, 19 Jun 2003 15:19:28 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5JMJEX29810; Thu, 19 Jun 2003 15:19:14 -0700 Date: Thu, 19 Jun 2003 15:19:14 -0700 From: Stephen Hemminger To: "David S. Miller" , Paul Mackerras Cc: netdev@oss.sgi.com, linux-ppp@vger.kernel.org Subject: [PATCH 2.5.72] ppp_async tty discipline module ref counting Message-Id: <20030619151914.22988936.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3430 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change ppp_async to use the owner field to do the module reference counting and eliminate the the MOD_INC/DEC calls. diff -Nru a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c --- a/drivers/net/ppp_async.c Thu Jun 19 15:15:56 2003 +++ b/drivers/net/ppp_async.c Thu Jun 19 15:15:56 2003 @@ -147,7 +147,6 @@ struct asyncppp *ap; int err; - MOD_INC_USE_COUNT; err = -ENOMEM; ap = kmalloc(sizeof(*ap), GFP_KERNEL); if (ap == 0) @@ -183,7 +182,6 @@ out_free: kfree(ap); out: - MOD_DEC_USE_COUNT; return err; } @@ -223,7 +221,6 @@ if (ap->tpkt != 0) kfree_skb(ap->tpkt); kfree(ap); - MOD_DEC_USE_COUNT; } /* @@ -351,6 +348,7 @@ static struct tty_ldisc ppp_ldisc = { + .owner = THIS_MODULE, .magic = TTY_LDISC_MAGIC, .name = "ppp", .open = ppp_asynctty_open, From herbert@gondor.apana.org.au Thu Jun 19 15:47:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 15:47:42 -0700 (PDT) Received: from arnor.me.apana.org.au (mail@[203.14.152.115]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5JMlQ2x013817 for ; Thu, 19 Jun 2003 15:47:29 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 19T87n-0004gR-00; Fri, 20 Jun 2003 08:43:23 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 19T87Y-0008N0-00; Fri, 20 Jun 2003 08:43:08 +1000 Date: Fri, 20 Jun 2003 08:43:08 +1000 To: "David S. Miller" Cc: toml@us.ibm.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Flow cache flush oops Message-ID: <20030619224308.GA32165@gondor.apana.org.au> References: <20030619000704.GA25135@gondor.apana.org.au> <20030619093518.GA27025@gondor.apana.org.au> <20030619.121108.21920417.davem@redhat.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="vtzGhvizbBRQ85DL" Content-Disposition: inline In-Reply-To: <20030619.121108.21920417.davem@redhat.com> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 3431 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --vtzGhvizbBRQ85DL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jun 19, 2003 at 12:11:08PM -0700, David S. Miller wrote: > > Reviewing this made me notice that we don't check the > __get_free_pages() return value. I'll fix that. Thanks. Perhaps we should check that in the init function as well. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --vtzGhvizbBRQ85DL Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- x/net/core/flow.c.orig 2003-06-20 08:35:54.000000000 +1000 +++ x/net/core/flow.c 2003-06-20 08:41:37.000000000 +1000 @@ -388,11 +388,14 @@ add_timer(&flow_hash_rnd_timer); register_cpu_notifier(&flow_cache_cpu_nb); - for (i = 0; i < NR_CPUS; i++) - if (cpu_online(i)) { - flow_cache_cpu_prepare(i); - flow_cache_cpu_online(i); - } + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_online(i)) + continue; + if (flow_cache_cpu_prepare(i) == NOTIFY_OK && + flow_cache_cpu_online(i) == NOTIFY_OK) + continue; + panic("NET: failed to initialise flow cache hash table\n"); + } return 0; } --vtzGhvizbBRQ85DL-- From davem@redhat.com Thu Jun 19 18:53:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 18:53:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K1rF2x016168 for ; Thu, 19 Jun 2003 18:53:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA12055; Thu, 19 Jun 2003 18:47:08 -0700 Date: Thu, 19 Jun 2003 18:47:07 -0700 (PDT) Message-Id: <20030619.184707.74741850.davem@redhat.com> To: herbert@gondor.apana.org.au Cc: toml@us.ibm.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Flow cache flush oops From: "David S. Miller" In-Reply-To: <20030619224308.GA32165@gondor.apana.org.au> References: <20030619093518.GA27025@gondor.apana.org.au> <20030619.121108.21920417.davem@redhat.com> <20030619224308.GA32165@gondor.apana.org.au> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3432 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Herbert Xu Date: Fri, 20 Jun 2003 08:43:08 +1000 Perhaps we should check that in the init function as well. Yep, I've applied your patch. Thanks. From jgarzik@pobox.com Thu Jun 19 19:07:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 19:07:39 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K27S2x017032 for ; Thu, 19 Jun 2003 19:07:33 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19TBJH-00015C-HY; Fri, 20 Jun 2003 03:07:27 +0100 Message-ID: <3EF26C54.70008@pobox.com> Date: Thu, 19 Jun 2003 22:07:16 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: netdev@oss.sgi.com Subject: Re: [PATCH] Remove copied inet_aton code in bond_main.c References: <20030618110946.GA6851@averell> In-Reply-To: <20030618110946.GA6851@averell> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3433 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev thanks, applied to 2.4 and 2.5. From garzik@gtf.org Thu Jun 19 20:46:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 20:46:39 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K3kW2x017919 for ; Thu, 19 Jun 2003 20:46:33 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 5B934665A; Thu, 19 Jun 2003 23:46:26 -0400 (EDT) Date: Thu, 19 Jun 2003 23:46:26 -0400 From: Jeff Garzik To: torvalds@transmeta.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [BK PATCHES] net driver merges Message-ID: <20030620034626.GA3366@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 3434 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Linus, please do a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.5 Others may download the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.72-bk2-netdrvr1.patch.bz2 This will update the following files: drivers/net/amd8111e.c | 12 +++-- drivers/net/bonding/bond_main.c | 88 +++------------------------------------ drivers/net/ixgb/ixgb_ethtool.c | 1 drivers/net/pcmcia/xirc2ps_cs.c | 89 ++++++++++++++++------------------------ drivers/net/pcnet32.c | 2 drivers/net/sis900.c | 1 drivers/net/tulip/Kconfig | 2 7 files changed, 54 insertions(+), 141 deletions(-) through these ChangeSets: (03/06/19 1.1366) [netdrvr tulip] Kconfig help text fix While there is a separate driver for 2104x tulips (CONFIG_DE2104X), drivers/net/tulip/Kconfig states that CONFIG_TULIP also supports 2104x tulips. This is not the case since that support was removed in December 2001. A user with an old tulip may thus be tricked into configuring the wrong driver. (I was, on my PMac 4400.) The patch below removes this misinformation from tulip's Kconfig. (03/06/19 1.1365) [netdrvr sis900] add new phy id to phy table (pulled change from 2.4) (03/06/19 1.1364) [PATCH] xirc2ps_cs update the second patch: replaces busy_loop with a simple macro doing a schedule_timeout. busy_loop was never called from interrupt conext anyway, so no need for that. and the sti() is gone. rgds -daniel (03/06/19 1.1363) [PATCH] xirc2ps_cs update hi this patch does: - net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev - xirc2ps_detach calls xirc2ps_release if necessary (like the other drivers) against 2.5.70-bk. rgds -daniel (03/06/19 1.1362) [PATCH] Remove warning due to comparison in drivers/net/pcnet32.c drivers/net/pcnet32.c: In function `pcnet32_init_ring': drivers/net/pcnet32.c:1006: warning: comparison between pointer and integer (03/06/19 1.1361) [netdrvr ixgb] fix clash with newly-updated ethtool.h (03/06/19 1.1360) [netdrvr amd8111e] fix spinlock recursion / if close failure (03/06/19 1.1359) [PATCH] Remove copied inet_aton code in bond_main.c According to a report the my_inet_aton code in bond_main.c is copied from 4.4BSD, but it doesn't carry a BSD copyright license. In addition it is somewhat redundant with the standard in_aton. Convert it to use the linux function. Error handling is a bit worse than before, but not much. Patch for 2.5 bonding. The 2.4 version has the same problem, but afaik it is scheduled to be replaced by the 2.5 codebase anyways. -Andi From garzik@gtf.org Thu Jun 19 20:59:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 20:59:30 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K3xJ2x018297 for ; Thu, 19 Jun 2003 20:59:19 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id EAD2E665A; Thu, 19 Jun 2003 23:59:13 -0400 (EDT) Date: Thu, 19 Jun 2003 23:59:13 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [PATCHES] 2.4.x net driver updates Message-ID: <20030620035913.GA3878@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 3435 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev This is the patchset going to Marcelo after he releases 2.4.22-pre1. BK users may issue bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 Others may download the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-netdrvr1.patch.bz2 This will update the following files: drivers/net/bonding.c | 3434 ------------------------ Documentation/Configure.help | 9 Documentation/networking/bonding.txt | 537 ++- Documentation/networking/ifenslave.c | 496 ++- drivers/net/3c59x.c | 13 drivers/net/8139cp.c | 9 drivers/net/8139too.c | 6 drivers/net/Config.in | 3 drivers/net/Makefile | 8 drivers/net/amd8111e.c | 1075 ++++--- drivers/net/amd8111e.h | 968 +++--- drivers/net/arcnet/arcnet.c | 2 drivers/net/arcnet/rfc1201.c | 6 drivers/net/bonding.c | 266 + drivers/net/bonding/Makefile | 18 drivers/net/bonding/bond_3ad.c | 2667 ++++++++++++++++++- drivers/net/bonding/bond_3ad.h | 342 ++ drivers/net/bonding/bond_alb.c | 1585 +++++++++++ drivers/net/bonding/bond_alb.h | 129 drivers/net/bonding/bond_main.c | 4883 +++++++++++++++++++++++++++++++---- drivers/net/bonding/bonding.h | 209 + drivers/net/cs89x0.c | 11 drivers/net/dl2k.h | 1 drivers/net/e100/e100.h | 30 drivers/net/e100/e100_main.c | 389 +- drivers/net/e100/e100_phy.c | 7 drivers/net/e100/e100_test.c | 155 - drivers/net/e1000/Makefile | 2 drivers/net/e1000/e1000.h | 8 drivers/net/e1000/e1000_ethtool.c | 959 ++++++ drivers/net/e1000/e1000_hw.c | 23 drivers/net/e1000/e1000_hw.h | 8 drivers/net/e1000/e1000_main.c | 312 +- drivers/net/e1000/e1000_osdep.h | 2 drivers/net/eepro.c | 2 drivers/net/eepro100.c | 1 drivers/net/ns83820.c | 2 drivers/net/pci-skeleton.c | 4 drivers/net/pcnet32.c | 9 drivers/net/r8169.c | 52 drivers/net/sis900.c | 100 drivers/net/sk98lin/skge.c | 2 drivers/net/sundance.c | 144 - drivers/net/tg3.c | 2 drivers/net/tlan.c | 258 + drivers/net/tlan.h | 7 drivers/net/tokenring/olympic.c | 3 drivers/net/tulip/tulip_core.c | 7 drivers/net/typhoon.c | 4 drivers/net/via-rhine.c | 2 drivers/net/wireless/airo.c | 2 include/linux/ethtool.h | 27 include/linux/if_arcnet.h | 4 include/linux/if_bonding.h | 101 include/linux/if_vlan.h | 1 include/linux/skbuff.h | 4 include/net/if_inet6.h | 5 include/net/irda/irlan_common.h | 2 net/core/dev.c | 4 net/core/skbuff.c | 3 net/ipv6/addrconf.c | 13 net/ipv6/ndisc.c | 3 net/irda/irlan/irlan_eth.c | 6 63 files changed, 13372 insertions(+), 5974 deletions(-) through these ChangeSets: (03/06/19 1.1236) [PATCH] PATCH: fix bug in drivers/net/cs89x0.c:set_mac_address() Hello Andrew, Jeff and Alan, the following patch fixes a bug in the CS89xx net device which would set new MAC address through SIOCSIFHWADDR _only_ when net_debug is set, which is obviously not what it was meant to do. The original code bogusly interpreted the addr argument as a buffer containing the MAC address instead of a struct sockaddr. Applies as-is to 2.4.20 and with offset to 2.5.69. Please forward it to Linus and Marcelo. This bug has been found and fixed by Stefano Fedrigo . (03/06/19 1.1235) [netdrvr sis900] minor fixes from 2.5 spelling, C99 initializers, jiffy wrap, set_bit (03/06/19 1.1234) [netdrvr sis900] make function headers readable by kernel-doc tool (03/06/19 1.1233) [PATCH] Remove warning due to comparison in drivers/net/pcnet32.c drivers/net/pcnet32.c: In function `pcnet32_init_ring': drivers/net/pcnet32.c:1006: warning: comparison between pointer and integer (03/06/19 1.1232) [PATCH] new eepro100 PDI ID [PATCH] new eepro100 PDI ID From: Tom Alsberg Add support for a new eepro100 PCI ID. (03/06/19 1.1231) [PATCH] Remove copied inet_aton code in bond_main.c According to a report the my_inet_aton code in bond_main.c is copied from 4.4BSD, but it doesn't carry a BSD copyright license. In addition it is somewhat redundant with the standard in_aton. Convert it to use the linux function. Error handling is a bit worse than before, but not much. Patch for 2.5 bonding. The 2.4 version has the same problem, but afaik it is scheduled to be replaced by the 2.5 codebase anyways. -Andi (03/06/19 1.1230) [PATCH] Additional 3c980 device support From: "J.A. Magallon" Adds support for a couple of 3c980 variants which are in pci.ids, but not in the driver. (03/06/08 1.1226) [netdrvr amd8111e] bug fix: move stats update after irq free (03/06/08 1.1225) [e1000] Whitespace cleanup * Whitespace cleanup (03/06/08 1.1224) [e1000] Miscellaneous code cleanup * Added Change Log entries * Miscellaneous code cleanup (03/06/08 1.1223) [e1000] Fixed LED coloring on 82541/82547 controllers * LED colors on 82541 and 82547 controllers were incorrect. The LED mode register didn't have the proper configuration. (03/06/08 1.1222) [e1000] Removed strong branded device ids * Removed strong branded device ids from teh device id table along with the associated branding strings. (03/06/08 1.1221) [e1000] Added support for 82546 Quad-port adapter * Added support for 82546 Quad-port adapter (03/06/08 1.1220) [e1000] Added ethtool test ioctl * Added routines for the Ethtool Test ioctl. * Added more statistics for the Ethtool statistics dump. * Added more registers for the register dump. (03/06/08 1.1219) [e1000] TSO fix * Premature write-back of descriptors during TSO causing resources to be returned too early on ppc64. Fix is to wait until last descriptor of frame is written back, then return resources back to OS. * 82544 hang caused by setting RS bit in context descriptor. Exposes known hang in 82544. Fix is same as above - set RS bit only in last descriptor. (03/06/08 1.1218) [e100] misc * Removed leftovers from removal of /proc support and IDIAG support * Cleaned up reporting of h/w init failure messages * Add 1/2 second delay after PHY reset to allow link partner to see and respond to reset, per IEEE 802.3. (03/06/08 1.1217) [e100] set netdev members before registration * Bug fix: setndev members before netdev registration to avoid races. (03/06/08 1.1216) [e100] use skb_headlen() rather than rolling own. * Cleanup: use skb_headlen() rather than rolling own. Sync w/ 2.5 driver. (03/06/08 1.1215) [e100] VLAN configuration was lost after ethtool diags run * Bug fix: ethtool diags would call e100_up/e100_down, which overwrite current VLAN settings. Move initialization of config regs out of up/down. (03/06/08 1.1214) [e100] fixed stalled stats collection * Bug fix: In the rare event of a failed command to dump stats, stat collection would stop, giving the illusion that traffic had stopped. Fixed by issuing stat dump in watchdog regardless of the status of previous attempt to dump stats. (03/06/08 1.1213) [e100] full stop/start on ethtool set speed/duplex/autoneg * Cleanup ethtool/mii_ioctl sets of speed/duplex/autoneg by stop/set/start driver to ensure sets stick. Must hold xmit_lock around stop/start. (03/06/08 1.1212) [e100] cleanup Tx resources before running ethtool diags * Bug fix: clean up Tx resources before runnig ethtool diags. (03/06/08 1.1211) [e100] Add MDI/MDI-X status to ethtool reg dump * Add MDI/MDI-X (crossover cable) status to ethtool reg dump. (03/06/08 1.1210) [e100] Add ethtool cable diag test * Feature add: ethtool cable diag test. * Some cleanup of the ethtool diags. * Fixed bug in return code for ethtool diag results. (03/06/08 1.1209) [e100] Add ethtool parameter support * Feature add: ethtool parameter support: Tx/Rx ring size, Rx xsum offloading, flow control. (03/06/08 1.1208) [e100] move e100_asf_enable under CONFIG_PM to avoid warning * Bug fix: move e100_asf_enable under CONFIG_PM to avoid compile warning. [Stephen Rothwell (sfr@canb.auug.org.ua)] (03/06/08 1.1207) [e100] Remove "Freeing alive device" warning * Bug fix: don't call any netif_carrier_* until netdev is registered. [Andrew Morton (akpm@dideo.com)] (03/06/06 1.1205) [PATCH] Bonding 2.4 update patch 6 Fix to the ifenslave -c fix, fix to version control (plus change log update). I've got an additional fix for version control that I'll send you on Monday. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1204) [PATCH] Bonding 2.4 update patch 5 Fix to prevent routes on the bonding device from being lost during enslavement processing. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1203) [PATCH] Bonding 2.4 update patch 4 A fix for ifenslave -c. Later patches have fixes for this fix. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1202) [PATCH] Bonding 2.4 update patch 3 A patch with some miscellaneous little stuff (comments, mode names, fix a printk). Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1201) [PATCH] Bonding 2.4 update patch 2 Small patch to fix endless failover problem in the ARP monitor. Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1200) [PATCH] Bonding 2.4 update patch 1 Documentation. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/bonding.txt (03/06/06 1.1199) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1198) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/05 1.1197) [netdrvr amd8111e] link against mii lib (03/06/04 1.1196) [netdrvr] gcc 3.3 cleanups Mostly marking 64-bit constants as ULL. (03/05/29 1.1185.1.52) [netdrvr amd8111e] remove out-of-tree feature that snuck in (03/05/29 1.1185.1.51) [netdrvr amd8111e] interrupt coalescing, libmii, bug fixes * Dynamic interrupt coalescing * mii lib support * dynamic IPG support (disabled by default) * jumbo frame fix * vlan fix * rx irq coalescing fix (03/05/29 1.1185.1.50) [netdrvr tlan] fix 64-bit issues (03/05/29 1.1185.1.49) [netdrvr r8169] sync with 2.5 (backport whitespace cleanups) (03/05/29 1.1185.1.48) [netdrvr r8169] use alloc_etherdev (fix race), pci_disable_device (03/05/29 1.1185.1.47) [netdrvr olympic] fix build with gcc 3.3 (03/05/29 1.1185.6.3) [netdrvr 8139too] add comment, whitespace cleanup (03/05/28 1.1185.6.2) [netdrvr] s/init_etherdev/alloc_etherdev/ in code comments, in 8139too and pci-skeleton drivers. (03/05/28 1.1185.6.1) [netdrvr tlan] backport fixes and cleanups from 2.5 * alloc_etherdev (fixes race) * PCI DMA API * C99 initializers * speling fixes * use pci_{request,release}_regions for PCI devices * propagate error returns back from pci_xxx functions * call pci_set_dma_mask * use keventd for adapter error reset (2.5 uses workqueue) (03/05/27 1.1185.1.45) [netdrvr pcnet32] bug fixes I would like to see a couple of the pcnet32 changes that I think we can agree on be put into the trees so a couple of the potential defects can be avoided. The following patch contains just these pieces. The only controversial one is an arbitrary change in the number of iterations in a while loop spinning on hardware state. No matter how this is done, I am not especially fond of this bit of code as it has no reasonable error recovery path -- however, as a half-way, incremental solution, increasing the polling time should help as the 100 value was certainly found to be insufficient. 1000 may not be sufficient either, but it is certainly no worse. Both of the other changes were hit in testing (and I belive the wmb() at a customer even), so it would help reduce some debug if these go in. Any feedback is appreciated - thanks. (03/05/27 1.1185.1.44) [netdrvr eepro] update MODULE_AUTHOR per old-author request (03/05/27 1.1185.1.43) [netdrvr sundance] fix another flow control bug (03/05/27 1.1185.1.42) [netdrvr sundance] fix flow control bug (03/05/27 1.1185.1.41) [netdrvr bonding] fix ABI version control problem This fix makes bonding not commit to a specific ABI version if the ioctl command is not supported by bonding. (It also removes the '\n' in the continuous printk reporting the link down event in bond_mii_monitor - it got in there by mistake in our previous patch set and caused log messages to appear funny in some situations). (03/05/27 1.1185.1.40) [netdrvr bonding] fix long failover in 802.3ad mode This patch fixes the bug reported by Jay on April 3rd regarding long failover time when releasing the last slave in the active aggregator. The fix, as suggested by Jay, is to follow the spec recommendation and send a LACPDU to the partner saying this port is no longer aggregatable and therefore trigger an immediate re-selection of a new aggregator instead of waiting the entire expiration timeout. (03/05/25 1.1185.1.39) IPv6 over ARCnet (RFC2497) support, IPv6 part. (03/05/25 1.1185.1.38) IPv6 over ARCnet (RFC2497) support, driver part (03/05/25 1.1185.1.37) [irda] module refcounts for irlan (03/05/23 1.1185.3.7) [bonding] small cleanups (03/05/23 1.1185.3.6) [bonding] add rcv load balancing mode This patch adds a new mode that enables receive load balancing for IPv4 traffic on top of the transmit load balancing mode. This capability is achieved by intercepting and manipulating the ARP negotiation to teach clients several MAC addresses for the bond and thus distribute incoming traffic among all slaves with the highest link speed. In order to function properly, slaves are required to be able to have their MAC address set even while the interface is up since once the primary slave looses its link, the new primary slave (and only it) must be able to take over and receive the incoming traffic instead. If a non-primary slave looses its link, ARP packets will be sent to all clients communicating through it in order to teach them a replacement MAC address, and the primary slave will be put in promiscuous mode for 10 seconds for fault tolerance reasons. This patch is against bonding-20030415, but must come only after the locking scheme changing patch since it uses dev_set_promiscuity() that would otherwise cause a system hang. (03/05/23 1.1185.3.5) [bonding] support xmit load balancing mode (03/05/23 1.1185.3.4) [bonding] much improved locking This patch replaces the use of lock_irqsave/unlock_irqrestore in bonding with lock/unlock or lock_bh/unlock_bh as appropriate according to context. This change is based on a previous discussion regarding the fact that holding a lock_irqsave doesn't prevent softirqs from running which can cause deadlocks in certain situations. This new locking scheme has already undergone massive testing cycle by our QA group and we feel it is ready for release (some new modes and enhancements will not work properly without it). (03/05/23 1.1185.3.3) [bonding] better 802.3ad mode control, some cleanup This patch adds the lacp_rate module param to enable better control over the IEEE 802.3ad mode. This param controls the rate at which the partner system is asked to send LACPDUs to bonding. Two options exist: - slow (or 0) - LACPDUs are 30 seconds apart - fast (or 1) - LACPDUs are 1 second apart The default is slow (like most switches around). There are also some code beautifications (mainly converting comments to C style in code segments we added in the past). (03/05/23 1.1185.3.2) [bonding] ABI versioning This patch adds user-land to kernel ABI version control in bonding to restore backward compatibility between different versions of ifenslave and the bonding module. It uses ethtool's GDRVINFO ioctl to pass the ABI version number between ifenslave and the bonding module in both directions so both the driver and the application can tell which partner they're working against and take the appropriate measures when enslaving/releasing an interface. The bonding module remembers the ABI version received from the application, and from that moment on will deny enslave and release commands from an application using a different ABI version, which means that if you want to switch to an ifenslave with a different ABI version (or with non at all), you'll first have to re-load the bonding module. This patch also changes the driver/application versioning scheme to contain 3 fields X.Y.Z with the follows meaning: X - Major version - big behavior changes Y - Minor version - addition of features Z - Extra version - minor changes and bug fixes There are also three minor bug fixes: 1. Prevent enslaving an interface that is already a slave. 2. Prevent enslaving if the bond is down. 3. In bond_release_all, save old value of current_slave before assigning NULL to it to enable using it's original value later on. This patch is against bonding-20030415. (03/04/27 1.1137.1.6) [netdrvr e1000] add TSO support -- disabled * Copy TSO support for 2.5 e1000. Wrapped with NETIF_F_TSO, so not currently enabled in 2.4. Done to keep 2.4 and 2.5 drivers in-sync as much as possible. (03/04/27 1.1137.1.5) [netdrvr e1000] add support for NAPI * Copy NAPI support from 2.5 e1000 driver * Add CONFIG_E1000_NAPI option (03/04/27 1.1137.1.4) [netdrvr tulip] support DM910x chip from ALi (03/04/27 1.1137.1.3) Remove duplicate CONFIG_TULIP_MWI entry in Configure.help Noticed by Geert Uytterhoeven (03/04/27 1.1137.1.2) [netdrvr 8139cp] enable MWI via pci_set_mwi, rather than manually (03/04/26 1.1131.2.6) [netdrvr typhoon] s/#if/#ifdef/ for a CONFIG_ var (03/04/25 1.1131.2.5) [netdrvr sundance] small cleanups from 2.5 - s/long flag/unsigned long flag/ - C99 initializers (03/04/25 1.1131.2.4) [netdrvr sundance] bug fixes, VLAN support - Fix tx bugs in big-endian machines - Remove unused max_interrupt_work module parameter, the new NAPI-like rx scheme doesn't need it. - Remove redundancy get_stats() in intr_handler(), those I/O access could affect performance in ARM-based system - Add Linux software VLAN support - Fix bug of custom mac address (StationAddr register only accept word write) (03/04/25 1.1131.2.3) [netdrvr via-rhine] fix promisc mode I found a via-rhine bug, it can't receive BPDU (mac: 0180c2000000) in promiscuous mode. Fill all "1" in hash table to fix this problem in promiscuous mode. (RCR remain 0x1c, write it as 0x1f don't work) (03/04/25 1.1131.2.2) [wireless airo] fix end-of-array test FYI statsLabels[] is an array of char*, so the fix below is pretty obvious. (03/04/25 1.1131.2.1) [PATCH] fix .text.exit error in drivers/net/r8169.c In drivers/net/r8169.c the function rtl8169_remove_one is __devexit but the pointer to it didn't use __devexit_p resulting in a.text.exit compile error when !CONFIG_HOTPLUG. The fix is simple: (03/04/17 1.1101.8.7) [bonding] add support for IEEE 802.3ad Dynamic link aggregation Contributed by Shmulik Hen @ Intel, merge by Jay Vosburgh @ IBM (03/04/17 1.1101.8.6) [bonding] move private decls into new drv/net/bonding/bonding.h file (03/04/17 1.1101.8.5) [bonding] move driver into new drivers/net/bonding directory (03/04/17 1.1101.8.4) [bonding] Moved setting slave mac addr, and open, from app to the driver This patch enables support of modes that need to use the unique mac address of each slave. It moves setting the slave's mac address and opening it from the application to the driver. This breaks backward compatibility between the new driver and older applications ! It also blocks possibility of enslaving before the master is up (to prevent putting the system in an unstable state), and removes the code that unconditionally restores all base driver's flags (flags are automatically restored once all undo stages are done in proper order). Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.3) [bonding] add support for getting slave's speed and duplex via ethtool Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.2) [bonding] fix comment to prevent future merge difficulties Contributed by Jay Vosburgh @ IBM (03/04/17 1.1101.8.1) [net] store physical device a packet arrives in on (Needed for bonding) Contributed by Jay Vosburgh @ IBM, Shmulik Hen @ Intel, and others. From xose@wanadoo.es Thu Jun 19 21:44:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 21:44:40 -0700 (PDT) Received: from smtp12.eresmas.com ([62.81.235.112]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K4iS2x018927 for ; Thu, 19 Jun 2003 21:44:30 -0700 Received: from [192.168.108.51] (helo=mx01.eresmas.com) by smtp12.eresmas.com with esmtp (Exim 4.10) id 19ScO3-0004Jm-00; Wed, 18 Jun 2003 14:50:03 +0200 Received: from [80.103.10.142] (helo=wanadoo.es) by mx01.eresmas.com with esmtp (Exim 4.12) id 19ScO4-000832-00; Wed, 18 Jun 2003 14:50:05 +0200 Message-ID: <3EF05FB6.8030306@wanadoo.es> Date: Wed, 18 Jun 2003 14:48:54 +0200 From: Xose Vazquez Perez User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 X-Accept-Language: gl, es, en MIME-Version: 1.0 To: netdev@oss.sgi.com, scott.feldman@intel.com X-Enigmail-Version: 0.63.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Subject: Red: e100-3.0.0_dev8 "Minneapolis Moline" releas Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SA-Exim-Scanned: Yes X-archive-position: 3436 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xose@wanadoo.es Precedence: bulk X-list: netdev >[Someone once suggested e100 as the first nominee into the driver hall >of shame. I'd like to revoke that nomination with this rewrite of >e100]. > > DON'T USE THIS DRIVER ON A PRODUCTION SYSTEM! > >http://sf.net/projects/e1000, download e100-3.0.0_dev8 (tar file or kernel patches). ^^^ ftp has this file: e100-3.0.0_dev6.tar.gz 17-Jun-2003 05:24 43K there is not dev8, it should be dev6 at web page. regards, -- Software is like sex, it's better when it's bug free. From vinay-rc@naturesoft.net Thu Jun 19 23:24:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 19 Jun 2003 23:25:02 -0700 (PDT) Received: from naturesoft.net ([203.145.184.221]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K6On2x020148 for ; Thu, 19 Jun 2003 23:24:51 -0700 Received: from [192.168.0.15] (helo=lima.royalchallenge.com) by naturesoft.net with esmtp (Exim 3.35 #1) id 19TFDy-0005wv-00; Fri, 20 Jun 2003 11:48:14 +0530 Subject: [PATCH 2.4.21][FIX] use mod_timer From: Vinay K Nallamothu To: Marcelo Tosatti Cc: netdev@oss.sgi.com, Jesse Barnes , Alan Cox , LKML Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 20 Jun 2003 12:06:40 +0530 Message-Id: <1056091000.1200.23.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-archive-position: 3437 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay-rc@naturesoft.net Precedence: bulk X-list: netdev Hi, This patch makes use of mod_timer instead of {del,add}_timer. Most of the patches already in -ac series since 2.4.21-rc2 and few of the networking fixes in 2.5.69 The following files are affected: arch/ia64/sn/kernel/irq.c arch/ia64/sn/kernel/mca.c drivers/block/floppy.c drivers/net/wan/sdla_chdlc.c drivers/net/wan/sdla_fr.c drivers/net/wan/sdla_x25.c net/core/dst.c net/sched/sch_cbq.c net/sched/sch_csz.c net/sched/sch_htb.c diff -urN linux-2.4.21/arch/ia64/sn/kernel/irq.c linux-2.4.21-nvk/arch/ia64/sn/kernel/irq.c --- linux-2.4.21/arch/ia64/sn/kernel/irq.c 2003-06-14 10:09:52.000000000 +0530 +++ linux-2.4.21-nvk/arch/ia64/sn/kernel/irq.c 2003-06-20 10:38:47.000000000 +0530 @@ -303,9 +303,7 @@ bridge->b_force_always[intr_test_registered[i].slot].intr = 1; } } - del_timer(&intr_test_timer); - intr_test_timer.expires = jiffies + HZ/100; - add_timer(&intr_test_timer); + mod_timer(&intr_test_timer, jiffies + HZ/100); } void diff -urN linux-2.4.21/arch/ia64/sn/kernel/mca.c linux-2.4.21-nvk/arch/ia64/sn/kernel/mca.c --- linux-2.4.21/arch/ia64/sn/kernel/mca.c 2003-06-14 10:09:52.000000000 +0530 +++ linux-2.4.21-nvk/arch/ia64/sn/kernel/mca.c 2003-06-20 11:00:11.000000000 +0530 @@ -123,9 +123,7 @@ static void sn_cpei_timer_handler(unsigned long dummy) { sn_cpei_handler(-1, NULL, NULL); - del_timer(&sn_cpei_timer); - sn_cpei_timer.expires = jiffies + CPEI_INTERVAL; - add_timer(&sn_cpei_timer); + mod_timer(&sn_cpei_timer, jiffies + CPEI_INTERVAL); } void @@ -147,9 +145,7 @@ unsigned long *pi_ce_error_inject_reg = 0xc00000092fffff00; *pi_ce_error_inject_reg = 0x0000000000000100; - del_timer(&sn_ce_timer); - sn_ce_timer.expires = jiffies + CPEI_INTERVAL; - add_timer(&sn_ce_timer); + mod_timer(&sn_ce_timer, jiffies + CPEI_INTERVAL); } sn_init_ce_timer() { diff -urN linux-2.4.21/drivers/net/wan/sdla_chdlc.c linux-2.4.21-nvk/drivers/net/wan/sdla_chdlc.c --- linux-2.4.21/drivers/net/wan/sdla_chdlc.c 2003-06-14 10:03:17.000000000 +0530 +++ linux-2.4.21-nvk/drivers/net/wan/sdla_chdlc.c 2003-06-20 10:38:40.000000000 +0530 @@ -1089,13 +1089,11 @@ set_bit(0,&chdlc_priv_area->config_chdlc); chdlc_priv_area->config_chdlc_timeout=jiffies; - del_timer(&chdlc_priv_area->poll_delay_timer); /* Start the CHDLC configuration after 1sec delay. * This will give the interface initilization time * to finish its configuration */ - chdlc_priv_area->poll_delay_timer.expires=jiffies+HZ; - add_timer(&chdlc_priv_area->poll_delay_timer); + mod_timer(&chdlc_priv_area->poll_delay_timer, jiffies + HZ); return err; } diff -urN linux-2.4.21/drivers/net/wan/sdla_fr.c linux-2.4.21-nvk/drivers/net/wan/sdla_fr.c --- linux-2.4.21/drivers/net/wan/sdla_fr.c 2003-06-14 10:03:17.000000000 +0530 +++ linux-2.4.21-nvk/drivers/net/wan/sdla_fr.c 2003-06-20 10:38:31.000000000 +0530 @@ -4541,9 +4541,7 @@ { fr_channel_t* chan = dev->priv; - del_timer(&chan->fr_arp_timer); - chan->fr_arp_timer.expires = jiffies + (chan->inarp_interval * HZ); - add_timer(&chan->fr_arp_timer); + mod_timer(&chan->fr_arp_timer, jiffies + chan->inarp_interval * HZ); return; } diff -urN linux-2.4.21/drivers/net/wan/sdla_ppp.c linux-2.4.21-nvk/drivers/net/wan/sdla_ppp.c --- linux-2.4.21/drivers/net/wan/sdla_ppp.c 2003-06-14 10:03:17.000000000 +0530 +++ linux-2.4.21-nvk/drivers/net/wan/sdla_ppp.c 2003-06-20 10:34:06.000000000 +0530 @@ -841,9 +841,7 @@ /* Start the PPP configuration after 1sec delay. * This will give the interface initilization time * to finish its configuration */ - del_timer(&ppp_priv_area->poll_delay_timer); - ppp_priv_area->poll_delay_timer.expires = jiffies+HZ; - add_timer(&ppp_priv_area->poll_delay_timer); + mod_timer(&ppp_priv_area->poll_delay_timer, jiffies + HZ); return 0; } diff -urN linux-2.4.21/drivers/net/wan/sdla_x25.c linux-2.4.21-nvk/drivers/net/wan/sdla_x25.c --- linux-2.4.21/drivers/net/wan/sdla_x25.c 2003-06-14 10:10:14.000000000 +0530 +++ linux-2.4.21-nvk/drivers/net/wan/sdla_x25.c 2003-06-20 10:33:30.000000000 +0530 @@ -1267,9 +1267,7 @@ connect(card); S508_S514_unlock(card, &smp_flags); - del_timer(&card->u.x.x25_timer); - card->u.x.x25_timer.expires=jiffies+HZ; - add_timer(&card->u.x.x25_timer); + mod_timer(&card->u.x.x25_timer, jiffies + HZ); } } /* Device is not up until the we are in connected state */ diff -urN linux-2.4.21/net/core/dst.c linux-2.4.21-nvk/net/core/dst.c --- linux-2.4.21/net/core/dst.c 2003-06-14 10:03:10.000000000 +0530 +++ linux-2.4.21-nvk/net/core/dst.c 2003-06-20 10:33:16.000000000 +0530 @@ -131,11 +131,9 @@ dst->next = dst_garbage_list; dst_garbage_list = dst; if (dst_gc_timer_inc > DST_GC_INC) { - del_timer(&dst_gc_timer); dst_gc_timer_inc = DST_GC_INC; dst_gc_timer_expires = DST_GC_MIN; - dst_gc_timer.expires = jiffies + dst_gc_timer_expires; - add_timer(&dst_gc_timer); + mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); } spin_unlock_bh(&dst_lock); diff -urN linux-2.4.21/net/sched/sch_cbq.c linux-2.4.21-nvk/net/sched/sch_cbq.c --- linux-2.4.21/net/sched/sch_cbq.c 2003-06-14 10:03:13.000000000 +0530 +++ linux-2.4.21-nvk/net/sched/sch_cbq.c 2003-06-20 10:38:53.000000000 +0530 @@ -1056,11 +1056,9 @@ sch->stats.overlimits++; if (q->wd_expires && !netif_queue_stopped(sch->dev)) { long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); if (delay <= 0) delay = 1; - q->wd_timer.expires = jiffies + delay; - add_timer(&q->wd_timer); + mod_timer(&q->wd_timer, jiffies + delay); sch->flags |= TCQ_F_THROTTLED; } } diff -urN linux-2.4.21/net/sched/sch_csz.c linux-2.4.21-nvk/net/sched/sch_csz.c --- linux-2.4.21/net/sched/sch_csz.c 2003-06-14 10:10:35.000000000 +0530 +++ linux-2.4.21-nvk/net/sched/sch_csz.c 2003-06-20 10:38:59.000000000 +0530 @@ -708,11 +708,9 @@ */ if (q->wd_expires) { unsigned long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); if (delay == 0) delay = 1; - q->wd_timer.expires = jiffies + delay; - add_timer(&q->wd_timer); + mod_timer(&q->wd_timer, jiffies + delay); sch->stats.overlimits++; } #endif diff -urN linux-2.4.21/net/sched/sch_htb.c linux-2.4.21-nvk/net/sched/sch_htb.c --- linux-2.4.21/net/sched/sch_htb.c 2003-06-14 10:10:35.000000000 +0530 +++ linux-2.4.21-nvk/net/sched/sch_htb.c 2003-06-20 10:39:05.000000000 +0530 @@ -986,9 +986,7 @@ printk(KERN_INFO "HTB delay %ld > 5sec\n", delay); delay = 5*HZ; } - del_timer(&q->timer); - q->timer.expires = jiffies + delay; - add_timer(&q->timer); + mod_timer(&q->timer, jiffies + delay); sch->flags |= TCQ_F_THROTTLED; sch->stats.overlimits++; HTB_DBG(3,1,"htb_deq t_delay=%ld\n",delay); diff -urN linux-2.4.21/drivers/block/floppy.c linux-2.4.21-nvk/drivers/block/floppy.c --- linux-2.4.21/drivers/block/floppy.c 2003-06-14 10:03:23.000000000 +0530 +++ linux-2.4.21-nvk/drivers/block/floppy.c 2003-06-20 10:44:28.000000000 +0530 @@ -652,15 +652,16 @@ static void reschedule_timeout(int drive, const char *message, int marg) { + unsigned long delay; + if (drive == CURRENTD) drive = current_drive; - del_timer(&fd_timeout); if (drive < 0 || drive > N_DRIVE) { - fd_timeout.expires = jiffies + 20UL*HZ; + delay = 20UL*HZ; drive=0; } else - fd_timeout.expires = jiffies + UDP->timeout; - add_timer(&fd_timeout); + delay = UDP->timeout; + mod_timer(&fd_timeout, delay + jiffies); if (UDP->flags & FD_DEBUG){ DPRINT("reschedule timeout "); printk(message, marg); From jmorris@intercode.com.au Fri Jun 20 02:11:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 02:12:04 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:EjFKJMBCjsUQ0UsGvzjMPvKgcQJD5UoI@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5K9Bm2x032130 for ; Fri, 20 Jun 2003 02:11:52 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5K9Aor25595; Fri, 20 Jun 2003 19:10:50 +1000 Date: Fri, 20 Jun 2003 19:10:50 +1000 (EST) From: James Morris To: Vinay K Nallamothu cc: Marcelo Tosatti , , Jesse Barnes , Alan Cox , LKML , "David S. Miller" Subject: Re: [PATCH 2.4.21][FIX] use mod_timer In-Reply-To: <1056091000.1200.23.camel@lima.royalchallenge.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3438 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On 20 Jun 2003, Vinay K Nallamothu wrote: > Hi, > > This patch makes use of mod_timer instead of {del,add}_timer. > > Most of the patches already in -ac series since 2.4.21-rc2 and few of > the networking fixes in 2.5.69 > > The following files are affected: > > arch/ia64/sn/kernel/irq.c > arch/ia64/sn/kernel/mca.c > drivers/block/floppy.c > drivers/net/wan/sdla_chdlc.c > drivers/net/wan/sdla_fr.c > drivers/net/wan/sdla_x25.c > net/core/dst.c > net/sched/sch_cbq.c > net/sched/sch_csz.c > net/sched/sch_htb.c FYI, the status of the networking patches is: 2.5-bk 2.4.21-ac1 2.4-bk net/core/dst.c yes no no net/sched/sch_cbq.c yes yes no net/sched/sch_csz.c yes yes no net/sched/sch_htb.c yes yes no - James -- James Morris From yoshfuji@linux-ipv6.org Fri Jun 20 07:42:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 07:42:56 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KEge2x025312 for ; Fri, 20 Jun 2003 07:42:42 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5KEhnBo003200; Fri, 20 Jun 2003 23:43:49 +0900 Date: Fri, 20 Jun 2003 23:43:49 +0900 (JST) Message-Id: <20030620.234349.50405228.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH] [IPV6] clean-up advmss calculation From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3439 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This patch introduces ipv6_advmss() and clean-up advmss calculation. Thanks. Index: linux-2.5/net/ipv6/route.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/route.c,v retrieving revision 1.40 diff -u -r1.40 route.c --- linux-2.5/net/ipv6/route.c 9 Jun 2003 17:26:52 -0000 1.40 +++ linux-2.5/net/ipv6/route.c 20 Jun 2003 13:28:03 -0000 @@ -600,6 +600,22 @@ return mtu; } +static inline unsigned int ipv6_advmss(unsigned int mtu) +{ + if (mtu < ip6_rt_min_advmss) + mtu = ip6_rt_min_advmss; + + /* + * Maximal non-jumbo IPv6 payload is 65535 and + * corresponding MSS is 65535 - tcp_header_size. + * 65535 is also valid and means: "any MSS, + * rely only on pmtu discovery" + */ + if (mtu > 65535 - sizeof(struct tcphdr)) + mtu = 65535; + return mtu; +} + static int ipv6_get_hoplimit(struct net_device *dev) { int hoplimit = ipv6_devconf.hop_limit; @@ -790,16 +806,7 @@ if (!rt->u.dst.metrics[RTAX_MTU-1]) rt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(dev); if (!rt->u.dst.metrics[RTAX_ADVMSS-1]) - rt->u.dst.metrics[RTAX_ADVMSS-1] = - max_t(unsigned int, dst_pmtu(&rt->u.dst) - 60, - ip6_rt_min_advmss); - - /* Maximal non-jumbo IPv6 payload is 65535 and corresponding - MSS is 65535 - tcp_header_size. 65535 is also valid and - means: "any MSS, rely only on pmtu discovery" - */ - if (dst_metric(&rt->u.dst, RTAX_ADVMSS) > 65535-20) - rt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst)); rt->u.dst.dev = dev; return rt6_ins(rt, nlh, _rtattr); @@ -952,9 +959,7 @@ nrt->rt6i_nexthop = neigh_clone(neigh); /* Reset pmtu, it may be better */ nrt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(neigh->dev); - nrt->u.dst.metrics[RTAX_ADVMSS-1] = max_t(unsigned int, dst_pmtu(&nrt->u.dst) - 60, ip6_rt_min_advmss); - if (nrt->u.dst.metrics[RTAX_ADVMSS-1] > 65535-20) - nrt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + nrt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&nrt->u.dst)); if (rt6_ins(nrt, NULL, NULL)) goto out; @@ -1214,9 +1219,7 @@ rt->u.dst.output = ip6_output; rt->rt6i_dev = &loopback_dev; rt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(rt->rt6i_dev); - rt->u.dst.metrics[RTAX_ADVMSS-1] = max_t(unsigned int, dst_pmtu(&rt->u.dst) - 60, ip6_rt_min_advmss); - if (rt->u.dst.metrics[RTAX_ADVMSS-1] > 65535-20) - rt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst)); rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ipv6_get_hoplimit(rt->rt6i_dev); rt->u.dst.obsolete = -1; @@ -1312,9 +1315,7 @@ (dst_pmtu(&rt->u.dst) < arg->mtu && dst_pmtu(&rt->u.dst) == idev->cnf.mtu6))) rt->u.dst.metrics[RTAX_MTU-1] = arg->mtu; - rt->u.dst.metrics[RTAX_ADVMSS-1] = max_t(unsigned int, arg->mtu - 60, ip6_rt_min_advmss); - if (rt->u.dst.metrics[RTAX_ADVMSS-1] > 65535-20) - rt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(arg->mtu); return 0; } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Fri Jun 20 10:08:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 10:08:54 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KH8h2x026991 for ; Fri, 20 Jun 2003 10:08:44 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA13616; Fri, 20 Jun 2003 10:03:29 -0700 Date: Fri, 20 Jun 2003 10:03:29 -0700 (PDT) Message-Id: <20030620.100329.74736219.davem@redhat.com> To: vinay-rc@naturesoft.net Cc: marcelo@hera.kernel.org, netdev@oss.sgi.com, jbarnes@sgi.com, alan@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2.4.21][FIX] use mod_timer From: "David S. Miller" In-Reply-To: <1056091000.1200.23.camel@lima.royalchallenge.com> References: <1056091000.1200.23.camel@lima.royalchallenge.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3440 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Vinay K Nallamothu Date: 20 Jun 2003 12:06:40 +0530 Hi, This patch makes use of mod_timer instead of {del,add}_timer. I applied all of the networking ones already and sent it off the Marcelo the other day, just waiting for him to eat it. From davem@redhat.com Fri Jun 20 10:10:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 10:10:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KHAP2x027219 for ; Fri, 20 Jun 2003 10:10:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA13637; Fri, 20 Jun 2003 10:05:01 -0700 Date: Fri, 20 Jun 2003 10:05:00 -0700 (PDT) Message-Id: <20030620.100500.41646097.davem@redhat.com> To: jmorris@intercode.com.au Cc: vinay-rc@naturesoft.net, marcelo@hera.kernel.org, netdev@oss.sgi.com, jbarnes@sgi.com, alan@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2.4.21][FIX] use mod_timer From: "David S. Miller" In-Reply-To: References: <1056091000.1200.23.camel@lima.royalchallenge.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3441 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Fri, 20 Jun 2003 19:10:50 +1000 (EST) FYI, the status of the networking patches is: 2.5-bk 2.4.21-ac1 2.4-bk net/core/dst.c yes no no net/sched/sch_cbq.c yes yes no net/sched/sch_csz.c yes yes no net/sched/sch_htb.c yes yes no They are, however, in my BK tree and I did attempt to push them to Marcelo so nights ago. From scott.feldman@intel.com Fri Jun 20 12:45:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 12:45:26 -0700 (PDT) Received: from hermes.sc.intel.com (fmr03.intel.com [143.183.121.5]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KJjC31003456 for ; Fri, 20 Jun 2003 12:45:15 -0700 Received: from petasus.sc.intel.com (petasus.sc.intel.com [10.3.253.4]) by hermes.sc.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5K66xa01046 for ; Fri, 20 Jun 2003 06:06:59 GMT Received: from fmsmsxvs042.fm.intel.com (fmsmsxvs042.fm.intel.com [132.233.42.128]) by petasus.sc.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5K69XM12076 for ; Fri, 20 Jun 2003 06:09:33 GMT Received: from [134.134.3.229] ([134.134.3.229]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061923151311127 ; Thu, 19 Jun 2003 23:15:13 -0700 Date: Thu, 19 Jun 2003 23:23:42 -0700 (PDT) From: "Feldman, Scott" X-X-Sender: scott.feldman@localhost.localdomain Reply-To: "Feldman, Scott" To: Jeff Garzik cc: "Feldman, Scott" , Subject: [PATCH net-drivers-2.5] Remove CAP_NET_ADMIN check for SIOCETHTOOL's Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3443 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev dev_ioctl already checks capable(CAP_NET_ADMIN), so no need to do so in drivers. diff -Nuarp net-drivers-2.5/drivers/net/acenic.c net-drivers-2.5/drivers/net.mod/acenic.c --- net-drivers-2.5/drivers/net/acenic.c 2003-06-19 22:36:55.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/acenic.c 2003-06-19 22:37:52.000000000 -0700 @@ -3026,9 +3026,6 @@ static int ace_ioctl(struct net_device * return 0; case ETHTOOL_SSET: - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - link = readl(®s->GigLnkState); if (link & LNK_1000MB) speed = SPEED_1000; diff -Nuarp net-drivers-2.5/drivers/net/e100/e100_main.c net-drivers-2.5/drivers/net.mod/e100/e100_main.c --- net-drivers-2.5/drivers/net/e100/e100_main.c 2003-06-19 22:17:58.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/e100/e100_main.c 2003-06-19 22:43:56.000000000 -0700 @@ -3424,10 +3424,6 @@ e100_ethtool_set_settings(struct net_dev int ethtool_new_speed_duplex; struct ethtool_cmd ecmd; - if (!capable(CAP_NET_ADMIN)) { - return -EPERM; - } - bdp = dev->priv; if (copy_from_user(&ecmd, ifr->ifr_data, sizeof (ecmd))) { return -EFAULT; @@ -3545,8 +3541,6 @@ e100_ethtool_gregs(struct net_device *de void *addr = ifr->ifr_data; u16 mdi_reg; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; bdp = dev->priv; if(copy_from_user(®s, addr, sizeof(regs))) @@ -3574,9 +3568,6 @@ e100_ethtool_nway_rst(struct net_device { struct e100_private *bdp; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if ((bdp->speed_duplex_caps & SUPPORTED_Autoneg) && @@ -3632,9 +3623,6 @@ e100_ethtool_eeprom(struct net_device *d void *ptr; u8 *eeprom_data_bytes = (u8 *)eeprom_data; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if (copy_from_user(&ecmd, ifr->ifr_data, sizeof (ecmd))) @@ -3912,9 +3900,6 @@ e100_ethtool_wol(struct net_device *dev, struct ethtool_wolinfo wolinfo; int res = 0; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if (copy_from_user(&wolinfo, ifr->ifr_data, sizeof (wolinfo))) { diff -Nuarp net-drivers-2.5/drivers/net/e1000/e1000_ethtool.c net-drivers-2.5/drivers/net.mod/e1000/e1000_ethtool.c --- net-drivers-2.5/drivers/net/e1000/e1000_ethtool.c 2003-05-29 17:31:26.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/e1000/e1000_ethtool.c 2003-06-19 22:45:52.000000000 -0700 @@ -1289,8 +1289,6 @@ e1000_ethtool_ioctl(struct net_device *n } case ETHTOOL_SSET: { struct ethtool_cmd ecmd; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(copy_from_user(&ecmd, addr, sizeof(ecmd))) return -EFAULT; return e1000_ethtool_sset(adapter, &ecmd); @@ -1363,8 +1361,6 @@ e1000_ethtool_ioctl(struct net_device *n return 0; } case ETHTOOL_NWAY_RST: { - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(netif_running(netdev)) { e1000_down(adapter); e1000_up(adapter); @@ -1393,8 +1389,6 @@ e1000_ethtool_ioctl(struct net_device *n } case ETHTOOL_SWOL: { struct ethtool_wolinfo wol; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(copy_from_user(&wol, addr, sizeof(wol)) != 0) return -EFAULT; return e1000_ethtool_swol(adapter, &wol); @@ -1436,9 +1430,6 @@ err_geeprom_ioctl: case ETHTOOL_SEEPROM: { struct ethtool_eeprom eeprom; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - if(copy_from_user(&eeprom, addr, sizeof(eeprom))) return -EFAULT; @@ -1470,9 +1461,6 @@ err_geeprom_ioctl: } test = { {ETHTOOL_TEST} }; int err; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - if(copy_from_user(&test.eth_test, addr, sizeof(test.eth_test))) return -EFAULT; diff -Nuarp net-drivers-2.5/drivers/net/ioc3-eth.c net-drivers-2.5/drivers/net.mod/ioc3-eth.c --- net-drivers-2.5/drivers/net/ioc3-eth.c 2003-05-29 17:31:26.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/ioc3-eth.c 2003-06-19 22:49:12.000000000 -0700 @@ -1844,9 +1844,6 @@ static int ioc3_ioctl(struct net_device return -EFAULT; return 0; } else if (ecmd.cmd == ETHTOOL_SSET) { - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) diff -Nuarp net-drivers-2.5/drivers/net/ixgb/ixgb_ethtool.c net-drivers-2.5/drivers/net.mod/ixgb/ixgb_ethtool.c --- net-drivers-2.5/drivers/net/ixgb/ixgb_ethtool.c 2003-06-19 22:17:58.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/ixgb/ixgb_ethtool.c 2003-06-19 22:52:06.000000000 -0700 @@ -448,8 +448,6 @@ ixgb_ethtool_ioctl(struct net_device *ne case ETHTOOL_SSET:{ struct ethtool_cmd ecmd; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; if (copy_from_user(&ecmd, addr, sizeof (ecmd))) return -EFAULT; return ixgb_ethtool_sset(adapter, &ecmd); @@ -482,9 +480,6 @@ ixgb_ethtool_ioctl(struct net_device *ne #endif /* ETHTOOL_GREGS */ case ETHTOOL_NWAY_RST:{ IXGB_DBG("ETHTOOL_NWAY_RST\n"); - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - ixgb_down(adapter); ixgb_up(adapter); @@ -539,9 +534,6 @@ ixgb_ethtool_ioctl(struct net_device *ne struct ethtool_eeprom eeprom; IXGB_DBG("ETHTOOL_SEEPROM\n"); - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (copy_from_user(&eeprom, addr, sizeof (eeprom))) return -EFAULT; diff -Nuarp net-drivers-2.5/drivers/net/sungem.c net-drivers-2.5/drivers/net.mod/sungem.c --- net-drivers-2.5/drivers/net/sungem.c 2003-06-19 22:17:58.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/sungem.c 2003-06-19 22:54:01.000000000 -0700 @@ -2384,9 +2384,6 @@ static int gem_ethtool_ioctl(struct net_ return 0; case ETHTOOL_SSET: - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) diff -Nuarp net-drivers-2.5/drivers/net/sunhme.c net-drivers-2.5/drivers/net.mod/sunhme.c --- net-drivers-2.5/drivers/net/sunhme.c 2003-05-29 17:31:26.000000000 -0700 +++ net-drivers-2.5/drivers/net.mod/sunhme.c 2003-06-19 22:54:47.000000000 -0700 @@ -2481,9 +2481,6 @@ static int happy_meal_ioctl(struct net_d return -EFAULT; return 0; } else if (ecmd.cmd == ETHTOOL_SSET) { - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) From scott.feldman@intel.com Fri Jun 20 12:45:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 12:45:23 -0700 (PDT) Received: from hermes.sc.intel.com (fmr03.intel.com [143.183.121.5]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KJjC2x003456 for ; Fri, 20 Jun 2003 12:45:15 -0700 Received: from petasus.sc.intel.com (petasus.sc.intel.com [10.3.253.4]) by hermes.sc.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5K6CXa03589 for ; Fri, 20 Jun 2003 06:12:33 GMT Received: from fmsmsxvs042.fm.intel.com (fmsmsxvs042.fm.intel.com [132.233.42.128]) by petasus.sc.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5K6F7M15035 for ; Fri, 20 Jun 2003 06:15:07 GMT Received: from [134.134.3.229] ([134.134.3.229]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003061923204718783 ; Thu, 19 Jun 2003 23:20:47 -0700 Date: Thu, 19 Jun 2003 23:29:16 -0700 (PDT) From: "Feldman, Scott" X-X-Sender: scott.feldman@localhost.localdomain Reply-To: "Feldman, Scott" To: Jeff Garzik cc: "Feldman, Scott" , Subject: [PATCH net-drivers-2.4] Remove CAP_NET_ADMIN check for SIOCETHTOOL's Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3442 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev dev_ioctl already checks capable(CAP_NET_ADMIN), so no need to do so in drivers. diff -Nuarp net-drivers-2.4/drivers/net/acenic.c net-drivers-2.4/drivers/net.mod/acenic.c --- net-drivers-2.4/drivers/net/acenic.c 2003-03-22 08:07:45.000000000 -0800 +++ net-drivers-2.4/drivers/net.mod/acenic.c 2003-06-19 23:00:08.000000000 -0700 @@ -3017,9 +3017,6 @@ static int ace_ioctl(struct net_device * return 0; case ETHTOOL_SSET: - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - link = readl(®s->GigLnkState); if (link & LNK_1000MB) speed = SPEED_1000; diff -Nuarp net-drivers-2.4/drivers/net/e100/e100_main.c net-drivers-2.4/drivers/net.mod/e100/e100_main.c --- net-drivers-2.4/drivers/net/e100/e100_main.c 2003-06-19 22:12:09.000000000 -0700 +++ net-drivers-2.4/drivers/net.mod/e100/e100_main.c 2003-06-19 23:00:59.000000000 -0700 @@ -3422,10 +3422,6 @@ e100_ethtool_set_settings(struct net_dev int ethtool_new_speed_duplex; struct ethtool_cmd ecmd; - if (!capable(CAP_NET_ADMIN)) { - return -EPERM; - } - bdp = dev->priv; if (copy_from_user(&ecmd, ifr->ifr_data, sizeof (ecmd))) { return -EFAULT; @@ -3543,8 +3539,6 @@ e100_ethtool_gregs(struct net_device *de void *addr = ifr->ifr_data; u16 mdi_reg; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; bdp = dev->priv; if(copy_from_user(®s, addr, sizeof(regs))) @@ -3572,9 +3566,6 @@ e100_ethtool_nway_rst(struct net_device { struct e100_private *bdp; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if ((bdp->speed_duplex_caps & SUPPORTED_Autoneg) && @@ -3630,9 +3621,6 @@ e100_ethtool_eeprom(struct net_device *d void *ptr; u8 *eeprom_data_bytes = (u8 *)eeprom_data; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if (copy_from_user(&ecmd, ifr->ifr_data, sizeof (ecmd))) @@ -3910,9 +3898,6 @@ e100_ethtool_wol(struct net_device *dev, struct ethtool_wolinfo wolinfo; int res = 0; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - bdp = dev->priv; if (copy_from_user(&wolinfo, ifr->ifr_data, sizeof (wolinfo))) { diff -Nuarp net-drivers-2.4/drivers/net/e1000/e1000_ethtool.c net-drivers-2.4/drivers/net.mod/e1000/e1000_ethtool.c --- net-drivers-2.4/drivers/net/e1000/e1000_ethtool.c 2003-06-19 22:12:09.000000000 -0700 +++ net-drivers-2.4/drivers/net.mod/e1000/e1000_ethtool.c 2003-06-19 23:01:43.000000000 -0700 @@ -1289,8 +1289,6 @@ e1000_ethtool_ioctl(struct net_device *n } case ETHTOOL_SSET: { struct ethtool_cmd ecmd; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(copy_from_user(&ecmd, addr, sizeof(ecmd))) return -EFAULT; return e1000_ethtool_sset(adapter, &ecmd); @@ -1363,8 +1361,6 @@ e1000_ethtool_ioctl(struct net_device *n return 0; } case ETHTOOL_NWAY_RST: { - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(netif_running(netdev)) { e1000_down(adapter); e1000_up(adapter); @@ -1393,8 +1389,6 @@ e1000_ethtool_ioctl(struct net_device *n } case ETHTOOL_SWOL: { struct ethtool_wolinfo wol; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; if(copy_from_user(&wol, addr, sizeof(wol)) != 0) return -EFAULT; return e1000_ethtool_swol(adapter, &wol); @@ -1436,9 +1430,6 @@ err_geeprom_ioctl: case ETHTOOL_SEEPROM: { struct ethtool_eeprom eeprom; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - if(copy_from_user(&eeprom, addr, sizeof(eeprom))) return -EFAULT; @@ -1470,9 +1461,6 @@ err_geeprom_ioctl: } test = { {ETHTOOL_TEST} }; int err; - if(!capable(CAP_NET_ADMIN)) - return -EPERM; - if(copy_from_user(&test.eth_test, addr, sizeof(test.eth_test))) return -EFAULT; diff -Nuarp net-drivers-2.4/drivers/net/ioc3-eth.c net-drivers-2.4/drivers/net.mod/ioc3-eth.c --- net-drivers-2.4/drivers/net/ioc3-eth.c 2003-03-22 08:07:46.000000000 -0800 +++ net-drivers-2.4/drivers/net.mod/ioc3-eth.c 2003-06-19 23:02:37.000000000 -0700 @@ -1841,9 +1841,6 @@ static int ioc3_ioctl(struct net_device return -EFAULT; return 0; } else if (ecmd.cmd == ETHTOOL_SSET) { - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) diff -Nuarp net-drivers-2.4/drivers/net/sungem.c net-drivers-2.4/drivers/net.mod/sungem.c --- net-drivers-2.4/drivers/net/sungem.c 2003-03-22 08:07:47.000000000 -0800 +++ net-drivers-2.4/drivers/net.mod/sungem.c 2003-06-19 23:03:16.000000000 -0700 @@ -2616,9 +2616,6 @@ static int gem_ethtool_ioctl(struct net_ return 0; case ETHTOOL_SSET: - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) diff -Nuarp net-drivers-2.4/drivers/net/sunhme.c net-drivers-2.4/drivers/net.mod/sunhme.c --- net-drivers-2.4/drivers/net/sunhme.c 2003-03-22 08:07:47.000000000 -0800 +++ net-drivers-2.4/drivers/net.mod/sunhme.c 2003-06-19 23:03:36.000000000 -0700 @@ -2480,9 +2480,6 @@ static int happy_meal_ioctl(struct net_d return -EFAULT; return 0; } else if (ecmd.cmd == ETHTOOL_SSET) { - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - /* Verify the settings we care about. */ if (ecmd.autoneg != AUTONEG_ENABLE && ecmd.autoneg != AUTONEG_DISABLE) From krkumar@us.ibm.com Fri Jun 20 13:53:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 13:53:40 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KKrK2x009364 for ; Fri, 20 Jun 2003 13:53:27 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e3.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5KKrDE2161946; Fri, 20 Jun 2003 16:53:13 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5KKrAUZ144086; Fri, 20 Jun 2003 16:53:11 -0400 Message-ID: <3EF37458.3070103@us.ibm.com> Date: Fri, 20 Jun 2003 13:53:44 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: davem@redhat.com, kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: [PATCH] Prefix List against 2.5.70 (re-done) Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3444 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi, The earlier patch to implement the prefix list has been redone to use fib. Following are the implementation details and a couple of issues : 1. I change the netlink_dump_start to pass another parameter, , which is stored in a new field in the cb, . All users of this function have been changed to pass a -1 since they don't care about the type, except the generic routine rtnetlink_rcv_msg() which calculates the type and stores it. So the same routine which is used to dump route table can be used to dump the prefix list by checking the type. It might be possible to derive the type from the table offset, but that is more complicated (probably doable). 2. Added yoshifuji's patch to store the M/O flags (now it is needed). 3. Added user interface for retrieving M/O flags. This is a separate interface from the one for getting the prefix list since the flags are per interface while the prefix list is per route. However these two can be merged into one if needed. 4. Changed the usage of RTF_ADDRCONF to be used only when the action is being performed due to receipt of a RA. 5. Though this patch is modified to use only routing table for updating and accessing the prefix list, I did a performace analysis for this approach vs storing the plist on the idev. Following is the result : System : 1 CPU. 866 MHz, 256MB memory For 1000 VLAN devices (4036 route entries gets created automatically as part of address assignment), retrieve prefix list for (system times only) : #devices #iteration for each dev plist on IDEV plist in RTTABLE % 200 100 3.95 secs 40.14 secs 916% 1000 10 2.60 secs 20.98 secs 706% 200 1000 38.44 secs 400.76 secs 942% 6. I have kept #ifdef PREFIXLIST in a few places, I can modify the patch to remove that if required. 7. I removed the /proc interface since I was not able to cleanly use seq_file with fib6_walk(). If needed, I can work on this later (but will need some input on how to proceed). So currently, the only user interface is using rtnetlink. 8. The patch can be extended to issue events on new prefix addition and on prefix deletion. I can do that if required. 9. I have tested using rtnetlink for both interfaces (prefix list and get O/M flags), no issues found. Please let me know if this looks acceptable, in which case I can also send the patch for 2.4 kernel. Thanks, - KK diff -ruN linux-2.5.70.org/include/linux/ipv6_route.h linux-2.5.70.new/include/linux/ipv6_route.h --- linux-2.5.70.org/include/linux/ipv6_route.h 2003-05-26 18:00:25.000000000 -0700 +++ linux-2.5.70.new/include/linux/ipv6_route.h 2003-06-20 01:45:17.000000000 -0700 @@ -44,4 +44,16 @@ #define RTMSG_NEWROUTE 0x21 #define RTMSG_DELROUTE 0x22 +#ifdef CONFIG_IPV6_PREFIXLIST + +/* Structure to return prefix and prefix length for all devices */ + +struct in6_prefix_msg +{ + int ifindex; + int prefix_len; + struct in6_addr prefix; +}; +#endif + #endif diff -ruN linux-2.5.70.org/include/linux/netlink.h linux-2.5.70.new/include/linux/netlink.h --- linux-2.5.70.org/include/linux/netlink.h 2003-05-26 18:00:56.000000000 -0700 +++ linux-2.5.70.new/include/linux/netlink.h 2003-06-20 05:00:47.000000000 -0700 @@ -132,6 +132,7 @@ int (*dump)(struct sk_buff * skb, struct netlink_callback *cb); int (*done)(struct netlink_callback *cb); int family; + int type; /* for overloading functions */ long args[4]; }; @@ -161,7 +162,7 @@ __nlmsg_put(skb, pid, seq, type, len); }) extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, - struct nlmsghdr *nlh, + struct nlmsghdr *nlh, int type, int (*dump)(struct sk_buff *skb, struct netlink_callback*), int (*done)(struct netlink_callback*)); diff -ruN linux-2.5.70.org/include/linux/rtnetlink.h linux-2.5.70.new/include/linux/rtnetlink.h --- linux-2.5.70.org/include/linux/rtnetlink.h 2003-05-26 18:00:46.000000000 -0700 +++ linux-2.5.70.new/include/linux/rtnetlink.h 2003-06-20 01:36:19.000000000 -0700 @@ -47,7 +47,14 @@ #define RTM_DELTFILTER (RTM_BASE+29) #define RTM_GETTFILTER (RTM_BASE+30) -#define RTM_MAX (RTM_BASE+31) +#define RTM_GETOMFLAGS (RTM_BASE+34) + +#ifndef CONFIG_IPV6_PREFIXLIST +#define RTM_MAX (RTM_GETOMFLAGS+1) +#else +#define RTM_GETPLIST (RTM_BASE+38) +#define RTM_MAX (RTM_GETPLIST+1) +#endif /* Generic structure for encapsulation optional route information. @@ -61,6 +68,14 @@ unsigned short rta_type; }; +/* Structure to return per interface device flags */ + +struct ifp_if6info +{ + int ifindex; + int flags; +}; + /* Macros to handle rtattributes */ #define RTA_ALIGNTO 4 @@ -201,9 +216,10 @@ RTA_FLOW, RTA_CACHEINFO, RTA_SESSION, + RTA_RA6INFO, /* No support yet, send event on prefix event */ }; -#define RTA_MAX RTA_SESSION +#define RTA_MAX RTA_RA6INFO #define RTM_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct rtmsg)))) #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg)) diff -ruN linux-2.5.70.org/include/net/if_inet6.h linux-2.5.70.new/include/net/if_inet6.h --- linux-2.5.70.org/include/net/if_inet6.h 2003-05-26 18:00:59.000000000 -0700 +++ linux-2.5.70.new/include/net/if_inet6.h 2003-06-20 02:01:39.000000000 -0700 @@ -17,6 +17,9 @@ #include +/* inet6_dev.if_flags */ +#define IF_RA_OTHERCONF 0x80 +#define IF_RA_MANAGED 0x40 #define IF_RA_RCVD 0x20 #define IF_RS_SENT 0x10 diff -ruN linux-2.5.70.org/net/core/rtnetlink.c linux-2.5.70.new/net/core/rtnetlink.c --- linux-2.5.70.org/net/core/rtnetlink.c 2003-05-26 18:01:03.000000000 -0700 +++ linux-2.5.70.new/net/core/rtnetlink.c 2003-06-19 06:05:34.000000000 -0700 @@ -380,7 +380,7 @@ if (link->dumpit == NULL) goto err_inval; - if ((*errp = netlink_dump_start(rtnl, skb, nlh, + if ((*errp = netlink_dump_start(rtnl, skb, nlh, type, link->dumpit, rtnetlink_done)) != 0) { return -1; diff -ruN linux-2.5.70.org/net/ipv4/tcp_diag.c linux-2.5.70.new/net/ipv4/tcp_diag.c --- linux-2.5.70.org/net/ipv4/tcp_diag.c 2003-05-26 18:00:20.000000000 -0700 +++ linux-2.5.70.new/net/ipv4/tcp_diag.c 2003-06-19 06:09:45.000000000 -0700 @@ -591,7 +591,7 @@ if (tcpdiag_bc_audit(RTA_DATA(rta), RTA_PAYLOAD(rta))) goto err_inval; } - return netlink_dump_start(tcpnl, skb, nlh, + return netlink_dump_start(tcpnl, skb, nlh, -1, tcpdiag_dump, tcpdiag_dump_done); } else { diff -ruN linux-2.5.70.org/net/ipv6/Kconfig linux-2.5.70.new/net/ipv6/Kconfig --- linux-2.5.70.org/net/ipv6/Kconfig 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/Kconfig 2003-06-19 05:37:11.000000000 -0700 @@ -42,4 +42,13 @@ If unsure, say Y. +config IPV6_PREFIXLIST + bool "IPv6: Prefix List" + depends on IPV6 + ---help--- + For applications needing to retrieve the list of prefixes supported + on the system. Defined in RFC2461. + + If unsure, say Y. + source "net/ipv6/netfilter/Kconfig" diff -ruN linux-2.5.70.org/net/ipv6/addrconf.c linux-2.5.70.new/net/ipv6/addrconf.c --- linux-2.5.70.org/net/ipv6/addrconf.c 2003-05-26 18:00:58.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/addrconf.c 2003-06-20 01:34:14.000000000 -0700 @@ -124,7 +124,7 @@ static int addrconf_ifdown(struct net_device *dev, int how); -static void addrconf_dad_start(struct inet6_ifaddr *ifp); +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags); static void addrconf_dad_timer(unsigned long data); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_rs_timer(unsigned long data); @@ -738,7 +738,7 @@ ift->prefered_lft = tmp_prefered_lft; ift->tstamp = ifp->tstamp; spin_unlock_bh(&ift->lock); - addrconf_dad_start(ift); + addrconf_dad_start(ift, 0); in6_ifa_put(ift); in6_dev_put(idev); out: @@ -1234,7 +1234,7 @@ rtmsg.rtmsg_dst_len = 8; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; rtmsg.rtmsg_ifindex = dev->ifindex; - rtmsg.rtmsg_flags = RTF_UP|RTF_ADDRCONF; + rtmsg.rtmsg_flags = RTF_UP; rtmsg.rtmsg_type = RTMSG_NEWROUTE; ip6_route_add(&rtmsg, NULL, NULL); } @@ -1261,7 +1261,7 @@ struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xFE800000), 0, 0, 0); - addrconf_prefix_route(&addr, 64, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&addr, 64, dev, 0, 0); } static struct inet6_dev *addrconf_add_dev(struct net_device *dev) @@ -1401,7 +1401,7 @@ } create = 1; - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, RTF_ADDRCONF); } if (ifp && valid_lft == 0) { @@ -1552,7 +1552,7 @@ ifp = ipv6_add_addr(idev, pfx, plen, scope, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); return 0; } @@ -1727,7 +1727,7 @@ ifp = ipv6_add_addr(idev, addr, 64, IFA_LINK, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); } } @@ -1965,8 +1965,7 @@ memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); rtmsg.rtmsg_type = RTMSG_NEWROUTE; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; - rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | - RTF_DEFAULT | RTF_UP); + rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_DEFAULT | RTF_UP); rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; @@ -1980,7 +1979,7 @@ /* * Duplicate Address Detection */ -static void addrconf_dad_start(struct inet6_ifaddr *ifp) +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags) { struct net_device *dev; unsigned long rand_num; @@ -1990,7 +1989,7 @@ addrconf_join_solict(dev, &ifp->addr); if (ifp->prefix_len != 128 && (ifp->flags&IFA_F_PERMANENT)) - addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, flags); net_srandom(ifp->addr.s6_addr32[3]); rand_num = net_random() % (ifp->idev->cnf.rtr_solicit_delay ? : 1); @@ -2389,6 +2388,42 @@ netlink_broadcast(rtnl, skb, 0, RTMGRP_IPV6_IFADDR, GFP_ATOMIC); } +int inet6_dump_omflags(struct sk_buff *skb, struct netlink_callback *cb) +{ + int flags; + struct ifp_if6info *ifp; + struct net_device *dev; + struct inet6_dev *idev; + struct nlmsghdr *nlh; + unsigned char *cur_tail, *org_tail = skb->tail; + + read_lock(&dev_base_lock); + for (dev = dev_base; dev; dev = dev->next) { + if (dev->flags & IFF_LOOPBACK) + continue; + if ((idev = in6_dev_get(dev)) == NULL) + continue; + flags = idev->if_flags; + in6_dev_put(idev); + cur_tail = skb->tail; + nlh = NLMSG_PUT(skb, NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, RTM_GETOMFLAGS, + sizeof(*ifp)); + ifp = NLMSG_DATA(nlh); + ifp->ifindex = dev->ifindex; + ifp->flags = flags; + nlh->nlmsg_len = skb->tail - cur_tail; + } + read_unlock(&dev_base_lock); + return skb->len; + +nlmsg_failure: + read_unlock(&dev_base_lock); + printk(KERN_INFO "inet6_dump_omflags:skb size not enough\n"); + skb_trim(skb, org_tail - skb->data); + return -1; +} + static struct rtnetlink_link inet6_rtnetlink_table[RTM_MAX - RTM_BASE + 1] = { [RTM_NEWADDR - RTM_BASE] = { .doit = inet6_rtm_newaddr, }, [RTM_DELADDR - RTM_BASE] = { .doit = inet6_rtm_deladdr, }, @@ -2397,6 +2432,10 @@ [RTM_DELROUTE - RTM_BASE] = { .doit = inet6_rtm_delroute, }, [RTM_GETROUTE - RTM_BASE] = { .doit = inet6_rtm_getroute, .dumpit = inet6_dump_fib, }, + [RTM_GETOMFLAGS - RTM_BASE] = { .dumpit = inet6_dump_omflags, }, +#ifdef CONFIG_IPV6_PREFIXLIST + [RTM_GETPLIST - RTM_BASE] = { .dumpit = inet6_dump_fib, }, +#endif }; static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) @@ -2730,7 +2769,7 @@ #ifdef CONFIG_PROC_FS proc_net_create("if_inet6", 0, iface_proc_info); #endif - + addrconf_verify(0); rtnetlink_links[PF_INET6] = inet6_rtnetlink_table; #ifdef CONFIG_SYSCTL diff -ruN linux-2.5.70.org/net/ipv6/ndisc.c linux-2.5.70.new/net/ipv6/ndisc.c --- linux-2.5.70.org/net/ipv6/ndisc.c 2003-05-26 18:00:41.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/ndisc.c 2003-06-20 02:00:53.000000000 -0700 @@ -1049,6 +1049,16 @@ */ in6_dev->if_flags |= IF_RA_RCVD; } + /* + * Remember the managed/otherconf flags from most recently + * received RA message (RFC 2462) -- yoshfuji + */ + in6_dev->if_flags = (in6_dev->if_flags & ~(IF_RA_MANAGED | + IF_RA_OTHERCONF)) | + (ra_msg->icmph.icmp6_addrconf_managed ? + IF_RA_MANAGED : 0) | + (ra_msg->icmph.icmp6_addrconf_other ? + IF_RA_OTHERCONF : 0); lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); diff -ruN linux-2.5.70.org/net/ipv6/route.c linux-2.5.70.new/net/ipv6/route.c --- linux-2.5.70.org/net/ipv6/route.c 2003-05-26 18:00:45.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/route.c 2003-06-20 02:05:48.000000000 -0700 @@ -1520,6 +1520,68 @@ return 0; } +#ifdef CONFIG_IPV6_PREFIXLIST +static int rt6_fill_prefix(struct sk_buff *skb, struct rt6_info *rt, + int type, u32 pid, u32 seq) +{ + struct in6_prefix_msg *pmsg; + struct nlmsghdr *nlh; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, type, sizeof(*pmsg)); + pmsg = NLMSG_DATA(nlh); + pmsg->ifindex = rt->rt6i_dev->ifindex; + pmsg->prefix_len = rt->rt6i_dst.plen; + ipv6_addr_copy(&pmsg->prefix, &rt->rt6i_dst.addr); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "rt6_fill_prefix:skb size not enough\n"); + skb_trim(skb, b - skb->data); + return -1; +} + +static int rt6_dump_route_prefix(struct rt6_info *rt, void *p_arg) +{ + int addr_type; + struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; + + /* + * Definition of a prefix : + * - Should be autoconfigured + * - No nexthop + * - Not a linklocal, loopback or multicast type. + */ + if (rt->rt6i_nexthop || (rt->rt6i_flags & RTF_ADDRCONF) == 0) + return 0; + addr_type = ipv6_addr_type(&rt->rt6i_dst.addr); + if ((addr_type & (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK | + IPV6_ADDR_MULTICAST)) != 0 || + addr_type == IPV6_ADDR_ANY) + return 0; + return rt6_fill_prefix(arg->skb, rt, RTM_GETPLIST, + NETLINK_CB(arg->cb->skb).pid, arg->cb->nlh->nlmsg_seq); +} + +static int fib6_dump_prefix(struct fib6_walker_t *w) +{ + int res; + struct rt6_info *rt; + + for (rt = w->leaf; rt; rt = rt->u.next) { + res = rt6_dump_route_prefix(rt, w->args); + if (res < 0) { + /* Frame is full, suspend walking */ + w->leaf = rt; + return 1; + } + } + w->leaf = NULL; + return 0; +} +#endif + static void fib6_dump_end(struct netlink_callback *cb) { struct fib6_walker_t *w = (void*)cb->args[0]; @@ -1547,6 +1609,13 @@ struct fib6_walker_t *w; int res; +#ifdef CONFIG_IPV6_PREFIXLIST + BUG_TRAP(cb->type + RTM_BASE == RTM_GETROUTE || + cb->type + RTM_BASE == RTM_GETPLIST); +#else + BUG_TRAP(cb->type + RTM_BASE == RTM_GETROUTE); +#endif + arg.skb = skb; arg.cb = cb; @@ -1568,7 +1637,12 @@ RT6_TRACE("dump<%p", w); memset(w, 0, sizeof(*w)); w->root = &ip6_routing_table; - w->func = fib6_dump_node; + if (cb->type + RTM_BASE == RTM_GETROUTE) + w->func = fib6_dump_node; +#ifdef CONFIG_IPV6_PREFIXLIST + else + w->func = fib6_dump_prefix; +#endif w->args = &arg; cb->args[0] = (long)w; read_lock_bh(&rt6_lock); diff -ruN linux-2.5.70.org/net/netlink/af_netlink.c linux-2.5.70.new/net/netlink/af_netlink.c --- linux-2.5.70.org/net/netlink/af_netlink.c 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70.new/net/netlink/af_netlink.c 2003-06-19 06:14:26.000000000 -0700 @@ -842,7 +842,7 @@ } int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, - struct nlmsghdr *nlh, + struct nlmsghdr *nlh, int type, int (*dump)(struct sk_buff *skb, struct netlink_callback*), int (*done)(struct netlink_callback*)) { @@ -858,6 +858,7 @@ cb->dump = dump; cb->done = done; cb->nlh = nlh; + cb->type = type; atomic_inc(&skb->users); cb->skb = skb; diff -ruN linux-2.5.70.org/net/xfrm/xfrm_user.c linux-2.5.70.new/net/xfrm/xfrm_user.c --- linux-2.5.70.org/net/xfrm/xfrm_user.c 2003-05-26 18:00:41.000000000 -0700 +++ linux-2.5.70.new/net/xfrm/xfrm_user.c 2003-06-19 06:10:17.000000000 -0700 @@ -869,7 +869,7 @@ if (link->dump == NULL) goto err_einval; - if ((*errp = netlink_dump_start(xfrm_nl, skb, nlh, + if ((*errp = netlink_dump_start(xfrm_nl, skb, nlh, -1, link->dump, xfrm_done)) != 0) { return -1; From chas@locutus.cmf.nrl.navy.mil Fri Jun 20 13:55:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 13:55:37 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KKtQ2x009979 for ; Fri, 20 Jun 2003 13:55:27 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5KKt9sG018696; Fri, 20 Jun 2003 16:55:14 -0400 (EDT) Message-Id: <200306202055.h5KKt9sG018696@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Reply-To: chas3@users.sourceforge.net Subject: Re: [PATCH][ATM][3/3] assorted changes for atm In-reply-to: Your message of "Tue, 17 Jun 2003 10:31:45 PDT." <20030617.103145.26534124.davem@redhat.com> Date: Fri, 20 Jun 2003 16:53:05 -0400 From: chas williams X-Spam-Score: () hits=-0.3 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3445 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev here is yet another version of the vcc sklist conversion. thanks to acme@conectiva.com.br for pointing out my brain damage about testing the iterator during list traversal. # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1359 -> 1.1360 # drivers/atm/he.c 1.15 -> 1.16 # net/atm/atm_misc.c 1.7 -> 1.8 # drivers/atm/eni.c 1.17 -> 1.18 # net/atm/proc.c 1.20 -> 1.21 # net/atm/pvc.c 1.16 -> 1.17 # drivers/atm/idt77252.c 1.17 -> 1.18 # net/atm/lec.c 1.29 -> 1.30 # drivers/atm/atmtcp.c 1.10 -> 1.11 # net/atm/svc.c 1.18 -> 1.19 # net/atm/common.h 1.12 -> 1.13 # net/atm/signaling.c 1.14 -> 1.15 # net/atm/resources.h 1.7 -> 1.8 # net/atm/mpc.c 1.20 -> 1.21 # net/atm/resources.c 1.13 -> 1.14 # net/atm/clip.c 1.17 -> 1.18 # drivers/atm/fore200e.c 1.18 -> 1.19 # net/atm/common.c 1.35 -> 1.36 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/20 chas@relax.cmf.nrl.navy.mil 1.1360 # move vcc's to global sk-based linked list # -------------------------------------------- # diff -Nru a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c --- a/drivers/atm/atmtcp.c Fri Jun 20 16:53:20 2003 +++ b/drivers/atm/atmtcp.c Fri Jun 20 16:53:20 2003 @@ -153,9 +153,10 @@ static int atmtcp_v_ioctl(struct atm_dev *dev,unsigned int cmd,void *arg) { - unsigned long flags; struct atm_cirange ci; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; if (cmd != ATM_SETCIRANGE) return -ENOIOCTLCMD; if (copy_from_user(&ci,(void *) arg,sizeof(ci))) return -EFAULT; @@ -163,14 +164,18 @@ if (ci.vci_bits == ATM_CI_MAX) ci.vci_bits = MAX_VCI_BITS; if (ci.vpi_bits > MAX_VPI_BITS || ci.vpi_bits < 0 || ci.vci_bits > MAX_VCI_BITS || ci.vci_bits < 0) return -EINVAL; - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; if ((vcc->vpi >> ci.vpi_bits) || (vcc->vci >> ci.vci_bits)) { - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EBUSY; } - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); dev->ci_range = ci; return 0; } @@ -233,9 +238,10 @@ static void atmtcp_c_close(struct atm_vcc *vcc) { - unsigned long flags; struct atm_dev *atmtcp_dev; struct atmtcp_dev_data *dev_data; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; atmtcp_dev = (struct atm_dev *) vcc->dev_data; @@ -246,19 +252,24 @@ kfree(dev_data); shutdown_atm_dev(atmtcp_dev); vcc->dev_data = NULL; - spin_lock_irqsave(&atmtcp_dev->lock, flags); - for (walk = atmtcp_dev->vccs; walk; walk = walk->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != atmtcp_dev) + continue; wake_up(&walk->sleep); - spin_unlock_irqrestore(&atmtcp_dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); } static int atmtcp_c_send(struct atm_vcc *vcc,struct sk_buff *skb) { - unsigned long flags; struct atm_dev *dev; struct atmtcp_hdr *hdr; - struct atm_vcc *out_vcc; + struct sock *s; + struct hlist_node *node; + struct atm_vcc *out_vcc = NULL; struct sk_buff *new_skb; int result = 0; @@ -270,13 +281,17 @@ (struct atmtcp_control *) skb->data); goto done; } - spin_lock_irqsave(&dev->lock, flags); - for (out_vcc = dev->vccs; out_vcc; out_vcc = out_vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + out_vcc = atm_sk(s); + if (out_vcc->dev != dev) + continue; if (out_vcc->vpi == ntohs(hdr->vpi) && out_vcc->vci == ntohs(hdr->vci) && out_vcc->qos.rxtp.traffic_class != ATM_NONE) break; - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); if (!out_vcc) { atomic_inc(&vcc->stats->tx_err); goto done; @@ -366,7 +381,7 @@ if (itf != -1) dev = atm_dev_lookup(itf); if (dev) { if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } if (PRIV(dev)->vcc) return -EBUSY; @@ -378,7 +393,8 @@ if (error) return error; } PRIV(dev)->vcc = vcc; - bind_vcc(vcc,&atmtcp_control_dev); + vcc->dev = &atmtcp_control_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); vcc->dev_data = dev; @@ -402,7 +418,7 @@ dev = atm_dev_lookup(itf); if (!dev) return -ENODEV; if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } dev_data = PRIV(dev); @@ -410,7 +426,7 @@ dev_data->persist = 0; if (PRIV(dev)->vcc) return 0; kfree(dev_data); - atm_dev_release(dev); + atm_dev_put(dev); shutdown_atm_dev(dev); return 0; } diff -Nru a/drivers/atm/eni.c b/drivers/atm/eni.c --- a/drivers/atm/eni.c Fri Jun 20 16:53:20 2003 +++ b/drivers/atm/eni.c Fri Jun 20 16:53:20 2003 @@ -1887,10 +1887,11 @@ static int get_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) *vpi = 0; if (*vci == ATM_VCI_ANY) { for (*vci = ATM_NOT_RSV_VCI; *vci < NR_VCI; (*vci)++) { @@ -1898,40 +1899,48 @@ ENI_DEV(vcc->dev)->rx_map[*vci]) continue; if (vcc->qos.txtp.traffic_class != ATM_NONE) { - for (walk = vcc->dev->vccs; walk; - walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) break; - if (walk) continue; + } + if (node) + continue; } break; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return *vci == NR_VCI ? -EADDRINUSE : 0; } if (*vci == ATM_VCI_UNSPEC) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (vcc->qos.rxtp.traffic_class != ATM_NONE && ENI_DEV(vcc->dev)->rx_map[*vci]) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } if (vcc->qos.txtp.traffic_class == ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); return 0; } @@ -2139,7 +2148,8 @@ static int eni_proc_read(struct atm_dev *dev,loff_t *pos,char *page) { - unsigned long flags; + struct hlist_node *node; + struct sock *s; static const char *signal[] = { "LOST","unknown","okay" }; struct eni_dev *eni_dev = ENI_DEV(dev); struct atm_vcc *vcc; @@ -2212,11 +2222,15 @@ return sprintf(page,"%10sbacklog %u packets\n","", skb_queue_len(&tx->backlog)); } - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) { - struct eni_vcc *eni_vcc = ENI_VCC(vcc); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct eni_vcc *eni_vcc; int length; + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; + eni_vcc = ENI_VCC(vcc); if (--left) continue; length = sprintf(page,"vcc %4d: ",vcc->vci); if (eni_vcc->rx) { @@ -2231,10 +2245,10 @@ length += sprintf(page+length,"tx[%d], txing %d bytes", eni_vcc->tx->index,eni_vcc->txing); page[length] = '\n'; - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return length+1; } - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); for (i = 0; i < eni_dev->free_len; i++) { struct eni_free *fe = eni_dev->free_list+i; unsigned long offset; diff -Nru a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c --- a/drivers/atm/fore200e.c Fri Jun 20 16:53:20 2003 +++ b/drivers/atm/fore200e.c Fri Jun 20 16:53:20 2003 @@ -1069,18 +1069,23 @@ static struct atm_vcc* fore200e_find_vcc(struct fore200e* fore200e, struct rpd* rpd) { - unsigned long flags; + struct sock *s; struct atm_vcc* vcc; + struct hlist_node *node; - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { - - if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) - break; + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != fore200e->atm_dev) + continue; + if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) { + read_unlock(&vcc_sklist_lock); + return vcc; + } } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); - - return vcc; + read_unlock(&vcc_sklist_lock); + + return NULL; } @@ -1350,20 +1355,26 @@ static int fore200e_walk_vccs(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; struct atm_vcc* walk; + struct sock *s; + struct hlist_node *node; /* find a free VPI */ - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { - for (*vpi = 0, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vpi = 0; +restart_vpi_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + goto restart_vpi_search; } } } @@ -1371,16 +1382,21 @@ /* find a free VCI */ if (*vci == ATM_VCI_ANY) { - for (*vci = ATM_NOT_RSV_VCI, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vci = ATM_NOT_RSV_VCI; +restart_vci_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vpi = *vpi) && (walk->vci == *vci)) { *vci = walk->vci + 1; - walk = vcc->dev->vccs; + goto restart_vci_search; } } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } @@ -2642,7 +2658,8 @@ static int fore200e_proc_read(struct atm_dev *dev,loff_t* pos,char* page) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct fore200e* fore200e = FORE200E_DEV(dev); int len, left = *pos; @@ -2889,8 +2906,12 @@ len = sprintf(page,"\n" " VCCs:\n address\tVPI.VCI:AAL\t(min/max tx PDU size) (min/max rx PDU size)\n"); - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + + if (vcc->dev != fore200e->atm_dev) + continue; fore200e_vcc = FORE200E_VCC(vcc); @@ -2904,7 +2925,7 @@ fore200e_vcc->rx_max_pdu ); } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); + read_unlock(&vcc_sklist_lock); return len; } diff -Nru a/drivers/atm/he.c b/drivers/atm/he.c --- a/drivers/atm/he.c Fri Jun 20 16:53:20 2003 +++ b/drivers/atm/he.c Fri Jun 20 16:53:20 2003 @@ -79,7 +79,6 @@ #include #define USE_TASKLET -#define USE_HE_FIND_VCC #undef USE_SCATTERGATHER #undef USE_CHECKSUM_HW /* still confused about this */ #define USE_RBPS @@ -328,25 +327,25 @@ he_writel_rcm(dev, val, 0x00000 | (cid << 3) | 7) static __inline__ struct atm_vcc* -he_find_vcc(struct he_dev *he_dev, unsigned cid) +__find_vcc(struct he_dev *he_dev, unsigned cid) { - unsigned long flags; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; short vpi; int vci; vpi = cid >> he_dev->vcibits; vci = cid & ((1 << he_dev->vcibits) - 1); - spin_lock_irqsave(&he_dev->atm_dev->lock, flags); - for (vcc = he_dev->atm_dev->vccs; vcc; vcc = vcc->next) - if (vcc->vci == vci && vcc->vpi == vpi - && vcc->qos.rxtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev == he_dev->atm_dev && + vcc->vci == vci && vcc->vpi == vpi && + vcc->qos.rxtp.traffic_class != ATM_NONE) { return vcc; - } - - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + } + } return NULL; } @@ -1566,17 +1565,6 @@ reg |= RX_ENABLE; he_writel(he_dev, reg, RC_CONFIG); -#ifndef USE_HE_FIND_VCC - he_dev->he_vcc_table = kmalloc(sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits)), GFP_KERNEL); - if (he_dev->he_vcc_table == NULL) { - hprintk("failed to alloc he_vcc_table\n"); - return -ENOMEM; - } - memset(he_dev->he_vcc_table, 0, sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits))); -#endif - for (i = 0; i < HE_NUM_CS_STPER; ++i) { he_dev->cs_stper[i].inuse = 0; he_dev->cs_stper[i].pcr = -1; @@ -1712,11 +1700,6 @@ he_dev->tpd_base, he_dev->tpd_base_phys); #endif -#ifndef USE_HE_FIND_VCC - if (he_dev->he_vcc_table) - kfree(he_dev->he_vcc_table); -#endif - if (he_dev->pci_dev) { pci_read_config_word(he_dev->pci_dev, PCI_COMMAND, &command); command &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); @@ -1798,6 +1781,7 @@ int pdus_assembled = 0; int updated = 0; + read_lock(&vcc_sklist_lock); while (he_dev->rbrq_head != rbrq_tail) { ++updated; @@ -1823,13 +1807,10 @@ buf_len = RBRQ_BUFLEN(he_dev->rbrq_head) * 4; cid = RBRQ_CID(he_dev->rbrq_head); -#ifdef USE_HE_FIND_VCC if (cid != lastcid) - vcc = he_find_vcc(he_dev, cid); + vcc = __find_vcc(he_dev, cid); lastcid = cid; -#else - vcc = HE_LOOKUP_VCC(he_dev, cid); -#endif + if (vcc == NULL) { hprintk("vcc == NULL (cid 0x%x)\n", cid); if (!RBRQ_HBUF_ERR(he_dev->rbrq_head)) @@ -1966,6 +1947,7 @@ RBRQ_MASK(++he_dev->rbrq_head)); } + read_unlock(&vcc_sklist_lock); if (updated) { if (updated > he_dev->rbrq_peak) @@ -2565,10 +2547,6 @@ #endif spin_unlock_irqrestore(&he_dev->global_lock, flags); - -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = vcc; -#endif } open_failed: @@ -2634,9 +2612,6 @@ if (timeout == 0) hprintk("close rx timeout cid 0x%x\n", cid); -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = NULL; -#endif HPRINTK("close rx cid 0x%x complete\n", cid); } diff -Nru a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c --- a/drivers/atm/idt77252.c Fri Jun 20 16:53:20 2003 +++ b/drivers/atm/idt77252.c Fri Jun 20 16:53:20 2003 @@ -2403,37 +2403,43 @@ static int idt77252_find_vcc(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; + struct sock *s; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { *vpi = 0; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } if (*vci == ATM_VCI_ANY) { *vci = ATM_NOT_RSV_VCI; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vci)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/net/atm/atm_misc.c b/net/atm/atm_misc.c --- a/net/atm/atm_misc.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/atm_misc.c Fri Jun 20 16:53:20 2003 @@ -47,15 +47,21 @@ static int check_ci(struct atm_vcc *vcc,short vpi,int vci) { + struct hlist_node *node; + struct sock *s; struct atm_vcc *walk; - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vpi == vpi && walk->vci == vci && ((walk->qos.txtp.traffic_class != ATM_NONE && vcc->qos.txtp.traffic_class != ATM_NONE) || (walk->qos.rxtp.traffic_class != ATM_NONE && vcc->qos.rxtp.traffic_class != ATM_NONE))) return -EADDRINUSE; + } /* allow VCCs with same VPI/VCI iff they don't collide on TX/RX (but we may refuse such sharing for other reasons, e.g. if protocol requires to have both channels) */ @@ -65,17 +71,16 @@ int atm_find_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; static short p = 0; /* poor man's per-device cache */ static int c = 0; short old_p; int old_c; int err; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi != ATM_VPI_ANY && *vci != ATM_VCI_ANY) { err = check_ci(vcc,*vpi,*vci); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return err; } /* last scan may have left values out of bounds for current device */ @@ -90,7 +95,7 @@ if (!check_ci(vcc,p,c)) { *vpi = p; *vci = c; - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (*vci == ATM_VCI_ANY) { @@ -105,7 +110,7 @@ } } while (old_p != p || old_c != c); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/clip.c Fri Jun 20 16:53:20 2003 @@ -737,7 +737,8 @@ set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); /* allow replies and avoid getting closed if signaling dies */ - bind_vcc(vcc,&atmarpd_dev); + vcc->dev = &atmarpd_dev; + vcc_insert_socket(vcc->sk); vcc->push = NULL; vcc->pop = NULL; /* crash */ vcc->push_oam = NULL; /* crash */ diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/common.c Fri Jun 20 16:53:20 2003 @@ -157,6 +157,29 @@ #endif +HLIST_HEAD(vcc_sklist); +rwlock_t vcc_sklist_lock = RW_LOCK_UNLOCKED; + +void __vcc_insert_socket(struct sock *sk) +{ + sk_add_node(sk, &vcc_sklist); +} + +void vcc_insert_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + __vcc_insert_socket(sk); + write_unlock_irq(&vcc_sklist_lock); +} + +void vcc_remove_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + sk_del_node_init(sk); + write_unlock_irq(&vcc_sklist_lock); +} + + static struct sk_buff *alloc_tx(struct atm_vcc *vcc,unsigned int size) { struct sk_buff *skb; @@ -175,16 +198,45 @@ } -int atm_create(struct socket *sock,int protocol,int family) +EXPORT_SYMBOL(vcc_sklist); +EXPORT_SYMBOL(vcc_sklist_lock); +EXPORT_SYMBOL(vcc_insert_socket); +EXPORT_SYMBOL(vcc_remove_socket); + +static void vcc_sock_destruct(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + if (atomic_read(&vcc->sk->sk_rmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: rmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_rmem_alloc)); + + if (atomic_read(&vcc->sk->sk_wmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: wmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_wmem_alloc)); + + kfree(sk->sk_protinfo); +} + +int vcc_create(struct socket *sock, int protocol, int family) { struct sock *sk; struct atm_vcc *vcc; sock->sk = NULL; - if (sock->type == SOCK_STREAM) return -EINVAL; - if (!(sk = alloc_atm_vcc_sk(family))) return -ENOMEM; - vcc = atm_sk(sk); - memset(&vcc->flags,0,sizeof(vcc->flags)); + if (sock->type == SOCK_STREAM) + return -EINVAL; + sk = sk_alloc(family, GFP_KERNEL, 1, NULL); + if (!sk) + return -ENOMEM; + sock_init_data(NULL, sk); + + vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); + if (!vcc) { + sk_free(sk); + return -ENOMEM; + } + + memset(vcc, 0, sizeof(*vcc)); + vcc->sk = sk; vcc->dev = NULL; vcc->callback = NULL; memset(&vcc->local,0,sizeof(struct sockaddr_atmsvc)); @@ -199,42 +251,48 @@ vcc->atm_options = vcc->aal_options = 0; init_waitqueue_head(&vcc->sleep); sk->sk_sleep = &vcc->sleep; + sk->sk_destruct = vcc_sock_destruct; sock->sk = sk; return 0; } -void atm_release_vcc_sk(struct sock *sk,int free_sk) +static void vcc_destroy_socket(struct sock *sk) { struct atm_vcc *vcc = atm_sk(sk); struct sk_buff *skb; - clear_bit(ATM_VF_READY,&vcc->flags); + clear_bit(ATM_VF_READY, &vcc->flags); if (vcc->dev) { - if (vcc->dev->ops->close) vcc->dev->ops->close(vcc); - if (vcc->push) vcc->push(vcc,NULL); /* atmarpd has no push */ + if (vcc->dev->ops->close) + vcc->dev->ops->close(vcc); + if (vcc->push) + vcc->push(vcc, NULL); /* atmarpd has no push */ + + vcc_remove_socket(sk); /* no more receive */ + while ((skb = skb_dequeue(&vcc->sk->sk_receive_queue))) { atm_return(vcc,skb->truesize); kfree_skb(skb); } module_put(vcc->dev->ops->owner); - atm_dev_release(vcc->dev); - if (atomic_read(&vcc->sk->sk_rmem_alloc)) - printk(KERN_WARNING "atm_release_vcc: strange ... " - "rmem_alloc == %d after closing\n", - atomic_read(&vcc->sk->sk_rmem_alloc)); - bind_vcc(vcc,NULL); + atm_dev_put(vcc->dev); } - - if (free_sk) free_atm_vcc_sk(sk); } -int atm_release(struct socket *sock) +int vcc_release(struct socket *sock) { - if (sock->sk) - atm_release_vcc_sk(sock->sk,1); + struct sock *sk = sock->sk; + + if (sk) { + lock_sock(sk); + vcc_destroy_socket(sock->sk); + release_sock(sk); + sock_put(sk); + } + return 0; } @@ -289,7 +347,8 @@ if (vci > 0 && vci < ATM_NOT_RSV_VCI && !capable(CAP_NET_BIND_SERVICE)) return -EPERM; error = 0; - bind_vcc(vcc,dev); + vcc->dev = dev; + vcc_insert_socket(vcc->sk); switch (vcc->qos.aal) { case ATM_AAL0: error = atm_init_aal0(vcc); @@ -313,7 +372,7 @@ if (!error) error = adjust_tp(&vcc->qos.txtp,vcc->qos.aal); if (!error) error = adjust_tp(&vcc->qos.rxtp,vcc->qos.aal); if (error) { - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } DPRINTK("VCC %d.%d, AAL %d\n",vpi,vci,vcc->qos.aal); @@ -327,7 +386,7 @@ error = dev->ops->open(vcc,vpi,vci); if (error) { module_put(dev->ops->owner); - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } } @@ -371,7 +430,7 @@ dev = atm_dev_lookup(itf); error = __vcc_connect(vcc, dev, vpi, vci); if (error) { - atm_dev_release(dev); + atm_dev_put(dev); return error; } } else { @@ -385,7 +444,7 @@ spin_unlock(&atm_dev_lock); if (!__vcc_connect(vcc, dev, vpi, vci)) break; - atm_dev_release(dev); + atm_dev_put(dev); dev = NULL; spin_lock(&atm_dev_lock); } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Fri Jun 20 16:53:20 2003 +++ b/net/atm/common.h Fri Jun 20 16:53:20 2003 @@ -10,8 +10,8 @@ #include /* for poll_table */ -int atm_create(struct socket *sock,int protocol,int family); -int atm_release(struct socket *sock); +int vcc_create(struct socket *sock, int protocol, int family); +int vcc_release(struct socket *sock); int vcc_connect(struct socket *sock, int itf, short vpi, int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); @@ -24,7 +24,6 @@ int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen); -void atm_release_vcc_sk(struct sock *sk,int free_sk); void atm_shutdown_dev(struct atm_dev *dev); int atmpvc_init(void); diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/lec.c Fri Jun 20 16:53:20 2003 @@ -48,7 +48,7 @@ #include "lec.h" #include "lec_arpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" #if 0 #define DPRINTK printk @@ -810,7 +810,8 @@ lec_arp_init(priv); priv->itfnum = i; /* LANE2 addition */ priv->lecd = vcc; - bind_vcc(vcc, &lecatm_dev); + vcc->dev = &lecatm_dev; + vcc_insert_socket(vcc->sk); vcc->proto_data = dev_lec[i]; set_bit(ATM_VF_META,&vcc->flags); diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/mpc.c Fri Jun 20 16:53:20 2003 @@ -28,7 +28,7 @@ #include "lec.h" #include "mpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" /* * mpc.c: Implementation of MPOA client kernel part @@ -789,7 +789,8 @@ } mpc->mpoad_vcc = vcc; - bind_vcc(vcc, &mpc_dev); + vcc->dev = &mpc_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); diff -Nru a/net/atm/proc.c b/net/atm/proc.c --- a/net/atm/proc.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/proc.c Fri Jun 20 16:53:20 2003 @@ -334,9 +334,8 @@ static int atm_pvc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left, clip_info = 0; @@ -349,25 +348,20 @@ if (try_atm_clip_ops()) clip_info = 1; #endif - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMPVC && - vcc->dev && !left--) { - pvc_info(vcc,buf,clip_info); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMPVC && vcc->dev && !left--) { + pvc_info(vcc,buf,clip_info); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) - if (clip_info) - module_put(atm_clip_ops->owner); + if (clip_info) + module_put(atm_clip_ops->owner); #endif - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) if (clip_info) module_put(atm_clip_ops->owner); @@ -378,10 +372,9 @@ static int atm_vc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; int left; if (!pos) @@ -389,20 +382,16 @@ "Address"," Itf VPI VCI Fam Flags Reply Send buffer" " Recv buffer\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (!left--) { - vc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (!left--) { + vc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } @@ -410,29 +399,24 @@ static int atm_svc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left; if (!pos) return sprintf(buf,"Itf VPI VCI State Remote\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMSVC && !left--) { - svc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMSVC && !left--) { + svc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/pvc.c Fri Jun 20 16:53:20 2003 @@ -17,10 +17,6 @@ #include "resources.h" /* devs and vccs */ #include "common.h" /* common for PVCs and SVCs */ -#ifndef NULL -#define NULL 0 -#endif - static int pvc_shutdown(struct socket *sock,int how) { @@ -109,7 +105,7 @@ static struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, - .release = atm_release, + .release = vcc_release, .bind = pvc_bind, .connect = pvc_connect, .socketpair = sock_no_socketpair, @@ -131,7 +127,7 @@ static int pvc_create(struct socket *sock,int protocol) { sock->ops = &pvc_proto_ops; - return atm_create(sock,protocol,PF_ATMPVC); + return vcc_create(sock, protocol, PF_ATMPVC); } diff -Nru a/net/atm/resources.c b/net/atm/resources.c --- a/net/atm/resources.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/resources.c Fri Jun 20 16:53:20 2003 @@ -23,11 +23,6 @@ #include "addr.h" -#ifndef NULL -#define NULL 0 -#endif - - LIST_HEAD(atm_devs); spinlock_t atm_dev_lock = SPIN_LOCK_UNLOCKED; @@ -91,7 +86,7 @@ spin_lock(&atm_dev_lock); if (number != -1) { if ((inuse = __atm_dev_lookup(number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); spin_unlock(&atm_dev_lock); __free_atm_dev(dev); return NULL; @@ -100,7 +95,7 @@ } else { dev->number = 0; while ((inuse = __atm_dev_lookup(dev->number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); dev->number++; } } @@ -402,78 +397,12 @@ else error = 0; done: - atm_dev_release(dev); + atm_dev_put(dev); return error; } -struct sock *alloc_atm_vcc_sk(int family) -{ - struct sock *sk; - struct atm_vcc *vcc; - - sk = sk_alloc(family, GFP_KERNEL, 1, NULL); - if (!sk) - return NULL; - vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); - if (!vcc) { - sk_free(sk); - return NULL; - } - sock_init_data(NULL, sk); - memset(vcc, 0, sizeof(*vcc)); - vcc->sk = sk; - - return sk; -} - - -static void unlink_vcc(struct atm_vcc *vcc) -{ - unsigned long flags; - if (vcc->dev) { - spin_lock_irqsave(&vcc->dev->lock, flags); - if (vcc->prev) - vcc->prev->next = vcc->next; - else - vcc->dev->vccs = vcc->next; - - if (vcc->next) - vcc->next->prev = vcc->prev; - else - vcc->dev->last = vcc->prev; - spin_unlock_irqrestore(&vcc->dev->lock, flags); - } -} - - -void free_atm_vcc_sk(struct sock *sk) -{ - unlink_vcc(atm_sk(sk)); - sk_free(sk); -} - -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev) -{ - unsigned long flags; - - unlink_vcc(vcc); - vcc->dev = dev; - if (dev) { - spin_lock_irqsave(&dev->lock, flags); - vcc->next = NULL; - vcc->prev = dev->last; - if (dev->vccs) - dev->last->next = vcc; - else - dev->vccs = vcc; - dev->last = vcc; - spin_unlock_irqrestore(&dev->lock, flags); - } -} - EXPORT_SYMBOL(atm_dev_register); EXPORT_SYMBOL(atm_dev_deregister); EXPORT_SYMBOL(atm_dev_lookup); EXPORT_SYMBOL(shutdown_atm_dev); -EXPORT_SYMBOL(bind_vcc); diff -Nru a/net/atm/resources.h b/net/atm/resources.h --- a/net/atm/resources.h Fri Jun 20 16:53:20 2003 +++ b/net/atm/resources.h Fri Jun 20 16:53:20 2003 @@ -14,8 +14,6 @@ extern spinlock_t atm_dev_lock; -struct sock *alloc_atm_vcc_sk(int family); -void free_atm_vcc_sk(struct sock *sk); int atm_dev_ioctl(unsigned int cmd, unsigned long arg); diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/signaling.c Fri Jun 20 16:53:20 2003 @@ -200,26 +200,22 @@ } -static void purge_vccs(struct atm_vcc *vcc) +static void purge_vcc(struct atm_vcc *vcc) { - while (vcc) { - if (vcc->sk->sk_family == PF_ATMSVC && - !test_bit(ATM_VF_META,&vcc->flags)) { - set_bit(ATM_VF_RELEASED,&vcc->flags); - vcc->reply = -EUNATCH; - vcc->sk->sk_err = EUNATCH; - wake_up(&vcc->sleep); - } - vcc = vcc->next; + if (vcc->sk->sk_family == PF_ATMSVC && + !test_bit(ATM_VF_META,&vcc->flags)) { + set_bit(ATM_VF_RELEASED,&vcc->flags); + vcc->reply = -EUNATCH; + vcc->sk->sk_err = EUNATCH; + wake_up(&vcc->sleep); } } static void sigd_close(struct atm_vcc *vcc) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; DPRINTK("sigd_close\n"); sigd = NULL; @@ -227,14 +223,14 @@ printk(KERN_ERR "sigd_close: closing with requests pending\n"); skb_queue_purge(&vcc->sk->sk_receive_queue); - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - purge_vccs(dev->vccs); - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct atm_vcc *vcc = atm_sk(s); + + if (vcc->dev) + purge_vcc(vcc); } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); } @@ -257,7 +253,8 @@ if (sigd) return -EADDRINUSE; DPRINTK("sigd_attach\n"); sigd = vcc; - bind_vcc(vcc,&sigd_dev); + vcc->dev = &sigd_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); wake_up(&sigd_sleep); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Fri Jun 20 16:53:20 2003 +++ b/net/atm/svc.c Fri Jun 20 16:53:20 2003 @@ -88,18 +88,21 @@ static int svc_release(struct socket *sock) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; - if (!sock->sk) return 0; - vcc = ATM_SD(sock); - DPRINTK("svc_release %p\n",vcc); - clear_bit(ATM_VF_READY,&vcc->flags); - atm_release_vcc_sk(sock->sk,0); - svc_disconnect(vcc); - /* VCC pointer is used as a reference, so we must not free it - (thereby subjecting it to re-use) before all pending connections - are closed */ - free_atm_vcc_sk(sock->sk); + if (sk) { + vcc = ATM_SD(sock); + DPRINTK("svc_release %p\n", vcc); + clear_bit(ATM_VF_READY, &vcc->flags); + /* VCC pointer is used as a reference, so we must not free it + (thereby subjecting it to re-use) before all pending connections + are closed */ + sock_hold(sk); + vcc_release(sock); + svc_disconnect(vcc); + sock_put(sk); + } return 0; } @@ -542,7 +545,7 @@ int error; sock->ops = &svc_proto_ops; - error = atm_create(sock,protocol,AF_ATMSVC); + error = vcc_create(sock, protocol, AF_ATMSVC); if (error) return error; ATM_SD(sock)->callback = svc_callback; ATM_SD(sock)->local.sas_family = AF_ATMSVC; From chas@locutus.cmf.nrl.navy.mil Fri Jun 20 14:34:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 14:35:02 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5KLYk2x010652 for ; Fri, 20 Jun 2003 14:34:46 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5KLYXsG019262; Fri, 20 Jun 2003 17:34:33 -0400 (EDT) Message-Id: <200306202134.h5KLYXsG019262@ginger.cmf.nrl.navy.mil> To: "David S. Miller" Cc: netdev@oss.sgi.com Reply-To: chas3@users.sourceforge.net Subject: Re: [PATCH][ATM][3/3] assorted changes for atm In-reply-to: Your message of "Fri, 20 Jun 2003 16:53:05 EDT." <200306202055.h5KKt9sG018696@ginger.cmf.nrl.navy.mil> Date: Fri, 20 Jun 2003 17:32:30 -0400 From: chas williams X-Spam-Score: () hits=-0.3 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3446 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev ok, one more time. maybe i could post the complete patch this time. [atm]: move vcc's to global sk-based linked list # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1359 -> 1.1360 # drivers/atm/he.c 1.15 -> 1.16 # net/atm/atm_misc.c 1.7 -> 1.8 # drivers/atm/eni.c 1.17 -> 1.18 # net/atm/proc.c 1.20 -> 1.21 # net/atm/pvc.c 1.16 -> 1.17 # drivers/atm/idt77252.c 1.17 -> 1.18 # net/atm/lec.c 1.29 -> 1.30 # drivers/atm/atmtcp.c 1.10 -> 1.11 # net/atm/svc.c 1.18 -> 1.19 # net/atm/common.h 1.12 -> 1.13 # net/atm/signaling.c 1.14 -> 1.15 # net/atm/resources.h 1.7 -> 1.8 # net/atm/mpc.c 1.20 -> 1.21 # include/linux/atmdev.h 1.18 -> 1.19 # net/atm/resources.c 1.13 -> 1.14 # net/atm/clip.c 1.17 -> 1.18 # drivers/atm/fore200e.c 1.18 -> 1.19 # net/atm/common.c 1.35 -> 1.36 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/20 chas@relax.cmf.nrl.navy.mil 1.1360 # move vcc's to global sk-based linked list # -------------------------------------------- # diff -Nru a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c --- a/drivers/atm/atmtcp.c Fri Jun 20 17:33:36 2003 +++ b/drivers/atm/atmtcp.c Fri Jun 20 17:33:36 2003 @@ -153,9 +153,10 @@ static int atmtcp_v_ioctl(struct atm_dev *dev,unsigned int cmd,void *arg) { - unsigned long flags; struct atm_cirange ci; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; if (cmd != ATM_SETCIRANGE) return -ENOIOCTLCMD; if (copy_from_user(&ci,(void *) arg,sizeof(ci))) return -EFAULT; @@ -163,14 +164,18 @@ if (ci.vci_bits == ATM_CI_MAX) ci.vci_bits = MAX_VCI_BITS; if (ci.vpi_bits > MAX_VPI_BITS || ci.vpi_bits < 0 || ci.vci_bits > MAX_VCI_BITS || ci.vci_bits < 0) return -EINVAL; - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; if ((vcc->vpi >> ci.vpi_bits) || (vcc->vci >> ci.vci_bits)) { - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EBUSY; } - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); dev->ci_range = ci; return 0; } @@ -233,9 +238,10 @@ static void atmtcp_c_close(struct atm_vcc *vcc) { - unsigned long flags; struct atm_dev *atmtcp_dev; struct atmtcp_dev_data *dev_data; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; atmtcp_dev = (struct atm_dev *) vcc->dev_data; @@ -246,19 +252,24 @@ kfree(dev_data); shutdown_atm_dev(atmtcp_dev); vcc->dev_data = NULL; - spin_lock_irqsave(&atmtcp_dev->lock, flags); - for (walk = atmtcp_dev->vccs; walk; walk = walk->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != atmtcp_dev) + continue; wake_up(&walk->sleep); - spin_unlock_irqrestore(&atmtcp_dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); } static int atmtcp_c_send(struct atm_vcc *vcc,struct sk_buff *skb) { - unsigned long flags; struct atm_dev *dev; struct atmtcp_hdr *hdr; - struct atm_vcc *out_vcc; + struct sock *s; + struct hlist_node *node; + struct atm_vcc *out_vcc = NULL; struct sk_buff *new_skb; int result = 0; @@ -270,13 +281,17 @@ (struct atmtcp_control *) skb->data); goto done; } - spin_lock_irqsave(&dev->lock, flags); - for (out_vcc = dev->vccs; out_vcc; out_vcc = out_vcc->next) + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + out_vcc = atm_sk(s); + if (out_vcc->dev != dev) + continue; if (out_vcc->vpi == ntohs(hdr->vpi) && out_vcc->vci == ntohs(hdr->vci) && out_vcc->qos.rxtp.traffic_class != ATM_NONE) break; - spin_unlock_irqrestore(&dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); if (!out_vcc) { atomic_inc(&vcc->stats->tx_err); goto done; @@ -366,7 +381,7 @@ if (itf != -1) dev = atm_dev_lookup(itf); if (dev) { if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } if (PRIV(dev)->vcc) return -EBUSY; @@ -378,7 +393,8 @@ if (error) return error; } PRIV(dev)->vcc = vcc; - bind_vcc(vcc,&atmtcp_control_dev); + vcc->dev = &atmtcp_control_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); vcc->dev_data = dev; @@ -402,7 +418,7 @@ dev = atm_dev_lookup(itf); if (!dev) return -ENODEV; if (dev->ops != &atmtcp_v_dev_ops) { - atm_dev_release(dev); + atm_dev_put(dev); return -EMEDIUMTYPE; } dev_data = PRIV(dev); @@ -410,7 +426,7 @@ dev_data->persist = 0; if (PRIV(dev)->vcc) return 0; kfree(dev_data); - atm_dev_release(dev); + atm_dev_put(dev); shutdown_atm_dev(dev); return 0; } diff -Nru a/drivers/atm/eni.c b/drivers/atm/eni.c --- a/drivers/atm/eni.c Fri Jun 20 17:33:36 2003 +++ b/drivers/atm/eni.c Fri Jun 20 17:33:36 2003 @@ -1887,10 +1887,11 @@ static int get_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) *vpi = 0; if (*vci == ATM_VCI_ANY) { for (*vci = ATM_NOT_RSV_VCI; *vci < NR_VCI; (*vci)++) { @@ -1898,40 +1899,48 @@ ENI_DEV(vcc->dev)->rx_map[*vci]) continue; if (vcc->qos.txtp.traffic_class != ATM_NONE) { - for (walk = vcc->dev->vccs; walk; - walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) break; - if (walk) continue; + } + if (node) + continue; } break; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return *vci == NR_VCI ? -EADDRINUSE : 0; } if (*vci == ATM_VCI_UNSPEC) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (vcc->qos.rxtp.traffic_class != ATM_NONE && ENI_DEV(vcc->dev)->rx_map[*vci]) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } if (vcc->qos.txtp.traffic_class == ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vci == *vci && walk->qos.txtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + } + read_unlock(&vcc_sklist_lock); return 0; } @@ -2139,7 +2148,8 @@ static int eni_proc_read(struct atm_dev *dev,loff_t *pos,char *page) { - unsigned long flags; + struct hlist_node *node; + struct sock *s; static const char *signal[] = { "LOST","unknown","okay" }; struct eni_dev *eni_dev = ENI_DEV(dev); struct atm_vcc *vcc; @@ -2212,11 +2222,15 @@ return sprintf(page,"%10sbacklog %u packets\n","", skb_queue_len(&tx->backlog)); } - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) { - struct eni_vcc *eni_vcc = ENI_VCC(vcc); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct eni_vcc *eni_vcc; int length; + vcc = atm_sk(s); + if (vcc->dev != dev) + continue; + eni_vcc = ENI_VCC(vcc); if (--left) continue; length = sprintf(page,"vcc %4d: ",vcc->vci); if (eni_vcc->rx) { @@ -2231,10 +2245,10 @@ length += sprintf(page+length,"tx[%d], txing %d bytes", eni_vcc->tx->index,eni_vcc->txing); page[length] = '\n'; - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); return length+1; } - spin_unlock_irqrestore(&dev->lock, flags); + read_unlock(&vcc_sklist_lock); for (i = 0; i < eni_dev->free_len; i++) { struct eni_free *fe = eni_dev->free_list+i; unsigned long offset; diff -Nru a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c --- a/drivers/atm/fore200e.c Fri Jun 20 17:33:36 2003 +++ b/drivers/atm/fore200e.c Fri Jun 20 17:33:36 2003 @@ -1069,18 +1069,23 @@ static struct atm_vcc* fore200e_find_vcc(struct fore200e* fore200e, struct rpd* rpd) { - unsigned long flags; + struct sock *s; struct atm_vcc* vcc; + struct hlist_node *node; - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { - - if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) - break; + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev != fore200e->atm_dev) + continue; + if (vcc->vpi == rpd->atm_header.vpi && vcc->vci == rpd->atm_header.vci) { + read_unlock(&vcc_sklist_lock); + return vcc; + } } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); - - return vcc; + read_unlock(&vcc_sklist_lock); + + return NULL; } @@ -1350,20 +1355,26 @@ static int fore200e_walk_vccs(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; struct atm_vcc* walk; + struct sock *s; + struct hlist_node *node; /* find a free VPI */ - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { - for (*vpi = 0, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vpi = 0; +restart_vpi_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + goto restart_vpi_search; } } } @@ -1371,16 +1382,21 @@ /* find a free VCI */ if (*vci == ATM_VCI_ANY) { - for (*vci = ATM_NOT_RSV_VCI, walk = vcc->dev->vccs; walk; walk = walk->next) { + *vci = ATM_NOT_RSV_VCI; +restart_vci_search: + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vpi = *vpi) && (walk->vci == *vci)) { *vci = walk->vci + 1; - walk = vcc->dev->vccs; + goto restart_vci_search; } } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } @@ -2642,7 +2658,8 @@ static int fore200e_proc_read(struct atm_dev *dev,loff_t* pos,char* page) { - unsigned long flags; + struct sock *s; + struct hlist_node *node; struct fore200e* fore200e = FORE200E_DEV(dev); int len, left = *pos; @@ -2889,8 +2906,12 @@ len = sprintf(page,"\n" " VCCs:\n address\tVPI.VCI:AAL\t(min/max tx PDU size) (min/max rx PDU size)\n"); - spin_lock_irqsave(&fore200e->atm_dev->lock, flags); - for (vcc = fore200e->atm_dev->vccs; vcc; vcc = vcc->next) { + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + + if (vcc->dev != fore200e->atm_dev) + continue; fore200e_vcc = FORE200E_VCC(vcc); @@ -2904,7 +2925,7 @@ fore200e_vcc->rx_max_pdu ); } - spin_unlock_irqrestore(&fore200e->atm_dev->lock, flags); + read_unlock(&vcc_sklist_lock); return len; } diff -Nru a/drivers/atm/he.c b/drivers/atm/he.c --- a/drivers/atm/he.c Fri Jun 20 17:33:36 2003 +++ b/drivers/atm/he.c Fri Jun 20 17:33:36 2003 @@ -79,7 +79,6 @@ #include #define USE_TASKLET -#define USE_HE_FIND_VCC #undef USE_SCATTERGATHER #undef USE_CHECKSUM_HW /* still confused about this */ #define USE_RBPS @@ -328,25 +327,25 @@ he_writel_rcm(dev, val, 0x00000 | (cid << 3) | 7) static __inline__ struct atm_vcc* -he_find_vcc(struct he_dev *he_dev, unsigned cid) +__find_vcc(struct he_dev *he_dev, unsigned cid) { - unsigned long flags; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; short vpi; int vci; vpi = cid >> he_dev->vcibits; vci = cid & ((1 << he_dev->vcibits) - 1); - spin_lock_irqsave(&he_dev->atm_dev->lock, flags); - for (vcc = he_dev->atm_dev->vccs; vcc; vcc = vcc->next) - if (vcc->vci == vci && vcc->vpi == vpi - && vcc->qos.rxtp.traffic_class != ATM_NONE) { - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->dev == he_dev->atm_dev && + vcc->vci == vci && vcc->vpi == vpi && + vcc->qos.rxtp.traffic_class != ATM_NONE) { return vcc; - } - - spin_unlock_irqrestore(&he_dev->atm_dev->lock, flags); + } + } return NULL; } @@ -1566,17 +1565,6 @@ reg |= RX_ENABLE; he_writel(he_dev, reg, RC_CONFIG); -#ifndef USE_HE_FIND_VCC - he_dev->he_vcc_table = kmalloc(sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits)), GFP_KERNEL); - if (he_dev->he_vcc_table == NULL) { - hprintk("failed to alloc he_vcc_table\n"); - return -ENOMEM; - } - memset(he_dev->he_vcc_table, 0, sizeof(struct he_vcc_table) * - (1 << (he_dev->vcibits + he_dev->vpibits))); -#endif - for (i = 0; i < HE_NUM_CS_STPER; ++i) { he_dev->cs_stper[i].inuse = 0; he_dev->cs_stper[i].pcr = -1; @@ -1712,11 +1700,6 @@ he_dev->tpd_base, he_dev->tpd_base_phys); #endif -#ifndef USE_HE_FIND_VCC - if (he_dev->he_vcc_table) - kfree(he_dev->he_vcc_table); -#endif - if (he_dev->pci_dev) { pci_read_config_word(he_dev->pci_dev, PCI_COMMAND, &command); command &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); @@ -1798,6 +1781,7 @@ int pdus_assembled = 0; int updated = 0; + read_lock(&vcc_sklist_lock); while (he_dev->rbrq_head != rbrq_tail) { ++updated; @@ -1823,13 +1807,10 @@ buf_len = RBRQ_BUFLEN(he_dev->rbrq_head) * 4; cid = RBRQ_CID(he_dev->rbrq_head); -#ifdef USE_HE_FIND_VCC if (cid != lastcid) - vcc = he_find_vcc(he_dev, cid); + vcc = __find_vcc(he_dev, cid); lastcid = cid; -#else - vcc = HE_LOOKUP_VCC(he_dev, cid); -#endif + if (vcc == NULL) { hprintk("vcc == NULL (cid 0x%x)\n", cid); if (!RBRQ_HBUF_ERR(he_dev->rbrq_head)) @@ -1966,6 +1947,7 @@ RBRQ_MASK(++he_dev->rbrq_head)); } + read_unlock(&vcc_sklist_lock); if (updated) { if (updated > he_dev->rbrq_peak) @@ -2565,10 +2547,6 @@ #endif spin_unlock_irqrestore(&he_dev->global_lock, flags); - -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = vcc; -#endif } open_failed: @@ -2634,9 +2612,6 @@ if (timeout == 0) hprintk("close rx timeout cid 0x%x\n", cid); -#ifndef USE_HE_FIND_VCC - HE_LOOKUP_VCC(he_dev, cid) = NULL; -#endif HPRINTK("close rx cid 0x%x complete\n", cid); } diff -Nru a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c --- a/drivers/atm/idt77252.c Fri Jun 20 17:33:36 2003 +++ b/drivers/atm/idt77252.c Fri Jun 20 17:33:36 2003 @@ -2403,37 +2403,43 @@ static int idt77252_find_vcc(struct atm_vcc *vcc, short *vpi, int *vci) { - unsigned long flags; + struct sock *s; struct atm_vcc *walk; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi == ATM_VPI_ANY) { *vpi = 0; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vpi)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } if (*vci == ATM_VCI_ANY) { *vci = ATM_NOT_RSV_VCI; - walk = vcc->dev->vccs; - while (walk) { + s = sk_head(&vcc_sklist); + while (s) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if ((walk->vci == *vci) && (walk->vpi == *vpi)) { (*vci)++; - walk = vcc->dev->vccs; + s = sk_head(&vcc_sklist); continue; } - walk = walk->next; + s = sk_next(s); } } - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Fri Jun 20 17:33:36 2003 +++ b/include/linux/atmdev.h Fri Jun 20 17:33:36 2003 @@ -293,7 +293,6 @@ struct k_atm_aal_stats *stats; /* pointer to AAL stats group */ wait_queue_head_t sleep; /* if socket is busy */ struct sock *sk; /* socket backpointer */ - struct atm_vcc *prev,*next; /* SVC part --- may move later ------------------------------------- */ short itf; /* interface number */ struct sockaddr_atmsvc local; @@ -320,8 +319,6 @@ /* (NULL) */ const char *type; /* device type name */ int number; /* device index */ - struct atm_vcc *vccs; /* VCC table (or NULL) */ - struct atm_vcc *last; /* last VCC (or undefined) */ void *dev_data; /* per-device data */ void *phy_data; /* private PHY date */ unsigned long flags; /* device flags (ATM_DF_*) */ @@ -390,6 +387,9 @@ unsigned long atm_options; /* ATM layer options */ }; +extern struct hlist_head vcc_sklist; +extern rwlock_t vcc_sklist_lock; + #define ATM_SKB(skb) (((struct atm_skb_data *) (skb)->cb)) struct atm_dev *atm_dev_register(const char *type,const struct atmdev_ops *ops, @@ -397,7 +397,8 @@ struct atm_dev *atm_dev_lookup(int number); void atm_dev_deregister(struct atm_dev *dev); void shutdown_atm_dev(struct atm_dev *dev); -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev); +void vcc_insert_socket(struct sock *sk); +void vcc_remove_socket(struct sock *sk); /* @@ -436,7 +437,7 @@ } -static inline void atm_dev_release(struct atm_dev *dev) +static inline void atm_dev_put(struct atm_dev *dev) { atomic_dec(&dev->refcnt); diff -Nru a/net/atm/atm_misc.c b/net/atm/atm_misc.c --- a/net/atm/atm_misc.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/atm_misc.c Fri Jun 20 17:33:36 2003 @@ -47,15 +47,21 @@ static int check_ci(struct atm_vcc *vcc,short vpi,int vci) { + struct hlist_node *node; + struct sock *s; struct atm_vcc *walk; - for (walk = vcc->dev->vccs; walk; walk = walk->next) + sk_for_each(s, node, &vcc_sklist) { + walk = atm_sk(s); + if (walk->dev != vcc->dev) + continue; if (test_bit(ATM_VF_ADDR,&walk->flags) && walk->vpi == vpi && walk->vci == vci && ((walk->qos.txtp.traffic_class != ATM_NONE && vcc->qos.txtp.traffic_class != ATM_NONE) || (walk->qos.rxtp.traffic_class != ATM_NONE && vcc->qos.rxtp.traffic_class != ATM_NONE))) return -EADDRINUSE; + } /* allow VCCs with same VPI/VCI iff they don't collide on TX/RX (but we may refuse such sharing for other reasons, e.g. if protocol requires to have both channels) */ @@ -65,17 +71,16 @@ int atm_find_ci(struct atm_vcc *vcc,short *vpi,int *vci) { - unsigned long flags; static short p = 0; /* poor man's per-device cache */ static int c = 0; short old_p; int old_c; int err; - spin_lock_irqsave(&vcc->dev->lock, flags); + read_lock(&vcc_sklist_lock); if (*vpi != ATM_VPI_ANY && *vci != ATM_VCI_ANY) { err = check_ci(vcc,*vpi,*vci); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return err; } /* last scan may have left values out of bounds for current device */ @@ -90,7 +95,7 @@ if (!check_ci(vcc,p,c)) { *vpi = p; *vci = c; - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return 0; } if (*vci == ATM_VCI_ANY) { @@ -105,7 +110,7 @@ } } while (old_p != p || old_c != c); - spin_unlock_irqrestore(&vcc->dev->lock, flags); + read_unlock(&vcc_sklist_lock); return -EADDRINUSE; } diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/clip.c Fri Jun 20 17:33:36 2003 @@ -737,7 +737,8 @@ set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); /* allow replies and avoid getting closed if signaling dies */ - bind_vcc(vcc,&atmarpd_dev); + vcc->dev = &atmarpd_dev; + vcc_insert_socket(vcc->sk); vcc->push = NULL; vcc->pop = NULL; /* crash */ vcc->push_oam = NULL; /* crash */ diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/common.c Fri Jun 20 17:33:36 2003 @@ -157,6 +157,29 @@ #endif +HLIST_HEAD(vcc_sklist); +rwlock_t vcc_sklist_lock = RW_LOCK_UNLOCKED; + +void __vcc_insert_socket(struct sock *sk) +{ + sk_add_node(sk, &vcc_sklist); +} + +void vcc_insert_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + __vcc_insert_socket(sk); + write_unlock_irq(&vcc_sklist_lock); +} + +void vcc_remove_socket(struct sock *sk) +{ + write_lock_irq(&vcc_sklist_lock); + sk_del_node_init(sk); + write_unlock_irq(&vcc_sklist_lock); +} + + static struct sk_buff *alloc_tx(struct atm_vcc *vcc,unsigned int size) { struct sk_buff *skb; @@ -175,16 +198,45 @@ } -int atm_create(struct socket *sock,int protocol,int family) +EXPORT_SYMBOL(vcc_sklist); +EXPORT_SYMBOL(vcc_sklist_lock); +EXPORT_SYMBOL(vcc_insert_socket); +EXPORT_SYMBOL(vcc_remove_socket); + +static void vcc_sock_destruct(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + if (atomic_read(&vcc->sk->sk_rmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: rmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_rmem_alloc)); + + if (atomic_read(&vcc->sk->sk_wmem_alloc)) + printk(KERN_DEBUG "vcc_sock_destruct: wmem leakage (%d bytes) detected.\n", atomic_read(&sk->sk_wmem_alloc)); + + kfree(sk->sk_protinfo); +} + +int vcc_create(struct socket *sock, int protocol, int family) { struct sock *sk; struct atm_vcc *vcc; sock->sk = NULL; - if (sock->type == SOCK_STREAM) return -EINVAL; - if (!(sk = alloc_atm_vcc_sk(family))) return -ENOMEM; - vcc = atm_sk(sk); - memset(&vcc->flags,0,sizeof(vcc->flags)); + if (sock->type == SOCK_STREAM) + return -EINVAL; + sk = sk_alloc(family, GFP_KERNEL, 1, NULL); + if (!sk) + return -ENOMEM; + sock_init_data(NULL, sk); + + vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); + if (!vcc) { + sk_free(sk); + return -ENOMEM; + } + + memset(vcc, 0, sizeof(*vcc)); + vcc->sk = sk; vcc->dev = NULL; vcc->callback = NULL; memset(&vcc->local,0,sizeof(struct sockaddr_atmsvc)); @@ -199,42 +251,48 @@ vcc->atm_options = vcc->aal_options = 0; init_waitqueue_head(&vcc->sleep); sk->sk_sleep = &vcc->sleep; + sk->sk_destruct = vcc_sock_destruct; sock->sk = sk; return 0; } -void atm_release_vcc_sk(struct sock *sk,int free_sk) +static void vcc_destroy_socket(struct sock *sk) { struct atm_vcc *vcc = atm_sk(sk); struct sk_buff *skb; - clear_bit(ATM_VF_READY,&vcc->flags); + clear_bit(ATM_VF_READY, &vcc->flags); if (vcc->dev) { - if (vcc->dev->ops->close) vcc->dev->ops->close(vcc); - if (vcc->push) vcc->push(vcc,NULL); /* atmarpd has no push */ + if (vcc->dev->ops->close) + vcc->dev->ops->close(vcc); + if (vcc->push) + vcc->push(vcc, NULL); /* atmarpd has no push */ + + vcc_remove_socket(sk); /* no more receive */ + while ((skb = skb_dequeue(&vcc->sk->sk_receive_queue))) { atm_return(vcc,skb->truesize); kfree_skb(skb); } module_put(vcc->dev->ops->owner); - atm_dev_release(vcc->dev); - if (atomic_read(&vcc->sk->sk_rmem_alloc)) - printk(KERN_WARNING "atm_release_vcc: strange ... " - "rmem_alloc == %d after closing\n", - atomic_read(&vcc->sk->sk_rmem_alloc)); - bind_vcc(vcc,NULL); + atm_dev_put(vcc->dev); } - - if (free_sk) free_atm_vcc_sk(sk); } -int atm_release(struct socket *sock) +int vcc_release(struct socket *sock) { - if (sock->sk) - atm_release_vcc_sk(sock->sk,1); + struct sock *sk = sock->sk; + + if (sk) { + lock_sock(sk); + vcc_destroy_socket(sock->sk); + release_sock(sk); + sock_put(sk); + } + return 0; } @@ -289,7 +347,8 @@ if (vci > 0 && vci < ATM_NOT_RSV_VCI && !capable(CAP_NET_BIND_SERVICE)) return -EPERM; error = 0; - bind_vcc(vcc,dev); + vcc->dev = dev; + vcc_insert_socket(vcc->sk); switch (vcc->qos.aal) { case ATM_AAL0: error = atm_init_aal0(vcc); @@ -313,7 +372,7 @@ if (!error) error = adjust_tp(&vcc->qos.txtp,vcc->qos.aal); if (!error) error = adjust_tp(&vcc->qos.rxtp,vcc->qos.aal); if (error) { - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } DPRINTK("VCC %d.%d, AAL %d\n",vpi,vci,vcc->qos.aal); @@ -327,7 +386,7 @@ error = dev->ops->open(vcc,vpi,vci); if (error) { module_put(dev->ops->owner); - bind_vcc(vcc,NULL); + vcc_remove_socket(vcc->sk); return error; } } @@ -371,7 +430,7 @@ dev = atm_dev_lookup(itf); error = __vcc_connect(vcc, dev, vpi, vci); if (error) { - atm_dev_release(dev); + atm_dev_put(dev); return error; } } else { @@ -385,7 +444,7 @@ spin_unlock(&atm_dev_lock); if (!__vcc_connect(vcc, dev, vpi, vci)) break; - atm_dev_release(dev); + atm_dev_put(dev); dev = NULL; spin_lock(&atm_dev_lock); } diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Fri Jun 20 17:33:36 2003 +++ b/net/atm/common.h Fri Jun 20 17:33:36 2003 @@ -10,8 +10,8 @@ #include /* for poll_table */ -int atm_create(struct socket *sock,int protocol,int family); -int atm_release(struct socket *sock); +int vcc_create(struct socket *sock, int protocol, int family); +int vcc_release(struct socket *sock); int vcc_connect(struct socket *sock, int itf, short vpi, int vci); int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags); @@ -24,7 +24,6 @@ int vcc_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen); -void atm_release_vcc_sk(struct sock *sk,int free_sk); void atm_shutdown_dev(struct atm_dev *dev); int atmpvc_init(void); diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/lec.c Fri Jun 20 17:33:36 2003 @@ -48,7 +48,7 @@ #include "lec.h" #include "lec_arpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" #if 0 #define DPRINTK printk @@ -810,7 +810,8 @@ lec_arp_init(priv); priv->itfnum = i; /* LANE2 addition */ priv->lecd = vcc; - bind_vcc(vcc, &lecatm_dev); + vcc->dev = &lecatm_dev; + vcc_insert_socket(vcc->sk); vcc->proto_data = dev_lec[i]; set_bit(ATM_VF_META,&vcc->flags); diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/mpc.c Fri Jun 20 17:33:36 2003 @@ -28,7 +28,7 @@ #include "lec.h" #include "mpc.h" -#include "resources.h" /* for bind_vcc() */ +#include "resources.h" /* * mpc.c: Implementation of MPOA client kernel part @@ -789,7 +789,8 @@ } mpc->mpoad_vcc = vcc; - bind_vcc(vcc, &mpc_dev); + vcc->dev = &mpc_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); diff -Nru a/net/atm/proc.c b/net/atm/proc.c --- a/net/atm/proc.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/proc.c Fri Jun 20 17:33:36 2003 @@ -334,9 +334,8 @@ static int atm_pvc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left, clip_info = 0; @@ -349,25 +348,20 @@ if (try_atm_clip_ops()) clip_info = 1; #endif - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMPVC && - vcc->dev && !left--) { - pvc_info(vcc,buf,clip_info); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMPVC && vcc->dev && !left--) { + pvc_info(vcc,buf,clip_info); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) - if (clip_info) - module_put(atm_clip_ops->owner); + if (clip_info) + module_put(atm_clip_ops->owner); #endif - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); #if defined(CONFIG_ATM_CLIP) || defined(CONFIG_ATM_CLIP_MODULE) if (clip_info) module_put(atm_clip_ops->owner); @@ -378,10 +372,9 @@ static int atm_vc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; struct atm_vcc *vcc; + struct hlist_node *node; + struct sock *s; int left; if (!pos) @@ -389,20 +382,16 @@ "Address"," Itf VPI VCI Fam Flags Reply Send buffer" " Recv buffer\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (!left--) { - vc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (!left--) { + vc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } @@ -410,29 +399,24 @@ static int atm_svc_info(loff_t pos,char *buf) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; struct atm_vcc *vcc; int left; if (!pos) return sprintf(buf,"Itf VPI VCI State Remote\n"); left = pos-1; - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - for (vcc = dev->vccs; vcc; vcc = vcc->next) - if (vcc->sk->sk_family == PF_ATMSVC && !left--) { - svc_info(vcc,buf); - spin_unlock_irqrestore(&dev->lock, flags); - spin_unlock(&atm_dev_lock); - return strlen(buf); - } - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + vcc = atm_sk(s); + if (vcc->sk->sk_family == PF_ATMSVC && !left--) { + svc_info(vcc,buf); + read_unlock(&vcc_sklist_lock); + return strlen(buf); + } } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); return 0; } diff -Nru a/net/atm/pvc.c b/net/atm/pvc.c --- a/net/atm/pvc.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/pvc.c Fri Jun 20 17:33:36 2003 @@ -17,10 +17,6 @@ #include "resources.h" /* devs and vccs */ #include "common.h" /* common for PVCs and SVCs */ -#ifndef NULL -#define NULL 0 -#endif - static int pvc_shutdown(struct socket *sock,int how) { @@ -109,7 +105,7 @@ static struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, - .release = atm_release, + .release = vcc_release, .bind = pvc_bind, .connect = pvc_connect, .socketpair = sock_no_socketpair, @@ -131,7 +127,7 @@ static int pvc_create(struct socket *sock,int protocol) { sock->ops = &pvc_proto_ops; - return atm_create(sock,protocol,PF_ATMPVC); + return vcc_create(sock, protocol, PF_ATMPVC); } diff -Nru a/net/atm/resources.c b/net/atm/resources.c --- a/net/atm/resources.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/resources.c Fri Jun 20 17:33:36 2003 @@ -23,11 +23,6 @@ #include "addr.h" -#ifndef NULL -#define NULL 0 -#endif - - LIST_HEAD(atm_devs); spinlock_t atm_dev_lock = SPIN_LOCK_UNLOCKED; @@ -91,7 +86,7 @@ spin_lock(&atm_dev_lock); if (number != -1) { if ((inuse = __atm_dev_lookup(number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); spin_unlock(&atm_dev_lock); __free_atm_dev(dev); return NULL; @@ -100,7 +95,7 @@ } else { dev->number = 0; while ((inuse = __atm_dev_lookup(dev->number))) { - atm_dev_release(inuse); + atm_dev_put(inuse); dev->number++; } } @@ -402,78 +397,12 @@ else error = 0; done: - atm_dev_release(dev); + atm_dev_put(dev); return error; } -struct sock *alloc_atm_vcc_sk(int family) -{ - struct sock *sk; - struct atm_vcc *vcc; - - sk = sk_alloc(family, GFP_KERNEL, 1, NULL); - if (!sk) - return NULL; - vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); - if (!vcc) { - sk_free(sk); - return NULL; - } - sock_init_data(NULL, sk); - memset(vcc, 0, sizeof(*vcc)); - vcc->sk = sk; - - return sk; -} - - -static void unlink_vcc(struct atm_vcc *vcc) -{ - unsigned long flags; - if (vcc->dev) { - spin_lock_irqsave(&vcc->dev->lock, flags); - if (vcc->prev) - vcc->prev->next = vcc->next; - else - vcc->dev->vccs = vcc->next; - - if (vcc->next) - vcc->next->prev = vcc->prev; - else - vcc->dev->last = vcc->prev; - spin_unlock_irqrestore(&vcc->dev->lock, flags); - } -} - - -void free_atm_vcc_sk(struct sock *sk) -{ - unlink_vcc(atm_sk(sk)); - sk_free(sk); -} - -void bind_vcc(struct atm_vcc *vcc,struct atm_dev *dev) -{ - unsigned long flags; - - unlink_vcc(vcc); - vcc->dev = dev; - if (dev) { - spin_lock_irqsave(&dev->lock, flags); - vcc->next = NULL; - vcc->prev = dev->last; - if (dev->vccs) - dev->last->next = vcc; - else - dev->vccs = vcc; - dev->last = vcc; - spin_unlock_irqrestore(&dev->lock, flags); - } -} - EXPORT_SYMBOL(atm_dev_register); EXPORT_SYMBOL(atm_dev_deregister); EXPORT_SYMBOL(atm_dev_lookup); EXPORT_SYMBOL(shutdown_atm_dev); -EXPORT_SYMBOL(bind_vcc); diff -Nru a/net/atm/resources.h b/net/atm/resources.h --- a/net/atm/resources.h Fri Jun 20 17:33:36 2003 +++ b/net/atm/resources.h Fri Jun 20 17:33:36 2003 @@ -14,8 +14,6 @@ extern spinlock_t atm_dev_lock; -struct sock *alloc_atm_vcc_sk(int family); -void free_atm_vcc_sk(struct sock *sk); int atm_dev_ioctl(unsigned int cmd, unsigned long arg); diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/signaling.c Fri Jun 20 17:33:36 2003 @@ -200,26 +200,22 @@ } -static void purge_vccs(struct atm_vcc *vcc) +static void purge_vcc(struct atm_vcc *vcc) { - while (vcc) { - if (vcc->sk->sk_family == PF_ATMSVC && - !test_bit(ATM_VF_META,&vcc->flags)) { - set_bit(ATM_VF_RELEASED,&vcc->flags); - vcc->reply = -EUNATCH; - vcc->sk->sk_err = EUNATCH; - wake_up(&vcc->sleep); - } - vcc = vcc->next; + if (vcc->sk->sk_family == PF_ATMSVC && + !test_bit(ATM_VF_META,&vcc->flags)) { + set_bit(ATM_VF_RELEASED,&vcc->flags); + vcc->reply = -EUNATCH; + vcc->sk->sk_err = EUNATCH; + wake_up(&vcc->sleep); } } static void sigd_close(struct atm_vcc *vcc) { - unsigned long flags; - struct atm_dev *dev; - struct list_head *p; + struct hlist_node *node; + struct sock *s; DPRINTK("sigd_close\n"); sigd = NULL; @@ -227,14 +223,14 @@ printk(KERN_ERR "sigd_close: closing with requests pending\n"); skb_queue_purge(&vcc->sk->sk_receive_queue); - spin_lock(&atm_dev_lock); - list_for_each(p, &atm_devs) { - dev = list_entry(p, struct atm_dev, dev_list); - spin_lock_irqsave(&dev->lock, flags); - purge_vccs(dev->vccs); - spin_unlock_irqrestore(&dev->lock, flags); + read_lock(&vcc_sklist_lock); + sk_for_each(s, node, &vcc_sklist) { + struct atm_vcc *vcc = atm_sk(s); + + if (vcc->dev) + purge_vcc(vcc); } - spin_unlock(&atm_dev_lock); + read_unlock(&vcc_sklist_lock); } @@ -257,7 +253,8 @@ if (sigd) return -EADDRINUSE; DPRINTK("sigd_attach\n"); sigd = vcc; - bind_vcc(vcc,&sigd_dev); + vcc->dev = &sigd_dev; + vcc_insert_socket(vcc->sk); set_bit(ATM_VF_META,&vcc->flags); set_bit(ATM_VF_READY,&vcc->flags); wake_up(&sigd_sleep); diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Fri Jun 20 17:33:36 2003 +++ b/net/atm/svc.c Fri Jun 20 17:33:36 2003 @@ -88,18 +88,21 @@ static int svc_release(struct socket *sock) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; - if (!sock->sk) return 0; - vcc = ATM_SD(sock); - DPRINTK("svc_release %p\n",vcc); - clear_bit(ATM_VF_READY,&vcc->flags); - atm_release_vcc_sk(sock->sk,0); - svc_disconnect(vcc); - /* VCC pointer is used as a reference, so we must not free it - (thereby subjecting it to re-use) before all pending connections - are closed */ - free_atm_vcc_sk(sock->sk); + if (sk) { + vcc = ATM_SD(sock); + DPRINTK("svc_release %p\n", vcc); + clear_bit(ATM_VF_READY, &vcc->flags); + /* VCC pointer is used as a reference, so we must not free it + (thereby subjecting it to re-use) before all pending connections + are closed */ + sock_hold(sk); + vcc_release(sock); + svc_disconnect(vcc); + sock_put(sk); + } return 0; } @@ -542,7 +545,7 @@ int error; sock->ops = &svc_proto_ops; - error = atm_create(sock,protocol,AF_ATMSVC); + error = vcc_create(sock, protocol, AF_ATMSVC); if (error) return error; ATM_SD(sock)->callback = svc_callback; ATM_SD(sock)->local.sas_family = AF_ATMSVC; From shemminger@osdl.org Fri Jun 20 17:33:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 17:33:33 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5L0XT2x012934 for ; Fri, 20 Jun 2003 17:33:29 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5L0XOX27834 for ; Fri, 20 Jun 2003 17:33:24 -0700 Date: Fri, 20 Jun 2003 17:33:24 -0700 From: Stephen Hemminger To: netdev@oss.sgi.com Subject: [PATCH 2.5.72] (2/3) PPPoE fix oops on /proc/net/pppoe Message-Id: <20030620173324.47071b18.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3449 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Reading /proc/net/pppoe will OOPs because it calls hash_item() when the object pointer (po) is null. diff -Nru a/drivers/net/pppoe.c b/drivers/net/pppoe.c --- a/drivers/net/pppoe.c Fri Jun 20 17:16:43 2003 +++ b/drivers/net/pppoe.c Fri Jun 20 17:16:43 2003 @@ -1015,8 +1015,9 @@ goto out; } po = v; - po = po->next; - if (!po) { + if (po->next) + po = po->next; + else { int hash = hash_item(po->pppoe_pa.sid, po->pppoe_pa.remote); while (++hash < PPPOE_HASH_SIZE) { From shemminger@osdl.org Fri Jun 20 17:32:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 17:32:42 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5L0Wc2x012633 for ; Fri, 20 Jun 2003 17:32:39 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5L0W8X27137; Fri, 20 Jun 2003 17:32:08 -0700 Date: Fri, 20 Jun 2003 17:32:08 -0700 From: Stephen Hemminger To: "David S. Miller" , Andi Kleen Cc: mostrows@speakeasy.net, paulus@au.ibm.com, netdev@oss.sgi.com Subject: [PATCH 2.5.72] (3/3) Convert PPPoE to new style protocol (redeux) Message-Id: <20030620173208.56a8a00c.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3448 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This a redo of Andi's patch to support new style protocol for PPPoE, but without linearizing. Basically, it pushes the pullup logic down into ppp_generic, where needed and adds a length check in the receive path. For the normal case of PPPoE which is uncompressed, it should pass the non-linear socket buffer through to IP. But for the compressed cases, the skbuff is effectively linearized. Tested on 8-way SMP server with one client. diff -urNp -X dontdiff linux-2.5-pppoe/drivers/net/ppp_generic.c pppoe-2.5/drivers/net/ppp_generic.c --- linux-2.5-pppoe/drivers/net/ppp_generic.c 2003-06-20 16:53:43.000000000 -0700 +++ pppoe-2.5/drivers/net/ppp_generic.c 2003-06-20 16:54:51.000000000 -0700 @@ -1348,11 +1348,18 @@ ppp_input(struct ppp_channel *chan, stru struct channel *pch = chan->ppp; int proto; - if (pch == 0 || skb->len == 0) { - kfree_skb(skb); - return; - } + if (pch == 0) + goto drop; + /* need to have PPP header */ + if (!pskb_may_pull(skb, 2)) { + if (pch->ppp) { + ++pch->ppp->stats.rx_length_errors; + ppp_receive_error(pch->ppp); + } + goto drop; + } + proto = PPP_PROTO(skb); read_lock_bh(&pch->upl); if (pch->ppp == 0 || proto >= 0xc000 || proto == PPP_CCPFRAG) { @@ -1367,6 +1374,10 @@ ppp_input(struct ppp_channel *chan, stru ppp_do_recv(pch->ppp, skb, pch); } read_unlock_bh(&pch->upl); + return; + drop: + kfree_skb(skb); + return; } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1398,23 +1409,13 @@ ppp_input_error(struct ppp_channel *chan static void ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch) { - if (skb->len >= 2) { #ifdef CONFIG_PPP_MULTILINK - /* XXX do channel-level decompression here */ - if (PPP_PROTO(skb) == PPP_MP) - ppp_receive_mp_frame(ppp, skb, pch); - else + /* XXX do channel-level decompression here */ + if (PPP_PROTO(skb) == PPP_MP) + ppp_receive_mp_frame(ppp, skb, pch); + else #endif /* CONFIG_PPP_MULTILINK */ - ppp_receive_nonmp_frame(ppp, skb); - return; - } - - if (skb->len > 0) - /* note: a 0-length skb is used as an error indication */ - ++ppp->stats.rx_length_errors; - - kfree_skb(skb); - ppp_receive_error(ppp); + ppp_receive_nonmp_frame(ppp, skb); } static void @@ -1446,7 +1447,8 @@ ppp_receive_nonmp_frame(struct ppp *ppp, /* decompress VJ compressed packets */ if (ppp->vj == 0 || (ppp->flags & SC_REJ_COMP_TCP)) goto err; - if (skb_tailroom(skb) < 124) { + + if (skb_tailroom(skb) < 124 || skb_is_nonlinear(skb) ) { /* copy to a new sk_buff with more tailroom */ ns = dev_alloc_skb(skb->len + 128); if (ns == 0) { @@ -1474,6 +1476,13 @@ ppp_receive_nonmp_frame(struct ppp *ppp, case PPP_VJC_UNCOMP: if (ppp->vj == 0 || (ppp->flags & SC_REJ_COMP_TCP)) goto err; + + /* Until we fix the decompressor need to make sure + * data portion is linear. + */ + if (!pskb_may_pull(skb, skb->len)) + goto err; + if (slhc_remember(ppp->vj, skb->data + 2, skb->len - 2) <= 0) { printk(KERN_ERR "PPP: VJ uncompressed error\n"); goto err; @@ -1551,6 +1560,12 @@ ppp_decompress_frame(struct ppp *ppp, st struct sk_buff *ns; int len; + /* Until we fix all the decompressor's need to make sure + * data portion is linear. + */ + if (!pskb_may_pull(skb, skb->len)) + goto err; + if (proto == PPP_COMP) { ns = dev_alloc_skb(ppp->mru + PPP_HDRLEN); if (ns == 0) { @@ -1603,7 +1618,7 @@ ppp_receive_mp_frame(struct ppp *ppp, st struct list_head *l; int mphdrlen = (ppp->flags & SC_MP_SHORTSEQ)? MPHDRLEN_SSN: MPHDRLEN; - if (skb->len < mphdrlen + 1 || ppp->mrru == 0) + if (!pskb_may_pull(skb, mphdrlen + 1) || ppp->mrru == 0) goto err; /* no good, throw it away */ /* Decode sequence number and begin/end bits */ @@ -2021,7 +2036,7 @@ ppp_ccp_peek(struct ppp *ppp, struct sk_ unsigned char *dp = skb->data + 2; int len; - if (skb->len < CCP_HDRLEN + 2 + if (!pskb_may_pull(skb, CCP_HDRLEN + 2) || skb->len < (len = CCP_LENGTH(dp)) + 2) return; /* too short */ @@ -2056,6 +2071,10 @@ ppp_ccp_peek(struct ppp *ppp, struct sk_ case CCP_CONFACK: if ((ppp->flags & (SC_CCP_OPEN | SC_CCP_UP)) != SC_CCP_OPEN) break; + + if (!pskb_may_pull(skb, len)) + break; + dp += CCP_HDRLEN; len -= CCP_HDRLEN; if (len < CCP_OPT_MINLEN || len < CCP_OPT_LENGTH(dp)) diff -urNp -X dontdiff linux-2.5-pppoe/drivers/net/pppoe.c pppoe-2.5/drivers/net/pppoe.c --- linux-2.5-pppoe/drivers/net/pppoe.c 2003-06-20 17:14:14.000000000 -0700 +++ pppoe-2.5/drivers/net/pppoe.c 2003-06-20 17:16:10.000000000 -0700 @@ -333,7 +333,11 @@ static int pppoe_rcv_core(struct sock *s struct pppox_opt *relay_po = NULL; if (sk->sk_state & PPPOX_BOUND) { + struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; + int len = ntohs(ph->length); skb_pull(skb, sizeof(struct pppoe_hdr)); + skb_trim(skb, len); + ppp_input(&po->chan, skb); } else if (sk->sk_state & PPPOX_RELAY) { relay_po = get_item_by_addr(&po->pppoe_relay); @@ -371,17 +375,22 @@ static int pppoe_rcv(struct sk_buff *skb struct packet_type *pt) { - struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; + struct pppoe_hdr *ph; struct pppox_opt *po; - struct sock *sk ; + struct sock *sk; int ret; - po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source); + if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr))) + goto drop; - if (!po) { - kfree_skb(skb); - return NET_RX_DROP; - } + if (!(skb = skb_share_check(skb, GFP_ATOMIC))) + goto out; + + ph = (struct pppoe_hdr *) skb->nh.raw; + + po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source); + if (!po) + goto drop; sk = po->sk; bh_lock_sock(sk); @@ -398,6 +407,10 @@ static int pppoe_rcv(struct sk_buff *skb sock_put(sk); return ret; +drop: + kfree_skb(skb); +out: + return NET_RX_DROP; } /************************************************************************ @@ -411,9 +424,16 @@ static int pppoe_disc_rcv(struct sk_buff struct packet_type *pt) { - struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; + struct pppoe_hdr *ph; struct pppox_opt *po; + if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr))) + goto abort; + + if (!(skb = skb_share_check(skb, GFP_ATOMIC))) + goto out; + + ph = (struct pppoe_hdr *) skb->nh.raw; if (ph->code != PADT_CODE) goto abort; @@ -441,17 +461,20 @@ static int pppoe_disc_rcv(struct sk_buff abort: kfree_skb(skb); +out: return NET_RX_SUCCESS; /* Lies... :-) */ } static struct packet_type pppoes_ptype = { .type = __constant_htons(ETH_P_PPP_SES), .func = pppoe_rcv, + .data = (void *)1, }; static struct packet_type pppoed_ptype = { .type = __constant_htons(ETH_P_PPP_DISC), .func = pppoe_disc_rcv, + .data = (void *)1, }; /*********************************************************************** From shemminger@osdl.org Fri Jun 20 17:31:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 20 Jun 2003 17:32:03 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5L0Vs2x012594 for ; Fri, 20 Jun 2003 17:31:55 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5L0VeX26912; Fri, 20 Jun 2003 17:31:40 -0700 Date: Fri, 20 Jun 2003 17:31:39 -0700 From: Stephen Hemminger To: "David S. Miller" , Michal Ostrowski Cc: netdev@oss.sgi.com, mostrows@speakeasy.net Subject: [PATCH 2.5.72] (1/3) PPPoE cleanup [TRIVIAL] Message-Id: <20030620173139.01f935c7.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3447 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This is a cleanup patch, no change in functionality. - Get rid of debug macro's that aren't used anywhere in the code. - Make functions and data structures static where possible - C99 initializer for ppoe_chan_ops - fix whitespace typo diff -Nru a/drivers/net/pppoe.c b/drivers/net/pppoe.c --- a/drivers/net/pppoe.c Fri Jun 20 16:38:03 2003 +++ b/drivers/net/pppoe.c Fri Jun 20 16:38:03 2003 @@ -77,27 +77,14 @@ #include -static int __attribute__((unused)) pppoe_debug = 7; #define PPPOE_HASH_BITS 4 #define PPPOE_HASH_SIZE (1<sk , skb)) + if (!__pppoe_xmit( relay_po->sk, skb)) goto abort_put; } else { sock_queue_rcv_skb(sk, skb); @@ -460,12 +444,12 @@ return NET_RX_SUCCESS; /* Lies... :-) */ } -struct packet_type pppoes_ptype = { +static struct packet_type pppoes_ptype = { .type = __constant_htons(ETH_P_PPP_SES), .func = pppoe_rcv, }; -struct packet_type pppoed_ptype = { +static struct packet_type pppoed_ptype = { .type = __constant_htons(ETH_P_PPP_DISC), .func = pppoe_disc_rcv, }; @@ -522,7 +506,7 @@ goto out; } -int pppoe_release(struct socket *sock) +static int pppoe_release(struct socket *sock) { struct sock *sk = sock->sk; struct pppox_opt *po; @@ -559,7 +543,7 @@ } -int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, +static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, int sockaddr_len, int flags) { struct sock *sk = sock->sk; @@ -648,7 +632,7 @@ } -int pppoe_getname(struct socket *sock, struct sockaddr *uaddr, +static int pppoe_getname(struct socket *sock, struct sockaddr *uaddr, int *usockaddr_len, int peer) { int len = sizeof(struct sockaddr_pppox); @@ -667,7 +651,7 @@ } -int pppoe_ioctl(struct socket *sock, unsigned int cmd, +static int pppoe_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock->sk; @@ -769,7 +753,7 @@ } -int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, +static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len) { struct sk_buff *skb = NULL; @@ -847,7 +831,7 @@ * xmit function for internal use. * ***********************************************************************/ -int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) +static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) { struct pppox_opt *po = pppox_sk(sk); struct net_device *dev = po->pppoe_dev; @@ -921,16 +905,18 @@ * sends PPP frame over PPPoE socket * ***********************************************************************/ -int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb) +static int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb) { struct sock *sk = (struct sock *) chan->private; return __pppoe_xmit(sk, skb); } -struct ppp_channel_ops pppoe_chan_ops = { pppoe_xmit , NULL }; +static struct ppp_channel_ops pppoe_chan_ops = { + .start_xmit = pppoe_xmit, +}; -int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock, +static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len, int flags) { struct sock *sk = sock->sk; @@ -1071,7 +1057,7 @@ /* ->ioctl are set at pppox_create */ -struct proto_ops pppoe_ops = { +static struct proto_ops pppoe_ops = { .family = AF_PPPOX, .owner = THIS_MODULE, .release = pppoe_release, @@ -1090,14 +1076,14 @@ .mmap = sock_no_mmap }; -struct pppox_proto pppoe_proto = { +static struct pppox_proto pppoe_proto = { .create = pppoe_create, .ioctl = pppoe_ioctl, .owner = THIS_MODULE, }; -int __init pppoe_init(void) +static int __init pppoe_init(void) { int err = register_pppox_proto(PX_PROTO_OE, &pppoe_proto); @@ -1125,7 +1111,7 @@ goto out; } -void __exit pppoe_exit(void) +static void __exit pppoe_exit(void) { unregister_pppox_proto(PX_PROTO_OE); dev_remove_pack(&pppoes_ptype); From alan@lxorguk.ukuu.org.uk Sat Jun 21 05:39:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 21 Jun 2003 05:39:24 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5LCdB2x026140 for ; Sat, 21 Jun 2003 05:39:12 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5LCb2RQ026223; Sat, 21 Jun 2003 13:37:03 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5LCasoh026221; Sat, 21 Jun 2003 13:36:54 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: patch for common networking error messages From: Alan Cox To: "David S. Miller" Cc: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, Linux Kernel Mailing List , netdev@oss.sgi.com, niv@us.ibm.com In-Reply-To: <20030616.155533.63022973.davem@redhat.com> References: <20030616.155533.63022973.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 21 Jun 2003 13:36:54 +0100 X-archive-position: 3450 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Llu, 2003-06-16 at 23:55, David S. Miller wrote: > Let me know when you're back on planet earth ok? > > Standardizing strings is an absolutely FRUITLESS exercise. Standardising strings is a real help for end users, but its not the way to approach logging issues I agree. From hadi@shell.cyberus.ca Sat Jun 21 07:27:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 21 Jun 2003 07:28:05 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5LERp2x027028 for ; Sat, 21 Jun 2003 07:27:52 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19TjKm-000I1o-Ix; Sat, 21 Jun 2003 10:27:16 -0400 Date: Sat, 21 Jun 2003 10:27:16 -0400 (EDT) From: Jamal Hadi To: Alan Cox cc: "David S. Miller" , girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, Linux Kernel Mailing List , netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages In-Reply-To: <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> Message-ID: <20030621100959.C69143@shell.cyberus.ca> References: <20030616.155533.63022973.davem@redhat.com> <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3451 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Sat, 21 Jun 2003, Alan Cox wrote: > On Llu, 2003-06-16 at 23:55, David S. Miller wrote: > > Let me know when you're back on planet earth ok? > > > > Standardizing strings is an absolutely FRUITLESS exercise. > > Standardising strings is a real help for end users, but its not the way > to approach logging issues I agree. now that xml is the holy grail ive seen people actually preach xml strings as encoding for protocols ;-> The arguement i have seen put forward is that strings are easier to read for users than binary encoding ;-> Therefore they can debug problems. There maybe cases where this may be valid[1] - the only problem is a lot of loonies will think this is the next sliced bread. what about all that bandwidth stoopid xml consumes? "bandwidth? Who has a problem with bandwidth?;-> what about all that involved processiong of stoopid xml? "cpu? who has CPU problems?" Intel has a 10Gige NIC, a 2Mhz cpu, adn 4G DDR Ram for your hungry applications. Its a conspiracy i tell ya ;-> cheers, jamal [1] For people who use expect for example to send string commands to a remote system to configure things, when expect (simple req-resp) becomes too simple you may need something more sophisticated. They are already sending strings across tcp probably. Infact a IETF working group has been formed to standardixe this. http://www.ietf.org/html.charters/netconf-charter.html theres a draft at : http://www.ietf.org/internet-drafts/draft-enns-xmlconf-spec-00.txt The only unfortunate side effect to this is you will see a lot idjots putting XML in protocols from now on just because. From yoshfuji@linux-ipv6.org Sat Jun 21 07:35:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 21 Jun 2003 07:35:43 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5LEZV2x027403 for ; Sat, 21 Jun 2003 07:35:32 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5LEaYBo009816; Sat, 21 Jun 2003 23:36:34 +0900 Date: Sat, 21 Jun 2003 23:36:34 +0900 (JST) Message-Id: <20030621.233634.67057417.yoshfuji@linux-ipv6.org> To: krkumar@us.ibm.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EF37458.3070103@us.ibm.com> References: <3EF37458.3070103@us.ibm.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3452 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Grr, I've almost lost this... In article <3EF37458.3070103@us.ibm.com> (at Fri, 20 Jun 2003 13:53:44 -0700), Krishna Kumar says: > 1. I change the netlink_dump_start to pass another parameter, , which is > stored in a new field in the cb, . All users of this function have been > changed to pass a -1 since they don't care about the type, except the > generic routine rtnetlink_rcv_msg() which calculates the type and stores it. > So the same routine which is used to dump route table can be used to dump > the prefix list by checking the type. It might be possible to derive the > type from the table offset, but that is more complicated (probably doable). I think this is not required. Rename inet6_dump_fib() to __inet6_dump_fib() and introduce extra argument. and call it like inet6_dump_fib() { __inet6_dump_fib(...,0); } and inet6_dump_prefix() { __inet6_dump_fib(...,1); } etc. > 3. Added user interface for retrieving M/O flags. This is a separate interface > from the one for getting the prefix list since the flags are per interface > while the prefix list is per route. However these two can be merged into one > if needed. Hmm, what I expected is to get information via RTA_NEWLINK message. This is because, this is per-interface thing. > 5. Though this patch is modified to use only routing table for updating and > accessing the prefix list, I did a performace analysis for this approach vs > storing the plist on the idev. Following is the result : > > System : 1 CPU. 866 MHz, 256MB memory > For 1000 VLAN devices (4036 route entries gets created automatically as part > of address assignment), retrieve prefix list for (system times only) : > > #devices #iteration for each dev plist on IDEV plist in RTTABLE % > 200 100 3.95 secs 40.14 secs 916% > 1000 10 2.60 secs 20.98 secs 706% > 200 1000 38.44 secs 400.76 secs 942% Hmm... Well, what should we do... -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Sun Jun 22 09:00:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 22 Jun 2003 09:00:18 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:fo4m+O9RHgtb8v+x7nnr/Bmon1WTMnHA@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5MG072x017478 for ; Sun, 22 Jun 2003 09:00:09 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5MFxkr13067; Mon, 23 Jun 2003 01:59:48 +1000 Date: Mon, 23 Jun 2003 01:59:46 +1000 (EST) From: James Morris To: Stephen Hemminger cc: "David S. Miller" , Michal Ostrowski , , Subject: Re: [PATCH 2.5.72] (1/3) PPPoE cleanup [TRIVIAL] In-Reply-To: <20030620173139.01f935c7.shemminger@osdl.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3453 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Fri, 20 Jun 2003, Stephen Hemminger wrote: > This is a cleanup patch, no change in functionality. > - Get rid of debug macro's that aren't used anywhere in the code. I'm guessing that Michal wants to leave these in for future debugging? - James -- James Morris From mostrows@speakeasy.net Sun Jun 22 16:26:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 22 Jun 2003 16:26:18 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5MNQ92x022981 for ; Sun, 22 Jun 2003 16:26:09 -0700 Received: (qmail 20201 invoked by uid 64014); 22 Jun 2003 23:26:08 -0000 Received: from mostrows@speakeasy.net by mail14.speakeasy.net with AmikaGuardian-Server-1.1.2c-csav (Processed in 0.110777 secs); 22 Jun 2003 23:26:08 -0000 Received: from unknown (HELO brick.watson.ibm.com) (mostrows@[129.34.20.17]) (envelope-sender ) by mail14.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 22 Jun 2003 23:26:08 -0000 Subject: Re: [PATCH 2.5.72] (1/3) PPPoE cleanup [TRIVIAL] From: Michal Ostrowski To: James Morris Cc: Stephen Hemminger , "David S. Miller" , netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Message-Id: <1056324366.26751.123.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 22 Jun 2003 19:26:06 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3454 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@speakeasy.net Precedence: bulk X-list: netdev Debugging stuff can go. I kept it in for future debugging but haven't found it of much use. On Sun, 2003-06-22 at 11:59, James Morris wrote: > On Fri, 20 Jun 2003, Stephen Hemminger wrote: > > > This is a cleanup patch, no change in functionality. > > - Get rid of debug macro's that aren't used anywhere in the code. > > I'm guessing that Michal wants to leave these in for future debugging? > > > - James -- Michal Ostrowski From davem@redhat.com Sun Jun 22 17:52:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 22 Jun 2003 17:52:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5N0qA2x001435 for ; Sun, 22 Jun 2003 17:52:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA23781; Sun, 22 Jun 2003 17:46:42 -0700 Date: Sun, 22 Jun 2003 17:46:41 -0700 (PDT) Message-Id: <20030622.174641.74727201.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, niv@us.ibm.com Subject: Re: patch for common networking error messages From: "David S. Miller" In-Reply-To: <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> References: <20030616.155533.63022973.davem@redhat.com> <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3455 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 21 Jun 2003 13:36:54 +0100 Standardising strings is a real help for end users, I agree. But my objections are in the context of doing this inside the kernel, where such things do not belong. From jmorris@intercode.com.au Mon Jun 23 03:26:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 03:27:06 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:uylHcQ8Wgjw9Jip+EuEdBMKLc69XTeWV@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NAQq2x016913 for ; Mon, 23 Jun 2003 03:26:54 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5NAQfr16563; Mon, 23 Jun 2003 20:26:42 +1000 Date: Mon, 23 Jun 2003 20:26:40 +1000 (EST) From: James Morris To: =?iso-2022-jp?Q?YOSHIFUJI_Hideaki_=2F_=1B$B5HF#1QL=40=1B=28B?= cc: netdev@oss.sgi.com, "David S. Miller" Subject: Re: [PATCH] [IPV6] clean-up advmss calculation (fwd) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3456 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 23 Jun 2003, James Morris wrote: > This patch introduces ipv6_advmss() and clean-up advmss calculation. > Thanks. I've applied this to a bk tree at: bk://kernel.bkbits.net/jmorris/net-2.5 I'll be trialling collecting some of the networking patches here so that Dave can pull them into his tree, instead of having to apply every single patch which is posted. - James -- James Morris From alan@lxorguk.ukuu.org.uk Mon Jun 23 04:56:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 04:56:32 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NBuO2x019064 for ; Mon, 23 Jun 2003 04:56:25 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5NBsKRQ013624; Mon, 23 Jun 2003 12:54:21 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5NBsFjb013622; Mon, 23 Jun 2003 12:54:15 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: patch for common networking error messages From: Alan Cox To: "David S. Miller" Cc: girouard@us.ibm.com, stekloff@us.ibm.com, janiceg@us.ibm.com, jgarzik@pobox.com, kenistonj@us.ibm.com, lkessler@us.ibm.com, Linux Kernel Mailing List , netdev@oss.sgi.com, niv@us.ibm.com In-Reply-To: <20030622.174641.74727201.davem@redhat.com> References: <20030616.155533.63022973.davem@redhat.com> <1056199013.25974.27.camel@dhcp22.swansea.linux.org.uk> <20030622.174641.74727201.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056369251.13529.17.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 23 Jun 2003 12:54:12 +0100 X-archive-position: 3457 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Llu, 2003-06-23 at 01:46, David S. Miller wrote: > From: Alan Cox > Date: 21 Jun 2003 13:36:54 +0100 > > Standardising strings is a real help for end users, > > I agree. But my objections are in the context of doing this > inside the kernel, where such things do not belong. Standardising strings for end users in the kernel is also good because it both saves space and makes things more consistent for the poor human wondering what blew up. Standardising them for programs to parse is not a good idea From us15@os.inf.tu-dresden.de Mon Jun 23 10:46:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 10:46:17 -0700 (PDT) Received: from Hell.WH8.TU-Dresden.De (Hell.WH8.tu-dresden.de [141.30.225.3]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NHk62x024133 for ; Mon, 23 Jun 2003 10:46:08 -0700 Received: from Corona.WH8.TU-Dresden.De (Corona.WH8.TU-Dresden.De [141.30.225.56]) by Hell.WH8.TU-Dresden.De (8.12.9-Hell/8.12.9/Slackware) with ESMTP id h5NHk3Pg030083 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 23 Jun 2003 19:46:03 +0200 Date: Mon, 23 Jun 2003 19:45:57 +0200 From: "Udo A. Steinberg" To: "Feldman, Scott" Cc: , Subject: Re: e100-3.0.0_dev8 "Minneapolis Moline" release Message-Id: <20030623194557.2f7e4e9c.us15@os.inf.tu-dresden.de> In-Reply-To: References: Organization: Fiasco Core Team X-GPG-Key: 1024D/233B9D29 (wwwkeys.pgp.net) X-GPG-Fingerprint: CE1F 5FDD 3C01 BE51 2106 292E 9E14 735D 233B 9D29 X-Fiasco-Rulez: Yes X-Mailer: X-Mailer 5.0 Gold Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="pgp-sha1"; boundary="=.qBXK2lmbRfaluS" X-archive-position: 3458 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: us15@os.inf.tu-dresden.de Precedence: bulk X-list: netdev --=.qBXK2lmbRfaluS Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 17 Jun 2003 19:48:02 -0700 Feldman, Scott (FS) wrote: FS> http://sf.net/projects/e1000, download e100-3.0.0_dev8 (tar file or FS> kernel patches). FS> FS> Your help in testing would be greatly appreciated. There are many 8255x FS> devices supported by e100, so hopefully we'll get good coverage from the FS> community. Hi Scott, Good work! I've been using the driver non-stop for about a week now and there have been exactly zero problems. I think, if possible, the driver should be merged into the mainstream kernels to get even wider testing. It certainly seems stable enough to me. -Udo. --=.qBXK2lmbRfaluS Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.3.1 (GNU/Linux) iD8DBQE+9zzanhRzXSM7nSkRAsLEAJ9C8e9r6JgegR4HG9bsFXV4gpQItQCeM769 bFUSz2LgSc5Falb4N2/9tRM= =TG4Z -----END PGP SIGNATURE----- --=.qBXK2lmbRfaluS-- From shemminger@osdl.org Mon Jun 23 11:58:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 11:58:21 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NIwB2x025400 for ; Mon, 23 Jun 2003 11:58:12 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5NIvtX18348; Mon, 23 Jun 2003 11:57:55 -0700 Date: Mon, 23 Jun 2003 11:57:55 -0700 From: Stephen Hemminger To: "David S. Miller" , Alexey Kuznetsov Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.73] update teql scheduler to dynamic net device Message-Id: <20030623115755.754205ab.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3459 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change teql scheduler to: - dynamically allocate and free the network device. previously, used static network device. - support multiple equalizers (default one) via module parameter (max_equalizers) previously, limited to one. Tested with 2.5.73 on SMP. --- linux-2.5.73/net/sched/sch_teql.c 2003-06-23 11:39:54.000000000 -0700 +++ linux-2.5-sysfs/net/sched/sch_teql.c 2003-06-23 11:42:22.000000000 -0700 @@ -67,8 +67,9 @@ struct teql_master { struct Qdisc_ops qops; - struct net_device dev; + struct net_device *dev; struct Qdisc *slaves; + struct list_head master_list; struct net_device_stats stats; }; @@ -122,13 +123,13 @@ teql_dequeue(struct Qdisc* sch) skb = __skb_dequeue(&dat->q); if (skb == NULL) { - struct net_device *m = dat->m->dev.qdisc->dev; + struct net_device *m = dat->m->dev->qdisc->dev; if (m) { dat->m->slaves = sch; netif_wake_queue(m); } } - sch->q.qlen = dat->q.qlen + dat->m->dev.qdisc->q.qlen; + sch->q.qlen = dat->q.qlen + dat->m->dev->qdisc->q.qlen; return skb; } @@ -165,9 +166,9 @@ teql_destroy(struct Qdisc* sch) master->slaves = NEXT_SLAVE(q); if (q == master->slaves) { master->slaves = NULL; - spin_lock_bh(&master->dev.queue_lock); - qdisc_reset(master->dev.qdisc); - spin_unlock_bh(&master->dev.queue_lock); + spin_lock_bh(&master->dev->queue_lock); + qdisc_reset(master->dev->qdisc); + spin_unlock_bh(&master->dev->queue_lock); } } skb_queue_purge(&dat->q); @@ -185,10 +186,10 @@ static int teql_qdisc_init(struct Qdisc struct teql_master *m = (struct teql_master*)sch->ops; struct teql_sched_data *q = (struct teql_sched_data *)sch->data; - if (dev->hard_header_len > m->dev.hard_header_len) + if (dev->hard_header_len > m->dev->hard_header_len) return -EINVAL; - if (&m->dev == dev) + if (m->dev == dev) return -ELOOP; q->m = m; @@ -196,29 +197,29 @@ static int teql_qdisc_init(struct Qdisc skb_queue_head_init(&q->q); if (m->slaves) { - if (m->dev.flags & IFF_UP) { - if ((m->dev.flags&IFF_POINTOPOINT && !(dev->flags&IFF_POINTOPOINT)) - || (m->dev.flags&IFF_BROADCAST && !(dev->flags&IFF_BROADCAST)) - || (m->dev.flags&IFF_MULTICAST && !(dev->flags&IFF_MULTICAST)) - || dev->mtu < m->dev.mtu) + if (m->dev->flags & IFF_UP) { + if ((m->dev->flags&IFF_POINTOPOINT && !(dev->flags&IFF_POINTOPOINT)) + || (m->dev->flags&IFF_BROADCAST && !(dev->flags&IFF_BROADCAST)) + || (m->dev->flags&IFF_MULTICAST && !(dev->flags&IFF_MULTICAST)) + || dev->mtu < m->dev->mtu) return -EINVAL; } else { if (!(dev->flags&IFF_POINTOPOINT)) - m->dev.flags &= ~IFF_POINTOPOINT; + m->dev->flags &= ~IFF_POINTOPOINT; if (!(dev->flags&IFF_BROADCAST)) - m->dev.flags &= ~IFF_BROADCAST; + m->dev->flags &= ~IFF_BROADCAST; if (!(dev->flags&IFF_MULTICAST)) - m->dev.flags &= ~IFF_MULTICAST; - if (dev->mtu < m->dev.mtu) - m->dev.mtu = dev->mtu; + m->dev->flags &= ~IFF_MULTICAST; + if (dev->mtu < m->dev->mtu) + m->dev->mtu = dev->mtu; } q->next = NEXT_SLAVE(m->slaves); NEXT_SLAVE(m->slaves) = sch; } else { q->next = sch; m->slaves = sch; - m->dev.mtu = dev->mtu; - m->dev.flags = (m->dev.flags&~FMASK)|(dev->flags&FMASK); + m->dev->mtu = dev->mtu; + m->dev->flags = (m->dev->flags&~FMASK)|(dev->flags&FMASK); } return 0; } @@ -379,9 +380,9 @@ static int teql_master_open(struct net_d flags &= ~IFF_MULTICAST; } while ((q = NEXT_SLAVE(q)) != m->slaves); - m->dev.mtu = mtu; - m->dev.flags = (m->dev.flags&~FMASK) | flags; - netif_start_queue(&m->dev); + m->dev->mtu = mtu; + m->dev->flags = (m->dev->flags&~FMASK) | flags; + netif_start_queue(m->dev); return 0; } @@ -417,8 +418,30 @@ static int teql_master_mtu(struct net_de return 0; } -static int teql_master_init(struct net_device *dev) +static __init int teql_master_init(struct net_device *dev) { + struct teql_master *master = dev->priv; + struct Qdisc_ops *ops = &master->qops; + + master->dev = dev; + + strlcpy(ops->id, dev->name, IFNAMSIZ); + ops->priv_size = sizeof(struct teql_sched_data); + + ops->enqueue = teql_enqueue; + ops->dequeue = teql_dequeue; + ops->requeue = teql_requeue; + ops->init = teql_qdisc_init; + ops->reset = teql_reset; + ops->destroy = teql_destroy; + ops->owner = THIS_MODULE; + + return register_qdisc(ops); +} + +static __init void teql_master_setup(struct net_device *dev) +{ + dev->init = teql_master_init; dev->open = teql_master_open; dev->hard_start_xmit = teql_master_xmit; dev->stop = teql_master_close; @@ -429,62 +452,58 @@ static int teql_master_init(struct net_d dev->tx_queue_len = 100; dev->flags = IFF_NOARP; dev->hard_header_len = LL_MAX_HEADER; - return 0; + SET_MODULE_OWNER(dev); } -static struct teql_master the_master = { -{ - .next = NULL, - .cl_ops = NULL, - .id = "", - .priv_size = sizeof(struct teql_sched_data), - .enqueue = teql_enqueue, - .dequeue = teql_dequeue, - .requeue = teql_requeue, - .drop = NULL, - .init = teql_qdisc_init, - .reset = teql_reset, - .destroy = teql_destroy, - .dump = NULL, - .owner = THIS_MODULE, -},}; - - -#ifdef MODULE -int init_module(void) -#else +static LIST_HEAD(master_dev_list); +static spinlock_t master_dev_lock = SPIN_LOCK_UNLOCKED; +static int max_equalizers = 1; +MODULE_PARM(max_equalizers, "i"); +MODULE_PARM_DESC(max_equalizers, "Max number of link equalizers"); + int __init teql_init(void) -#endif { - int err; - - rtnl_lock(); + int i; + int err = 0; - the_master.dev.priv = (void*)&the_master; - err = dev_alloc_name(&the_master.dev, "teql%d"); - if (err < 0) - return err; - memcpy(the_master.qops.id, the_master.dev.name, IFNAMSIZ); - the_master.dev.init = teql_master_init; - - SET_MODULE_OWNER(&the_master.dev); - err = register_netdevice(&the_master.dev); - if (err == 0) { - err = register_qdisc(&the_master.qops); - if (err) - unregister_netdevice(&the_master.dev); + for (i = 0; i < max_equalizers; i++) { + struct net_device *dev; + struct teql_master *master; + + dev = alloc_netdev(sizeof(struct teql_master), + "teql%d", teql_master_setup); + if (!dev) + return -ENOMEM; + + if ((err = register_netdev(dev))) + goto out; + + master = dev->priv; + spin_lock(&master_dev_lock); + list_add_tail(&master->master_list, &master_dev_list); + spin_unlock(&master_dev_lock); } - rtnl_unlock(); + out: return err; } -#ifdef MODULE -void cleanup_module(void) +static void __exit teql_exit(void) { - rtnl_lock(); - unregister_qdisc(&the_master.qops); - unregister_netdevice(&the_master.dev); - rtnl_unlock(); + struct teql_master *master, *nxt; + + spin_lock(&master_dev_lock); + list_for_each_entry_safe(master, nxt, &master_dev_list, master_list) { + + list_del(&master->master_list); + + unregister_qdisc(&master->qops); + unregister_netdev(master->dev); + kfree(master->dev); + } + spin_unlock(&master_dev_lock); } -#endif + +module_init(teql_init); +module_exit(teql_exit); + MODULE_LICENSE("GPL"); From shemminger@osdl.org Mon Jun 23 12:03:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 12:04:01 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NJ3u2x025778 for ; Mon, 23 Jun 2003 12:03:57 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5NJ3gX20852; Mon, 23 Jun 2003 12:03:42 -0700 Date: Mon, 23 Jun 2003 12:03:42 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [RFT] remove skb_linearize from igmp.c Message-Id: <20030623120342.470cf504.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3460 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This patch gets rid of the deprecated skb_linearize call in IGMP by using pskb_may_pull like ip_input does. Could someone who actually receives IGMP packets test this? diff -Nru a/net/ipv4/igmp.c b/net/ipv4/igmp.c --- a/net/ipv4/igmp.c Mon Jun 23 11:59:56 2003 +++ b/net/ipv4/igmp.c Mon Jun 23 11:59:56 2003 @@ -838,28 +838,19 @@ int igmp_rcv(struct sk_buff *skb) { /* This basically follows the spec line by line -- see RFC1112 */ - struct igmphdr *ih = skb->h.igmph; + struct igmphdr *ih; struct in_device *in_dev = in_dev_get(skb->dev); int len = skb->len; - if (in_dev==NULL) { - kfree_skb(skb); - return 0; - } - - if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) { - kfree_skb(skb); - return -ENOMEM; - } - ih = skb->h.igmph; - } + if (in_dev==NULL) + goto out; - if (len < sizeof(struct igmphdr) || ip_compute_csum((void *)ih, len)) { - in_dev_put(in_dev); - kfree_skb(skb); - return 0; - } + if (!pskb_may_pull(skb, sizeof(struct igmphdr))) + goto drop; + + ih = skb->h.igmph; + if (ip_compute_csum((void *)ih, len)) + goto drop; switch (ih->type) { case IGMP_HOST_MEMBERSHIP_QUERY: @@ -887,7 +878,9 @@ default: NETDEBUG(printk(KERN_DEBUG "New IGMP type=%d, why we do not know about it?\n", ih->type)); } + drop: in_dev_put(in_dev); + out: kfree_skb(skb); return 0; } From davem@redhat.com Mon Jun 23 12:07:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 12:08:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NJ7w2x026129 for ; Mon, 23 Jun 2003 12:07:59 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA26149; Mon, 23 Jun 2003 12:02:29 -0700 Date: Mon, 23 Jun 2003 12:02:29 -0700 (PDT) Message-Id: <20030623.120229.59679308.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [RFT] remove skb_linearize from igmp.c From: "David S. Miller" In-Reply-To: <20030623120342.470cf504.shemminger@osdl.org> References: <20030623120342.470cf504.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3461 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 23 Jun 2003 12:03:42 -0700 Could someone who actually receives IGMP packets test this? Don't bother, your patch is buggy. int len = skb->len; ... + if (!pskb_may_pull(skb, sizeof(struct igmphdr))) + goto drop; + + ih = skb->h.igmph; + if (ip_compute_csum((void *)ih, len)) + goto drop; You're only verifying that "sizeof(struct igmphdr)" is available at skb->data, then you dereference "len" bytes via the call to ip_compute_csum(). From shemminger@osdl.org Mon Jun 23 12:41:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 12:41:37 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NJfU2x026650 for ; Mon, 23 Jun 2003 12:41:31 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5NJfIX30528; Mon, 23 Jun 2003 12:41:18 -0700 Date: Mon, 23 Jun 2003 12:41:18 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [RFT] remove skb_linearize from igmp.c Message-Id: <20030623124118.620f8339.shemminger@osdl.org> In-Reply-To: <20030623.120229.59679308.davem@redhat.com> References: <20030623120342.470cf504.shemminger@osdl.org> <20030623.120229.59679308.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3462 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Try again... this time add pullup logic to the query processing, and use skb_checksum to handle non-linear buffers. --- linux-2.5.73/net/ipv4/igmp.c 2003-06-23 11:39:50.000000000 -0700 +++ linux-2.5-sysfs/net/ipv4/igmp.c 2003-06-23 12:37:57.000000000 -0700 @@ -757,17 +757,16 @@ static void igmp_heard_report(struct in_ read_unlock(&in_dev->lock); } -static void igmp_heard_query(struct in_device *in_dev, struct igmphdr *ih, - int len) +static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb) { + struct igmphdr *ih = skb->h.igmph; struct igmpv3_query *ih3 = (struct igmpv3_query *)ih; struct ip_mc_list *im; u32 group = ih->group; int max_delay; int mark = 0; - - if (len == 8) { + if (skb->len == 8) { if (ih->code == 0) { /* Alas, old v1 router presents here. */ @@ -787,9 +786,14 @@ static void igmp_heard_query(struct in_d __in_dev_put(in_dev); /* clear deleted report items */ igmpv3_clear_delrec(in_dev); - } else if (len < 12) { + } else if (skb->len < 12) { return; /* ignore bogus packet; freed by caller */ } else { /* v3 */ + if (!pskb_may_pull(skb, sizeof(struct igmpv3_query))) + return; + + ih3 = (struct igmpv3_query *)(ih = skb->h.igmph); + max_delay = IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE); if (!max_delay) max_delay = 1; /* can't mod w/ 0 */ @@ -803,7 +807,13 @@ static void igmp_heard_query(struct in_d return; } /* mark sources to include, if group & source-specific */ - mark = ih3->nsrcs != 0; + if ((mark = (ih3->nsrcs != 0))) { + if (!pskb_may_pull(skb, sizeof(struct igmpv3_query) + + ntohs(ih3->nsrcs) * sizeof(ih3->srcs[0]))) + return; + + ih3 = (struct igmpv3_query *)(ih = skb->h.igmph); + } } /* @@ -838,32 +848,23 @@ static void igmp_heard_query(struct in_d int igmp_rcv(struct sk_buff *skb) { /* This basically follows the spec line by line -- see RFC1112 */ - struct igmphdr *ih = skb->h.igmph; + struct igmphdr *ih; struct in_device *in_dev = in_dev_get(skb->dev); - int len = skb->len; - if (in_dev==NULL) { - kfree_skb(skb); - return 0; - } + if (in_dev==NULL) + goto out; - if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) { - kfree_skb(skb); - return -ENOMEM; - } - ih = skb->h.igmph; - } + if ((u16)csum_fold(skb_checksum(skb, 0, skb->len, 0))) + goto drop; - if (len < sizeof(struct igmphdr) || ip_compute_csum((void *)ih, len)) { - in_dev_put(in_dev); - kfree_skb(skb); - return 0; - } + if (!pskb_may_pull(skb, sizeof(struct igmphdr))) + goto drop; + + ih = skb->h.igmph; switch (ih->type) { case IGMP_HOST_MEMBERSHIP_QUERY: - igmp_heard_query(in_dev, ih, len); + igmp_heard_query(in_dev, skb); break; case IGMP_HOST_MEMBERSHIP_REPORT: case IGMPV2_HOST_MEMBERSHIP_REPORT: @@ -887,7 +888,9 @@ int igmp_rcv(struct sk_buff *skb) default: NETDEBUG(printk(KERN_DEBUG "New IGMP type=%d, why we do not know about it?\n", ih->type)); } + drop: in_dev_put(in_dev); + out: kfree_skb(skb); return 0; } From davem@redhat.com Mon Jun 23 12:48:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 12:48:47 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NJme2x027038 for ; Mon, 23 Jun 2003 12:48:41 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA26309; Mon, 23 Jun 2003 12:43:04 -0700 Date: Mon, 23 Jun 2003 12:43:03 -0700 (PDT) Message-Id: <20030623.124303.48505809.davem@redhat.com> To: shemminger@osdl.org Cc: ak@muc.de, mostrows@speakeasy.net, paulus@au.ibm.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.72] (3/3) Convert PPPoE to new style protocol (redeux) From: "David S. Miller" In-Reply-To: <20030620173208.56a8a00c.shemminger@osdl.org> References: <20030620173208.56a8a00c.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3463 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev I've applied all of your pppoe patches Stephen, thanks a lot. From davem@redhat.com Mon Jun 23 13:19:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 13:20:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NKJs2x027648 for ; Mon, 23 Jun 2003 13:19:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA26472; Mon, 23 Jun 2003 13:13:44 -0700 Date: Mon, 23 Jun 2003 13:13:43 -0700 (PDT) Message-Id: <20030623.131343.88494977.davem@redhat.com> To: shemminger@osdl.org Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.73] update teql scheduler to dynamic net device From: "David S. Miller" In-Reply-To: <20030623115755.754205ab.shemminger@osdl.org> References: <20030623115755.754205ab.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3464 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 23 Jun 2003 11:57:55 -0700 Change teql scheduler to: - dynamically allocate and free the network device. previously, used static network device. - support multiple equalizers (default one) via module parameter (max_equalizers) previously, limited to one. Applied, thanks Stephen. From davem@redhat.com Mon Jun 23 13:23:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 13:23:32 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NKNR2x028019 for ; Mon, 23 Jun 2003 13:23:28 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA26491; Mon, 23 Jun 2003 13:17:58 -0700 Date: Mon, 23 Jun 2003 13:17:58 -0700 (PDT) Message-Id: <20030623.131758.23035859.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [RFT] remove skb_linearize from igmp.c From: "David S. Miller" In-Reply-To: <20030623124118.620f8339.shemminger@osdl.org> References: <20030623120342.470cf504.shemminger@osdl.org> <20030623.120229.59679308.davem@redhat.com> <20030623124118.620f8339.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3465 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 23 Jun 2003 12:41:18 -0700 Try again... this time add pullup logic to the query processing, and use skb_checksum to handle non-linear buffers. I'll let this one sit for a day or so in order to get some testing done :-) From scott.feldman@intel.com Mon Jun 23 15:46:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 15:46:57 -0700 (PDT) Received: from hermes.jf.intel.com (fmr05.intel.com [134.134.136.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NMkn2x029621 for ; Mon, 23 Jun 2003 15:46:50 -0700 Received: from talaria.jf.intel.com (talaria.jf.intel.com [10.7.209.7]) by hermes.jf.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5NMiej15975 for ; Mon, 23 Jun 2003 22:44:40 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by talaria.jf.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5NMDsx06943 for ; Mon, 23 Jun 2003 22:13:54 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003062315573700750 ; Mon, 23 Jun 2003 15:57:37 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 23 Jun 2003 15:46:43 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: e100 "Twin City" release Date: Mon, 23 Jun 2003 15:46:42 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e100 "Twin City" release Thread-Index: AcM5ydSjkLB7q8bAT86v8xYYjJRHbA== From: "Feldman, Scott" To: , X-OriginalArrivalTime: 23 Jun 2003 22:46:43.0242 (UTC) FILETIME=[515FB8A0:01C339D9] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5NMkn2x029621 X-archive-position: 3466 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev Thanks to Jason, Udo, Zose, and David for the feedback so far. New version posted: http://sf.net/projects/e1000, download e100-3.0.0_dev9. Changes from e100-3.0.0-dev8: * Added Device ID 0x1029. * Named union for compilers not supporting anonymous unions. * Initialize struct for ETHTOOL_GRINGPARAM. DON'T USE THIS DRIVER ON A PRODUCTION SYSTEM! -scott From willy@www.linux.org.uk Mon Jun 23 15:58:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 15:58:40 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5NMwR2x029983 for ; Mon, 23 Jun 2003 15:58:28 -0700 Received: from willy by www.linux.org.uk with local (Exim 4.14) id 19TK0u-0008Qf-Pi for netdev@oss.sgi.com; Fri, 20 Jun 2003 12:25:04 +0100 Date: Fri, 20 Jun 2003 12:25:04 +0100 From: Matthew Wilcox To: netdev@oss.sgi.com Subject: [PATCH] more CONFIG_NET removals Message-ID: <20030620112504.GM24357@parcelfarce.linux.theplanet.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-archive-position: 3467 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@debian.org Precedence: bulk X-list: netdev Anyone see a problem with this patch against 2.5.72? It worksforme. - Remove CONFIG_NET from files which are no longer built when CONFIG_NET isn't set. - Redo Makefiles a little to remove some ifeqs. Also don't build compat.o if CONFIG_NET is unset. - Remove ifdefs around extern declarations - Remove from sysctl_net.c Index: net/Makefile =================================================================== RCS file: /var/cvs/linux-2.5/net/Makefile,v retrieving revision 1.11 diff -u -p -r1.11 Makefile --- net/Makefile 8 Apr 2003 15:20:57 -0000 1.11 +++ net/Makefile 20 Jun 2003 10:45:38 -0000 @@ -7,9 +7,10 @@ obj-y := nonet.o -obj-$(CONFIG_NET) := socket.o core/ - -obj-$(CONFIG_COMPAT) += compat.o +net-$(CONFIG_COMPAT) += compat.o +net-$(CONFIG_MODULES) += netsyms.o +net-$(CONFIG_SYSCTL) += sysctl_net.o +obj-$(CONFIG_NET) := socket.o core/ $(net-y) # LLC has to be linked before the files in net/802/ obj-$(CONFIG_LLC) += llc/ @@ -38,8 +39,3 @@ obj-$(CONFIG_DECNET) += decnet/ obj-$(CONFIG_ECONET) += econet/ obj-$(CONFIG_VLAN_8021Q) += 8021q/ obj-$(CONFIG_IP_SCTP) += sctp/ - -ifeq ($(CONFIG_NET),y) -obj-$(CONFIG_MODULES) += netsyms.o -obj-$(CONFIG_SYSCTL) += sysctl_net.o -endif Index: net/netsyms.c =================================================================== RCS file: /var/cvs/linux-2.5/net/netsyms.c,v retrieving revision 1.22 diff -u -p -r1.22 netsyms.c --- net/netsyms.c 14 Jun 2003 22:16:08 -0000 1.22 +++ net/netsyms.c 20 Jun 2003 10:45:38 -0000 @@ -36,12 +36,9 @@ #include #endif /* CONFIG_NET_DIVERT */ -#ifdef CONFIG_NET extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; -#endif -#ifdef CONFIG_INET #include #include #include @@ -80,8 +77,6 @@ extern int tcp_port_rover; extern int udp_port_rover; #endif -#endif - #include #ifdef CONFIG_IPX_MODULE @@ -557,7 +552,6 @@ EXPORT_SYMBOL(unregister_netdevice_notif EXPORT_SYMBOL(call_netdevice_notifiers); /* support for loadable net drivers */ -#ifdef CONFIG_NET EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); @@ -693,5 +687,3 @@ EXPORT_SYMBOL(wireless_spy_update); #endif /* CONFIG_NET_RADIO */ EXPORT_SYMBOL(linkwatch_fire_event); - -#endif /* CONFIG_NET */ Index: net/sysctl_net.c =================================================================== RCS file: /var/cvs/linux-2.5/net/sysctl_net.c,v retrieving revision 1.3 diff -u -p -r1.3 sysctl_net.c --- net/sysctl_net.c 30 Aug 2002 20:00:44 -0000 1.3 +++ net/sysctl_net.c 20 Jun 2003 10:45:38 -0000 @@ -13,26 +13,13 @@ */ #include -#include #include -#ifdef CONFIG_INET extern struct ctl_table ipv4_table[]; -#endif - extern struct ctl_table core_table[]; - -#ifdef CONFIG_NET extern struct ctl_table ether_table[]; -#endif - -#ifdef CONFIG_IPV6 extern struct ctl_table ipv6_table[]; -#endif - -#ifdef CONFIG_TR extern struct ctl_table tr_table[]; -#endif struct ctl_table net_table[] = { { @@ -41,14 +28,12 @@ struct ctl_table net_table[] = { .mode = 0555, .child = core_table, }, -#ifdef CONFIG_NET { .ctl_name = NET_ETHER, .procname = "ethernet", .mode = 0555, .child = ether_table, }, -#endif #ifdef CONFIG_INET { .ctl_name = NET_IPV4, Index: net/core/Makefile =================================================================== RCS file: /var/cvs/linux-2.5/net/core/Makefile,v retrieving revision 1.9 diff -u -p -r1.9 Makefile --- net/core/Makefile 27 May 2003 17:29:33 -0000 1.9 +++ net/core/Makefile 20 Jun 2003 10:45:38 -0000 @@ -4,13 +4,9 @@ obj-y := sock.o skbuff.o iovec.o datagram.o scm.o -ifeq ($(CONFIG_SYSCTL),y) -ifeq ($(CONFIG_NET),y) -obj-y += sysctl_net_core.o -endif -endif +obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-$(CONFIG_NET) += flow.o dev.o net-sysfs.o dev_mcast.o dst.o neighbour.o \ +obj-y += flow.o dev.o net-sysfs.o dev_mcast.o dst.o neighbour.o \ rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_NETFILTER) += netfilter.o Index: net/core/sysctl_net_core.c =================================================================== RCS file: /var/cvs/linux-2.5/net/core/sysctl_net_core.c,v retrieving revision 1.4 diff -u -p -r1.4 sysctl_net_core.c --- net/core/sysctl_net_core.c 5 May 2003 17:09:51 -0000 1.4 +++ net/core/sysctl_net_core.c 20 Jun 2003 10:45:38 -0000 @@ -34,7 +34,6 @@ extern char sysctl_divert_version[]; #endif /* CONFIG_NET_DIVERT */ ctl_table core_table[] = { -#ifdef CONFIG_NET { .ctl_name = NET_CORE_WMEM_MAX, .procname = "wmem_max", @@ -159,7 +158,6 @@ ctl_table core_table[] = { .proc_handler = &proc_dostring }, #endif /* CONFIG_NET_DIVERT */ -#endif /* CONFIG_NET */ { .ctl_name = 0 } }; #endif -- "It's not Hollywood. War is real, war is primarily not about defeat or victory, it is about death. I've seen thousands and thousands of dead bodies. Do you think I want to have an academic debate on this subject?" -- Robert Fisk From garzik@gtf.org Mon Jun 23 19:32:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 19:32:25 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5O2WH2x003188 for ; Mon, 23 Jun 2003 19:32:18 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id DED376644; Mon, 23 Jun 2003 22:32:11 -0400 (EDT) Date: Mon, 23 Jun 2003 22:32:11 -0400 From: Jeff Garzik To: torvalds@transmeta.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [BK PATCHES] net driver merges Message-ID: <20030624023211.GA2592@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 3468 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Linus, please do a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.5 Others may download the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.73-bk1-netdrvr1.patch.bz2 This will update the following files: drivers/net/sk98lin/skge.c | 598 +++++++++++++++++++++---------------------- drivers/net/wan/sdla_chdlc.c | 4 drivers/net/wan/sdla_fr.c | 4 drivers/net/wan/sdla_ppp.c | 4 drivers/net/wan/sdla_x25.c | 4 5 files changed, 297 insertions(+), 317 deletions(-) through these ChangeSets: (03/06/23 1.1413) [netdrvr sk98lin] PCI API conversion, and some cleanups - PCI API init style conversion for drivers/net/sk98lin/skge.c; - new helpers: SkGeDev{Init/CleanUp}; - sk_devs_lock moved around as it's needed early. Compiles without error. Untested. (03/06/23 1.1412) [PATCH] [PATCH 2.5.72] Use mod_timer in drivers_net_wan_sdla_chdlc.c From: Vinay K Nallamothu (03/06/23 1.1411) [PATCH] [PATCH 2.5.72] Use mod_timer in drivers_net_wan_sdla_x25.c From: Vinay K Nallamothu (03/06/23 1.1410) [PATCH] [PATCH 2.5.72] Use mod_timer in drivers_net_wan_sdla_ppp.c From: Vinay K Nallamothu (03/06/23 1.1409) [PATCH] {PATCH 2.5.72] Use mod_timer in drivers_net_wan_sdla_fr.c From: Vinay K Nallamothu From yoshfuji@linux-ipv6.org Mon Jun 23 20:39:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 20:39:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5O3dG2x003889 for ; Mon, 23 Jun 2003 20:39:17 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5O3eUBo002588; Tue, 24 Jun 2003 12:40:31 +0900 Date: Tue, 24 Jun 2003 12:40:30 +0900 (JST) Message-Id: <20030624.124030.123761150.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: Fix large packet length check From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3470 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. There were two errors in length check in the output path. We could not send large packet (65535bytes). This patch fixes the problem. Patch against [PATCH] IPV6: use macro for maximum payload length patch. Thanks. --- linux-2.5+advmss+magic/net/ipv6/ip6_output.c.orig Tue Jun 24 12:34:12 2003 +++ linux-2.5+advmss+magic/net/ipv6/ip6_output.c Tue Jun 24 12:32:34 2003 @@ -1265,7 +1265,7 @@ maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr); if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { - if (inet->cork.length + length > IPV6_MAXPLEN - fragheaderlen) { + if (inet->cork.length + length > sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) { ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); return -EMSGSIZE; } @@ -1461,7 +1461,7 @@ *(u32*)hdr = fl->fl6_flowlabel | htonl(0x60000000); - if (skb->len <= IPV6_MAXPLEN) + if (skb->len <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); else hdr->payload_len = 0; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 23 20:39:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 20:39:22 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5O3d82x003884 for ; Mon, 23 Jun 2003 20:39:09 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5O3eLBo002582; Tue, 24 Jun 2003 12:40:21 +0900 Date: Tue, 24 Jun 2003 12:40:21 +0900 (JST) Message-Id: <20030624.124021.119708603.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: use macro for maximum payload length From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3469 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Use macro for maximum payload length. Patch is against "[PATCH] [IPV6] clean-up advmss calculation" patch. Thanks. Index: linux-2.5/include/net/ipv6.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/ipv6.h,v retrieving revision 1.19 diff -u -r1.19 ipv6.h --- linux-2.5/include/net/ipv6.h 9 Jun 2003 17:26:52 -0000 1.19 +++ linux-2.5/include/net/ipv6.h 24 Jun 2003 01:58:26 -0000 @@ -23,6 +23,8 @@ #define SIN6_LEN_RFC2133 24 +#define IPV6_MAXPLEN 65535 + /* * NextHeader field of IPv6 header */ Index: linux-2.5/net/ipv6/exthdrs.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/exthdrs.c,v retrieving revision 1.13 diff -u -r1.13 exthdrs.c --- linux-2.5/net/ipv6/exthdrs.c 20 May 2003 06:49:54 -0000 1.13 +++ linux-2.5/net/ipv6/exthdrs.c 24 Jun 2003 01:58:26 -0000 @@ -432,7 +432,7 @@ } pkt_len = ntohl(*(u32*)(skb->nh.raw+optoff+2)); - if (pkt_len < 0x10000) { + if (pkt_len <= IPV6_MAXPLEN) { icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2); return 0; } Index: linux-2.5/net/ipv6/ip6_output.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ip6_output.c,v retrieving revision 1.29 diff -u -r1.29 ip6_output.c --- linux-2.5/net/ipv6/ip6_output.c 21 Jun 2003 16:20:41 -0000 1.29 +++ linux-2.5/net/ipv6/ip6_output.c 24 Jun 2003 01:58:27 -0000 @@ -621,7 +621,7 @@ if (opt) pktlength += opt->opt_flen + opt->opt_nflen; - if (pktlength > 0xFFFF + sizeof(struct ipv6hdr)) { + if (pktlength > sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { /* Jumbo datagram. It is assumed, that in the case of hdrincl jumbo option is supplied by user. @@ -1264,8 +1264,8 @@ fragheaderlen = sizeof(struct ipv6hdr) + (opt ? opt->opt_nflen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr); - if (mtu < 65576) { - if (inet->cork.length + length > 0xFFFF - fragheaderlen) { + if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { + if (inet->cork.length + length > IPV6_MAXPLEN - fragheaderlen) { ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); return -EMSGSIZE; } @@ -1461,7 +1461,7 @@ *(u32*)hdr = fl->fl6_flowlabel | htonl(0x60000000); - if (skb->len < 65536) + if (skb->len <= IPV6_MAXPLEN) hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); else hdr->payload_len = 0; Index: linux-2.5/net/ipv6/reassembly.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/reassembly.c,v retrieving revision 1.16 diff -u -r1.16 reassembly.c --- linux-2.5/net/ipv6/reassembly.c 21 Jun 2003 16:16:59 -0000 1.16 +++ linux-2.5/net/ipv6/reassembly.c 24 Jun 2003 01:58:27 -0000 @@ -425,7 +425,7 @@ end = offset + (ntohs(skb->nh.ipv6h->payload_len) - ((u8 *) (fhdr + 1) - (u8 *) (skb->nh.ipv6h + 1))); - if ((unsigned int)end >= 65536) { + if ((unsigned int)end > IPV6_MAXPLEN) { icmpv6_param_prob(skb,ICMPV6_HDR_FIELD, (u8*)&fhdr->frag_off - skb->nh.raw); return; } @@ -597,7 +597,7 @@ /* Unfragmented part is taken from the first segment. */ payload_len = (head->data - head->nh.raw) - sizeof(struct ipv6hdr) + fq->len - sizeof(struct frag_hdr); - if (payload_len > 65535) + if (payload_len > IPV6_MAXPLEN) goto out_oversize; /* Head of list must not be cloned. */ Index: linux-2.5/net/ipv6/route.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/route.c,v retrieving revision 1.40 diff -u -r1.40+ route.c --- linux-2.5/net/ipv6/route.c Tue Jun 24 12:10:22 2003 +++ linux-2.5/net/ipv6/route.c Tue Jun 24 12:11:23 2003 @@ -606,13 +606,13 @@ mtu = ip6_rt_min_advmss; /* - * Maximal non-jumbo IPv6 payload is 65535 and - * corresponding MSS is 65535 - tcp_header_size. - * 65535 is also valid and means: "any MSS, + * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and + * corresponding MSS is IPV6_MAXPLEN - tcp_header_size. + * IPV6_MAXPLEN is also valid and means: "any MSS, * rely only on pmtu discovery" */ - if (mtu > 65535 - sizeof(struct tcphdr)) - mtu = 65535; + if (mtu > IPV6_MAXPLEN - sizeof(struct tcphdr)) + mtu = IPV6_MAXPLEN; return mtu; } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From chas@locutus.cmf.nrl.navy.mil Mon Jun 23 20:53:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 23 Jun 2003 20:53:22 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5O3rB2x004582 for ; Mon, 23 Jun 2003 20:53:12 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5O3r1sG025541 for ; Mon, 23 Jun 2003 23:53:01 -0400 (EDT) Message-Id: <200306240353.h5O3r1sG025541@ginger.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [rfc] more atm cleanup Reply-To: chas3@users.sourceforge.net Date: Mon, 23 Jun 2003 23:50:54 -0400 From: chas williams X-Spam-Score: () hits=0.5 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3471 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev here's a couple of changes, in short, protect/setup br2684 and pppoatm ioctl's with a mutex, get rid of sleep in vcc and just use sock->sleep, replace wake_up() with sk_state_change and sk_data_ready (sk_write_space needs some thinking so we have one wake_up not converted). vcc->callback() also goes away in favor of sk_state_change(). # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1363 -> 1.1364 # net/atm/br2684.c 1.3 -> 1.4 # net/atm/common.h 1.14 -> 1.15 # net/atm/common.c 1.39 -> 1.40 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/21 chas@relax.cmf.nrl.navy.mil 1.1364 # common.h, common.c, br2684.c: # cleanup br2684_ioctl_hook # -------------------------------------------- # diff -Nru a/net/atm/br2684.c b/net/atm/br2684.c --- a/net/atm/br2684.c Mon Jun 23 09:45:37 2003 +++ b/net/atm/br2684.c Mon Jun 23 09:45:37 2003 @@ -16,9 +16,12 @@ #include #include #include +#include +#include #include +#include "common.h" #include "ipcommon.h" /* @@ -768,8 +771,6 @@ extern struct proc_dir_entry *atm_proc_root; /* from proc.c */ -extern int (*br2684_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); - /* the following avoids some spurious warnings from the compiler */ #define UNUSED __attribute__((unused)) @@ -779,14 +780,14 @@ if ((p = create_proc_entry("br2684", 0, atm_proc_root)) == NULL) return -ENOMEM; p->proc_fops = &br2684_proc_operations; - br2684_ioctl_hook = br2684_ioctl; + br2684_ioctl_set(br2684_ioctl); return 0; } static void __exit UNUSED br2684_exit(void) { struct br2684_dev *brdev; - br2684_ioctl_hook = NULL; + br2684_ioctl_set(NULL); remove_proc_entry("br2684", atm_proc_root); while (!list_empty(&br2684_devs)) { brdev = list_entry_brdev(br2684_devs.next); diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Mon Jun 23 09:45:37 2003 +++ b/net/atm/common.c Mon Jun 23 09:45:37 2003 @@ -145,9 +145,18 @@ #endif #if defined(CONFIG_ATM_BR2684) || defined(CONFIG_ATM_BR2684_MODULE) -int (*br2684_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); +static DECLARE_MUTEX(br2684_ioctl_mutex); + +static int (*br2684_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); + +void br2684_ioctl_set(int (*hook)(struct atm_vcc *, unsigned int, unsigned long)) +{ + down(&br2684_ioctl_mutex); + br2684_ioctl_hook = hook; + up(&br2684_ioctl_mutex); +} #ifdef CONFIG_ATM_BR2684_MODULE -EXPORT_SYMBOL(br2684_ioctl_hook); +EXPORT_SYMBOL(br2684_ioctl_set); #endif #endif @@ -886,11 +895,12 @@ goto done; #endif #if defined(CONFIG_ATM_BR2684) || defined(CONFIG_ATM_BR2684_MODULE) - if (br2684_ioctl_hook) { + down(&br2684_ioctl_mutex); + if (br2684_ioctl_hook) error = br2684_ioctl_hook(vcc, cmd, arg); - if (error != -ENOIOCTLCMD) - goto done; - } + up(&br2684_ioctl_mutex); + if (error != -ENOIOCTLCMD) + goto done; #endif error = atm_dev_ioctl(cmd, arg); diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Mon Jun 23 09:45:37 2003 +++ b/net/atm/common.h Mon Jun 23 09:45:37 2003 @@ -27,6 +27,7 @@ void atm_shutdown_dev(struct atm_dev *dev); void pppoatm_ioctl_set(int (*hook)(struct atm_vcc *, unsigned int, unsigned long)); +void br2684_ioctl_set(int (*hook)(struct atm_vcc *, unsigned int, unsigned long)); int atmpvc_init(void); void atmpvc_exit(void); # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1362 -> 1.1363 # net/atm/pppoatm.c 1.7 -> 1.8 # net/atm/common.h 1.13 -> 1.14 # net/atm/common.c 1.38 -> 1.39 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/21 chas@relax.cmf.nrl.navy.mil 1.1363 # pppoatm.c, common.h, common.c: # cleanup pppoatm_ioctl_hook # -------------------------------------------- # diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Mon Jun 23 09:45:58 2003 +++ b/net/atm/common.c Mon Jun 23 09:45:58 2003 @@ -129,8 +129,19 @@ #endif #if defined(CONFIG_PPPOATM) || defined(CONFIG_PPPOATM_MODULE) -int (*pppoatm_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); -EXPORT_SYMBOL(pppoatm_ioctl_hook); +static DECLARE_MUTEX(pppoatm_ioctl_mutex); + +static int (*pppoatm_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); + +void pppoatm_ioctl_set(int (*hook)(struct atm_vcc *, unsigned int, unsigned long)) +{ + down(&pppoatm_ioctl_mutex); + pppoatm_ioctl_hook = hook; + up(&pppoatm_ioctl_mutex); +} +#ifdef CONFIG_PPPOATM_MODULE +EXPORT_SYMBOL(pppoatm_ioctl_set); +#endif #endif #if defined(CONFIG_ATM_BR2684) || defined(CONFIG_ATM_BR2684_MODULE) @@ -865,12 +876,14 @@ default: break; } + error = -ENOIOCTLCMD; #if defined(CONFIG_PPPOATM) || defined(CONFIG_PPPOATM_MODULE) - if (pppoatm_ioctl_hook) { + down(&pppoatm_ioctl_mutex); + if (pppoatm_ioctl_hook) error = pppoatm_ioctl_hook(vcc, cmd, arg); - if (error != -ENOIOCTLCMD) - goto done; - } + up(&pppoatm_ioctl_mutex); + if (error != -ENOIOCTLCMD) + goto done; #endif #if defined(CONFIG_ATM_BR2684) || defined(CONFIG_ATM_BR2684_MODULE) if (br2684_ioctl_hook) { diff -Nru a/net/atm/common.h b/net/atm/common.h --- a/net/atm/common.h Mon Jun 23 09:45:58 2003 +++ b/net/atm/common.h Mon Jun 23 09:45:58 2003 @@ -26,6 +26,8 @@ void atm_shutdown_dev(struct atm_dev *dev); +void pppoatm_ioctl_set(int (*hook)(struct atm_vcc *, unsigned int, unsigned long)); + int atmpvc_init(void); void atmpvc_exit(void); int atmsvc_init(void); diff -Nru a/net/atm/pppoatm.c b/net/atm/pppoatm.c --- a/net/atm/pppoatm.c Mon Jun 23 09:45:58 2003 +++ b/net/atm/pppoatm.c Mon Jun 23 09:45:58 2003 @@ -44,6 +44,8 @@ #include #include +#include "common.h" + #if 0 #define DPRINTK(format, args...) \ printk(KERN_DEBUG "pppoatm: " format, ##args) @@ -344,17 +346,15 @@ /* the following avoids some spurious warnings from the compiler */ #define UNUSED __attribute__((unused)) -extern int (*pppoatm_ioctl_hook)(struct atm_vcc *, unsigned int, unsigned long); - static int __init UNUSED pppoatm_init(void) { - pppoatm_ioctl_hook = pppoatm_ioctl; + pppoatm_ioctl_set(pppoatm_ioctl); return 0; } static void __exit UNUSED pppoatm_exit(void) { - pppoatm_ioctl_hook = NULL; + pppoatm_ioctl_set(NULL); } module_init(pppoatm_init); # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1365 -> 1.1366 # net/atm/lec.c 1.31 -> 1.32 # net/atm/signaling.c 1.18 -> 1.19 # net/atm/mpc.c 1.22 -> 1.23 # net/atm/raw.c 1.5 -> 1.6 # net/atm/clip.c 1.19 -> 1.20 # net/atm/common.c 1.40 -> 1.41 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/23 chas@relax.cmf.nrl.navy.mil 1.1366 # use sk_data_ready and sk_change_state instead of wake_up # -------------------------------------------- # diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/clip.c Mon Jun 23 10:57:48 2003 @@ -67,7 +67,7 @@ ctrl->ip = ip; atm_force_charge(atmarpd,skb->truesize); skb_queue_tail(&atmarpd->sk->sk_receive_queue, skb); - wake_up(atmarpd->sk->sk_sleep); + atmarpd->sk->sk_data_ready(atmarpd->sk, skb->len); return 0; } diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/common.c Mon Jun 23 10:57:48 2003 @@ -328,7 +328,7 @@ set_bit(ATM_VF_CLOSE, &vcc->flags); vcc->reply = reply; vcc->sk->sk_err = -reply; - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_state_change(vcc->sk); } diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/lec.c Mon Jun 23 10:57:48 2003 @@ -134,7 +134,7 @@ priv = (struct lec_priv *)dev->priv; atm_force_charge(priv->lecd, skb2->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb2); - wake_up(priv->lecd->sk->sk_sleep); + priv->lecd->sk->sk_data_ready(priv->lecd->sk, skb2->len); } return; @@ -513,7 +513,7 @@ memcpy(skb2->data, mesg, sizeof(struct atmlec_msg)); atm_force_charge(priv->lecd, skb2->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb2); - wake_up(priv->lecd->sk->sk_sleep); + priv->lecd->sk->sk_data_ready(priv->lecd->sk, skb2->len); } if (f != NULL) br_fdb_put_hook(f); #endif /* defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE) */ @@ -598,13 +598,13 @@ atm_force_charge(priv->lecd, skb->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb); - wake_up(priv->lecd->sk->sk_sleep); + priv->lecd->sk->sk_data_ready(priv->lecd->sk, skb->len); if (data != NULL) { DPRINTK("lec: about to send %d bytes of data\n", data->len); atm_force_charge(priv->lecd, data->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, data); - wake_up(priv->lecd->sk->sk_sleep); + priv->lecd->sk->sk_data_ready(priv->lecd->sk, skb->len); } return 0; @@ -686,7 +686,7 @@ if (memcmp(skb->data, lec_ctrl_magic, 4) ==0) { /* Control frame, to daemon*/ DPRINTK("%s: To daemon\n",dev->name); skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_data_ready(vcc->sk, skb->len); } else { /* Data frame, queue to protocol handlers */ unsigned char *dst; diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/mpc.c Mon Jun 23 10:57:48 2003 @@ -669,7 +669,7 @@ dprintk("mpoa: (%s) mpc_push: control packet arrived\n", dev->name); /* Pass control packets to daemon */ skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_data_ready(vcc->sk, skb->len); return; } @@ -947,7 +947,7 @@ memcpy(skb->data, mesg, sizeof(struct k_message)); atm_force_charge(mpc->mpoad_vcc, skb->truesize); skb_queue_tail(&mpc->mpoad_vcc->sk->sk_receive_queue, skb); - wake_up(mpc->mpoad_vcc->sk->sk_sleep); + mpc->mpoad_vcc->sk->sk_data_ready(mpc->mpoad_vcc->sk, skb->len); return 0; } @@ -1226,7 +1226,7 @@ atm_force_charge(vcc, skb->truesize); skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_data_ready(vcc->sk, skb->len); dprintk("mpoa: purge_egress_shortcut: exiting:\n"); return; diff -Nru a/net/atm/raw.c b/net/atm/raw.c --- a/net/atm/raw.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/raw.c Mon Jun 23 10:57:48 2003 @@ -29,7 +29,7 @@ { if (skb) { skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_data_ready(vcc->sk, skb->len); } } diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Mon Jun 23 10:57:48 2003 +++ b/net/atm/signaling.c Mon Jun 23 10:57:48 2003 @@ -63,7 +63,7 @@ #endif atm_force_charge(sigd,skb->truesize); skb_queue_tail(&sigd->sk->sk_receive_queue,skb); - wake_up(sigd->sk->sk_sleep); + sigd->sk->sk_data_ready(sigd->sk, skb->len); } @@ -206,7 +206,7 @@ set_bit(ATM_VF_RELEASED,&vcc->flags); vcc->reply = -EUNATCH; vcc->sk->sk_err = EUNATCH; - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_state_change(vcc->sk); } } # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1360 -> 1.1361 # net/atm/svc.c 1.19 -> 1.20 # net/atm/signaling.c 1.15 -> 1.16 # include/linux/atmdev.h 1.19 -> 1.20 # net/atm/common.c 1.36 -> 1.37 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/20 chas@relax.cmf.nrl.navy.mil 1.1361 # use sk_state_change() and eliminate vcc->callback() # -------------------------------------------- # diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Mon Jun 23 09:54:19 2003 +++ b/include/linux/atmdev.h Mon Jun 23 09:54:19 2003 @@ -297,7 +297,6 @@ short itf; /* interface number */ struct sockaddr_atmsvc local; struct sockaddr_atmsvc remote; - void (*callback)(struct atm_vcc *vcc); int reply; /* also used by ATMTCP */ /* Multipoint part ------------------------------------------------- */ struct atm_vcc *session; /* session VCC descriptor */ diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Mon Jun 23 09:54:19 2003 +++ b/net/atm/common.c Mon Jun 23 09:54:19 2003 @@ -215,6 +215,14 @@ kfree(sk->sk_protinfo); } + +static void vcc_def_wakeup(struct sock *sk) +{ + read_lock(&sk->sk_callback_lock); + if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) + wake_up(sk->sk_sleep); + read_unlock(&sk->sk_callback_lock); +} int vcc_create(struct socket *sock, int protocol, int family) { @@ -228,6 +236,7 @@ if (!sk) return -ENOMEM; sock_init_data(NULL, sk); + sk->sk_state_change = vcc_def_wakeup; vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); if (!vcc) { @@ -238,7 +247,6 @@ memset(vcc, 0, sizeof(*vcc)); vcc->sk = sk; vcc->dev = NULL; - vcc->callback = NULL; memset(&vcc->local,0,sizeof(struct sockaddr_atmsvc)); memset(&vcc->remote,0,sizeof(struct sockaddr_atmsvc)); vcc->qos.txtp.max_sdu = 1 << 16; /* for meta VCs */ diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Mon Jun 23 09:54:19 2003 +++ b/net/atm/signaling.c Mon Jun 23 09:54:19 2003 @@ -137,11 +137,8 @@ } vcc->sk->sk_ack_backlog++; skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - if (vcc->callback) { - DPRINTK("waking vcc->sleep 0x%p\n", - &vcc->sleep); - vcc->callback(vcc); - } + DPRINTK("waking vcc->sleep 0x%p\n", &vcc->sleep); + vcc->sk->sk_state_change(vcc->sk); as_indicate_complete: release_sock(vcc->sk); return 0; @@ -159,7 +156,7 @@ (int) msg->type); return -EINVAL; } - if (vcc->callback) vcc->callback(vcc); + vcc->sk->sk_state_change(vcc->sk); dev_kfree_skb(skb); return 0; } diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Mon Jun 23 09:54:19 2003 +++ b/net/atm/svc.c Mon Jun 23 09:54:19 2003 @@ -43,14 +43,6 @@ */ -void svc_callback(struct atm_vcc *vcc) -{ - wake_up(&vcc->sleep); -} - - - - static int svc_shutdown(struct socket *sock,int how) { return 0; @@ -547,7 +539,6 @@ sock->ops = &svc_proto_ops; error = vcc_create(sock, protocol, AF_ATMSVC); if (error) return error; - ATM_SD(sock)->callback = svc_callback; ATM_SD(sock)->local.sas_family = AF_ATMSVC; ATM_SD(sock)->remote.sas_family = AF_ATMSVC; return 0; # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1361 -> 1.1362 # net/atm/lec.c 1.30 -> 1.31 # net/atm/svc.c 1.20 -> 1.21 # drivers/atm/atmtcp.c 1.11 -> 1.12 # net/atm/signaling.c 1.16 -> 1.17 # net/atm/mpc.c 1.21 -> 1.22 # include/linux/atmdev.h 1.20 -> 1.21 # net/atm/raw.c 1.4 -> 1.5 # net/atm/clip.c 1.18 -> 1.19 # net/atm/common.c 1.37 -> 1.38 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/23 chas@relax.cmf.nrl.navy.mil 1.1362 # eliminate vcc->sleep() # -------------------------------------------- # diff -Nru a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c --- a/drivers/atm/atmtcp.c Mon Jun 23 10:58:07 2003 +++ b/drivers/atm/atmtcp.c Mon Jun 23 10:58:07 2003 @@ -66,7 +66,7 @@ *(struct atm_vcc **) &new_msg->vcc = vcc; old_test = test_bit(flag,&vcc->flags); out_vcc->push(out_vcc,skb); - add_wait_queue(&vcc->sleep,&wait); + add_wait_queue(vcc->sk->sk_sleep, &wait); while (test_bit(flag,&vcc->flags) == old_test) { mb(); out_vcc = PRIV(vcc->dev) ? PRIV(vcc->dev)->vcc : NULL; @@ -78,7 +78,7 @@ schedule(); } current->state = TASK_RUNNING; - remove_wait_queue(&vcc->sleep,&wait); + remove_wait_queue(vcc->sk->sk_sleep, &wait); return error; } @@ -103,7 +103,7 @@ msg->type); return -EINVAL; } - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); return 0; } @@ -257,7 +257,7 @@ walk = atm_sk(s); if (walk->dev != atmtcp_dev) continue; - wake_up(&walk->sleep); + wake_up(walk->sk->sk_sleep); } read_unlock(&vcc_sklist_lock); } diff -Nru a/include/linux/atmdev.h b/include/linux/atmdev.h --- a/include/linux/atmdev.h Mon Jun 23 10:58:07 2003 +++ b/include/linux/atmdev.h Mon Jun 23 10:58:07 2003 @@ -291,7 +291,6 @@ void *dev_data; /* per-device data */ void *proto_data; /* per-protocol data */ struct k_atm_aal_stats *stats; /* pointer to AAL stats group */ - wait_queue_head_t sleep; /* if socket is busy */ struct sock *sk; /* socket backpointer */ /* SVC part --- may move later ------------------------------------- */ short itf; /* interface number */ diff -Nru a/net/atm/clip.c b/net/atm/clip.c --- a/net/atm/clip.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/clip.c Mon Jun 23 10:58:07 2003 @@ -67,7 +67,7 @@ ctrl->ip = ip; atm_force_charge(atmarpd,skb->truesize); skb_queue_tail(&atmarpd->sk->sk_receive_queue, skb); - wake_up(&atmarpd->sleep); + wake_up(atmarpd->sk->sk_sleep); return 0; } diff -Nru a/net/atm/common.c b/net/atm/common.c --- a/net/atm/common.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/common.c Mon Jun 23 10:58:07 2003 @@ -235,7 +235,7 @@ sk = sk_alloc(family, GFP_KERNEL, 1, NULL); if (!sk) return -ENOMEM; - sock_init_data(NULL, sk); + sock_init_data(sock, sk); sk->sk_state_change = vcc_def_wakeup; vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); @@ -257,8 +257,6 @@ vcc->push_oam = NULL; vcc->vpi = vcc->vci = 0; /* no VCI/VPI yet */ vcc->atm_options = vcc->aal_options = 0; - init_waitqueue_head(&vcc->sleep); - sk->sk_sleep = &vcc->sleep; sk->sk_destruct = vcc_sock_destruct; sock->sk = sk; return 0; @@ -310,7 +308,7 @@ set_bit(ATM_VF_CLOSE, &vcc->flags); vcc->reply = reply; vcc->sk->sk_err = -reply; - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); } @@ -557,7 +555,7 @@ } /* verify_area is done by net/socket.c */ eff = (size+3) & ~3; /* align to word boundary */ - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); error = 0; while (!(skb = alloc_tx(vcc,eff))) { if (m->msg_flags & MSG_DONTWAIT) { @@ -578,9 +576,9 @@ error = -EPIPE; break; } - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); } - finish_wait(&vcc->sleep, &wait); + finish_wait(sk->sk_sleep, &wait); if (error) goto out; skb->dev = NULL; /* for paths shared with net_device interfaces */ @@ -605,7 +603,7 @@ unsigned int mask; vcc = ATM_SD(sock); - poll_wait(file,&vcc->sleep,wait); + poll_wait(file, vcc->sk->sk_sleep, wait); mask = 0; if (skb_peek(&vcc->sk->sk_receive_queue)) mask |= POLLIN | POLLRDNORM; diff -Nru a/net/atm/lec.c b/net/atm/lec.c --- a/net/atm/lec.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/lec.c Mon Jun 23 10:58:07 2003 @@ -134,7 +134,7 @@ priv = (struct lec_priv *)dev->priv; atm_force_charge(priv->lecd, skb2->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb2); - wake_up(&priv->lecd->sleep); + wake_up(priv->lecd->sk->sk_sleep); } return; @@ -513,7 +513,7 @@ memcpy(skb2->data, mesg, sizeof(struct atmlec_msg)); atm_force_charge(priv->lecd, skb2->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb2); - wake_up(&priv->lecd->sleep); + wake_up(priv->lecd->sk->sk_sleep); } if (f != NULL) br_fdb_put_hook(f); #endif /* defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE) */ @@ -598,13 +598,13 @@ atm_force_charge(priv->lecd, skb->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, skb); - wake_up(&priv->lecd->sleep); + wake_up(priv->lecd->sk->sk_sleep); if (data != NULL) { DPRINTK("lec: about to send %d bytes of data\n", data->len); atm_force_charge(priv->lecd, data->truesize); skb_queue_tail(&priv->lecd->sk->sk_receive_queue, data); - wake_up(&priv->lecd->sleep); + wake_up(priv->lecd->sk->sk_sleep); } return 0; @@ -686,7 +686,7 @@ if (memcmp(skb->data, lec_ctrl_magic, 4) ==0) { /* Control frame, to daemon*/ DPRINTK("%s: To daemon\n",dev->name); skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); } else { /* Data frame, queue to protocol handlers */ unsigned char *dst; diff -Nru a/net/atm/mpc.c b/net/atm/mpc.c --- a/net/atm/mpc.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/mpc.c Mon Jun 23 10:58:07 2003 @@ -669,7 +669,7 @@ dprintk("mpoa: (%s) mpc_push: control packet arrived\n", dev->name); /* Pass control packets to daemon */ skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); return; } @@ -947,7 +947,7 @@ memcpy(skb->data, mesg, sizeof(struct k_message)); atm_force_charge(mpc->mpoad_vcc, skb->truesize); skb_queue_tail(&mpc->mpoad_vcc->sk->sk_receive_queue, skb); - wake_up(&mpc->mpoad_vcc->sleep); + wake_up(mpc->mpoad_vcc->sk->sk_sleep); return 0; } @@ -1226,7 +1226,7 @@ atm_force_charge(vcc, skb->truesize); skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); dprintk("mpoa: purge_egress_shortcut: exiting:\n"); return; diff -Nru a/net/atm/raw.c b/net/atm/raw.c --- a/net/atm/raw.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/raw.c Mon Jun 23 10:58:07 2003 @@ -29,7 +29,7 @@ { if (skb) { skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); } } @@ -40,7 +40,7 @@ skb->truesize); atomic_sub(skb->truesize, &vcc->sk->sk_wmem_alloc); dev_kfree_skb_any(skb); - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); } diff -Nru a/net/atm/signaling.c b/net/atm/signaling.c --- a/net/atm/signaling.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/signaling.c Mon Jun 23 10:58:07 2003 @@ -61,7 +61,7 @@ #endif atm_force_charge(sigd,skb->truesize); skb_queue_tail(&sigd->sk->sk_receive_queue,skb); - wake_up(&sigd->sleep); + wake_up(sigd->sk->sk_sleep); } @@ -137,7 +137,7 @@ } vcc->sk->sk_ack_backlog++; skb_queue_tail(&vcc->sk->sk_receive_queue, skb); - DPRINTK("waking vcc->sleep 0x%p\n", &vcc->sleep); + DPRINTK("waking vcc->sk->sk_sleep 0x%p\n", vcc->sk->sk_sleep); vcc->sk->sk_state_change(vcc->sk); as_indicate_complete: release_sock(vcc->sk); @@ -204,7 +204,7 @@ set_bit(ATM_VF_RELEASED,&vcc->flags); vcc->reply = -EUNATCH; vcc->sk->sk_err = EUNATCH; - wake_up(&vcc->sleep); + wake_up(vcc->sk->sk_sleep); } } diff -Nru a/net/atm/svc.c b/net/atm/svc.c --- a/net/atm/svc.c Mon Jun 23 10:58:07 2003 +++ b/net/atm/svc.c Mon Jun 23 10:58:07 2003 @@ -56,13 +56,13 @@ DPRINTK("svc_disconnect %p\n",vcc); if (test_bit(ATM_VF_REGIS,&vcc->flags)) { - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_close,NULL,NULL,NULL); while (!test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { schedule(); - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } - finish_wait(&vcc->sleep, &wait); + finish_wait(vcc->sk->sk_sleep, &wait); } /* beware - socket is still in use by atmsigd until the last as_indicate has been answered */ @@ -138,13 +138,13 @@ } vcc->local = *addr; vcc->reply = WAITING; - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_bind,NULL,NULL,&vcc->local); while (vcc->reply == WAITING && sigd) { schedule(); - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } - finish_wait(&vcc->sleep, &wait); + finish_wait(sk->sk_sleep, &wait); clear_bit(ATM_VF_REGIS,&vcc->flags); /* doesn't count */ if (!sigd) { error = -EUNATCH; @@ -219,10 +219,10 @@ } vcc->remote = *addr; vcc->reply = WAITING; - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); sigd_enq(vcc,as_connect,NULL,NULL,&vcc->remote); if (flags & O_NONBLOCK) { - finish_wait(&vcc->sleep, &wait); + finish_wait(sk->sk_sleep, &wait); sock->state = SS_CONNECTING; error = -EINPROGRESS; goto out; @@ -231,7 +231,7 @@ while (vcc->reply == WAITING && sigd) { schedule(); if (!signal_pending(current)) { - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); continue; } DPRINTK("*ABORT*\n"); @@ -249,13 +249,13 @@ */ sigd_enq(vcc,as_close,NULL,NULL,NULL); while (vcc->reply == WAITING && sigd) { - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); schedule(); } if (!vcc->reply) while (!test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { - prepare_to_wait(&vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); schedule(); } clear_bit(ATM_VF_REGIS,&vcc->flags); @@ -265,7 +265,7 @@ error = -EINTR; break; } - finish_wait(&vcc->sleep, &wait); + finish_wait(sk->sk_sleep, &wait); if (error) goto out; if (!sigd) { @@ -312,13 +312,13 @@ goto out; } vcc->reply = WAITING; - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_listen,NULL,NULL,&vcc->local); while (vcc->reply == WAITING && sigd) { schedule(); - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } - finish_wait(&vcc->sleep, &wait); + finish_wait(sk->sk_sleep, &wait); if (!sigd) { error = -EUNATCH; goto out; @@ -354,7 +354,7 @@ while (1) { DEFINE_WAIT(wait); - prepare_to_wait(&old_vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(old_vcc->sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); while (!(skb = skb_dequeue(&old_vcc->sk->sk_receive_queue)) && sigd) { if (test_bit(ATM_VF_RELEASED,&old_vcc->flags)) break; @@ -373,9 +373,9 @@ error = -ERESTARTSYS; break; } - prepare_to_wait(&old_vcc->sleep, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(old_vcc->sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); } - finish_wait(&old_vcc->sleep, &wait); + finish_wait(old_vcc->sk->sk_sleep, &wait); if (error) goto out; if (!skb) { @@ -400,15 +400,15 @@ } /* wait should be short, so we ignore the non-blocking flag */ new_vcc->reply = WAITING; - prepare_to_wait(&new_vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(new_vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(new_vcc,as_accept,old_vcc,NULL,NULL); while (new_vcc->reply == WAITING && sigd) { release_sock(sk); schedule(); lock_sock(sk); - prepare_to_wait(&new_vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(new_vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } - finish_wait(&new_vcc->sleep, &wait); + finish_wait(new_vcc->sk->sk_sleep, &wait); if (!sigd) { error = -EUNATCH; goto out; @@ -444,14 +444,14 @@ DEFINE_WAIT(wait); vcc->reply = WAITING; - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq2(vcc,as_modify,NULL,NULL,&vcc->local,qos,0); while (vcc->reply == WAITING && !test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { schedule(); - prepare_to_wait(&vcc->sleep, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } - finish_wait(&vcc->sleep, &wait); + finish_wait(vcc->sk->sk_sleep, &wait); if (!sigd) return -EUNATCH; return vcc->reply; } From jmorris@intercode.com.au Tue Jun 24 01:47:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 01:47:53 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:3p3hSQ/fSLaebeztXBQnhVQ6E9Vmh4Y6@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5O8li2x008678 for ; Tue, 24 Jun 2003 01:47:46 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5O8lRr22148; Tue, 24 Jun 2003 18:47:28 +1000 Date: Tue, 24 Jun 2003 18:47:26 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: "David S. Miller" , Subject: Re: [PATCH] IPV6: Fix large packet length check In-Reply-To: <20030624.124030.123761150.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3472 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 24 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > This patch fixes the problem. > Patch against [PATCH] IPV6: use macro for maximum payload length patch. I've applied both of these to bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From jmorris@intercode.com.au Tue Jun 24 05:01:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 05:01:46 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:YlYkYXvmxNPT92R99n4Thdps+2yTGsr3@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OC1X2x022824 for ; Tue, 24 Jun 2003 05:01:35 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5OC1Qr22835; Tue, 24 Jun 2003 22:01:26 +1000 Date: Tue, 24 Jun 2003 22:01:25 +1000 (EST) From: James Morris To: Matthew Wilcox cc: netdev@oss.sgi.com Subject: Re: [PATCH] more CONFIG_NET removals In-Reply-To: <20030620112504.GM24357@parcelfarce.linux.theplanet.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3473 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Fri, 20 Jun 2003, Matthew Wilcox wrote: > Anyone see a problem with this patch against 2.5.72? It worksforme. > +net-$(CONFIG_COMPAT) += compat.o > +net-$(CONFIG_MODULES) += netsyms.o > +net-$(CONFIG_SYSCTL) += sysctl_net.o > +obj-$(CONFIG_NET) := socket.o core/ $(net-y) Some of the net/compat.c functions (e.g. compat_sys_setsockopt) are still needed to allow the kernel to build. Perhaps use cond_syscall() for these? - James -- James Morris From us15@os.inf.tu-dresden.de Tue Jun 24 07:29:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 07:29:36 -0700 (PDT) Received: from Hell.WH8.TU-Dresden.De (Hell.WH8.tu-dresden.de [141.30.225.3]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OETP2x025906 for ; Tue, 24 Jun 2003 07:29:27 -0700 Received: from Corona.WH8.TU-Dresden.De (Corona.WH8.TU-Dresden.De [141.30.225.56]) by Hell.WH8.TU-Dresden.De (8.12.9-Hell/8.12.9/Slackware) with ESMTP id h5OETNPg017978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 24 Jun 2003 16:29:23 +0200 Date: Tue, 24 Jun 2003 16:29:19 +0200 From: "Udo A. Steinberg" To: "Feldman, Scott" , Linux Network Mailing List , "netdev@oss.sgi.com" Subject: Linux-2.5.73 + e100 Message-Id: <20030624162919.09a45dc0.us15@os.inf.tu-dresden.de> Organization: Fiasco Core Team X-GPG-Key: 1024D/233B9D29 (wwwkeys.pgp.net) X-GPG-Fingerprint: CE1F 5FDD 3C01 BE51 2106 292E 9E14 735D 233B 9D29 X-Fiasco-Rulez: Yes X-Mailer: X-Mailer 5.0 Gold Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="pgp-sha1"; boundary="=.7MjGUDTPHVX9dg" X-archive-position: 3474 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: us15@os.inf.tu-dresden.de Precedence: bulk X-list: netdev --=.7MjGUDTPHVX9dg Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hi, Just tried out the e100 driver which comes with 2.5.73, i.e. not the new one from sourceforge, with the following result. The machine is a dual Xeon box. Regards, -Udo. Intel(R) PRO/100 Network Driver - version 2.3.13-k1 Copyright (c) 2003 Intel Corporation e100: selftest OK. irq 9: nobody cared! Call Trace: [] __report_bad_irq+0x2a/0x8b [] note_interrupt+0x6f/0x9f [] do_IRQ+0x161/0x192 [] common_interrupt+0x18/0x20 [] e100_mdi_read+0x50/0xdb [] e100_auto_neg+0xeb/0x112 [] e100_mdi_write+0xb0/0xdb [] e100_phy_set_speed_duplex+0x38/0xa7 [] e100_phy_init+0x71/0x80 [] e100_hw_init+0x13/0x10b [] e100_rd_pwa_no+0x31/0x3f [] e100_init+0xee/0x118 [] e100_found1+0x226/0x3f9 [] pci_device_probe_static+0x52/0x63 [] __pci_device_probe+0x3b/0x4d [] pci_device_probe+0x2f/0x4d [] bus_match+0x45/0x73 [] driver_attach+0x59/0x5d [] bus_add_driver+0x94/0xa7 [] pci_register_driver+0x80/0xa8 [] e100_init_module+0x17/0x57 [] do_initcalls+0x27/0x92 [] init_workqueues+0xf/0x26 [] init+0x5b/0x1f6 [] init+0x0/0x1f6 [] kernel_thread_helper+0x5/0xb handlers: [] (acpi_irq+0x0/0x16) Disabling IRQ #9 e100: eth0: Intel(R) PRO/100 Network Connection Hardware receive checksums enabled --=.7MjGUDTPHVX9dg Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.3.1 (GNU/Linux) iD8DBQE++GBCnhRzXSM7nSkRAjCKAJ9jdbHFMh3Y0qjYoNl+kzyqzpj0sACeI2vt d/zcQaJZmYB26h0Fk4eXDBY= =3Pup -----END PGP SIGNATURE----- --=.7MjGUDTPHVX9dg-- From chas@locutus.cmf.nrl.navy.mil Tue Jun 24 10:35:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 10:35:26 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OHZH2x031028 for ; Tue, 24 Jun 2003 10:35:18 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5OHZCsG004098 for ; Tue, 24 Jun 2003 13:35:12 -0400 (EDT) Message-Id: <200306241735.h5OHZCsG004098@ginger.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [rfc] sk_write_space() for atm Reply-To: chas3@users.sourceforge.net Date: Tue, 24 Jun 2003 13:33:05 -0400 From: chas williams X-Spam-Score: () hits=0.5 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3475 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev i am thinking about the following for the atm protocol. the writable for atm has always been when you have enough space to send the next pdu. i suppose this should be preserved to be completely compat, but it might not be the best choice. poll is interesting also. it seems to me that vcc->reply should be atleast copied, since it could change during the poll function (or so i imagine). its probably a better idea to just change WAITING to be a bit inside vcc->flags and remove vcc->error in favor of sk->sk_err. ===== net/atm/common.c 1.41 vs edited ===== --- 1.41/net/atm/common.c Mon Jun 23 10:57:01 2003 +++ edited/net/atm/common.c Tue Jun 24 11:58:16 2003 @@ -243,6 +243,29 @@ wake_up(sk->sk_sleep); read_unlock(&sk->sk_callback_lock); } + +static inline int vcc_writable(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + return (vcc->qos.txtp.max_sdu + + atomic_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf; +} + +static void vcc_write_space(struct sock *sk) +{ + read_lock(&sk->sk_callback_lock); + + if (vcc_writable(sk)) { + if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) + wake_up_interruptible(sk->sk_sleep); + + sk_wake_async(sk, 2, POLL_OUT); + } + + read_unlock(&sk->sk_callback_lock); +} + int vcc_create(struct socket *sock, int protocol, int family) { @@ -257,6 +280,7 @@ return -ENOMEM; sock_init_data(sock, sk); sk->sk_state_change = vcc_def_wakeup; + sk->sk_write_space = vcc_write_space; vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); if (!vcc) { @@ -617,29 +641,39 @@ } -unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait) +unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; + volatile int reply; unsigned int mask; - vcc = ATM_SD(sock); - poll_wait(file, vcc->sk->sk_sleep, wait); + poll_wait(file, sk->sk_sleep, wait); mask = 0; - if (skb_peek(&vcc->sk->sk_receive_queue)) - mask |= POLLIN | POLLRDNORM; - if (test_bit(ATM_VF_RELEASED,&vcc->flags) || - test_bit(ATM_VF_CLOSE,&vcc->flags)) + + vcc = ATM_SD(sock); + reply = vcc->reply; + + /* exceptional events */ + if (sk->sk_err || (reply && reply != WAITING)) + mask = POLLERR; + + if (test_bit(ATM_VF_RELEASED, &vcc->flags) || + test_bit(ATM_VF_CLOSE, &vcc->flags)) mask |= POLLHUP; - if (sock->state != SS_CONNECTING) { - if (vcc->qos.txtp.traffic_class != ATM_NONE && - vcc->qos.txtp.max_sdu + - atomic_read(&vcc->sk->sk_wmem_alloc) <= vcc->sk->sk_sndbuf) - mask |= POLLOUT | POLLWRNORM; - } - else if (vcc->reply != WAITING) { - mask |= POLLOUT | POLLWRNORM; - if (vcc->reply) mask |= POLLERR; - } + + /* readable? */ + if (!skb_queue_empty(&sk->sk_receive_queue)) + mask |= POLLIN | POLLRDNORM; + + /* writable? */ + if (sock->state == SS_CONNECTING && reply == WAITING) + return mask; + + if (vcc->qos.txtp.traffic_class != ATM_NONE && + vcc_writable(vcc->sk)) + mask |= POLLOUT | POLLWRNORM | POLLWRBAND; + return mask; } ===== net/atm/common.h 1.15 vs edited ===== --- 1.15/net/atm/common.h Mon Jun 23 10:51:10 2003 +++ edited/net/atm/common.h Mon Jun 23 11:04:38 2003 @@ -17,7 +17,7 @@ int size, int flags); int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len); -unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); +unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait); int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); int vcc_setsockopt(struct socket *sock, int level, int optname, char *optval, int optlen); ===== net/atm/pvc.c 1.17 vs edited ===== --- 1.17/net/atm/pvc.c Fri Jun 20 17:33:02 2003 +++ edited/net/atm/pvc.c Mon Jun 23 11:05:07 2003 @@ -111,7 +111,7 @@ .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = pvc_getname, - .poll = atm_poll, + .poll = vcc_poll, .ioctl = vcc_ioctl, .listen = sock_no_listen, .shutdown = pvc_shutdown, ===== net/atm/raw.c 1.6 vs edited ===== --- 1.6/net/atm/raw.c Mon Jun 23 10:57:02 2003 +++ edited/net/atm/raw.c Mon Jun 23 11:06:57 2003 @@ -40,7 +40,7 @@ skb->truesize); atomic_sub(skb->truesize, &vcc->sk->sk_wmem_alloc); dev_kfree_skb_any(skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_write_space(vcc->sk); } ===== net/atm/svc.c 1.21 vs edited ===== --- 1.21/net/atm/svc.c Mon Jun 23 10:45:54 2003 +++ edited/net/atm/svc.c Mon Jun 23 11:05:17 2003 @@ -519,7 +519,7 @@ .socketpair = sock_no_socketpair, .accept = svc_accept, .getname = svc_getname, - .poll = atm_poll, + .poll = vcc_poll, .ioctl = vcc_ioctl, .listen = svc_listen, .shutdown = svc_shutdown, From us15@os.inf.tu-dresden.de Tue Jun 24 13:58:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 13:58:14 -0700 (PDT) Received: from Hell.WH8.TU-Dresden.De (Hell.WH8.tu-dresden.de [141.30.225.3]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OKw32x001373 for ; Tue, 24 Jun 2003 13:58:05 -0700 Received: from Corona.WH8.TU-Dresden.De (Corona.WH8.TU-Dresden.De [141.30.225.56]) by Hell.WH8.TU-Dresden.De (8.12.9-Hell/8.12.9/Slackware) with ESMTP id h5OKw3Pg001407 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 24 Jun 2003 22:58:03 +0200 Date: Tue, 24 Jun 2003 22:58:02 +0200 From: "Udo A. Steinberg" To: Linux Network Mailing List Cc: "netdev@oss.sgi.com" Subject: Re: Linux-2.5.73 + e100 Message-Id: <20030624225802.3d33d664.us15@os.inf.tu-dresden.de> In-Reply-To: <20030624162919.09a45dc0.us15@os.inf.tu-dresden.de> References: <20030624162919.09a45dc0.us15@os.inf.tu-dresden.de> Organization: Fiasco Core Team X-GPG-Key: 1024D/233B9D29 (wwwkeys.pgp.net) X-GPG-Fingerprint: CE1F 5FDD 3C01 BE51 2106 292E 9E14 735D 233B 9D29 X-Fiasco-Rulez: Yes X-Mailer: X-Mailer 5.0 Gold Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="pgp-sha1"; boundary="=.G8:xgohh/k,Sf/" X-archive-position: 3476 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: us15@os.inf.tu-dresden.de Precedence: bulk X-list: netdev --=.G8:xgohh/k,Sf/ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 24 Jun 2003 16:29:19 +0200 Udo A. Steinberg (UAS) wrote: UAS> e100: selftest OK. UAS> irq 9: nobody cared! UAS> Call Trace: [...] UAS> handlers: UAS> [] (acpi_irq+0x0/0x16) UAS> Disabling IRQ #9 UAS> e100: eth0: Intel(R) PRO/100 Network Connection UAS> Hardware receive checksums enabled Andrew Grover from the ACPI team confirmed that this bug is very likely an ACPI issue, and not related to the e100 driver. So don't worry about it :) Regards, -Udo. --=.G8:xgohh/k,Sf/ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.3.1 (GNU/Linux) iD8DBQE++LtanhRzXSM7nSkRAnX6AJ9lA4E2K4Umzh9Wp65U9SDNfHmSYwCfTtmB /kvjKAodTNefC3s3K/3Cb84= =QIh+ -----END PGP SIGNATURE----- --=.G8:xgohh/k,Sf/-- From davem@redhat.com Tue Jun 24 14:46:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 14:47:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OLkq2x002067 for ; Tue, 24 Jun 2003 14:46:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA29703; Tue, 24 Jun 2003 14:41:03 -0700 Date: Tue, 24 Jun 2003 14:41:02 -0700 (PDT) Message-Id: <20030624.144102.08348088.davem@redhat.com> To: jmorris@intercode.com.au Cc: willy@debian.org, netdev@oss.sgi.com Subject: Re: [PATCH] more CONFIG_NET removals From: "David S. Miller" In-Reply-To: References: <20030620112504.GM24357@parcelfarce.linux.theplanet.co.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3477 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Tue, 24 Jun 2003 22:01:25 +1000 (EST) Some of the net/compat.c functions (e.g. compat_sys_setsockopt) are still needed to allow the kernel to build. Perhaps use cond_syscall() for these? Agreed. You can't get rid of net/compat.o so easily. From davem@redhat.com Tue Jun 24 14:57:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 14:57:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5OLvb2x002431 for ; Tue, 24 Jun 2003 14:57:38 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA29741; Tue, 24 Jun 2003 14:51:51 -0700 Date: Tue, 24 Jun 2003 14:51:51 -0700 (PDT) Message-Id: <20030624.145151.28812624.davem@redhat.com> To: jmorris@intercode.com.au Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: Fix large packet length check From: "David S. Miller" In-Reply-To: References: <20030624.124030.123761150.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3478 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Tue, 24 Jun 2003 18:47:26 +1000 (EST) On Tue, 24 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > This patch fixes the problem. > Patch against [PATCH] IPV6: use macro for maximum payload length patch. I've applied both of these to bk://kernel.bkbits.net/jmorris/net-2.5 Thanks James, I've pulled them. From davem@redhat.com Tue Jun 24 16:37:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 16:38:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ONbr2x018603 for ; Tue, 24 Jun 2003 16:37:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA30104; Tue, 24 Jun 2003 16:31:56 -0700 Date: Tue, 24 Jun 2003 16:31:56 -0700 (PDT) Message-Id: <20030624.163156.35039084.davem@redhat.com> To: chas3@users.sourceforge.net, chas@cmf.nrl.navy.mil Cc: netdev@oss.sgi.com Subject: Re: [rfc] more atm cleanup From: "David S. Miller" In-Reply-To: <200306240353.h5O3r1sG025541@ginger.cmf.nrl.navy.mil> References: <200306240353.h5O3r1sG025541@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3479 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Mon, 23 Jun 2003 23:50:54 -0400 here's a couple of changes, in short, protect/setup br2684 and pppoatm ioctl's with a mutex, get rid of sleep in vcc and just use sock->sleep, replace wake_up() with sk_state_change and sk_data_ready (sk_write_space needs some thinking so we have one wake_up not converted). vcc->callback() also goes away in favor of sk_state_change(). This one looks good to me. From davem@redhat.com Tue Jun 24 16:41:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 16:41:19 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ONfD2x018916 for ; Tue, 24 Jun 2003 16:41:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA30125; Tue, 24 Jun 2003 16:35:17 -0700 Date: Tue, 24 Jun 2003 16:35:17 -0700 (PDT) Message-Id: <20030624.163517.15237390.davem@redhat.com> To: chas3@users.sourceforge.net, chas@cmf.nrl.navy.mil Cc: netdev@oss.sgi.com Subject: Re: [rfc] sk_write_space() for atm From: "David S. Miller" In-Reply-To: <200306241735.h5OHZCsG004098@ginger.cmf.nrl.navy.mil> References: <200306241735.h5OHZCsG004098@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3480 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 24 Jun 2003 13:33:05 -0400 i am thinking about the following for the atm protocol. the writable for atm has always been when you have enough space to send the next pdu. i suppose this should be preserved to be completely compat, but it might not be the best choice. poll is interesting also. it seems to me that vcc->reply should be atleast copied, since it could change during the poll function (or so i imagine). its probably a better idea to just change WAITING to be a bit inside vcc->flags and remove vcc->error in favor of sk->sk_err. You can achieve what you want with 'reply' via: reply = vcc->reply; barrier(); the code you have there will merely make gcc go to the stack for 'reply' every time it is used and that's obviously not what you want, you want a singular snapshot of vcc->reply. This doesn't guarentee anything, if you want to test multiple pieces of state and make a decision based upon a snapshot of them you must do some more serious locking (such as lock_sock()) in the poll function. From davem@redhat.com Tue Jun 24 17:46:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 17:46:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5P0ke2x022061 for ; Tue, 24 Jun 2003 17:46:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA30316; Tue, 24 Jun 2003 17:40:42 -0700 Date: Tue, 24 Jun 2003 17:40:42 -0700 (PDT) Message-Id: <20030624.174042.27809957.davem@redhat.com> To: chas3@users.sourceforge.net, chas@cmf.nrl.navy.mil Cc: netdev@oss.sgi.com Subject: Re: [rfc] sk_write_space() for atm From: "David S. Miller" In-Reply-To: <200306250032.h5P0WJsG010081@ginger.cmf.nrl.navy.mil> References: <20030624.163517.15237390.davem@redhat.com> <200306250032.h5P0WJsG010081@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3481 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 24 Jun 2003 20:30:12 -0400 comments? This looks ok to me, but I am not well versed in this area. For example, if you give a spurious wakeup via poll() what can happen? From chas@locutus.cmf.nrl.navy.mil Tue Jun 24 18:45:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 18:46:05 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5P1js2x022709 for ; Tue, 24 Jun 2003 18:45:54 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5P0WJsG010081; Tue, 24 Jun 2003 20:32:19 -0400 (EDT) Message-Id: <200306250032.h5P0WJsG010081@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Reply-To: chas3@users.sourceforge.net Subject: Re: [rfc] sk_write_space() for atm In-reply-to: Your message of "Tue, 24 Jun 2003 16:35:17 PDT." <20030624.163517.15237390.davem@redhat.com> Date: Tue, 24 Jun 2003 20:30:12 -0400 From: chas williams X-Spam-Score: () hits=-2.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3482 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030624.163517.15237390.davem@redhat.com>,"David S. Miller" writes: >You can achieve what you want with 'reply' via: > reply = vcc->reply; > barrier(); ah. >This doesn't guarentee anything, if you want to test multiple >pieces of state and make a decision based upon a snapshot of >them you must do some more serious locking (such as lock_sock()) >in the poll function. yeah, i figured the next step up was lock but i would rather just avoid that. the confusion seems to be that vcc->reply is used to hold the socket return code and indicate the waiting state. since this isnt compat with sk->sk_err (the likely replacment for vcc->reply) i just went ahead and made a flag bit ATM_VF_WAITING and changed vcc->reply to sk->sk_err. its not quite complete, a couple places (recvmsg/sendmsg) should just return EPIPE instead of sk_err and the occurances of error = -sk->sk_err should be error = sock_errno(sk) (or so i think). comments? ===== drivers/atm/atmtcp.c 1.12 vs edited ===== --- 1.12/drivers/atm/atmtcp.c Mon Jun 23 10:45:54 2003 +++ edited/drivers/atm/atmtcp.c Tue Jun 24 15:01:21 2003 @@ -90,7 +90,7 @@ vcc->vpi = msg->addr.sap_addr.vpi; vcc->vci = msg->addr.sap_addr.vci; vcc->qos = msg->qos; - vcc->reply = msg->result; + vcc->sk->sk_err = -msg->result; switch (msg->type) { case ATMTCP_CTRL_OPEN: change_bit(ATM_VF_READY,&vcc->flags); @@ -134,7 +134,7 @@ clear_bit(ATM_VF_READY,&vcc->flags); /* just in case ... */ error = atmtcp_send_control(vcc,ATMTCP_CTRL_OPEN,&msg,ATM_VF_READY); if (error) return error; - return vcc->reply; + return -vcc->sk->sk_err; } ===== include/linux/atmdev.h 1.21 vs edited ===== --- 1.21/include/linux/atmdev.h Mon Jun 23 10:45:54 2003 +++ edited/include/linux/atmdev.h Tue Jun 24 20:25:39 2003 @@ -252,6 +252,7 @@ ATM_VF_SESSION, /* VCC is p2mp session control descriptor */ ATM_VF_HASSAP, /* SAP has been set */ ATM_VF_CLOSE, /* asynchronous close - treat like VF_RELEASED*/ + ATM_VF_WAITING, /* waiting for reply from sigd */ }; @@ -296,7 +297,6 @@ short itf; /* interface number */ struct sockaddr_atmsvc local; struct sockaddr_atmsvc remote; - int reply; /* also used by ATMTCP */ /* Multipoint part ------------------------------------------------- */ struct atm_vcc *session; /* session VCC descriptor */ /* Other stuff ----------------------------------------------------- */ ===== net/atm/common.c 1.41 vs edited ===== --- 1.41/net/atm/common.c Mon Jun 23 10:57:01 2003 +++ edited/net/atm/common.c Tue Jun 24 20:25:39 2003 @@ -243,6 +243,29 @@ wake_up(sk->sk_sleep); read_unlock(&sk->sk_callback_lock); } + +static inline int vcc_writable(struct sock *sk) +{ + struct atm_vcc *vcc = atm_sk(sk); + + return (vcc->qos.txtp.max_sdu + + atomic_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf; +} + +static void vcc_write_space(struct sock *sk) +{ + read_lock(&sk->sk_callback_lock); + + if (vcc_writable(sk)) { + if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) + wake_up_interruptible(sk->sk_sleep); + + sk_wake_async(sk, 2, POLL_OUT); + } + + read_unlock(&sk->sk_callback_lock); +} + int vcc_create(struct socket *sock, int protocol, int family) { @@ -257,6 +280,7 @@ return -ENOMEM; sock_init_data(sock, sk); sk->sk_state_change = vcc_def_wakeup; + sk->sk_write_space = vcc_write_space; vcc = atm_sk(sk) = kmalloc(sizeof(*vcc), GFP_KERNEL); if (!vcc) { @@ -326,8 +350,8 @@ void vcc_release_async(struct atm_vcc *vcc, int reply) { set_bit(ATM_VF_CLOSE, &vcc->flags); - vcc->reply = reply; vcc->sk->sk_err = -reply; + clear_bit(ATM_VF_WAITING, &vcc->flags); vcc->sk->sk_state_change(vcc->sk); } @@ -501,7 +525,7 @@ vcc = ATM_SD(sock); if (test_bit(ATM_VF_RELEASED,&vcc->flags) || test_bit(ATM_VF_CLOSE,&vcc->flags)) - return vcc->reply; + return -sk->sk_err; if (!test_bit(ATM_VF_READY, &vcc->flags)) return 0; @@ -558,7 +582,7 @@ vcc = ATM_SD(sock); if (test_bit(ATM_VF_RELEASED, &vcc->flags) || test_bit(ATM_VF_CLOSE, &vcc->flags)) { - error = vcc->reply; + error = -sk->sk_err; goto out; } if (!test_bit(ATM_VF_READY, &vcc->flags)) { @@ -589,7 +613,7 @@ } if (test_bit(ATM_VF_RELEASED,&vcc->flags) || test_bit(ATM_VF_CLOSE,&vcc->flags)) { - error = vcc->reply; + error = -sk->sk_err; break; } if (!test_bit(ATM_VF_READY,&vcc->flags)) { @@ -617,29 +641,38 @@ } -unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait) +unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait) { + struct sock *sk = sock->sk; struct atm_vcc *vcc; unsigned int mask; - vcc = ATM_SD(sock); - poll_wait(file, vcc->sk->sk_sleep, wait); + poll_wait(file, sk->sk_sleep, wait); mask = 0; - if (skb_peek(&vcc->sk->sk_receive_queue)) - mask |= POLLIN | POLLRDNORM; - if (test_bit(ATM_VF_RELEASED,&vcc->flags) || - test_bit(ATM_VF_CLOSE,&vcc->flags)) + + vcc = ATM_SD(sock); + + /* exceptional events */ + if (sk->sk_err) + mask = POLLERR; + + if (test_bit(ATM_VF_RELEASED, &vcc->flags) || + test_bit(ATM_VF_CLOSE, &vcc->flags)) mask |= POLLHUP; - if (sock->state != SS_CONNECTING) { - if (vcc->qos.txtp.traffic_class != ATM_NONE && - vcc->qos.txtp.max_sdu + - atomic_read(&vcc->sk->sk_wmem_alloc) <= vcc->sk->sk_sndbuf) - mask |= POLLOUT | POLLWRNORM; - } - else if (vcc->reply != WAITING) { - mask |= POLLOUT | POLLWRNORM; - if (vcc->reply) mask |= POLLERR; - } + + /* readable? */ + if (!skb_queue_empty(&sk->sk_receive_queue)) + mask |= POLLIN | POLLRDNORM; + + /* writable? */ + if (sock->state == SS_CONNECTING && + test_bit(ATM_VF_WAITING, &vcc->flags)) + return mask; + + if (vcc->qos.txtp.traffic_class != ATM_NONE && + vcc_writable(vcc->sk)) + mask |= POLLOUT | POLLWRNORM | POLLWRBAND; + return mask; } ===== net/atm/common.h 1.15 vs edited ===== --- 1.15/net/atm/common.h Mon Jun 23 10:51:10 2003 +++ edited/net/atm/common.h Tue Jun 24 20:25:03 2003 @@ -17,7 +17,7 @@ int size, int flags); int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, int total_len); -unsigned int atm_poll(struct file *file,struct socket *sock,poll_table *wait); +unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait); int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); int vcc_setsockopt(struct socket *sock, int level, int optname, char *optval, int optlen); ===== net/atm/proc.c 1.21 vs edited ===== --- 1.21/net/atm/proc.c Fri Jun 20 17:33:01 2003 +++ edited/net/atm/proc.c Tue Jun 24 20:25:41 2003 @@ -224,7 +224,7 @@ here += sprintf(here, "%3d", vcc->sk->sk_family); } here += sprintf(here," %04lx %5d %7d/%7d %7d/%7d\n",vcc->flags, - vcc->reply, + vcc->sk->sk_err, atomic_read(&vcc->sk->sk_wmem_alloc), vcc->sk->sk_sndbuf, atomic_read(&vcc->sk->sk_rmem_alloc), vcc->sk->sk_rcvbuf); } ===== net/atm/pvc.c 1.17 vs edited ===== --- 1.17/net/atm/pvc.c Fri Jun 20 17:33:02 2003 +++ edited/net/atm/pvc.c Tue Jun 24 20:25:04 2003 @@ -111,7 +111,7 @@ .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = pvc_getname, - .poll = atm_poll, + .poll = vcc_poll, .ioctl = vcc_ioctl, .listen = sock_no_listen, .shutdown = pvc_shutdown, ===== net/atm/raw.c 1.6 vs edited ===== --- 1.6/net/atm/raw.c Mon Jun 23 10:57:02 2003 +++ edited/net/atm/raw.c Tue Jun 24 20:25:05 2003 @@ -40,7 +40,7 @@ skb->truesize); atomic_sub(skb->truesize, &vcc->sk->sk_wmem_alloc); dev_kfree_skb_any(skb); - wake_up(vcc->sk->sk_sleep); + vcc->sk->sk_write_space(vcc->sk); } ===== net/atm/signaling.c 1.19 vs edited ===== --- 1.19/net/atm/signaling.c Mon Jun 23 10:57:02 2003 +++ edited/net/atm/signaling.c Tue Jun 24 20:25:42 2003 @@ -105,7 +105,8 @@ vcc = *(struct atm_vcc **) &msg->vcc; switch (msg->type) { case as_okay: - vcc->reply = msg->reply; + vcc->sk->sk_err = -msg->reply; + clear_bit(ATM_VF_WAITING, &vcc->flags); if (!*vcc->local.sas_addr.prv && !*vcc->local.sas_addr.pub) { vcc->local.sas_family = AF_ATMSVC; @@ -125,8 +126,8 @@ case as_error: clear_bit(ATM_VF_REGIS,&vcc->flags); clear_bit(ATM_VF_READY,&vcc->flags); - vcc->reply = msg->reply; vcc->sk->sk_err = -msg->reply; + clear_bit(ATM_VF_WAITING, &vcc->flags); break; case as_indicate: vcc = *(struct atm_vcc **) &msg->listen_vcc; @@ -147,8 +148,8 @@ case as_close: set_bit(ATM_VF_RELEASED,&vcc->flags); clear_bit(ATM_VF_READY,&vcc->flags); - vcc->reply = msg->reply; vcc->sk->sk_err = -msg->reply; + clear_bit(ATM_VF_WAITING, &vcc->flags); break; case as_modify: modify_qos(vcc,msg); @@ -204,8 +205,8 @@ if (vcc->sk->sk_family == PF_ATMSVC && !test_bit(ATM_VF_META,&vcc->flags)) { set_bit(ATM_VF_RELEASED,&vcc->flags); - vcc->reply = -EUNATCH; vcc->sk->sk_err = EUNATCH; + clear_bit(ATM_VF_WAITING, &vcc->flags); vcc->sk->sk_state_change(vcc->sk); } } ===== net/atm/signaling.h 1.1 vs edited ===== --- 1.1/net/atm/signaling.h Tue Feb 5 12:40:00 2002 +++ edited/net/atm/signaling.h Tue Jun 24 14:24:41 2003 @@ -11,9 +11,6 @@ #include -#define WAITING 1 /* for reply: 0: no error, < 0: error, ... */ - - extern struct atm_vcc *sigd; /* needed in svc_release */ ===== net/atm/svc.c 1.21 vs edited ===== --- 1.21/net/atm/svc.c Mon Jun 23 10:45:54 2003 +++ edited/net/atm/svc.c Tue Jun 24 20:25:46 2003 @@ -137,10 +137,10 @@ goto out; } vcc->local = *addr; - vcc->reply = WAITING; + set_bit(ATM_VF_WAITING, &vcc->flags); prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_bind,NULL,NULL,&vcc->local); - while (vcc->reply == WAITING && sigd) { + while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) { schedule(); prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } @@ -150,9 +150,9 @@ error = -EUNATCH; goto out; } - if (!vcc->reply) + if (!sk->sk_err) set_bit(ATM_VF_BOUND,&vcc->flags); - error = vcc->reply; + error = -sk->sk_err; out: release_sock(sk); return error; @@ -183,13 +183,13 @@ error = -EISCONN; goto out; case SS_CONNECTING: - if (vcc->reply == WAITING) { + if (test_bit(ATM_VF_WAITING, &vcc->flags)) { error = -EALREADY; goto out; } sock->state = SS_UNCONNECTED; - if (vcc->reply) { - error = vcc->reply; + if (sk->sk_err) { + error = -sk->sk_err; goto out; } break; @@ -218,7 +218,7 @@ goto out; } vcc->remote = *addr; - vcc->reply = WAITING; + set_bit(ATM_VF_WAITING, &vcc->flags); prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); sigd_enq(vcc,as_connect,NULL,NULL,&vcc->remote); if (flags & O_NONBLOCK) { @@ -228,7 +228,7 @@ goto out; } error = 0; - while (vcc->reply == WAITING && sigd) { + while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) { schedule(); if (!signal_pending(current)) { prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); @@ -248,11 +248,11 @@ * Kernel <--close--- Demon */ sigd_enq(vcc,as_close,NULL,NULL,NULL); - while (vcc->reply == WAITING && sigd) { + while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) { prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); schedule(); } - if (!vcc->reply) + if (!sk->sk_err) while (!test_bit(ATM_VF_RELEASED,&vcc->flags) && sigd) { prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); @@ -272,8 +272,8 @@ error = -EUNATCH; goto out; } - if (vcc->reply) { - error = vcc->reply; + if (sk->sk_err) { + error = -sk->sk_err; goto out; } } @@ -311,10 +311,10 @@ error = -EINVAL; goto out; } - vcc->reply = WAITING; + set_bit(ATM_VF_WAITING, &vcc->flags); prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(vcc,as_listen,NULL,NULL,&vcc->local); - while (vcc->reply == WAITING && sigd) { + while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) { schedule(); prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } @@ -326,7 +326,7 @@ set_bit(ATM_VF_LISTEN,&vcc->flags); vcc->sk->sk_max_ack_backlog = backlog > 0 ? backlog : ATM_BACKLOG_DEFAULT; - error = vcc->reply; + error = -sk->sk_err; out: release_sock(sk); return error; @@ -359,7 +359,7 @@ sigd) { if (test_bit(ATM_VF_RELEASED,&old_vcc->flags)) break; if (test_bit(ATM_VF_CLOSE,&old_vcc->flags)) { - error = old_vcc->reply; + error = -sk->sk_err; break; } if (flags & O_NONBLOCK) { @@ -399,10 +399,10 @@ goto out; } /* wait should be short, so we ignore the non-blocking flag */ - new_vcc->reply = WAITING; + set_bit(ATM_VF_WAITING, &new_vcc->flags); prepare_to_wait(new_vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq(new_vcc,as_accept,old_vcc,NULL,NULL); - while (new_vcc->reply == WAITING && sigd) { + while (test_bit(ATM_VF_WAITING, &new_vcc->flags) && sigd) { release_sock(sk); schedule(); lock_sock(sk); @@ -413,9 +413,10 @@ error = -EUNATCH; goto out; } - if (!new_vcc->reply) break; - if (new_vcc->reply != -ERESTARTSYS) { - error = new_vcc->reply; + if (!new_vcc->sk->sk_err) + break; + if (new_vcc->sk->sk_err != ERESTARTSYS) { + error = -new_vcc->sk->sk_err; goto out; } } @@ -443,17 +444,17 @@ { DEFINE_WAIT(wait); - vcc->reply = WAITING; + set_bit(ATM_VF_WAITING, &vcc->flags); prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); sigd_enq2(vcc,as_modify,NULL,NULL,&vcc->local,qos,0); - while (vcc->reply == WAITING && !test_bit(ATM_VF_RELEASED,&vcc->flags) - && sigd) { + while (test_bit(ATM_VF_WAITING, &vcc->flags) && + !test_bit(ATM_VF_RELEASED, &vcc->flags) && sigd) { schedule(); prepare_to_wait(vcc->sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE); } finish_wait(vcc->sk->sk_sleep, &wait); if (!sigd) return -EUNATCH; - return vcc->reply; + return -vcc->sk->sk_err; } @@ -519,7 +520,7 @@ .socketpair = sock_no_socketpair, .accept = svc_accept, .getname = svc_getname, - .poll = atm_poll, + .poll = vcc_poll, .ioctl = vcc_ioctl, .listen = svc_listen, .shutdown = svc_shutdown, From chas@locutus.cmf.nrl.navy.mil Tue Jun 24 19:34:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 24 Jun 2003 19:34:09 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5P2Xx2x023341 for ; Tue, 24 Jun 2003 19:34:00 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5P2XtsG011090; Tue, 24 Jun 2003 22:33:55 -0400 (EDT) Message-Id: <200306250233.h5P2XtsG011090@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Reply-To: chas3@users.sourceforge.net Subject: Re: [rfc] sk_write_space() for atm In-reply-to: Your message of "Tue, 24 Jun 2003 17:40:42 PDT." <20030624.174042.27809957.davem@redhat.com> Date: Tue, 24 Jun 2003 22:31:49 -0400 From: chas williams X-Spam-Score: () hits=-0.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3483 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030624.174042.27809957.davem@redhat.com>,"David S. Miller" writes: >This looks ok to me, but I am not well versed in this >area. For example, if you give a spurious wakeup via >poll() what can happen? as far as i can tell this version of poll is fairly close to datagram_poll so therefore it must be correct :) i would say the previous version was racy since it needed to check vcc->reply repeatedly, which could change. now, sk_err is checked once and is error state of the socket. the only reason i dont use datagram_poll (besides atm implementing a different writeable condition) are atm sockets that are waiting (in connecting) would block when written. this is similar to a tcp socket w/o syn having been sent. From rusty@samba.org Wed Jun 25 01:09:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 01:09:34 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5P89E2x026982 for ; Wed, 25 Jun 2003 01:09:15 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 529AF2C0B9; Wed, 25 Jun 2003 07:26:02 +0000 (GMT) From: Rusty Russell To: davem@redhat.com, paulus@samba.org Cc: netdev@oss.sgi.com Subject: [PATCH, untested] Support for PPPOE on SMP Date: Wed, 25 Jun 2003 17:24:22 +1000 Message-Id: <20030625072602.529AF2C0B9@lists.samba.org> X-archive-position: 3484 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev Paul Mackerras says PPPoE relies on receiving packets in wire order, and he has bug reports caused by packet reordering. This is icky. Example code below: 1) Extract core queuing part of netif_rx into __netif_rx. 2) If the protocol is requires serialization, packets are put on a global "serial" queue instead of the local queue. (Which protocols currently hardcoded). 3) One cpu (boot cpu as it happens) drains this serial queue, so it stays ordered. 4) Fix bug in cpu_raise_softirq: need to wake softirqd if it's a different cpu. Another option would simply be to stamp a serialization number into the skb if the proto needs serialization, and drop packets if serial number goes backwards. But since this is actually happening to people, that would suck, too. I don't understand the unbalanced dev_put in net_rx_action(), BTW. Cheers, Rusty. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.72-bk2/kernel/softirq.c working-2.5.72-bk2-serial-protocols/kernel/softirq.c --- linux-2.5.72-bk2/kernel/softirq.c 2003-06-25 17:17:19.000000000 +1000 +++ working-2.5.72-bk2-serial-protocols/kernel/softirq.c 2003-06-25 14:55:15.000000000 +1000 @@ -130,7 +130,7 @@ inline void cpu_raise_softirq(unsigned i * Otherwise we wake up ksoftirqd to make sure we * schedule the softirq soon. */ - if (!in_interrupt()) + if (!in_interrupt() || cpu != smp_processor_id()) wakeup_softirqd(cpu); } diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.72-bk2/net/core/dev.c working-2.5.72-bk2-serial-protocols/net/core/dev.c --- linux-2.5.72-bk2/net/core/dev.c 2003-06-20 11:53:36.000000000 +1000 +++ working-2.5.72-bk2-serial-protocols/net/core/dev.c 2003-06-25 17:11:36.000000000 +1000 @@ -1323,42 +1323,11 @@ static void sample_queue(unsigned long d } #endif - -/** - * netif_rx - post buffer to the network code - * @skb: buffer to post - * - * This function receives a packet from a device driver and queues it for - * the upper (protocol) levels to process. It always succeeds. The buffer - * may be dropped during processing for congestion control or by the - * protocol layers. - * - * return values: - * NET_RX_SUCCESS (no congestion) - * NET_RX_CN_LOW (low congestion) - * NET_RX_CN_MOD (moderate congestion) - * NET_RX_CN_HIGH (high congestion) - * NET_RX_DROP (packet was dropped) - * - */ - -int netif_rx(struct sk_buff *skb) +/* Called with IRQs disabled. */ +static inline int __netif_rx(int this_cpu, + struct softnet_data *queue, + struct sk_buff *skb) { - int this_cpu; - struct softnet_data *queue; - unsigned long flags; - - if (!skb->stamp.tv_sec) - do_gettimeofday(&skb->stamp); - - /* - * The code is rearranged so that the path is the most - * short when CPU is congested, but is still operating. - */ - local_irq_save(flags); - this_cpu = smp_processor_id(); - queue = &softnet_data[this_cpu]; - netdev_rx_stat[this_cpu].total++; if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { if (queue->input_pkt_queue.qlen) { @@ -1371,7 +1340,6 @@ enqueue: #ifndef OFFLINE_SAMPLE get_sample_stats(this_cpu); #endif - local_irq_restore(flags); return queue->cng_level; } @@ -1397,12 +1365,116 @@ enqueue: drop: netdev_rx_stat[this_cpu].dropped++; - local_irq_restore(flags); kfree_skb(skb); return NET_RX_DROP; } +#ifdef CONFIG_SMP +/* Queue for serial protocols (eg PPPoe). All handled by one CPU. */ +static spinlock_t serial_queue_lock = SPIN_LOCK_UNLOCKED; +static struct softnet_data serial_queue; + +/* Which cpu does serial queue. */ +static int serial_cpu; + +static inline int net_proto_serialize(struct sk_buff *skb, + int this_cpu, + int *ret) +{ + if (likely(skb->protocol != ETH_P_PPP_DISC + && skb->protocol != ETH_P_PPP_SES)) + return 0; + + spin_lock(&serial_queue_lock); + *ret = __netif_rx(this_cpu, &serial_queue, skb); + spin_unlock(&serial_queue_lock); + if (this_cpu != serial_cpu) + cpu_raise_softirq(serial_cpu, NET_RX_SOFTIRQ); + return 1; +} + +static void init_queue(struct softnet_data *queue); + +static void init_serial(void) +{ + init_queue(&serial_queue); + serial_cpu = smp_processor_id(); +} + +static inline void drain_serial_queue(int this_cpu) +{ + if (this_cpu != serial_cpu) + return; + + spin_lock(&serial_queue_lock); + while (!list_empty(&serial_queue.poll_list)) { + struct net_device *dev; + + dev = list_entry(serial_queue.poll_list.next, + struct net_device, poll_list); + + list_del(&dev->poll_list); + list_add_tail(&dev->poll_list, &serial_queue.poll_list); + } + spin_unlock(&serial_queue_lock); +} +#else +static inline int net_proto_serialize(struct sk_buff *skb, + int this_cpu, + int *ret) +{ + return 0; +} + +static void init_serial(void) +{ +} + +static inline void drain_serial_queue(int this_cpu) +{ +} +#endif /* CONFIG_SMP */ + +/** + * netif_rx - post buffer to the network code + * @skb: buffer to post + * + * This function receives a packet from a device driver and queues it for + * the upper (protocol) levels to process. It always succeeds. The buffer + * may be dropped during processing for congestion control or by the + * protocol layers. + * + * return values: + * NET_RX_SUCCESS (no congestion) + * NET_RX_CN_LOW (low congestion) + * NET_RX_CN_MOD (moderate congestion) + * NET_RX_CN_HIGH (high congestion) + * NET_RX_DROP (packet was dropped) + * + */ + +int netif_rx(struct sk_buff *skb) +{ + int ret, this_cpu; + unsigned long flags; + + if (!skb->stamp.tv_sec) + do_gettimeofday(&skb->stamp); + + /* + * The code is rearranged so that the path is the most + * short when CPU is congested, but is still operating. + */ + local_irq_save(flags); + this_cpu = smp_processor_id(); + + if (!net_proto_serialize(skb, this_cpu, &ret)) + ret = __netif_rx(this_cpu, &softnet_data[this_cpu], skb); + local_irq_restore(flags); + return ret; +} + /* Deliver skb to an old protocol, which is not threaded well or which do not understand shared skbs. */ @@ -1705,6 +1777,8 @@ static void net_rx_action(struct softirq local_irq_disable(); } } + + drain_serial_queue(this_cpu); out: local_irq_enable(); preempt_enable(); @@ -2944,6 +3018,20 @@ int unregister_netdevice(struct net_devi } +static void init_queue(struct softnet_data *queue) +{ + skb_queue_head_init(&queue->input_pkt_queue); + queue->throttle = 0; + queue->cng_level = 0; + queue->avg_blog = 10; /* arbitrary non-zero */ + queue->completion_queue = NULL; + INIT_LIST_HEAD(&queue->poll_list); + set_bit(__LINK_STATE_START, &queue->backlog_dev.state); + queue->backlog_dev.weight = weight_p; + queue->backlog_dev.poll = process_backlog; + atomic_set(&queue->backlog_dev.refcnt, 1); +} + /* * Initialize the DEV module. At boot time this walks the device list and * unhooks any devices that fail to initialise (normally hardware not @@ -2976,21 +3064,9 @@ static int __init net_dev_init(void) * Initialise the packet receive queues. */ - for (i = 0; i < NR_CPUS; i++) { - struct softnet_data *queue; - - queue = &softnet_data[i]; - skb_queue_head_init(&queue->input_pkt_queue); - queue->throttle = 0; - queue->cng_level = 0; - queue->avg_blog = 10; /* arbitrary non-zero */ - queue->completion_queue = NULL; - INIT_LIST_HEAD(&queue->poll_list); - set_bit(__LINK_STATE_START, &queue->backlog_dev.state); - queue->backlog_dev.weight = weight_p; - queue->backlog_dev.poll = process_backlog; - atomic_set(&queue->backlog_dev.refcnt, 1); - } + for (i = 0; i < NR_CPUS; i++) + init_queue(&softnet_data[i]); + init_serial(); #ifdef CONFIG_NET_PROFILE net_profile_init(); -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From hadi@shell.cyberus.ca Wed Jun 25 04:20:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 04:20:38 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PBKU2x001474 for ; Wed, 25 Jun 2003 04:20:31 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19V8Jf-000Lso-4v; Wed, 25 Jun 2003 07:19:55 -0400 Date: Wed, 25 Jun 2003 07:19:55 -0400 (EDT) From: Jamal Hadi To: Rusty Russell cc: davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <20030625072602.529AF2C0B9@lists.samba.org> Message-ID: <20030625071439.O84062@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3485 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Wed, 25 Jun 2003, Rusty Russell wrote: > Paul Mackerras says PPPoE relies on receiving packets in wire order, > and he has bug reports caused by packet reordering. > I dont know of any ordering dependencies with pppoe. Is this a bug in the ppp code? > This is icky. Yes it is ;-> The effects of your patch could be achieved in two ways: a) tie the pppoe related ethernet card to a processor. b) use a NAPI caopable ethernet card. Now, if there is a real need to have a serialization queue (i dont see one) you really dont need to tie to a processor. Just have a single queue shared by all processors; every one grabs a lock to it. cheers, jamal From mostrows@watson.ibm.com Wed Jun 25 06:21:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 06:21:25 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PDLF2x003365 for ; Wed, 25 Jun 2003 06:21:16 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PDKNq90590; Wed, 25 Jun 2003 09:20:23 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PDL28151394; Wed, 25 Jun 2003 09:21:02 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id JAA40378; Wed, 25 Jun 2003 09:21:02 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Rusty Russell Cc: "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, "David F. Skoll" , James Carlson In-Reply-To: <20030625072602.529AF2C0B9@lists.samba.org> References: <20030625072602.529AF2C0B9@lists.samba.org> Content-Type: text/plain Message-Id: <1056547262.1945.1436.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 09:21:02 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3486 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev First some background for those new to this discussion (I was going post the original discussion that strted this to this list, but the summary here should get everyone up to speed). A user has observed a race condition where the last packet of PPPoE discovery arrives just before the first payload packet. The discovery packet carries the session id and pppd needs to take this session id and create a PPPoE socket which will then pick up all packets matching the given session id. The race is between the arrival of the first payload packet and pppd's creation of the socket that is to receive PPPoE payload. If the packet wins the race, the payload packet is lost. This problem was noticed only because the ISP in this case configured their systems to use a longer, non-standard (but legal) retransmit timeout thus causing noticeable delays in PPP negotiation. About the patch: Do we have any guarantees that no drivers will break this? From the few drivers I've looked at, this will not be a problem since they lock to ensure that we can't have races in submitting packets to netif_rx. My concern here would be that it appears that there is no explicit requirement that this be so; we may be safe in this regard only by accident. (I can think of a device and driver design where this need not be so.) > + > +static inline int net_proto_serialize(struct sk_buff *skb, > + int this_cpu, > + int *ret) > +{ > + if (likely(skb->protocol != ETH_P_PPP_DISC > + && skb->protocol != ETH_P_PPP_SES)) > + return 0; I believe there are concerns with other protocols as well (SNA, spanning tree - I'm just the messenger on this). If this is so, then I have two concerns: 1. Some protocols may have no in-kernel implementation, we'd have to ensure that raw sockets get packets in the right order (perhaps even regardless of what packet type we hreceive). 2. There are two issues with PPPoE: there's the creation race described above which requires correct ordering of packets of two different packet types (discovery is 0x8863, payload is 0x8864), as well payload packets must be ordered to handle Paul's concerns regarding compression. The patch as is adequate to 2), but I'm concerned it would get ugly if we need to do 1) (and in the process of doing 1) we may break 2) if we can't synchronize between two different packet types). I think we can fix the race condition I've described up top without such core infrastructure changes (delay dropping unmatched payload packets, give pppd a chance to make the socket). This however doesn't solve the other ordering problems. -- Michal Ostrowski From mostrows@watson.ibm.com Wed Jun 25 06:42:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 06:42:47 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PDga2x003821 for ; Wed, 25 Jun 2003 06:42:37 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PDfjq201088; Wed, 25 Jun 2003 09:41:45 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PDgO895638; Wed, 25 Jun 2003 09:42:24 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id JAA58936; Wed, 25 Jun 2003 09:42:24 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Rusty Russell Cc: "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, "David F. Skoll" , James Carlson In-Reply-To: <1056547262.1945.1436.camel@brick.watson.ibm.com> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> Content-Type: text/plain Message-Id: <1056548544.1944.1488.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 09:42:24 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3487 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev Perhaps instead of using a special queue that keeps packets ordered, we add a tag to each skb as it comes off the card and let higher level protocols use this to re-order things themselves? (And add some option for AF_PACKET sockets to optionally enforce this ordering in presenting packets to apps, or not.) This may require modifying all drivers, but it does provide for an explicit mechanism that can be made mandatory for drivers, avoids special casing, avoids dumping work onto a single CPU and leaves it up to the higher-level code to figure out ordering, if it wants to. -- Michal Ostrowski From hadi@shell.cyberus.ca Wed Jun 25 08:45:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 08:45:40 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PFjY2x005607 for ; Wed, 25 Jun 2003 08:45:35 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19VCSJ-000LzZ-LO; Wed, 25 Jun 2003 11:45:07 -0400 Date: Wed, 25 Jun 2003 11:45:07 -0400 (EDT) From: Jamal Hadi To: Michal Ostrowski cc: Rusty Russell , "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, "David F. Skoll" , James Carlson Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <1056548544.1944.1488.camel@brick.watson.ibm.com> Message-ID: <20030625114243.F84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <1056548544.1944.1488.camel@brick.watson.ibm.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3488 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev Have you tested the case where the ethernet card is tied to only CPU in SMP? That guarantees ordering. Ordering per protocol should really be that protocols problem to solve. If you cant solve it you have a bug. cheers, jamal On Wed, 25 Jun 2003, Michal Ostrowski wrote: > > Perhaps instead of using a special queue that keeps packets ordered, we > add a tag to each skb as it comes off the card and let higher level > protocols use this to re-order things themselves? (And add some option > for AF_PACKET sockets to optionally enforce this ordering in presenting > packets to apps, or not.) > > This may require modifying all drivers, but it does provide for an > explicit mechanism that can be made mandatory for drivers, avoids > special casing, avoids dumping work onto a single CPU and leaves it up > to the higher-level code to figure out ordering, if it wants to. > > -- > Michal Ostrowski > > > > From shemminger@osdl.org Wed Jun 25 09:15:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 09:16:02 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PGFp2x006302 for ; Wed, 25 Jun 2003 09:15:51 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PGFVq25963; Wed, 25 Jun 2003 09:15:31 -0700 Date: Wed, 25 Jun 2003 09:15:31 -0700 From: Stephen Hemminger To: Michal Ostrowski Cc: rusty@rustcorp.com.au, davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP Message-Id: <20030625091531.5ebed618.shemminger@osdl.org> In-Reply-To: <1056547262.1945.1436.camel@brick.watson.ibm.com> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3489 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On 25 Jun 2003 09:21:02 -0400 Michal Ostrowski wrote: > First some background for those new to this discussion (I was going post > the original discussion that strted this to this list, but the summary > here should get everyone up to speed). > > A user has observed a race condition where the last packet of PPPoE > discovery arrives just before the first payload packet. The discovery > packet carries the session id and pppd needs to take this session id and > create a PPPoE socket which will then pick up all packets matching the > given session id. The race is between the arrival of the first payload > packet and pppd's creation of the socket that is to receive PPPoE > payload. If the packet wins the race, the payload packet is lost. This > problem was noticed only because the ISP in this case configured their > systems to use a longer, non-standard (but legal) retransmit timeout > thus causing noticeable delays in PPP negotiation. > Also, you only need the ordering dependency till the session is setup, not after it is established. Imagine a large ISP with many PPPoE sessions; it makes no sense to serialize traffic just for this session establishment case. In the long run, the right answer probably is to push the session management out of the daemon and into the kernel. Today the PPPoE code in the kernel is only half-brained, it needs pppd to survive. From hadi@shell.cyberus.ca Wed Jun 25 09:23:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 09:23:11 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PGN62x006720 for ; Wed, 25 Jun 2003 09:23:07 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19VD2Z-000M0Y-GK; Wed, 25 Jun 2003 12:22:35 -0400 Date: Wed, 25 Jun 2003 12:22:35 -0400 (EDT) From: Jamal Hadi To: Stephen Hemminger cc: Michal Ostrowski , rusty@rustcorp.com.au, davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <20030625091531.5ebed618.shemminger@osdl.org> Message-ID: <20030625122128.V84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3490 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Wed, 25 Jun 2003, Stephen Hemminger wrote: > In the long run, the right answer probably is to push the session management > out of the daemon and into the kernel. Today the PPPoE code in the kernel > is only half-brained, it needs pppd to survive. > I would think pppd is the half-brained portion ;-> Placing control protocols in the kernel is plain wrong. cheers, jamal From shemminger@osdl.org Wed Jun 25 09:32:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 09:32:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PGWY2x007103 for ; Wed, 25 Jun 2003 09:32:34 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PGWHq30136; Wed, 25 Jun 2003 09:32:17 -0700 Date: Wed, 25 Jun 2003 09:32:16 -0700 From: Stephen Hemminger To: David Stevens , "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.72] Igmp w/o linearize. Message-Id: <20030625093216.57d0f586.shemminger@osdl.org> In-Reply-To: References: Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3491 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Here is the updated igmp patch, thanks to dlstevens for testing this. # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1386 -> 1.1387 # net/ipv4/igmp.c 1.26 -> 1.27 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/25 shemminger@osdl.org 1.1387 # IGMP no linearize # -------------------------------------------- # diff -Nru a/net/ipv4/igmp.c b/net/ipv4/igmp.c --- a/net/ipv4/igmp.c Wed Jun 25 09:27:47 2003 +++ b/net/ipv4/igmp.c Wed Jun 25 09:27:47 2003 @@ -757,9 +757,10 @@ read_unlock(&in_dev->lock); } -static void igmp_heard_query(struct in_device *in_dev, struct igmphdr *ih, +static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb, int len) { + struct igmphdr *ih = skb->h.igmph; struct igmpv3_query *ih3 = (struct igmpv3_query *)ih; struct ip_mc_list *im; u32 group = ih->group; @@ -790,6 +791,17 @@ } else if (len < 12) { return; /* ignore bogus packet; freed by caller */ } else { /* v3 */ + if (!pskb_may_pull(skb, sizeof(struct igmpv3_query))) + return; + + ih3 = (struct igmpv3_query *) skb->h.raw; + if (ih3->nsrcs) { + if (!pskb_may_pull(skb, sizeof(struct igmpv3_query) + + ntohs(ih3->nsrcs)*sizeof(__u32))) + return; + ih3 = (struct igmpv3_query *) skb->h.raw; + } + max_delay = IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE); if (!max_delay) max_delay = 1; /* can't mod w/ 0 */ @@ -838,7 +850,7 @@ int igmp_rcv(struct sk_buff *skb) { /* This basically follows the spec line by line -- see RFC1112 */ - struct igmphdr *ih = skb->h.igmph; + struct igmphdr *ih; struct in_device *in_dev = in_dev_get(skb->dev); int len = skb->len; @@ -847,23 +859,17 @@ return 0; } - if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) { - kfree_skb(skb); - return -ENOMEM; - } - ih = skb->h.igmph; - } - - if (len < sizeof(struct igmphdr) || ip_compute_csum((void *)ih, len)) { + if (!pskb_may_pull(skb, sizeof(struct igmphdr)) || + (u16)csum_fold(skb_checksum(skb, 0, len, 0))) { in_dev_put(in_dev); kfree_skb(skb); return 0; } + ih = skb->h.igmph; switch (ih->type) { case IGMP_HOST_MEMBERSHIP_QUERY: - igmp_heard_query(in_dev, ih, len); + igmp_heard_query(in_dev, skb, len); break; case IGMP_HOST_MEMBERSHIP_REPORT: case IGMPV2_HOST_MEMBERSHIP_REPORT: From shemminger@osdl.org Wed Jun 25 09:40:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 09:40:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PGeX2x007452 for ; Wed, 25 Jun 2003 09:40:34 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PGd2q31283; Wed, 25 Jun 2003 09:39:02 -0700 Date: Wed, 25 Jun 2003 09:39:02 -0700 From: Stephen Hemminger To: Jamal Hadi Cc: mostrows@watson.ibm.com, rusty@rustcorp.com.au, davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP Message-Id: <20030625093902.7431efc3.shemminger@osdl.org> In-Reply-To: <20030625122128.V84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3492 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Wed, 25 Jun 2003 12:22:35 -0400 (EDT) Jamal Hadi wrote: > > > On Wed, 25 Jun 2003, Stephen Hemminger wrote: > > > In the long run, the right answer probably is to push the session management > > out of the daemon and into the kernel. Today the PPPoE code in the kernel > > is only half-brained, it needs pppd to survive. > > > > I would think pppd is the half-brained portion ;-> > > Placing control protocols in the kernel is plain wrong. What about arp, TCP, IP, routing protocols. The problem is that state management needs to be done in one place. From krkumar@us.ibm.com Wed Jun 25 10:03:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 10:03:31 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PH3L2x007999 for ; Wed, 25 Jun 2003 10:03:21 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5PH3Ei8020640; Wed, 25 Jun 2003 13:03:14 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5PH37Lx165080; Wed, 25 Jun 2003 13:03:12 -0400 Message-ID: <3EF9D5C2.5080101@us.ibm.com> Date: Wed, 25 Jun 2003 10:02:58 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: yoshfuji@linux-ipv6.org CC: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) References: <3EF37458.3070103@us.ibm.com> <20030621.233634.67057417.yoshfuji@linux-ipv6.org> In-Reply-To: <20030621.233634.67057417.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3493 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Yoshfuji, > Rename inet6_dump_fib() to __inet6_dump_fib() and introduce OK, done. > Hmm, what I expected is to get information via RTA_NEWLINK message. I have made changes to return per interface flags, however I am not very familiar with netlink and it's different interfaces. I wanted to clarify whether the following code is what you are trying to get done. Otherwise please let me know what changes need to be done. What I want to happen is : 1. Return entire prefix list on request from user. 2. Return flags for a particular interface on request from user. What I have not yet done is to broadcast events when a new prefix arrives or an existing prefix gets expired, and to broadcast flags when an RA is received. Currently there is no need for these from DHCP, but it could be added later on. >> #devices #iteration for each dev plist on IDEV plist in RTTABLE % >> 200 100 3.95 secs 40.14 secs 916% > Well, what should we do... The original code, though faster, you have more code as dave said in his initial mail. I think this operation is not done too often to be concerned about performance. Besides it is taking 40 secs to iterate 20,000 times over a 4K routing table, effectively about 2ms for getting the prefix list for one device. And for most systems, this is not an issue since this code will not run at all. Thanks, - KK diff -ruN linux-2.5.70.org/include/linux/ipv6_route.h linux-2.5.70.new/include/linux/ipv6_route.h --- linux-2.5.70.org/include/linux/ipv6_route.h 2003-05-26 18:00:25.000000000 -0700 +++ linux-2.5.70.new/include/linux/ipv6_route.h 2003-06-24 04:36:39.000000000 -0700 @@ -44,4 +44,19 @@ #define RTMSG_NEWROUTE 0x21 #define RTMSG_DELROUTE 0x22 +#ifdef CONFIG_IPV6_PREFIXLIST + +/* + * Return entire prefix list in array of following structures. Provides the + * prefix and prefix length for all devices. + */ + +struct in6_prefix_msg +{ + int ifindex; + int prefix_len; + struct in6_addr prefix; +}; +#endif + #endif diff -ruN linux-2.5.70.org/include/linux/rtnetlink.h linux-2.5.70.new/include/linux/rtnetlink.h --- linux-2.5.70.org/include/linux/rtnetlink.h 2003-05-26 18:00:46.000000000 -0700 +++ linux-2.5.70.new/include/linux/rtnetlink.h 2003-06-24 04:39:59.000000000 -0700 @@ -47,7 +47,14 @@ #define RTM_DELTFILTER (RTM_BASE+29) #define RTM_GETTFILTER (RTM_BASE+30) -#define RTM_MAX (RTM_BASE+31) +#define RTM_GETLNKFLAGS (RTM_BASE+34) + +#ifndef CONFIG_IPV6_PREFIXLIST +#define RTM_MAX (RTM_GETLNKFLAGS+1) +#else +#define RTM_GETPLIST (RTM_BASE+38) +#define RTM_MAX (RTM_GETPLIST+1) +#endif /* Generic structure for encapsulation optional route information. @@ -61,6 +68,14 @@ unsigned short rta_type; }; +/* Structure to return per interface device flags */ + +struct ifp_if6info +{ + int ifindex; + int flags; +}; + /* Macros to handle rtattributes */ #define RTA_ALIGNTO 4 @@ -201,9 +216,11 @@ RTA_FLOW, RTA_CACHEINFO, RTA_SESSION, + RTA_LINKFLAGS, + RTA_RA6INFO, /* No support yet, send event on new prefix event */ }; -#define RTA_MAX RTA_SESSION +#define RTA_MAX RTA_RA6INFO #define RTM_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct rtmsg)))) #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg)) diff -ruN linux-2.5.70.org/include/net/if_inet6.h linux-2.5.70.new/include/net/if_inet6.h --- linux-2.5.70.org/include/net/if_inet6.h 2003-05-26 18:00:59.000000000 -0700 +++ linux-2.5.70.new/include/net/if_inet6.h 2003-06-19 05:42:08.000000000 -0700 @@ -17,6 +17,8 @@ #include +#define IF_RA_OTHERCONF 0x80 +#define IF_RA_MANAGED 0x40 #define IF_RA_RCVD 0x20 #define IF_RS_SENT 0x10 diff -ruN linux-2.5.70.org/include/net/ip6_route.h linux-2.5.70.new/include/net/ip6_route.h --- linux-2.5.70.org/include/net/ip6_route.h 2003-05-26 18:00:26.000000000 -0700 +++ linux-2.5.70.new/include/net/ip6_route.h 2003-06-23 02:59:06.000000000 -0700 @@ -87,6 +87,7 @@ struct nlmsghdr; struct netlink_callback; extern int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb); +extern int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb); extern int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_getroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); diff -ruN linux-2.5.70.org/net/ipv6/Kconfig linux-2.5.70.new/net/ipv6/Kconfig --- linux-2.5.70.org/net/ipv6/Kconfig 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/Kconfig 2003-06-19 05:37:11.000000000 -0700 @@ -42,4 +42,13 @@ If unsure, say Y. +config IPV6_PREFIXLIST + bool "IPv6: Prefix List" + depends on IPV6 + ---help--- + For applications needing to retrieve the list of prefixes supported + on the system. Defined in RFC2461. + + If unsure, say Y. + source "net/ipv6/netfilter/Kconfig" diff -ruN linux-2.5.70.org/net/ipv6/addrconf.c linux-2.5.70.new/net/ipv6/addrconf.c --- linux-2.5.70.org/net/ipv6/addrconf.c 2003-05-26 18:00:58.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/addrconf.c 2003-06-24 04:40:05.000000000 -0700 @@ -124,7 +124,7 @@ static int addrconf_ifdown(struct net_device *dev, int how); -static void addrconf_dad_start(struct inet6_ifaddr *ifp); +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags); static void addrconf_dad_timer(unsigned long data); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_rs_timer(unsigned long data); @@ -738,7 +738,7 @@ ift->prefered_lft = tmp_prefered_lft; ift->tstamp = ifp->tstamp; spin_unlock_bh(&ift->lock); - addrconf_dad_start(ift); + addrconf_dad_start(ift, 0); in6_ifa_put(ift); in6_dev_put(idev); out: @@ -1234,7 +1234,7 @@ rtmsg.rtmsg_dst_len = 8; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; rtmsg.rtmsg_ifindex = dev->ifindex; - rtmsg.rtmsg_flags = RTF_UP|RTF_ADDRCONF; + rtmsg.rtmsg_flags = RTF_UP; rtmsg.rtmsg_type = RTMSG_NEWROUTE; ip6_route_add(&rtmsg, NULL, NULL); } @@ -1261,7 +1261,7 @@ struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xFE800000), 0, 0, 0); - addrconf_prefix_route(&addr, 64, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&addr, 64, dev, 0, 0); } static struct inet6_dev *addrconf_add_dev(struct net_device *dev) @@ -1401,7 +1401,7 @@ } create = 1; - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, RTF_ADDRCONF); } if (ifp && valid_lft == 0) { @@ -1552,7 +1552,7 @@ ifp = ipv6_add_addr(idev, pfx, plen, scope, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); return 0; } @@ -1727,7 +1727,7 @@ ifp = ipv6_add_addr(idev, addr, 64, IFA_LINK, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); } } @@ -1965,8 +1965,7 @@ memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); rtmsg.rtmsg_type = RTMSG_NEWROUTE; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; - rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | - RTF_DEFAULT | RTF_UP); + rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_DEFAULT | RTF_UP); rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; @@ -1980,7 +1979,7 @@ /* * Duplicate Address Detection */ -static void addrconf_dad_start(struct inet6_ifaddr *ifp) +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags) { struct net_device *dev; unsigned long rand_num; @@ -1990,7 +1989,7 @@ addrconf_join_solict(dev, &ifp->addr); if (ifp->prefix_len != 128 && (ifp->flags&IFA_F_PERMANENT)) - addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, flags); net_srandom(ifp->addr.s6_addr32[3]); rand_num = net_random() % (ifp->idev->cnf.rtr_solicit_delay ? : 1); @@ -2389,6 +2388,43 @@ netlink_broadcast(rtnl, skb, 0, RTMGRP_IPV6_IFADDR, GFP_ATOMIC); } +int inet6_dump_linkflags(struct sk_buff *skb, struct netlink_callback *cb) +{ + int ifindex, flags = 0; + struct net_device *dev; + struct inet6_dev *idev; + struct nlmsghdr *nlh; + struct ifp_if6info *ifp = NLMSG_DATA(cb->nlh); + unsigned char *org_tail = skb->tail; + + /* ifindex = cb->args[0]; ? */ + ifindex = ifp->ifindex; + + if ((dev = dev_get_by_index(ifindex)) == NULL) + goto out; + if ((idev = in6_dev_get(dev)) != NULL) { + flags = idev->if_flags; + in6_dev_put(idev); + } + dev_put(dev); + + nlh = NLMSG_PUT(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + RTA_LINKFLAGS, sizeof(*ifp)); + ifp = NLMSG_DATA(nlh); + ifp->flags = flags; + ifp->ifindex = ifindex; /* duplicate information for user to verify */ + + nlh->nlmsg_len = skb->tail - org_tail; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "inet6_dump_linkflags:skb size not enough\n"); + skb_trim(skb, org_tail - skb->data); + +out: + return -1; +} + static struct rtnetlink_link inet6_rtnetlink_table[RTM_MAX - RTM_BASE + 1] = { [RTM_NEWADDR - RTM_BASE] = { .doit = inet6_rtm_newaddr, }, [RTM_DELADDR - RTM_BASE] = { .doit = inet6_rtm_deladdr, }, @@ -2397,6 +2433,10 @@ [RTM_DELROUTE - RTM_BASE] = { .doit = inet6_rtm_delroute, }, [RTM_GETROUTE - RTM_BASE] = { .doit = inet6_rtm_getroute, .dumpit = inet6_dump_fib, }, + [RTM_GETLNKFLAGS - RTM_BASE] = { .dumpit = inet6_dump_linkflags, }, +#ifdef CONFIG_IPV6_PREFIXLIST + [RTM_GETPLIST - RTM_BASE] = { .dumpit = inet6_dump_prefix, }, +#endif }; static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) @@ -2730,7 +2770,7 @@ #ifdef CONFIG_PROC_FS proc_net_create("if_inet6", 0, iface_proc_info); #endif - + addrconf_verify(0); rtnetlink_links[PF_INET6] = inet6_rtnetlink_table; #ifdef CONFIG_SYSCTL diff -ruN linux-2.5.70.org/net/ipv6/ndisc.c linux-2.5.70.new/net/ipv6/ndisc.c --- linux-2.5.70.org/net/ipv6/ndisc.c 2003-05-26 18:00:41.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/ndisc.c 2003-06-24 04:09:30.000000000 -0700 @@ -1049,6 +1049,16 @@ */ in6_dev->if_flags |= IF_RA_RCVD; } + /* + * Remember the managed/otherconf flags from most recently + * receieved RA message (RFC 2462) -- yoshfuji + */ + in6_dev->if_flags = (in6_dev->if_flags & ~(IF_RA_MANAGED| + IF_RA_OTHERCONF)) | + (ra_msg->icmph.icmp6_addrconf_managed ? + IF_RA_MANAGED : 0) | + (ra_msg->icmph.icmp6_addrconf_other ? + IF_RA_OTHERCONF : 0); lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); diff -ruN linux-2.5.70.org/net/ipv6/route.c linux-2.5.70.new/net/ipv6/route.c --- linux-2.5.70.org/net/ipv6/route.c 2003-05-26 18:00:45.000000000 -0700 +++ linux-2.5.70.new/net/ipv6/route.c 2003-06-23 02:46:42.000000000 -0700 @@ -1520,6 +1520,68 @@ return 0; } +#ifdef CONFIG_IPV6_PREFIXLIST +static int rt6_fill_prefix(struct sk_buff *skb, struct rt6_info *rt, + int type, u32 pid, u32 seq) +{ + struct in6_prefix_msg *pmsg; + struct nlmsghdr *nlh; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, type, sizeof(*pmsg)); + pmsg = NLMSG_DATA(nlh); + pmsg->ifindex = rt->rt6i_dev->ifindex; + pmsg->prefix_len = rt->rt6i_dst.plen; + ipv6_addr_copy(&pmsg->prefix, &rt->rt6i_dst.addr); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "rt6_fill_prefix:skb size not enough\n"); + skb_trim(skb, b - skb->data); + return -1; +} + +static int rt6_dump_route_prefix(struct rt6_info *rt, void *p_arg) +{ + int addr_type; + struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; + + /* + * Definition of a prefix : + * - Should be autoconfigured + * - No nexthop + * - Not a linklocal, loopback or multicast type. + */ + if (rt->rt6i_nexthop || (rt->rt6i_flags & RTF_ADDRCONF) == 0) + return 0; + addr_type = ipv6_addr_type(&rt->rt6i_dst.addr); + if ((addr_type & (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK | + IPV6_ADDR_MULTICAST)) != 0 || + addr_type == IPV6_ADDR_ANY) + return 0; + return rt6_fill_prefix(arg->skb, rt, RTM_GETPLIST, + NETLINK_CB(arg->cb->skb).pid, arg->cb->nlh->nlmsg_seq); +} + +static int fib6_dump_prefix(struct fib6_walker_t *w) +{ + int res; + struct rt6_info *rt; + + for (rt = w->leaf; rt; rt = rt->u.next) { + res = rt6_dump_route_prefix(rt, w->args); + if (res < 0) { + /* Frame is full, suspend walking */ + w->leaf = rt; + return 1; + } + } + w->leaf = NULL; + return 0; +} +#endif + static void fib6_dump_end(struct netlink_callback *cb) { struct fib6_walker_t *w = (void*)cb->args[0]; @@ -1541,12 +1603,17 @@ return cb->done(cb); } -int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +static int __inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb, + int prefix) { struct rt6_rtnl_dump_arg arg; struct fib6_walker_t *w; int res; +#ifndef CONFIG_IPV6_PREFIXLIST + BUG_TRAP(prefix == 0); +#endif + arg.skb = skb; arg.cb = cb; @@ -1568,7 +1635,12 @@ RT6_TRACE("dump<%p", w); memset(w, 0, sizeof(*w)); w->root = &ip6_routing_table; - w->func = fib6_dump_node; + if (prefix == 0) + w->func = fib6_dump_node; +#ifdef CONFIG_IPV6_PREFIXLIST + else + w->func = fib6_dump_prefix; +#endif w->args = &arg; cb->args[0] = (long)w; read_lock_bh(&rt6_lock); @@ -1595,6 +1667,16 @@ return res; } +int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 0); +} + +int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 1); +} + int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; From hadi@shell.cyberus.ca Wed Jun 25 10:08:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 10:08:26 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PH8L2x008376 for ; Wed, 25 Jun 2003 10:08:21 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19VDkJ-000M1m-0n; Wed, 25 Jun 2003 13:07:47 -0400 Date: Wed, 25 Jun 2003 13:07:46 -0400 (EDT) From: Jamal Hadi To: Stephen Hemminger cc: mostrows@watson.ibm.com, rusty@rustcorp.com.au, davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <20030625093902.7431efc3.shemminger@osdl.org> Message-ID: <20030625125518.N84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> <20030625093902.7431efc3.shemminger@osdl.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3494 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Wed, 25 Jun 2003, Stephen Hemminger wrote: > On Wed, 25 Jun 2003 12:22:35 -0400 (EDT) > Jamal Hadi wrote: > > > Placing control protocols in the kernel is plain wrong. > > What about arp, TCP, IP, routing protocols. ARP should really be ripped off the kernel. I mentioned to you once the same in regards to STP and iirc you agreed. I wouldnt call TCP or IP control protocols. >The problem is that state management needs to be done in one place. a protocol or implementation which wishes to do state maintanance properly oughta be able to do the synchronization on its own. Separation between policy and mechanism has been the strength of unix. A clean separation between control and a data path is very important. Control protocols tend to be very rich environments which are constantly changing. Take STP, there are so many features that could be added to STP that are much harder to add because it is in the kernel. Maybe what needs to be looked at i sthe design of pppoe or ppp. The patch from Rusty is just bandaid. cheers, jamal From mostrows@watson.ibm.com Wed Jun 25 10:28:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 10:28:54 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PHSl2x008747 for ; Wed, 25 Jun 2003 10:28:48 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PHRKq160736; Wed, 25 Jun 2003 13:27:20 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PHS08154640; Wed, 25 Jun 2003 13:28:00 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id NAA81744; Wed, 25 Jun 2003 13:27:59 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Jamal Hadi Cc: Rusty Russell , "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, "David F. Skoll" , James Carlson In-Reply-To: <20030625114243.F84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <1056548544.1944.1488.camel@brick.watson.ibm.com> <20030625114243.F84526@shell.cyberus.ca> Content-Type: text/plain Message-Id: <1056562079.1944.1961.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 13:27:59 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3495 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev Paul: you made an assertion to me in an eariler e-mail that you were concerned about packet ordering for the sake of vj and compression. IIRC the PPPoE spec prohibits compression, probably for this very reason. Is there any other reason we'd be worried about re-ordering in the PPP data stream? On Wed, 2003-06-25 at 11:45, Jamal Hadi wrote: > > Have you tested the case where the ethernet card is tied to only > CPU in SMP? That guarantees ordering. Agreed, this does guarantee ordering. But there are cases where I don't have this guarantee and those are the issues Rusty's patch attempts to solve. > Ordering per protocol should really be that protocols problem to > solve. If you cant solve it you have a bug. > The session initiation race I described earlier is brought about independently by several problems: 1. PPPoE negotiation is done in user space and thus there is a window between completion of this negotiation and the creation of the PPPoE socket during which a payload packet may arrive and be dropped (SMP and UP). 2. Re-ordering by softIRQ handling on SMP may cause same problem. There's also the question as to whether or not there are other protocols (perhaps not implemented in the kernel, but relying on AF_PACKET) may be affected by this (#2). We can fix #1 without any patches to core networking code. If the SMP softIRQ re-ordering issues is handled, then we may have some better options for fixing #1. But note that even if #1 is fixed and #2 isn't, then we're not any better off. -- Michal Ostrowski From shemminger@osdl.org Wed Jun 25 10:40:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 10:40:54 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PHem2x009137 for ; Wed, 25 Jun 2003 10:40:49 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PHe1q15137; Wed, 25 Jun 2003 10:40:01 -0700 Date: Wed, 25 Jun 2003 10:40:01 -0700 From: Stephen Hemminger To: Jamal Hadi Cc: mostrows@watson.ibm.com, rusty@rustcorp.com.au, davem@redhat.com, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP Message-Id: <20030625104001.476ee314.shemminger@osdl.org> In-Reply-To: <20030625125518.N84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> <20030625093902.7431efc3.shemminger@osdl.org> <20030625125518.N84526@shell.cyberus.ca> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3496 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Wed, 25 Jun 2003 13:07:46 -0400 (EDT) Jamal Hadi wrote: > > > On Wed, 25 Jun 2003, Stephen Hemminger wrote: > > > On Wed, 25 Jun 2003 12:22:35 -0400 (EDT) > > Jamal Hadi wrote: > > > > > Placing control protocols in the kernel is plain wrong. > > > > What about arp, TCP, IP, routing protocols. > > ARP should really be ripped off the kernel. I mentioned to you once > the same in regards to STP and iirc you agreed. > I wouldnt call TCP or IP control protocols. > > >The problem is that state management needs to be done in one place. > > a protocol or implementation which wishes to do state maintanance > properly oughta be able to do the synchronization on its own. > Separation between policy and mechanism has been the strength of unix. > A clean separation between control and a data path is very important. > Control protocols tend to be very rich environments which are > constantly changing. Take STP, there are so many features that could be > added to STP that are much harder to add because it is in the kernel. Rather than take an architectural approach about what is right and wrong, I take the practical point of view. If the protocol is small, and the policy can be done in the kernel fine; if the implementation gets messy and the right information is not there, then it belongs in user space. For PPPoE, the session management needs to be in kernel space, with the policy in user space. What if the kernel, initialized the session when it saw the discovery and notified the pppd, session would not be established until ppd accepted the connection. This would be more like a socket protocol without auto-accept like TCP. Any data for the session would then stay queued until it was accepted or rejected. Having special non-SMP receive logic is bogus; and probably won't work anyway with preempt and other races. There is already work in moving STP out of the kernel, but even that has shown that the problem is how to have the proper management hooks to do the job. That is why it hasn't been a simple slam dunk. From jmorris@intercode.com.au Wed Jun 25 10:41:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 10:41:40 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:z55k/SqaZXK7N52RVvFzYeNiHrO0EHHb@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PHfW2x009314 for ; Wed, 25 Jun 2003 10:41:34 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5PHf2r31173; Thu, 26 Jun 2003 03:41:02 +1000 Date: Thu, 26 Jun 2003 03:41:01 +1000 (EST) From: James Morris To: Stephen Hemminger cc: David Stevens , "David S. Miller" , Subject: Re: [PATCH 2.5.72] Igmp w/o linearize. In-Reply-To: <20030625093216.57d0f586.shemminger@osdl.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3497 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Wed, 25 Jun 2003, Stephen Hemminger wrote: > Here is the updated igmp patch, thanks to dlstevens for testing this. Applied to bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From mostrows@watson.ibm.com Wed Jun 25 11:01:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:01:50 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PI1j2x009936 for ; Wed, 25 Jun 2003 11:01:45 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PI0Fq165256; Wed, 25 Jun 2003 14:00:16 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PI0t8125860; Wed, 25 Jun 2003 14:00:55 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id OAA79692; Wed, 25 Jun 2003 14:00:55 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Stephen Hemminger Cc: Jamal Hadi , rusty@rustcorp.com.au, "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com In-Reply-To: <20030625104001.476ee314.shemminger@osdl.org> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> <20030625093902.7431efc3.shemminger@osdl.org> <20030625125518.N84526@shell.cyberus.ca> <20030625104001.476ee314.shemminger@osdl.org> Content-Type: text/plain Message-Id: <1056564055.1944.2041.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 14:00:55 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3498 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev On Wed, 2003-06-25 at 13:40, Stephen Hemminger wrote: > For PPPoE, the session management needs to be in kernel space, with the policy > in user space. What if the kernel, initialized the session when it saw > the discovery and notified the pppd, session would not be established > until ppd accepted the connection. This would be more like a socket > protocol without auto-accept like TCP. Any data for the session would > then stay queued until it was accepted or rejected. > Regardless of the solution take for the session-initiation race, any solution would fall apart if SMP softIRQ's can reorder packets (that is it would result in dropped packets). Only once there is a solution for this reordering problem does it make sense to consider the options for handling this race. PPPoE also doesn't exactly cleanly fit nicely into the standard bind()/listen()/accept()/connect() mould. I've been convinced that negotiation/discovery belongs in pppd and so would like to avoid adding connection detection logic into the kernel. Finally, please keep in mind that with PPPoE when we do hit this problem the effect is that PPP session establishment takes a bit longer since we have to wait for an LCP timeout and retransmit. I am much more curious about how other protocols may be affected by packet reordering. -- Michal Ostrowski From shemminger@osdl.org Wed Jun 25 11:41:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:33 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIfT2x011054 for ; Wed, 25 Jun 2003 11:41:29 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIfHq00666; Wed, 25 Jun 2003 11:41:17 -0700 Date: Wed, 25 Jun 2003 11:41:17 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (3/7) ipmr.c - drop/reacquire in error path. Message-Id: <20030625114117.541d53cd.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3502 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev In the failure path, need to drop/re-acquire locking to allow sysfs and hotplug to run. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:15:05 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:15:05 2003 @@ -158,6 +158,10 @@ return dev; failure: + /* allow the register to be completed before unregistering. */ + rtnl_unlock(); + rtnl_lock(); + unregister_netdevice(dev); return NULL; } @@ -228,6 +232,10 @@ return dev; failure: + /* allow the register to be completed before unregistering. */ + rtnl_unlock(); + rtnl_lock(); + unregister_netdevice(dev); return NULL; } From shemminger@osdl.org Wed Jun 25 11:41:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:42:00 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIfr2x011407 for ; Wed, 25 Jun 2003 11:41:53 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIffq00718; Wed, 25 Jun 2003 11:41:41 -0700 Date: Wed, 25 Jun 2003 11:41:41 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (7/7) Get rid of skb_linearize in ipmr.c Message-Id: <20030625114141.76c11186.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3506 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Use explicit may_pull's instead of forcing skb to be linearized. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:30:41 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:30:41 2003 @@ -1400,15 +1400,14 @@ int pim_rcv_v1(struct sk_buff * skb) { - struct igmphdr *pim = (struct igmphdr*)skb->h.raw; + struct igmphdr *pim; struct iphdr *encap; struct net_device *reg_dev = NULL; - if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) - goto drop; - pim = (struct igmphdr*)skb->h.raw; - } + if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap))) + goto drop; + + pim = (struct igmphdr*)skb->h.raw; if (!mroute_do_pim || skb->len < sizeof(*pim) + sizeof(*encap) || @@ -1465,21 +1464,18 @@ #ifdef CONFIG_IP_PIMSM_V2 static int pim_rcv(struct sk_buff * skb) { - struct pimreghdr *pim = (struct pimreghdr*)skb->h.raw; + struct pimreghdr *pim; struct iphdr *encap; struct net_device *reg_dev = NULL; - if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) - goto drop; - pim = (struct pimreghdr*)skb->h.raw; - } + if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap))) + goto drop; - if (skb->len < sizeof(*pim) + sizeof(*encap) || - pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) || + pim = (struct pimreghdr*)skb->h.raw; + if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) || (pim->flags&PIM_NULL_REGISTER) || - (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 && - ip_compute_csum((void *)pim, skb->len))) + (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 && + (u16)csum_fold(skb_checksum(skb, 0, skb->len, 0)))) goto drop; /* check if the inner packet is destined to mcast group */ From shemminger@osdl.org Wed Jun 25 11:40:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:40:59 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIer2x010903 for ; Wed, 25 Jun 2003 11:40:54 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIecq00482; Wed, 25 Jun 2003 11:40:38 -0700 Date: Wed, 25 Jun 2003 11:40:38 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (0/7) ipmr fixes Message-Id: <20030625114038.176ee030.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3499 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev These patches fix ip multicast route (ipmr) on 2.5.73. 1 - Trivial C99 initialization 2 - Change functions/variables to static 3 - Drop and reacquire RTNL in error path 4 - Use time_after() 5 - Use alloc_netdev 6 - Fix OOPS on dropped packets 7 - Get rid of skb_linearize Tested on 8-way SMP by bringing up pimd. From shemminger@osdl.org Wed Jun 25 11:41:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIfX2x011091 for ; Wed, 25 Jun 2003 11:41:33 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIfMq00670; Wed, 25 Jun 2003 11:41:22 -0700 Date: Wed, 25 Jun 2003 11:41:21 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (4/7) ipmr.c - convert to alloc_netdev Message-Id: <20030625114121.2597182c.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3503 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Use time_after to avoid potential for jiffies wrap in timer code. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:30:03 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:30:03 2003 @@ -342,9 +342,8 @@ cp = &mfc_unres_queue; while ((c=*cp) != NULL) { - long interval = c->mfc_un.unres.expires - now; - - if (interval > 0) { + if (time_after(c->mfc_un.unres.expires, now)) { + unsigned long interval = c->mfc_un.unres.expires - now; if (interval < expires) expires = interval; cp = &c->next; @@ -1291,7 +1290,8 @@ large chunk of pimd to kernel. Ough... --ANK */ (mroute_do_pim || cache->mfc_un.res.ttls[true_vifi] < 255) && - jiffies - cache->mfc_un.res.last_assert > MFC_ASSERT_THRESH) { + time_after(jiffies, + cache->mfc_un.res.last_assert + MFC_ASSERT_THRESH)) { cache->mfc_un.res.last_assert = jiffies; ipmr_cache_report(skb, true_vifi, IGMPMSG_WRONGVIF); } From shemminger@osdl.org Wed Jun 25 11:40:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:01 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIev2x010904 for ; Wed, 25 Jun 2003 11:40:58 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIegq00498; Wed, 25 Jun 2003 11:40:42 -0700 Date: Wed, 25 Jun 2003 11:40:42 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (1/7) ipmr - C99 initializers Message-Id: <20030625114042.3cee041e.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3500 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change notifier to C99 initialization This one changes an initializer to C99 style. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:14:59 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:14:59 2003 @@ -1078,9 +1078,7 @@ static struct notifier_block ip_mr_notifier={ - ipmr_device_event, - NULL, - 0 + .notifier_call = ipmr_device_event, }; /* From shemminger@osdl.org Wed Jun 25 11:41:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:57 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIfo2x011346 for ; Wed, 25 Jun 2003 11:41:50 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIfbq00701; Wed, 25 Jun 2003 11:41:37 -0700 Date: Wed, 25 Jun 2003 11:41:37 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (6/7) ipmr.c fix dst underflow on dropped packets Message-Id: <20030625114137.4c4c2377.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3505 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev If ipmr rejected a packet, then it would cause a socket buffer to be reprocessed after freed leading to "dst underflow" and slow death. The IP handler expects a negative return value to be the protocol to resubmit to, not an error code. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:30:29 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:30:29 2003 @@ -1405,19 +1405,15 @@ struct net_device *reg_dev = NULL; if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) { - kfree_skb(skb); - return -ENOMEM; - } + if (skb_linearize(skb, GFP_ATOMIC) != 0) + goto drop; pim = (struct igmphdr*)skb->h.raw; } if (!mroute_do_pim || skb->len < sizeof(*pim) + sizeof(*encap) || - pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER) { - kfree_skb(skb); - return -EINVAL; - } + pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER) + goto drop; encap = (struct iphdr*)(skb->h.raw + sizeof(struct igmphdr)); /* @@ -1427,11 +1423,9 @@ c. packet is not truncated */ if (!MULTICAST(encap->daddr) || - ntohs(encap->tot_len) == 0 || - ntohs(encap->tot_len) + sizeof(*pim) > skb->len) { - kfree_skb(skb); - return -EINVAL; - } + encap->tot_len == 0 || + ntohs(encap->tot_len) + sizeof(*pim) > skb->len) + goto drop; read_lock(&mrt_lock); if (reg_vif_num >= 0) @@ -1440,10 +1434,8 @@ dev_hold(reg_dev); read_unlock(&mrt_lock); - if (reg_dev == NULL) { - kfree_skb(skb); - return -EINVAL; - } + if (reg_dev == NULL) + goto drop; skb->mac.raw = skb->nh.raw; skb_pull(skb, (u8*)encap - skb->data); @@ -1464,6 +1456,9 @@ netif_rx(skb); dev_put(reg_dev); return 0; + drop: + kfree_skb(skb); + return 0; } #endif @@ -1475,10 +1470,8 @@ struct net_device *reg_dev = NULL; if (skb_is_nonlinear(skb)) { - if (skb_linearize(skb, GFP_ATOMIC) != 0) { - kfree_skb(skb); - return -ENOMEM; - } + if (skb_linearize(skb, GFP_ATOMIC) != 0) + goto drop; pim = (struct pimreghdr*)skb->h.raw; } @@ -1486,19 +1479,15 @@ pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) || (pim->flags&PIM_NULL_REGISTER) || (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 && - ip_compute_csum((void *)pim, skb->len))) { - kfree_skb(skb); - return -EINVAL; - } + ip_compute_csum((void *)pim, skb->len))) + goto drop; /* check if the inner packet is destined to mcast group */ encap = (struct iphdr*)(skb->h.raw + sizeof(struct pimreghdr)); if (!MULTICAST(encap->daddr) || - ntohs(encap->tot_len) == 0 || - ntohs(encap->tot_len) + sizeof(*pim) > skb->len) { - kfree_skb(skb); - return -EINVAL; - } + encap->tot_len == 0 || + ntohs(encap->tot_len) + sizeof(*pim) > skb->len) + goto drop; read_lock(&mrt_lock); if (reg_vif_num >= 0) @@ -1507,10 +1496,8 @@ dev_hold(reg_dev); read_unlock(&mrt_lock); - if (reg_dev == NULL) { - kfree_skb(skb); - return -EINVAL; - } + if (reg_dev == NULL) + goto drop; skb->mac.raw = skb->nh.raw; skb_pull(skb, (u8*)encap - skb->data); @@ -1530,6 +1517,9 @@ #endif netif_rx(skb); dev_put(reg_dev); + return 0; + drop: + kfree_skb(skb); return 0; } #endif From shemminger@osdl.org Wed Jun 25 11:41:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIfa2x011123 for ; Wed, 25 Jun 2003 11:41:37 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIfPq00676; Wed, 25 Jun 2003 11:41:25 -0700 Date: Wed, 25 Jun 2003 11:41:25 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (5/7) ipmr.c - use alloc_netdev Message-Id: <20030625114125.0fef55ca.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3504 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Rather than explicitly calling kmalloc, use the alloc_netdev infrastructure. diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:30:19 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:30:19 2003 @@ -186,34 +186,23 @@ return (struct net_device_stats*)dev->priv; } -static void vif_dev_destructor(struct net_device *dev) +static void reg_vif_setup(struct net_device *dev) { - kfree(dev); -} - -static struct net_device *ipmr_reg_vif(struct vifctl *v) -{ - struct net_device *dev; - struct in_device *in_dev; - int size; - - size = sizeof(*dev) + sizeof(struct net_device_stats); - dev = kmalloc(size, GFP_KERNEL); - if (!dev) - return NULL; - - memset(dev, 0, size); - - dev->priv = dev + 1; - - strcpy(dev->name, "pimreg"); - dev->type = ARPHRD_PIMREG; dev->mtu = 1500 - sizeof(struct iphdr) - 8; dev->flags = IFF_NOARP; dev->hard_start_xmit = reg_vif_xmit; dev->get_stats = reg_vif_get_stats; - dev->destructor = vif_dev_destructor; + dev->destructor = (void (*)(struct net_device *)) kfree; +} + +static struct net_device *ipmr_reg_vif(void) +{ + struct net_device *dev; + struct in_device *in_dev; + + dev = alloc_netdev(sizeof(struct net_device_stats), "pimreg", + reg_vif_setup); if (register_netdevice(dev)) { kfree(dev); @@ -403,7 +392,7 @@ */ if (reg_vif_num >= 0) return -EADDRINUSE; - dev = ipmr_reg_vif(vifc); + dev = ipmr_reg_vif(); if (!dev) return -ENOBUFS; break; From shemminger@osdl.org Wed Jun 25 11:41:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 11:41:08 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PIf32x010911 for ; Wed, 25 Jun 2003 11:41:03 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5PIepq00519; Wed, 25 Jun 2003 11:40:51 -0700 Date: Wed, 25 Jun 2003 11:40:51 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] (2/7) ipmr.c - make local stuff static Message-Id: <20030625114051.7a5cc652.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3501 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Make things static where possible in ipmr.c. diff -Nru a/include/linux/mroute.h b/include/linux/mroute.h --- a/include/linux/mroute.h Wed Jun 25 11:15:02 2003 +++ b/include/linux/mroute.h Wed Jun 25 11:15:02 2003 @@ -217,7 +217,6 @@ __u32 flags; }; -extern int pim_rcv(struct sk_buff *); extern int pim_rcv_v1(struct sk_buff *); struct rtmsg; diff -Nru a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c Wed Jun 25 11:15:02 2003 +++ b/net/ipv4/ipmr.c Wed Jun 25 11:15:02 2003 @@ -83,13 +83,13 @@ #define VIF_EXISTS(idx) (vif_table[idx].dev != NULL) -int mroute_do_assert; /* Set in PIM assert */ -int mroute_do_pim; +static int mroute_do_assert; /* Set in PIM assert */ +static int mroute_do_pim; static struct mfc_cache *mfc_cache_array[MFC_LINES]; /* Forwarding cache */ static struct mfc_cache *mfc_unres_queue; /* Queue of unresolved entries */ -atomic_t cache_resolve_queue_len; /* Size of unresolved */ +static atomic_t cache_resolve_queue_len; /* Size of unresolved */ /* Special spinlock for queue of unresolved entries */ static spinlock_t mfc_unres_lock = SPIN_LOCK_UNLOCKED; @@ -102,7 +102,7 @@ In this case data path is free of exclusive locks at all. */ -kmem_cache_t *mrt_cachep; +static kmem_cache_t *mrt_cachep; static int ip_mr_forward(struct sk_buff *skb, struct mfc_cache *cache, int local); static int ipmr_cache_report(struct sk_buff *pkt, vifi_t vifi, int assert); @@ -187,8 +187,7 @@ kfree(dev); } -static -struct net_device *ipmr_reg_vif(struct vifctl *v) +static struct net_device *ipmr_reg_vif(struct vifctl *v) { struct net_device *dev; struct in_device *in_dev; @@ -316,7 +315,7 @@ /* Single timer process for all the unresolved queue. */ -void ipmr_expire_process(unsigned long dummy) +static void ipmr_expire_process(unsigned long dummy) { unsigned long now; unsigned long expires; @@ -683,7 +682,7 @@ * MFC cache manipulation by user space mroute daemon */ -int ipmr_mfc_delete(struct mfcctl *mfc) +static int ipmr_mfc_delete(struct mfcctl *mfc) { int line; struct mfc_cache *c, **cp; @@ -704,7 +703,7 @@ return -ENOENT; } -int ipmr_mfc_add(struct mfcctl *mfc, int mrtsock) +static int ipmr_mfc_add(struct mfcctl *mfc, int mrtsock) { int line; struct mfc_cache *uc, *c, **cp; @@ -1232,7 +1231,7 @@ ipmr_forward_finish); } -int ipmr_find_vif(struct net_device *dev) +static int ipmr_find_vif(struct net_device *dev) { int ct; for (ct=maxvif-1; ct>=0; ct--) { @@ -1244,7 +1243,7 @@ /* "local" means that we should preserve one skb (for local delivery) */ -int ip_mr_forward(struct sk_buff *skb, struct mfc_cache *cache, int local) +static int ip_mr_forward(struct sk_buff *skb, struct mfc_cache *cache, int local) { int psend = -1; int vif, ct; @@ -1472,7 +1471,7 @@ #endif #ifdef CONFIG_IP_PIMSM_V2 -int pim_rcv(struct sk_buff * skb) +static int pim_rcv(struct sk_buff * skb) { struct pimreghdr *pim = (struct pimreghdr*)skb->h.raw; struct iphdr *encap; From yoshfuji@linux-ipv6.org Wed Jun 25 12:00:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 12:00:45 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PJ0Z2x013544 for ; Wed, 25 Jun 2003 12:00:36 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5PJ1oBo014257; Thu, 26 Jun 2003 04:01:51 +0900 Date: Thu, 26 Jun 2003 04:01:50 +0900 (JST) Message-Id: <20030626.040150.28947784.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: DAD has to be destined to solicited node mulitcast address From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3507 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Check if DAD is destined for solicited node multicast address as RFC2461 required. Thanks in advance. Index: linux-2.5/net/ipv6/ndisc.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ndisc.c,v retrieving revision 1.40 diff -u -r1.40 ndisc.c --- linux-2.5/net/ipv6/ndisc.c 21 Jun 2003 16:21:01 -0000 1.40 +++ linux-2.5/net/ipv6/ndisc.c 25 Jun 2003 17:45:08 -0000 @@ -713,6 +713,7 @@ struct net_device *dev = skb->dev; struct inet6_ifaddr *ifp; struct neighbour *neigh; + int addr_type = ipv6_addr_type(saddr); if (ipv6_addr_type(&msg->target)&IPV6_ADDR_MULTICAST) { if (net_ratelimit()) @@ -720,6 +721,20 @@ return; } + /* + * RFC2461 7.1.1: + * DAD has to be destined for solicited node multicast address. + */ + if (addr_type == IPV6_ADDR_ANY && + !(daddr->s6_addr32[0] == htonl(0xff020000) && + daddr->s6_addr32[1] == htonl(0x00000000) && + daddr->s6_addr32[2] == htonl(0x00000001) && + daddr->s6_addr [12] == 0xff )) { + if (net_ratelimit()) + printk(KERN_DEBUG "ICMP6 NS: bad DAD packet (wrong destination\n"); + return; + } + if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) { if (net_ratelimit()) printk(KERN_WARNING "ICMP NS: invalid ND option, ignored.\n"); @@ -743,14 +758,7 @@ * NOTE! Linux kernel < 2.4.4 broke this rule. */ - /* XXX: RFC2461 7.1.1: - * If the IP source address is the unspecified address, the IP - * destination address MUST be a solicited-node multicast address. - */ - if ((ifp = ipv6_get_ifaddr(&msg->target, dev)) != NULL) { - int addr_type = ipv6_addr_type(saddr); - if (ifp->flags & IFA_F_TENTATIVE) { /* Address is tentative. If the source is unspecified address, it is someone @@ -816,7 +824,6 @@ in6_ifa_put(ifp); } else if (ipv6_chk_acast_addr(dev, &msg->target)) { struct inet6_dev *idev = in6_dev_get(dev); - int addr_type = ipv6_addr_type(saddr); /* anycast */ -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Wed Jun 25 12:34:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 12:34:16 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PJY52x014482 for ; Wed, 25 Jun 2003 12:34:06 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5PJZKBo014856; Thu, 26 Jun 2003 04:35:20 +0900 Date: Thu, 26 Jun 2003 04:35:20 +0900 (JST) Message-Id: <20030626.043520.18011582.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: DAD must not have source link-layer option From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3508 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Check if DAD does not have source link-layer address option; RFC2461 7.1.1. Patch depends on "[PATCH] IPV6: DAD has to be destined to solicited node mulitcast address" patch. Thanks. --- linux25+patch/net/ipv6/ndisc.c.orig Thu Jun 26 04:07:15 2003 +++ linux25+patch/net/ipv6/ndisc.c Thu Jun 26 04:27:50 2003 @@ -749,15 +749,19 @@ void ndisc_recv_ns(struct sk_buff *skb) printk(KERN_WARNING "ICMP NS: bad lladdr length.\n"); return; } + + /* XXX: RFC2461 7.1.1: + * If the IP source address is the unspecified address, + * there MUST NOT be source link-layer address option + * in the message. + */ + if (addr_type == IPV6_ADDR_ANY) { + if (net_ratelimit()) + printk(KERN_WARNING "ICMP6 NS: bad DAD packet (link-layer address option)\n"); + return; + } } - /* XXX: RFC2461 7.1.1: - * If the IP source address is the unspecified address, there - * MUST NOT be source link-layer address option in the message. - * - * NOTE! Linux kernel < 2.4.4 broke this rule. - */ - if ((ifp = ipv6_get_ifaddr(&msg->target, dev)) != NULL) { if (ifp->flags & IFA_F_TENTATIVE) { /* Address is tentative. If the source -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From linux-netdev@gmane.org Wed Jun 25 13:27:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 13:27:40 -0700 (PDT) Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PKRW2x019669 for ; Wed, 25 Jun 2003 13:27:34 -0700 Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19VCkS-0005VH-00 for ; Wed, 25 Jun 2003 18:03:52 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: netdev@oss.sgi.com Received: from news by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19VChk-0005Gw-00 for ; Wed, 25 Jun 2003 18:01:04 +0200 From: Jason Lunz Subject: Re: [PATCH, untested] Support for PPPOE on SMP Date: Wed, 25 Jun 2003 16:01:04 +0000 (UTC) Organization: PBR Streetgang Lines: 24 Message-ID: References: <20030625072602.529AF2C0B9@lists.samba.org> X-Complaints-To: usenet@main.gmane.org User-Agent: slrn/0.9.7.4 (Linux) X-archive-position: 3509 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lunz@falooley.org Precedence: bulk X-list: netdev rusty@rustcorp.com.au said: > I don't understand the unbalanced dev_put in net_rx_action(), BTW. It's tricky. There are two paths an skb can take into net_rx_action(), napi and non-napi. The non-napi path uses dev_hold/dev_put on both skb->dev and a virtual per-cpu struct net_device, the backlog_dev. In a non-napi skb receive, the driver uses netif_rx() to hand the skb up to the net core. netif_rx does a dev_hold on skb->dev, puts the skb on the current cpu's softnet_data queue, and uses netif_rx_schedule to schedule that softnet-data's ->backlog_dev to be polled. In the process, __netif_rx_schedule does a dev_hold(backlog_dev). So the queue of ready net_devices processed by net_rx_action may contain actual struct net_devices (napi) or the virtual ->backlog_dev net_device. In the former case, net_rx_action's dev_put balances the dev_hold done when the driver called __netif_rx_schedule(). In the latter case, net_rx_action's dev_put balances the dev_hold of the backlog_dev done when netif_rx called __netif_rx_schedule(). I hope that makes some kind of sense. It took a while to figure out, but I saved my notes. :) Jason From pekkas@netcore.fi Wed Jun 25 13:43:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 13:43:48 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PKhd2x020117 for ; Wed, 25 Jun 2003 13:43:40 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5PKhGx21246; Wed, 25 Jun 2003 23:43:16 +0300 Date: Wed, 25 Jun 2003 23:43:15 +0300 (EEST) From: Pekka Savola To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: DAD has to be destined to solicited node mulitcast address In-Reply-To: <20030626.040150.28947784.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3510 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Thu, 26 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > + printk(KERN_DEBUG "ICMP6 NS: bad DAD packet (wrong destination\n"); s/destination/destination)/ -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From nf@hipac.org Wed Jun 25 13:48:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 13:49:02 -0700 (PDT) Received: from indyio.rz.uni-saarland.de (indyio.rz.uni-saarland.de [134.96.7.3]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PKms2x020487 for ; Wed, 25 Jun 2003 13:48:55 -0700 Received: from mars.rz.uni-saarland.de (mars.rz.uni-saarland.de [134.96.7.4]) by indyio.rz.uni-saarland.de (8.12.9/8.12.5) with ESMTP id h5PKmlai8862333; Wed, 25 Jun 2003 22:48:47 +0200 (CEST) Received: from e002.stw.stud.uni-saarland.de (e002.stw.stud.uni-saarland.de [134.96.65.17]) by mars.rz.uni-saarland.de (8.9.3p2/8.8.4/8.8.2) with ESMTP id WAA17657326; Wed, 25 Jun 2003 22:48:46 +0200 (CEST) Received: from e002.stw.stud.uni-saarland.de ([134.96.65.138] helo=e123.stw.stud.uni-saarland.de) by e002.stw.stud.uni-saarland.de with esmtp (Exim 3.35 #1 (Debian)) id 19VHCA-0002HF-00; Wed, 25 Jun 2003 22:48:46 +0200 From: Michael Bellion and Thomas Heinz Reply-To: Michael Bellion and Thomas Heinz To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [ANNOUNCE] nf-hipac v0.8 released Date: Wed, 25 Jun 2003 22:48:44 +0200 User-Agent: KMail/1.5.2 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306252248.44224.nf@hipac.org> X-archive-position: 3511 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nf@hipac.org Precedence: bulk X-list: netdev Hi We have released a new version of nf-hipac. We rewrote most of the code and added a bunch of new features. The main enhancements are user-defined chains, generic support for iptables targets and matches and 64 bit atomic counters. For all of you who don't know nf-hipac yet, here is a short overview: nf-hipac is a drop-in replacement for the iptables packet filtering module. It implements a novel framework for packet classification which uses an advanced algorithm to reduce the number of memory lookups per packet. The module is ideal for environments where large rulesets and/or high bandwidth networks are involved. Its userspace tool, which is also called 'nf-hipac', is designed to be as compatible as possible to 'iptables -t filter'. The official project web page is: http://www.hipac.org The releases can be downloaded from: http://sourceforge.net/projects/nf-hipac Features: - optimized for high performance packet classification with moderate memory usage - completely dynamic: data structure isn't rebuild from scratch when inserting or deleting rules, so fast updates are possible - very short locking times during rule updates: packet matching is not blocked - support for 64 bit architectures - optimized kernel-user protocol (netlink): improved rule listing speed - libnfhipac: netlink library for kernel-user communication - native match support for: + source/destination ip + in/out interface + protocol (udp, tcp, icmp) + fragments + source/destination ports (udp, tcp) + tcp flags + icmp type + connection state + ttl - match negation (!) - iptables compatibility: syntax and semantics of the userspace tool are very similar to iptables - coexistence of nf-hipac and iptables: both facilities can be used at the same time - generic support for iptables targets and matches (binary compatibility) - integration into the netfilter connection tracking facility - user-defined chains support - 64 bit atomic counters - kernel module autoloading - /proc/net/nf-hipac/info: + dynamically limit the maximum memory usage + change invokation order of nf-hipac and iptables - extended statistics via /proc/net/nf-hipac/statistics/* We are currently working on extending the hipac algorithm to do classification with several stages. The hipac algorithm will then be capable of combining several classification problems in one data structure, e.g. it will be possible to solve routing and firewalling with one hipac lookup. The idea is to shorten the packet forwarding path by combining fib_lookup and iptables filter lookup into one hipac query. To further improve the performance in this scenario the upcoming flow cache could be used to cache recent hipac results. Enjoy, +-----------------------+----------------------+ | Michael Bellion | Thomas Heinz | | | | +-----------------------+----------------------+ From folkert@vanheusden.com Wed Jun 25 14:03:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 14:03:33 -0700 (PDT) Received: from muur.intranet.vanheusden.com (nobody@keetweej.xs4all.nl [213.84.46.114]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PL3E2x022172 for ; Wed, 25 Jun 2003 14:03:26 -0700 Received: from boemboem.intranet.vanheusden.com (folkert@boemboem.intranet.vanheusden.com [192.168.64.54]) by muur.intranet.vanheusden.com (8.12.8/8.9.3) with ESMTP id h5PL3CeO003647; Wed, 25 Jun 2003 23:03:12 +0200 From: Folkert van Heusden Reply-To: folkert@vanheusden.com To: Michael Bellion and Thomas Heinz , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released Date: Wed, 25 Jun 2003 23:03:13 +0200 User-Agent: KMail/1.5.2 References: <200306252248.44224.nf@hipac.org> In-Reply-To: <200306252248.44224.nf@hipac.org> WebSite: http://www.vanheusden.com/ MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306252303.13366.folkert@vanheusden.com> X-archive-position: 3512 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: folkert@vanheusden.com Precedence: bulk X-list: netdev Hi, > nf-hipac is a drop-in replacement for the iptables packet filtering module. > It implements a novel framework for packet classification which uses an > advanced algorithm to reduce the number of memory lookups per packet. > The module is ideal for environments where large rulesets and/or high > bandwidth networks are involved. Its userspace tool, which is also called > 'nf-hipac', is designed to be as compatible as possible to 'iptables -t > filter'. Looks great! Any chance on a port to 2.5.x? Greetings, Folkert van Heusden +-> www.vanheusden.com folkert@vanheusden.com +31-6-41278122 <-+ +--------------------------------------------------------------------------+ | UNIX sysop? Then give MultiTail ( http://www.vanheusden.com/multitail/ ) | | a try, it brings monitoring logfiles (and such) to a different level! | +--------------------------------------------------------------------------+ From garzik@gtf.org Wed Jun 25 14:33:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 14:34:01 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PLXr2x024477 for ; Wed, 25 Jun 2003 14:33:54 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 348C86650; Wed, 25 Jun 2003 17:33:48 -0400 (EDT) Date: Wed, 25 Jun 2003 17:33:48 -0400 From: Jeff Garzik To: torvalds@transmeta.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [BK PATCHES] 2.5 net driver merges Message-ID: <20030625213348.GA22088@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 3513 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Linus, please do a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.5 Others may download the patch ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.73-bk3-netdrvr1.patch.bz2 This will update the following files: drivers/net/Kconfig | 22 drivers/net/Makefile | 3 drivers/net/Makefile.lib | 1 drivers/net/Space.c | 11 drivers/net/acenic.c | 3 drivers/net/au1000_eth.c | 70 +-- drivers/net/au1000_eth.h | 11 drivers/net/declance.c | 453 ++++++++++--------- drivers/net/e100/e100_main.c | 15 drivers/net/e1000/e1000_ethtool.c | 12 drivers/net/eepro100.c | 16 drivers/net/gt96100eth.c | 17 drivers/net/ioc3-eth.c | 107 ---- drivers/net/irda/Kconfig | 4 drivers/net/irda/Makefile | 1 drivers/net/irda/au1k_ir.c | 868 ++++++++++++++++++++++++++++++++++++++ drivers/net/ixgb/ixgb_ethtool.c | 8 drivers/net/meth.c | 862 +++++++++++++++++++++++++++++++++++++ drivers/net/meth.h | 273 +++++++++++ drivers/net/pcmcia/3c574_cs.c | 37 - drivers/net/pcmcia/3c589_cs.c | 40 - drivers/net/pcmcia/fmvj18x_cs.c | 34 - drivers/net/pcmcia/nmclan_cs.c | 33 - drivers/net/pcmcia/smc91c92_cs.c | 75 +-- drivers/net/sb1250-mac.c | 731 +++++++++++++++++++++----------- drivers/net/sgiseeq.c | 213 ++++----- drivers/net/sgiseeq.h | 2 drivers/net/sungem.c | 3 drivers/net/sunhme.c | 3 drivers/net/tulip/tulip_core.c | 12 30 files changed, 3043 insertions(+), 897 deletions(-) through these ChangeSets: (03/06/25 1.1458) [netdrvr] misc small mips updates Add missing CONFIG_TC35805 entry to Kconfig. Update CONFIG_NET_SB1250_MAC Kconfig entry. Minor cosmetic updates to gt96100eth. (03/06/25 1.1457) [netdrvr tulip] add mips cobalt support (03/06/25 1.1456) [netdrvr] update sb1250-mac (03/06/25 1.1455) [netdrvr] au1000_eth update (03/06/25 1.1454) [netdrvr] update declance (03/06/25 1.1453) [netdrvr] update ioc3_eth (03/06/25 1.1452) [netdrvr] sgiseeq update (03/06/25 1.1451) [netdrvr] add driver "meth", for SGI O2 MACE fast eth (03/06/25 1.1450) [irda] add driver for mips Alchemy Au1000 SIR/FIR Submitted by Ralf Baechle (03/06/24 1.1449) [PATCH] 2.5.70 - eepro100 - use alloc_etherdev Ignore earlier patch -- this one locks and free's as appropriate. Tested on 2.5.72 with SMP. Of course, it begs the question why have two (now three) versions of drivers for the same hardware... (03/06/24 1.1448) [PATCH] Remove CAP_NET_ADMIN check for SIOCETHTOOL's dev_ioctl already checks capable(CAP_NET_ADMIN), so no need to do so in drivers. (03/06/24 1.1447) [PATCH] alloc_etherdev for smc91c92_cs net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev. compile tested only since no hardware against 2.5.73-bk -daniel ===== smc91c92_cs.c 1.18 vs edited ===== (03/06/24 1.1446) [PATCH] alloc_etherdev for nmclan_cs net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev. compile tested only since no hardware against 2.5.73-bk -daniel ===== nmclan_cs.c 1.14 vs edited ===== (03/06/24 1.1445) [PATCH] alloc_etherdev for fmvj18x_cs net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev. compile tested only since no hardware against 2.5.73-bk -daniel ===== fmvj18x_cs.c 1.21 vs edited ===== (03/06/24 1.1444) [PATCH] alloc_etherdev for 3c589_cs net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev. compile tested only since no hardware against 2.5.73-bk -daniel ===== drivers/net/pcmcia/3c589_cs.c 1.17 vs edited ===== (03/06/24 1.1443) [PATCH] alloc_etherdev for 3c574_cs net_device is no longer allocated as part of the driver's private structure, instead it's allocated via alloc_netdev. compile tested only since no hardware against 2.5.73-bk -daniel ===== drivers/net/pcmcia/3c574_cs.c 1.17 vs edited ===== From davem@redhat.com Wed Jun 25 14:39:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 14:39:32 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PLdS2x024834 for ; Wed, 25 Jun 2003 14:39:28 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA32003; Wed, 25 Jun 2003 14:33:34 -0700 Date: Wed, 25 Jun 2003 14:33:34 -0700 (PDT) Message-Id: <20030625.143334.85380461.davem@redhat.com> To: mostrows@watson.ibm.com Cc: rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: "David S. Miller" In-Reply-To: <1056547262.1945.1436.camel@brick.watson.ibm.com> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3514 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Why don't you just queue the payload packets in a "resolution queue" until the socket is created? Just make the resolution queue packets timeout using a value that will easily exceed any reasonable PPP negotiation time. All this ordered packet arrival shit is just beyond stupid. From mostrows@watson.ibm.com Wed Jun 25 15:07:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 15:07:11 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PM762x026116 for ; Wed, 25 Jun 2003 15:07:07 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PM6Eq122300; Wed, 25 Jun 2003 18:06:14 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PM6s8129694; Wed, 25 Jun 2003 18:06:54 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id SAA37498; Wed, 25 Jun 2003 18:06:54 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: "David S. Miller" Cc: rusty@rustcorp.com.au, Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com In-Reply-To: <20030625.143334.85380461.davem@redhat.com> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625.143334.85380461.davem@redhat.com> Content-Type: text/plain Message-Id: <1056578813.27267.8.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 18:06:54 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3515 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev On Wed, 2003-06-25 at 17:33, David S. Miller wrote: > Why don't you just queue the payload packets in a "resolution queue" > until the socket is created? Just make the resolution queue packets > timeout using a value that will easily exceed any reasonable PPP > negotiation time. > > All this ordered packet arrival shit is just beyond stupid. Exactly this mechanism is what I had in mind. The open question remaining is if there are any protocols which can be affected by packets being processed out of order. Some people have suggested that there are. If not, then there's not much to discuss. Can anyone comment on this decisively, either way? -- Michal Ostrowski From paulus@samba.org Wed Jun 25 15:29:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 15:29:47 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PMTc2x031431 for ; Wed, 25 Jun 2003 15:29:39 -0700 Received: by lists.samba.org (Postfix, from userid 580) id 961582C0F5; Wed, 25 Jun 2003 22:29:38 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16122.8374.178895.287907@nanango.paulus.ozlabs.org> Date: Thu, 26 Jun 2003 08:22:46 +1000 (EST) From: Paul Mackerras To: Jamal Hadi Cc: Stephen Hemminger , mostrows@watson.ibm.com, rusty@rustcorp.com.au, davem@redhat.com, netdev@oss.sgi.com, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <20030625125518.N84526@shell.cyberus.ca> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> <20030625093902.7431efc3.shemminger@osdl.org> <20030625125518.N84526@shell.cyberus.ca> X-Mailer: VM 6.75 under Emacs 20.7.2 X-archive-position: 3517 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: paulus@samba.org Precedence: bulk X-list: netdev Jamal Hadi writes: > a protocol or implementation which wishes to do state maintanance > properly oughta be able to do the synchronization on its own. > Separation between policy and mechanism has been the strength of unix. > A clean separation between control and a data path is very important. > Control protocols tend to be very rich environments which are > constantly changing. Take STP, there are so many features that could be > added to STP that are much harder to add because it is in the kernel. > > Maybe what needs to be looked at i sthe design of pppoe or ppp. OK, now that we have had our little flight of fancy about what things will be like once we get to heaven, can we talk about this bastard protocol called PPPoE? :) Or are you going to go personally to each ISP in the world and tell them they shouldn't use PPPoE? :) In any case the problem isn't strictly with PPPoE, since ethernet doesn't reorder packets on the wire. The problem is that the lower parts of the Linux network stack lose information. Paul. From paulus@samba.org Wed Jun 25 15:29:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 15:29:45 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PMTc2x031430 for ; Wed, 25 Jun 2003 15:29:39 -0700 Received: by lists.samba.org (Postfix, from userid 580) id 8E1392C0CD; Wed, 25 Jun 2003 22:29:38 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16122.8066.928895.202985@nanango.paulus.ozlabs.org> Date: Thu, 26 Jun 2003 08:17:38 +1000 (EST) From: Paul Mackerras To: Michal Ostrowski Cc: Jamal Hadi , Rusty Russell , "David S. Miller" , netdev@oss.sgi.com, "David F. Skoll" , James Carlson Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <1056562079.1944.1961.camel@brick.watson.ibm.com> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <1056548544.1944.1488.camel@brick.watson.ibm.com> <20030625114243.F84526@shell.cyberus.ca> <1056562079.1944.1961.camel@brick.watson.ibm.com> X-Mailer: VM 6.75 under Emacs 20.7.2 X-archive-position: 3516 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: paulus@samba.org Precedence: bulk X-list: netdev Michal Ostrowski writes: > Paul: you made an assertion to me in an eariler e-mail that you were > concerned about packet ordering for the sake of vj and compression. > IIRC the PPPoE spec prohibits compression, probably for this very > reason. Is there any other reason we'd be worried about re-ordering in > the PPP data stream? Reordering would stop you doing multilink, for instance. Generally, PPP protocols assume ordering where it is helpful since most point-to-point links don't reorder packets. IMO the PPPoE protocol itself should have included a sequence number, but we can't change what's deployed. James might be able to comment better than me on what will happen if packets get reordered during the negotiation phase of a PPP connection. I think the worst is that some packets will have to be retransmitted and thus the negotiation will take several seconds longer than it needs to. Paul. From greearb@candelatech.com Wed Jun 25 15:53:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 15:53:49 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PMra2x001088 for ; Wed, 25 Jun 2003 15:53:37 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5PMrAxA017161; Wed, 25 Jun 2003 15:53:18 -0700 Message-ID: <3EFA27D6.2000007@candelatech.com> Date: Wed, 25 Jun 2003 15:53:10 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Paul Mackerras CC: Jamal Hadi , Stephen Hemminger , mostrows@watson.ibm.com, rusty@rustcorp.com.au, davem@redhat.com, netdev@oss.sgi.com, dfs@roaringpenguin.com, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625091531.5ebed618.shemminger@osdl.org> <20030625122128.V84526@shell.cyberus.ca> <20030625093902.7431efc3.shemminger@osdl.org> <20030625125518.N84526@shell.cyberus.ca> <16122.8374.178895.287907@nanango.paulus.ozlabs.org> In-Reply-To: <16122.8374.178895.287907@nanango.paulus.ozlabs.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3518 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Paul Mackerras wrote: > Jamal Hadi writes: > > >>a protocol or implementation which wishes to do state maintanance >>properly oughta be able to do the synchronization on its own. >>Separation between policy and mechanism has been the strength of unix. >>A clean separation between control and a data path is very important. >>Control protocols tend to be very rich environments which are >>constantly changing. Take STP, there are so many features that could be >>added to STP that are much harder to add because it is in the kernel. >> >>Maybe what needs to be looked at i sthe design of pppoe or ppp. > > > OK, now that we have had our little flight of fancy about what things > will be like once we get to heaven, can we talk about this bastard > protocol called PPPoE? :) > > Or are you going to go personally to each ISP in the world and tell > them they shouldn't use PPPoE? :) > > In any case the problem isn't strictly with PPPoE, since ethernet > doesn't reorder packets on the wire. The problem is that the lower > parts of the Linux network stack lose information. > > Paul. Nothing is guaranteed, but you may be right at least most of the time. Btw, if you want a proprietary tool that will emulate an ethernet network that reorders packets, I write such a thing and will give it to you. It could help you with testing perhaps. Also, if you have a PCMCIA Zircom NIC, it seems to reorder packets just for the hell of it (and no, I'm not using a dual-cpu laptop :)) I don't know of any other protocols that can't handle reordering, since most of them seem to be designed to run over the real internet, where reordering/drop/duplication is a part of life. Ben > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From mostrows@watson.ibm.com Wed Jun 25 15:57:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 15:57:32 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5PMvR2x001418 for ; Wed, 25 Jun 2003 15:57:28 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5PMu5q122112; Wed, 25 Jun 2003 18:56:05 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5PMuj8125868; Wed, 25 Jun 2003 18:56:45 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id SAA47972; Wed, 25 Jun 2003 18:56:45 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Paul MacKerras Cc: Jamal Hadi , Rusty Russell , "David S. Miller" , netdev@oss.sgi.com, carlson@workingcode.com In-Reply-To: <16122.8066.928895.202985@nanango.paulus.ozlabs.org> References: <20030625072602.529AF2C0B9@lists.samba.org> <1056547262.1945.1436.camel@brick.watson.ibm.com> <1056548544.1944.1488.camel@brick.watson.ibm.com> <20030625114243.F84526@shell.cyberus.ca> <1056562079.1944.1961.camel@brick.watson.ibm.com> <16122.8066.928895.202985@nanango.paulus.ozlabs.org> Content-Type: text/plain Message-Id: <1056581805.27267.14.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 25 Jun 2003 18:56:45 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3519 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev On Wed, 2003-06-25 at 18:17, Paul Mackerras wrote: > James might be able to comment better than me on what will happen if > packets get reordered during the negotiation phase of a PPP > connection. I think the worst is that some packets will have to be > retransmitted and thus the negotiation will take several seconds > longer than it needs to. This is exactly what we're dealing with the current "bug"; the worst case effect is a delay. I don't think heroic measures are called for the sake of this PPPoE issue alone. -- Michal Ostrowski From shemminger@osdl.org Wed Jun 25 17:36:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 17:36:05 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q0Zx2x004892 for ; Wed, 25 Jun 2003 17:36:00 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5Q0Znq22661; Wed, 25 Jun 2003 17:35:51 -0700 Date: Wed, 25 Jun 2003 17:35:49 -0700 From: Stephen Hemminger To: Valdis.Kletnieks@vt.edu Cc: netdev@oss.sgi.com Subject: Re: Weird modem behaviour in 2.5.73-mm1 Message-Id: <20030625173549.561cfaec.shemminger@osdl.org> In-Reply-To: <200306251804.h5PI4odA023590@turing-police.cc.vt.edu> References: <200306242102.49356.kde@myrealbox.com> <200306250327.h5P3RwH8001577@turing-police.cc.vt.edu> <200306250418.h5P4IWdA001565@turing-police.cc.vt.edu> <20030625091013.573f2e7b.shemminger@osdl.org> <200306251654.h5PGsUdA022467@turing-police.cc.vt.edu> <20030625102134.2046b04f.shemminger@osdl.org> <200306251804.h5PI4odA023590@turing-police.cc.vt.edu> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3520 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Try this patch, it is more paranoid in some of the code paths. I did get PPP over a null modem cable working between 2.4.18 and 2.5.73 with the PPP patches. diff -Nru a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c --- a/drivers/net/ppp_generic.c Wed Jun 25 17:32:29 2003 +++ b/drivers/net/ppp_generic.c Wed Jun 25 17:32:29 2003 @@ -1448,7 +1448,7 @@ if (ppp->vj == 0 || (ppp->flags & SC_REJ_COMP_TCP)) goto err; - if (skb_tailroom(skb) < 124 || skb_is_nonlinear(skb) ) { + if (skb_tailroom(skb) < 124) { /* copy to a new sk_buff with more tailroom */ ns = dev_alloc_skb(skb->len + 128); if (ns == 0) { @@ -1459,7 +1459,9 @@ memcpy(skb_put(ns, skb->len), skb->data, skb->len); kfree_skb(skb); skb = ns; - } + } else if (!pskb_may_pull(skb, skb->len)) + goto err; + len = slhc_uncompress(ppp->vj, skb->data + 2, skb->len - 2); if (len <= 0) { printk(KERN_DEBUG "PPP: VJ decompression error\n"); @@ -2033,11 +2035,15 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound) { - unsigned char *dp = skb->data + 2; + unsigned char *dp; int len; - if (!pskb_may_pull(skb, CCP_HDRLEN + 2) - || skb->len < (len = CCP_LENGTH(dp)) + 2) + if (!pskb_may_pull(skb, CCP_HDRLEN + 2)) + return; + + dp = skb->data + 2; + len = CCP_LENGTH(dp); + if (skb->len < len +2) return; /* too short */ switch (CCP_CODE(dp)) { From davem@redhat.com Wed Jun 25 18:10:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 18:10:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q1A72x005434 for ; Wed, 25 Jun 2003 18:10:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA00558; Wed, 25 Jun 2003 18:04:10 -0700 Date: Wed, 25 Jun 2003 18:04:10 -0700 (PDT) Message-Id: <20030625.180410.74717146.davem@redhat.com> To: mostrows@watson.ibm.com Cc: rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: "David S. Miller" In-Reply-To: <1056578813.27267.8.camel@brick.watson.ibm.com> References: <1056547262.1945.1436.camel@brick.watson.ibm.com> <20030625.143334.85380461.davem@redhat.com> <1056578813.27267.8.camel@brick.watson.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3521 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Michal Ostrowski Date: 25 Jun 2003 18:06:54 -0400 Exactly this mechanism is what I had in mind. Great. The open question remaining is if there are any protocols which can be affected by packets being processed out of order. Some people have suggested that there are. If not, then there's not much to discuss. Can anyone comment on this decisively, either way? TCP, as one example, is able to cope very well. It is even able to distinguish reordering from true packet loss. From jmorris@intercode.com.au Wed Jun 25 18:15:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 18:15:11 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:0ykoLFyPAbpFBh3F4xi1J5hSYj3hPh89@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q1F42x005753 for ; Wed, 25 Jun 2003 18:15:06 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5Q1Epr01230; Thu, 26 Jun 2003 11:14:51 +1000 Date: Thu, 26 Jun 2003 11:14:51 +1000 (EST) From: James Morris To: Stephen Hemminger cc: "David S. Miller" , Subject: Re: [PATCH] (0/7) ipmr fixes In-Reply-To: <20030625114038.176ee030.shemminger@osdl.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3522 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Wed, 25 Jun 2003, Stephen Hemminger wrote: > These patches fix ip multicast route (ipmr) on 2.5.73. > > 1 - Trivial C99 initialization > 2 - Change functions/variables to static > 3 - Drop and reacquire RTNL in error path > 4 - Use time_after() > 5 - Use alloc_netdev > 6 - Fix OOPS on dropped packets > 7 - Get rid of skb_linearize > > Tested on 8-way SMP by bringing up pimd. Thanks, all applied to my net-patchmonkey tree at bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From jmorris@intercode.com.au Wed Jun 25 18:24:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 18:24:21 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:vA9bkRCd00TxRdO4FjzLi6dmPF1cALrz@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q1OG2x006095 for ; Wed, 25 Jun 2003 18:24:18 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5Q1O1r01287; Thu, 26 Jun 2003 11:24:02 +1000 Date: Thu, 26 Jun 2003 11:24:00 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: DAD must not have source link-layer option In-Reply-To: <20030626.043520.18011582.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3523 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev Both patches applied, thanks. - James -- James Morris From davem@redhat.com Wed Jun 25 21:05:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 21:05:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q45b2x007354 for ; Wed, 25 Jun 2003 21:05:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA01002; Wed, 25 Jun 2003 20:59:41 -0700 Date: Wed, 25 Jun 2003 20:59:41 -0700 (PDT) Message-Id: <20030625.205941.41631020.davem@redhat.com> To: rusty@rustcorp.com.au Cc: paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: "David S. Miller" In-Reply-To: <20030626035824.D68B62C147@lists.samba.org> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3524 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Thu, 26 Jun 2003 13:57:09 +1000 Frankly, I'm amazed anyone sees reordering in real life... Many paths on the internet are quite reordered, this is the first thing. In fact, I claim that any TCP stack that doesn't do reordering detection is busted performance wise. The second thing is that network cards can and do reorder packets. Some PCMCIA cards do this just for fun. And ethernet _DOES NOT_ guarentee non-reordering. At a minumum, a card can use QoS values to reorder receive of a given packet, it can also use this to reorder transmit. Our packet schedulers do this on a software level. If you need ordering, you need sequence numbers in your protocol if you wish to operate over these mediums. The case where SMP causes out-of-order packet delivery is just academic compared to the non-local sources of reordering mentioned above. From rusty@samba.org Wed Jun 25 21:39:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 21:39:34 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q4dE2x007804 for ; Wed, 25 Jun 2003 21:39:15 -0700 Received: by lists.samba.org (Postfix, from userid 590) id D68B62C147; Thu, 26 Jun 2003 03:58:24 +0000 (GMT) From: Rusty Russell To: "David S. Miller" Cc: paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-reply-to: Your message of "Wed, 25 Jun 2003 14:33:34 MST." <20030625.143334.85380461.davem@redhat.com> Date: Thu, 26 Jun 2003 13:57:09 +1000 Message-Id: <20030626035824.D68B62C147@lists.samba.org> X-archive-position: 3525 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <20030625.143334.85380461.davem@redhat.com> you write: > > Why don't you just queue the payload packets in a "resolution queue" > until the socket is created? Just make the resolution queue packets > timeout using a value that will easily exceed any reasonable PPP > negotiation time. Sure, that works in this case, where you know when you get the packet that it's out of order. But I wanted to see how ugly it got to do it generally: a protocol where you can't tell until later that things were in the wrong order can't use this technique. Paul tells me that multilink PPP assumes this (moral: don't do multilink PPPoE). Anyway, my patch is fundamentally flawed: you can't do cpu_raise_softirq() on another CPU, it's racy (*bad* *bad* interface). > All this ordered packet arrival shit is just beyond stupid. I want to know how often this is happening (Michal?), because if protocols need ordering and can't tell, it becomes effectively a packet drop somewhere down in the protocol. If it's 1 in a million, OK. If it's 1 in a thousand, that's bad. Frankly, I'm amazed anyone sees reordering in real life... Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From yoshfuji@linux-ipv6.org Wed Jun 25 22:19:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 22:19:59 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q5Jr2x008330 for ; Wed, 25 Jun 2003 22:19:55 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5Q5L9Bo017794; Thu, 26 Jun 2003 14:21:10 +0900 Date: Thu, 26 Jun 2003 14:21:09 +0900 (JST) Message-Id: <20030626.142109.76202395.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: inappropriate static variable in net/ipv6/ndisc.c From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3526 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. It is not appropriate to use static variable here. --yoshfuji Index: linux-2.5/net/ipv6/ndisc.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ndisc.c,v retrieving revision 1.40 diff -u -p -r1.40 ndisc.c --- linux-2.5/net/ipv6/ndisc.c 21 Jun 2003 16:21:01 -0000 1.40 +++ linux-2.5/net/ipv6/ndisc.c 26 Jun 2003 03:58:58 -0000 @@ -423,7 +423,7 @@ static void ndisc_send_na(struct net_dev struct in6_addr *daddr, struct in6_addr *solicited_addr, int router, int solicited, int override, int inc_opt) { - static struct in6_addr tmpaddr; + struct in6_addr tmpaddr; struct inet6_ifaddr *ifp; struct inet6_dev *idev; struct flowi fl; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Wed Jun 25 22:29:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 22:29:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q5Td2x008664 for ; Wed, 25 Jun 2003 22:29:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA01295; Wed, 25 Jun 2003 22:23:42 -0700 Date: Wed, 25 Jun 2003 22:23:42 -0700 (PDT) Message-Id: <20030625.222342.48531169.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: inappropriate static variable in net/ipv6/ndisc.c From: "David S. Miller" In-Reply-To: <20030626.142109.76202395.yoshfuji@linux-ipv6.org> References: <20030626.142109.76202395.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3527 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Thu, 26 Jun 2003 14:21:09 +0900 (JST) It is not appropriate to use static variable here. It used to be actually needed when ipv6 flows contained pointers to addresses, and yes that was racey. I'll apply this, thanks. From davem@redhat.com Wed Jun 25 23:48:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 25 Jun 2003 23:49:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q6mn2x009732 for ; Wed, 25 Jun 2003 23:48:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA01630; Wed, 25 Jun 2003 23:42:51 -0700 Date: Wed, 25 Jun 2003 23:42:51 -0700 (PDT) Message-Id: <20030625.234251.116353369.davem@redhat.com> To: krkumar@us.ibm.com Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) From: "David S. Miller" In-Reply-To: <3EF9D5C2.5080101@us.ibm.com> References: <3EF37458.3070103@us.ibm.com> <20030621.233634.67057417.yoshfuji@linux-ipv6.org> <3EF9D5C2.5080101@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3528 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev I don't think it's wise to make RTNETLINK facilities dependant upon ifdef values. Please kill CONFIG_IPV6_PREFIXLIST. People don't need to to enable funny options to get a fully function dhcp on ipv4, they should therefore not have to for ipv6 either. From yoshfuji@wide.ad.jp Thu Jun 26 00:21:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 00:22:01 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q7Lp2x010825 for ; Thu, 26 Jun 2003 00:21:52 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5Q7N7Bo000352; Thu, 26 Jun 2003 16:23:07 +0900 Date: Thu, 26 Jun 2003 16:23:07 +0900 (JST) Message-Id: <20030626.162307.85181540.yoshfuji@wide.ad.jp> To: davem@redhat.com CC: netdev@oss.sgi.com Subject: [PATCH] IPV6: make several ndisc private stuff static From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3529 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@wide.ad.jp Precedence: bulk X-list: netdev Hello. Make several ndisc private stuff static. Thanks. Index: linux-2.5/include/net/ndisc.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/ndisc.h,v retrieving revision 1.5 diff -u -d -r1.5 ndisc.h --- linux-2.5/include/net/ndisc.h 27 Sep 2002 02:20:35 -0000 1.5 +++ linux-2.5/include/net/ndisc.h 26 Jun 2003 05:28:50 -0000 @@ -56,20 +56,6 @@ __u8 nd_opt_len; } __attribute__((__packed__)); -struct ndisc_options { - struct nd_opt_hdr *nd_opt_array[7]; - struct nd_opt_hdr *nd_opt_piend; -}; - -#define nd_opts_src_lladdr nd_opt_array[ND_OPT_SOURCE_LL_ADDR] -#define nd_opts_tgt_lladdr nd_opt_array[ND_OPT_TARGET_LL_ADDR] -#define nd_opts_pi nd_opt_array[ND_OPT_PREFIX_INFO] -#define nd_opts_pi_end nd_opt_piend -#define nd_opts_rh nd_opt_array[ND_OPT_REDIRECT_HDR] -#define nd_opts_mtu nd_opt_array[ND_OPT_MTU] - -extern struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur, struct nd_opt_hdr *end); -extern struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len, struct ndisc_options *ndopts); extern int ndisc_init(struct net_proto_family *ops); Index: linux-2.5/net/ipv6/ndisc.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ndisc.c,v retrieving revision 1.40 diff -u -d -r1.40 ndisc.c --- linux-2.5/net/ipv6/ndisc.c 21 Jun 2003 16:21:01 -0000 1.40 +++ linux-2.5/net/ipv6/ndisc.c 26 Jun 2003 05:28:50 -0000 @@ -144,6 +144,19 @@ .gc_thresh3 = 1024, }; +/* ND options */ +struct ndisc_options { + struct nd_opt_hdr *nd_opt_array[7]; + struct nd_opt_hdr *nd_opt_piend; +}; + +#define nd_opts_src_lladdr nd_opt_array[ND_OPT_SOURCE_LL_ADDR] +#define nd_opts_tgt_lladdr nd_opt_array[ND_OPT_TARGET_LL_ADDR] +#define nd_opts_pi nd_opt_array[ND_OPT_PREFIX_INFO] +#define nd_opts_pi_end nd_opt_piend +#define nd_opts_rh nd_opt_array[ND_OPT_REDIRECT_HDR] +#define nd_opts_mtu nd_opt_array[ND_OPT_MTU] + #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7) static u8 *ndisc_fill_option(u8 *opt, int type, void *data, int data_len) @@ -160,8 +173,8 @@ return opt + space; } -struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur, - struct nd_opt_hdr *end) +static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur, + struct nd_opt_hdr *end) { int type; if (!cur || !end || cur >= end) @@ -173,8 +186,8 @@ return (cur <= end && cur->nd_opt_type == type ? cur : NULL); } -struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len, - struct ndisc_options *ndopts) +static struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len, + struct ndisc_options *ndopts) { struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)opt; @@ -375,7 +388,7 @@ * Send a Neighbour Advertisement */ -int ndisc_output(struct sk_buff *skb) +static int ndisc_output(struct sk_buff *skb) { if (skb) { struct neighbour *neigh = (skb->dst ? skb->dst->neighbour : NULL); @@ -701,7 +714,7 @@ } } -void ndisc_recv_ns(struct sk_buff *skb) +static void ndisc_recv_ns(struct sk_buff *skb) { struct nd_msg *msg = (struct nd_msg *)skb->h.raw; struct in6_addr *saddr = &skb->nh.ipv6h->saddr; @@ -897,7 +910,7 @@ return; } -void ndisc_recv_na(struct sk_buff *skb) +static void ndisc_recv_na(struct sk_buff *skb) { struct nd_msg *msg = (struct nd_msg *)skb->h.raw; struct in6_addr *saddr = &skb->nh.ipv6h->saddr; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Thu Jun 26 00:54:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 00:54:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q7sQ2x012178 for ; Thu, 26 Jun 2003 00:54:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA01803; Thu, 26 Jun 2003 00:48:30 -0700 Date: Thu, 26 Jun 2003 00:48:30 -0700 (PDT) Message-Id: <20030626.004830.02304376.davem@redhat.com> To: yoshfuji@wide.ad.jp Cc: netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: make several ndisc private stuff static From: "David S. Miller" In-Reply-To: <20030626.162307.85181540.yoshfuji@wide.ad.jp> References: <20030626.162307.85181540.yoshfuji@wide.ad.jp> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3530 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Thu, 26 Jun 2003 16:23:07 +0900 (JST) Make several ndisc private stuff static. Applied, thanks. From rusty@samba.org Thu Jun 26 01:53:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 01:53:13 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q8r72x013118 for ; Thu, 26 Jun 2003 01:53:08 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 096652C0B4; Thu, 26 Jun 2003 08:53:07 +0000 (GMT) From: Rusty Russell To: "David S. Miller" Cc: paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-reply-to: Your message of "Wed, 25 Jun 2003 20:59:41 MST." <20030625.205941.41631020.davem@redhat.com> Date: Thu, 26 Jun 2003 18:17:45 +1000 Message-Id: <20030626085307.096652C0B4@lists.samba.org> X-archive-position: 3531 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <20030625.205941.41631020.davem@redhat.com> you write: > From: Rusty Russell > Date: Thu, 26 Jun 2003 13:57:09 +1000 > > Frankly, I'm amazed anyone sees reordering in real life... > > Many paths on the internet are quite reordered, this is > the first thing. In fact, I claim that any TCP stack that > doesn't do reordering detection is busted performance wise. Sure, but I was assuming that the packets arrived in order and got processed out of order. If the first one isn't happening, this patch is doubly useless 8) Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From davem@redhat.com Thu Jun 26 02:01:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 02:01:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5Q91U2x013536 for ; Thu, 26 Jun 2003 02:01:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA01975; Thu, 26 Jun 2003 01:55:33 -0700 Date: Thu, 26 Jun 2003 01:55:33 -0700 (PDT) Message-Id: <20030626.015533.77059614.davem@redhat.com> To: rusty@rustcorp.com.au Cc: paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: "David S. Miller" In-Reply-To: <20030626085307.096652C0B4@lists.samba.org> References: <20030625.205941.41631020.davem@redhat.com> <20030626085307.096652C0B4@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3532 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Thu, 26 Jun 2003 18:17:45 +1000 If the first one isn't happening, this patch is doubly useless 8) It does happen, but so does reordering on ethernet itself. From carlson@workingcode.com Thu Jun 26 03:47:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 03:47:44 -0700 (PDT) Received: from workingcode.com (h006008986325.ne.client2.attbi.com [24.61.67.218]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QAlW2x017762 for ; Thu, 26 Jun 2003 03:47:32 -0700 Received: from workingcode.com (websterhost [127.0.0.1]) by workingcode.com (8.12.0/8.12.0) with ESMTP id h5QAlRW7019270; Thu, 26 Jun 2003 06:47:27 -0400 Received: (from carlson@localhost) by workingcode.com (8.12.0/8.12.0/Submit) id h5QAlRig018496; Thu, 26 Jun 2003 06:47:27 -0400 From: James Carlson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16122.53054.421234.711859@h006008986325.ne.client2.attbi.com> Date: Thu, 26 Jun 2003 06:47:26 -0400 (EDT) To: "David S. Miller" Cc: rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: David S. Miller's message of 26 June 2003 01:55:33 References: <20030625.205941.41631020.davem@redhat.com> <20030626085307.096652C0B4@lists.samba.org> <20030626.015533.77059614.davem@redhat.com> X-Mailer: VM 6.75 under Emacs 20.6.1 X-archive-position: 3533 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carlson@workingcode.com Precedence: bulk X-list: netdev David S. Miller writes: > From: Rusty Russell > Date: Thu, 26 Jun 2003 18:17:45 +1000 > > If the first one isn't happening, this patch is doubly useless 8) > > It does happen, but so does reordering on ethernet itself. That's nonsense, and it's directly counter to all of the 802 standards. Please explain how reordering on a single wire could ever take place. Reordering *requires* that you have something like a router in the path. It's an issue for a network or transport layer protocol to consider, but it's not a link-layer issue. If you have Ethernet interfaces that reorder packets between a given pair of stations, then those interfaces are just simply broken. From IEEE Std 802.1D, 1998: 6.3.3 Frame misordering The MAC Service does not permit the reordering of frames with a given user priority for a given combination of destination address and source address. MA_UNITDATA.indication service primitives corresponding to MA_UNITDATA.request primitives, with the same requested priority and for the same combination of destination and source addresses, are received in the same order as the request primitives were processed. Here are some excerpts from IEEE Std 802.3-2002: 1.4.94 Conversation: A set of MAC frames transmitted from one end station to another, where all of the MAC frames form an ordered sequence, and where the communicating end stations require the ordering to be maintained among the set of MAC frames exchanged. (See IEEE 802.3 Clause 43.) 43.2.1 Principles of Link Aggregation Link Aggregation allows a MAC Client to treat a set of one or more ports as if it were a single port. In doing so, it employs the following principles and concepts: [...] f) Frame ordering must be maintained for certain sequences of frame exchanges between MAC Clients (known as conversations, see 1.4). The Distributor ensures that all frames of a given conversation are passed to a single port. For any given port, the Collector is required to pass frames to the MAC Client in the order that they are received from that port. The Collector is otherwise free to select frames received from the aggregated ports in any order. Since there are no means for frames to be misordered on a single link, this guarantees that frame ordering is maintained for any conversation. [...] 43.2.3 Frame Collector A Frame Collector is responsible for receiving incoming frames (i.e., AggMuxN:MA_DATA.indications) from the set of individual links that form the Link Aggregation Group (through each link s associated Aggregator Parser/Multiplexer) and delivering them to the MAC Client. Frames received from a given port are delivered to the MAC Client in the order that they are received by the Frame Collector. Since the Frame Distributor is responsible for maintaining any frame ordering constraints, there is no requirement for the Frame Collector to perform any reordering of frames received from multiple links. [...] Annex 43A a) Frame duplication is not permitted. b) Frame ordering must be preserved in aggregated links. Strictly, the MAC service specication (ISO/IEC 15802-1) states that order must be preserved for frames with a given SA, DA, and priority; however, this is a tighter constraint than is absolutely necessary. There may be multiple, logically independent conversations in progress between a given SA-DA pair at a given priority; the real requirement is to maintain ordering within a conversation, though not necessarily between conversations. -- James Carlson From carlson@workingcode.com Thu Jun 26 03:51:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 03:51:39 -0700 (PDT) Received: from workingcode.com (h006008986325.ne.client2.attbi.com [24.61.67.218]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QApX2x018086 for ; Thu, 26 Jun 2003 03:51:34 -0700 Received: from workingcode.com (websterhost [127.0.0.1]) by workingcode.com (8.12.0/8.12.0) with ESMTP id h5QApVW7019594; Thu, 26 Jun 2003 06:51:31 -0400 Received: (from carlson@localhost) by workingcode.com (8.12.0/8.12.0/Submit) id h5QApU9f018308; Thu, 26 Jun 2003 06:51:30 -0400 From: James Carlson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> Date: Thu, 26 Jun 2003 06:51:30 -0400 (EDT) To: "David S. Miller" Cc: rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: David S. Miller's message of 25 June 2003 20:59:41 References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> X-Mailer: VM 6.75 under Emacs 20.6.1 X-archive-position: 3534 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carlson@workingcode.com Precedence: bulk X-list: netdev David S. Miller writes: > From: Rusty Russell > Date: Thu, 26 Jun 2003 13:57:09 +1000 > > Frankly, I'm amazed anyone sees reordering in real life... > > Many paths on the internet are quite reordered, this is > the first thing. In fact, I claim that any TCP stack that > doesn't do reordering detection is busted performance wise. Nobody's disputing that. That's certainly true. However, reordering on a given wire does not happen. > The second thing is that network cards can and do reorder packets. > Some PCMCIA cards do this just for fun. If so, then that needs to be taken up with the manufacturer. That's a rather severe design flaw that will prevent such a card from ever being used for anything other than IP -- many other protocols *ASSUME* that packets on a single wire cannot be reordered, including SNA, PPP (!), and link aggregation, among others. > And ethernet _DOES NOT_ > guarentee non-reordering. Please provide references. 802.1 MAC says otherwise. > At a minumum, a card can use QoS values to > reorder receive of a given packet, it can also use this to reorder > transmit. Our packet schedulers do this on a software level. Sure. *If* QoS is present, then reordering between priority levels is permissible. However, reordering L2 frames at a given priority level isn't. > If you need ordering, you need sequence numbers in your > protocol if you wish to operate over these mediums. > > The case where SMP causes out-of-order packet delivery is just > academic compared to the non-local sources of reordering > mentioned above. Not where it affects the correctness of the defined protocols. -- James Carlson From mostrows@watson.ibm.com Thu Jun 26 04:37:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 04:37:39 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QBbS2x019139 for ; Thu, 26 Jun 2003 04:37:29 -0700 Received: from sp1n293en1.watson.ibm.com (sp1n293en1.watson.ibm.com [9.2.112.57]) by igw2.watson.ibm.com (8.11.7/8.11.4) with ESMTP id h5QBaZq198820; Thu, 26 Jun 2003 07:36:35 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.224.107]) by sp1n293en1.watson.ibm.com (8.11.7/8.11.7) with ESMTP id h5QBbG8115174; Thu, 26 Jun 2003 07:37:16 -0400 Received: from brick.watson.ibm.com (brick.watson.ibm.com [9.2.216.48]) by kitch0.watson.ibm.com (AIX4.3/8.9.3p2/8.9.3/09-18-2002) with ESMTP id HAA53150; Thu, 26 Jun 2003 07:37:16 -0400 Subject: Re: [PATCH, untested] Support for PPPOE on SMP From: Michal Ostrowski To: Rusty Russell Cc: "David S. Miller" , Paul MacKerras , netdev@oss.sgi.com, fcusack@samba.org, carlson@workingcode.com In-Reply-To: <20030626035824.D68B62C147@lists.samba.org> References: <20030626035824.D68B62C147@lists.samba.org> Content-Type: text/plain Message-Id: <1056627436.27295.42.camel@brick.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 26 Jun 2003 07:37:16 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 3535 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mostrows@watson.ibm.com Precedence: bulk X-list: netdev On Wed, 2003-06-25 at 23:57, Rusty Russell wrote: > In message <20030625.143334.85380461.davem@redhat.com> you write: > > > > Why don't you just queue the payload packets in a "resolution queue" > > until the socket is created? Just make the resolution queue packets > > timeout using a value that will easily exceed any reasonable PPP > > negotiation time. > > Sure, that works in this case, where you know when you get the packet > that it's out of order. But I wanted to see how ugly it got to do it > generally: a protocol where you can't tell until later that things > were in the wrong order can't use this technique. Paul tells me that > multilink PPP assumes this (moral: don't do multilink PPPoE). > > Anyway, my patch is fundamentally flawed: you can't do > cpu_raise_softirq() on another CPU, it's racy (*bad* *bad* interface). > > > All this ordered packet arrival shit is just beyond stupid. > > I want to know how often this is happening (Michal?), because if > protocols need ordering and can't tell, it becomes effectively a > packet drop somewhere down in the protocol. If it's 1 in a million, > OK. If it's 1 in a thousand, that's bad. > > Frankly, I'm amazed anyone sees reordering in real life... I have observed (very, very rarely) a situation where interrupt sequences for two CPUs allowed this to happen (but not that it did necessarily happen). When these races do occur, it probably hits TCP traffic which deals with it, otherwise any hiccups it causes are probably lost in the noise. For PPPoE (non multilink) the worst case scenario would appear to be a packet drop with a retransmit delay imposed on or by higher-level protocols. That being said, I don't think PPPoE provides any justification for any modifications to the core networking code to deal with this. Continuing on with PPPoE, I would like to get people's opinions on whether or not mechanisms should be put in (as outlined in David's suggestion above) to handle races between payload packets and socket creation. These races are, I think, quite rare and at worst may impose a delay of a couple of seconds on session creation. I'm not entirely comfortable with the idea of saving incoming packets that I can't match to existing sessions in case a matching session comes into existence in the near future (DOS), especially if not handling this case is non-fatal. I'd like to get a consensus on this "policy" issue. -- Michal Ostrowski From yoshfuji@linux-ipv6.org Thu Jun 26 04:54:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 04:54:32 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QBsO2x020387 for ; Thu, 26 Jun 2003 04:54:25 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5QBteBo003653; Thu, 26 Jun 2003 20:55:40 +0900 Date: Thu, 26 Jun 2003 20:55:40 +0900 (JST) Message-Id: <20030626.205540.83592923.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, miyazawa@linux-ipv6.org Subject: [PATCH] IPV6: Fixed fragment check in ip6_output.c:ip6_fragment() From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3536 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. MTU / alignment check in ip6_fragment() was wrong; first_len was not correct. Thanks. Index: linux-2.5/net/ipv6/ip6_output.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ip6_output.c,v retrieving revision 1.29 diff -u -p -r1.29 ip6_output.c --- linux-2.5/net/ipv6/ip6_output.c 21 Jun 2003 16:20:41 -0000 1.29 +++ linux-2.5/net/ipv6/ip6_output.c 26 Jun 2003 10:28:17 -0000 @@ -939,7 +939,7 @@ static int ip6_fragment(struct sk_buff * mtu = dst_pmtu(&rt->u.dst) - hlen - sizeof(struct frag_hdr); if (skb_shinfo(skb)->frag_list) { - int first_len = 0; + int first_len = skb_pagelen(skb); if (first_len - hlen > mtu || ((first_len - hlen) & 7) || -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From degger@fhm.edu Thu Jun 26 06:49:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 06:49:22 -0700 (PDT) Received: from nicole.de.interearth.com (B5bf6.pppool.de [213.7.91.246]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QDn42x023698 for ; Thu, 26 Jun 2003 06:49:05 -0700 Received: from sonja (sonja [192.168.11.7]) by nicole.de.interearth.com (Postfix) with ESMTP id 5B3293EEB; Thu, 26 Jun 2003 15:39:27 +0200 (CEST) Subject: Re: [ANNOUNCE] nf-hipac v0.8 released From: Daniel Egger To: Michael Bellion and Thomas Heinz Cc: Linux Kernel Mailinglist , netdev@oss.sgi.com In-Reply-To: <200306252248.44224.nf@hipac.org> References: <200306252248.44224.nf@hipac.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-W3mKwm6MRxqAiREzWcJx" Message-Id: <1056634720.5423.83.camel@sonja> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 26 Jun 2003 15:38:41 +0200 X-archive-position: 3537 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: degger@fhm.edu Precedence: bulk X-list: netdev --=-W3mKwm6MRxqAiREzWcJx Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Am Mit, 2003-06-25 um 22.48 schrieb Michael Bellion and Thomas Heinz: > - libnfhipac: netlink library for kernel-user communication Is this library actually usable for applications which need to control the firewall or is it equally braindead to libiptables? --=20 Servus, Daniel --=-W3mKwm6MRxqAiREzWcJx Content-Type: application/pgp-signature; name=signature.asc Content-Description: Dies ist ein digital signierter Nachrichtenteil -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQA++vdfchlzsq9KoIYRAqL5AJ9bIwghJoehKAENHLZ+oZeTfo9JuACgiC9w PwTOwvadXnrvQ7ULVypqw9g= =mr4I -----END PGP SIGNATURE----- --=-W3mKwm6MRxqAiREzWcJx-- From nf@hipac.org Thu Jun 26 07:21:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 07:21:51 -0700 (PDT) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [62.67.200.160] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QELd2x030232 for ; Thu, 26 Jun 2003 07:21:40 -0700 Received: (qmail 2122 invoked from network); 26 Jun 2003 14:21:36 -0000 Received: from unknown (HELO portal.lan) (134300@[80.138.218.190]) (envelope-sender ) by smtprelay01.ispgateway.de (qmail-ldap-1.03) with SMTP for ; 26 Jun 2003 14:21:36 -0000 Received: from hipac.org (tmobile.lan [192.168.0.6]) by portal.lan (Postfix) with ESMTP id 87D7D4B060; Thu, 26 Jun 2003 14:56:03 +0200 (CEST) Message-ID: <3EFB0143.7000606@hipac.org> Date: Thu, 26 Jun 2003 16:20:51 +0200 From: Michael Bellion and Thomas Heinz Reply-To: Michael Bellion and Thomas Heinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1 X-Accept-Language: de, en MIME-Version: 1.0 To: Daniel Egger Cc: Linux Kernel Mailinglist , netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released References: <200306252248.44224.nf@hipac.org> <1056634720.5423.83.camel@sonja> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3538 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nf@hipac.org Precedence: bulk X-list: netdev Hi Daniel You wrote: >> - libnfhipac: netlink library for kernel-user communication > > Is this library actually usable for applications which need to control > the firewall or is it equally braindead to libiptables? The library _is_ intended to be used by other applications than the nf-hipac userspace tool, too. It hides the netlink communication from the user who is only required to construct the command data structure sent to the kernel which contains at most one single nf-hipac rule. This is very straightforward and the kernel returns detailed errors if the packet is misconstructed. Taking a look at nfhp_com.h and evt. nf-hipac.c gives you some clue on how to build valid command packets. Regards, +-----------------------+----------------------+ | Michael Bellion | Thomas Heinz | | | | +-----------------------+----------------------+ From degger@fhm.edu Thu Jun 26 07:46:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 07:46:31 -0700 (PDT) Received: from nicole.de.interearth.com (B55b8.pppool.de [213.7.85.184]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QEkH2x030753 for ; Thu, 26 Jun 2003 07:46:22 -0700 Received: from sonja (sonja [192.168.11.7]) by nicole.de.interearth.com (Postfix) with ESMTP id 8912E3EDE; Thu, 26 Jun 2003 16:46:16 +0200 (CEST) Subject: Re: [ANNOUNCE] nf-hipac v0.8 released From: Daniel Egger To: Michael Bellion and Thomas Heinz Cc: Linux Kernel Mailinglist , netdev@oss.sgi.com In-Reply-To: <3EFB0143.7000606@hipac.org> References: <200306252248.44224.nf@hipac.org> <1056634720.5423.83.camel@sonja> <3EFB0143.7000606@hipac.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-aOY37NhVanfwwvqe0Iu0" Message-Id: <1056638729.4962.86.camel@sonja> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.0 Date: 26 Jun 2003 16:45:30 +0200 X-archive-position: 3539 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: degger@fhm.edu Precedence: bulk X-list: netdev --=-aOY37NhVanfwwvqe0Iu0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Am Don, 2003-06-26 um 16.20 schrieb Michael Bellion and Thomas Heinz: > Taking a look at nfhp_com.h and evt. nf-hipac.c gives you some clue > on how to build valid command packets. Thanks. Your reply made me somewhat curious and I'll definitely have a look, hoping the interface is much better than libiptables which is merely a bunch of convience functions for use of the iptables utility but unusable for real world applications which need to deal with=20 firewall rules. --=20 Servus, Daniel --=-aOY37NhVanfwwvqe0Iu0 Content-Type: application/pgp-signature; name=signature.asc Content-Description: Dies ist ein digital signierter Nachrichtenteil -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQA++wcJchlzsq9KoIYRAi03AKDfHluFMhoZiFpxSQpw+i6XYPj1DQCfaf29 X50vfIvp0Zg4OIgFx/ZhkQE= =QoaY -----END PGP SIGNATURE----- --=-aOY37NhVanfwwvqe0Iu0-- From creatix@hipac.org Thu Jun 26 07:50:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 07:50:25 -0700 (PDT) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [62.67.200.160] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QEoI2x031128 for ; Thu, 26 Jun 2003 07:50:19 -0700 Received: (qmail 4473 invoked from network); 25 Jun 2003 23:53:14 -0000 Received: from unknown (HELO portal.lan) (134300@[80.138.231.107]) (envelope-sender ) by smtprelay01.ispgateway.de (qmail-ldap-1.03) with SMTP for ; 25 Jun 2003 23:53:14 -0000 Received: from hipac.org (tmobile.lan [192.168.0.6]) by portal.lan (Postfix) with ESMTP id CF2B64B060; Thu, 26 Jun 2003 00:28:04 +0200 (CEST) Message-ID: <3EFA35D3.3020408@hipac.org> Date: Thu, 26 Jun 2003 01:52:51 +0200 From: Thomas Heinz Reply-To: Michael Bellion and Thomas Heinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1 X-Accept-Language: de, en MIME-Version: 1.0 To: folkert@vanheusden.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released References: <200306252248.44224.nf@hipac.org> <200306252303.13366.folkert@vanheusden.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3540 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: creatix@hipac.org Precedence: bulk X-list: netdev Hi Folkert You wrote: > Looks great! > Any chance on a port to 2.5.x? It should not be that hard to port nf-hipac to 2.5 since most of the code (the whole hipac core) is not "kernel specific". But since we are busy planning the next hipac extension we don't have the time to do this ourselves. Maybe some volunteer is willing to implement the port. Thomas From shmulik.hen@intel.com Thu Jun 26 07:58:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 07:58:17 -0700 (PDT) Received: from hermes.jf.intel.com (fmr05.intel.com [134.134.136.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QEw32x031781 for ; Thu, 26 Jun 2003 07:58:04 -0700 Received: from talaria.jf.intel.com (talaria.jf.intel.com [10.7.209.7]) by hermes.jf.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5QEtsb21506 for ; Thu, 26 Jun 2003 14:55:54 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by talaria.jf.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5QEOsG14972 for ; Thu, 26 Jun 2003 14:24:54 GMT Received: from jrslxjul1.npdj.intel.com ([10.12.254.186]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003062608085100134 ; Thu, 26 Jun 2003 08:08:54 -0700 Date: Thu, 26 Jun 2003 17:57:52 +0300 (IDT) From: Shmulik Hen X-X-Sender: Reply-To: Shmulik Hen To: bond-devel , linux-net , linux-netdev cc: Amir Noam , "Chad N. Tindel" , Jay Vosburgh , Jeff Garzik , Noam Marom , Shmulik Hen , Tsippy Mendelson Subject: [bonding][patch] Fix load balance problem with high UDP Tx stress Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3541 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Hi, This patch fixes a problem detected by our QA group. On very high UDP Tx stress traffic on 10/100 adapters, load sharing would collapse to only one slave after very short time. The bug is due to unsigned to signed conversions that caused calculation errors (outgoing traffic "exceeds" adapter's actual capability). Since we still don't use bitkeeper, this patch should be applied on top of Marcelo's 2.4.22-pre1 patch plus Jeff Garzik's 2.4 net driver updates from from June 20Th (2.4.22-pre1-netdrvr1). -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | diff -Nuarp linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_alb.c linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_alb.c --- linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_alb.c Wed Jun 25 16:33:19 2003 +++ linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_alb.c Wed Jun 25 16:33:20 2003 @@ -17,6 +17,13 @@ * * The full GNU General Public License is included in this distribution in the * file called LICENSE. + * + * + * Changes: + * + * 2003/06/25 - Shmulik Hen + * - Fixed signed/unsigned calculation errors that caused load sharing + * to collapse to one slave under very heavy UDP Tx stress. */ #include @@ -246,7 +253,7 @@ tlb_get_least_loaded_slave(struct bondin { struct slave *slave; struct slave *least_loaded; - u32 curr_gap, max_gap; + s64 curr_gap, max_gap; /* Find the first enabled slave */ slave = bond_get_first_slave(bond); @@ -262,15 +269,15 @@ tlb_get_least_loaded_slave(struct bondin } least_loaded = slave; - max_gap = (slave->speed * 1000000) - - (SLAVE_TLB_INFO(slave).load * 8); + max_gap = (s64)(slave->speed * 1000000) - + (s64)(SLAVE_TLB_INFO(slave).load * 8); /* Find the slave with the largest gap */ slave = bond_get_next_slave(bond, slave); while (slave) { if (SLAVE_IS_OK(slave)) { - curr_gap = (slave->speed * 1000000) - - (SLAVE_TLB_INFO(slave).load * 8); + curr_gap = (s64)(slave->speed * 1000000) - + (s64)(SLAVE_TLB_INFO(slave).load * 8); if (max_gap < curr_gap) { least_loaded = slave; max_gap = curr_gap; diff -Nuarp linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_main.c linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_main.c --- linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_main.c Wed Jun 25 16:33:19 2003 +++ linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_main.c Wed Jun 25 16:33:20 2003 @@ -429,8 +429,8 @@ #include "bond_3ad.h" #include "bond_alb.h" -#define DRV_VERSION "2.2.11" -#define DRV_RELDATE "May 29, 2003" +#define DRV_VERSION "2.2.12" +#define DRV_RELDATE "June 25, 2003" #define DRV_NAME "bonding" #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" From shmulik.hen@intel.com Thu Jun 26 08:04:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 08:04:53 -0700 (PDT) Received: from hermes.jf.intel.com (fmr05.intel.com [134.134.136.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QF4k2x032153 for ; Thu, 26 Jun 2003 08:04:47 -0700 Received: from talaria.jf.intel.com (talaria.jf.intel.com [10.7.209.7]) by hermes.jf.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5QF2bb26158 for ; Thu, 26 Jun 2003 15:02:37 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by talaria.jf.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5QEVaG21638 for ; Thu, 26 Jun 2003 14:31:37 GMT Received: from jrslxjul1.npdj.intel.com ([10.12.254.186]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003062608153411595 ; Thu, 26 Jun 2003 08:15:37 -0700 Date: Thu, 26 Jun 2003 18:04:34 +0300 (IDT) From: Shmulik Hen X-X-Sender: Reply-To: Shmulik Hen To: bond-devel , linux-net , linux-netdev cc: Amir Noam , "Chad N. Tindel" , Jay Vosburgh , Jeff Garzik , Noam Marom , Shmulik Hen , Tsippy Mendelson Subject: [bonding][patch] Fix 802.3ad long fail over with high UDP Tx stress Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by oss.sgi.com id h5QF4k2x032153 X-archive-position: 3542 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Hi, This patch fixes a problem detected by our QA group. On very high bi-directional stress traffic, removing the last slave of the active aggregator results in long failover time to another aggregator (upto 90 sec). The fix is to send LACPDU packets with the highest priority (TC_PRIO_CONTROL), to overcome the possibility of packets being dropped from the adapter's queue. This further fixes the original long failover problem reported by Jay Vosburgh on April 3rd and fixed by us on May 20th. We verified it fixes the problem for 1000Mbps adapters, but it may still not entirely fix it for 10/100 adapters since they simply can't handle the load. In the latter case, the failover may have to wait the entire timeout. Since we still don't use bitkeeper, this patch should be applied on top of Marcelo's 2.4.22-pre1 patch plus Jeff Garzik's 2.4 net driver updates from from June 20Th 2.4.22-pre1-netdrvr1. There is also a small fix for a non-printable character that somehow snuck into bond_3ad.h. -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | diff -Nuarp linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_3ad.c linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_3ad.c --- linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_3ad.c Wed Jun 25 16:33:22 2003 +++ linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_3ad.c Wed Jun 25 16:33:23 2003 @@ -37,6 +37,16 @@ * 2003/05/01 - Shmulik Hen * - Renamed bond_3ad_link_status_changed() to * bond_3ad_handle_link_change() for compatibility with TLB. + * + * 2003/05/20 - Amir Noam + * - Fix long fail over time when releasing last slave of an active + * aggregator - send LACPDU on unbind of slave to tell partner this + * port is no longer aggregatable. + * + * 2003/06/25 - Tsippy Mendelson + * - Send LACPDU as highest priority packet to further fix the above + * problem on very high Tx traffic load where packets may get dropped + * by the slave. */ #include @@ -45,6 +55,7 @@ #include #include #include +#include #include "bonding.h" #include "bond_3ad.h" @@ -905,6 +916,7 @@ static int ad_lacpdu_send(struct port *p skb->mac.raw = skb->data; skb->nh.raw = skb->data + ETH_HLEN; skb->protocol = PKT_TYPE_LACPDU; + skb->priority = TC_PRIO_CONTROL; lacpdu_header = (struct lacpdu_header *)skb_put(skb, length); diff -Nuarp linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_3ad.h linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_3ad.h --- linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_3ad.h Wed Jun 25 16:33:22 2003 +++ linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_3ad.h Wed Jun 25 16:33:23 2003 @@ -165,7 +165,7 @@ typedef struct marker { // = 0x02 (marker response information) u8 marker_length; // = 0x16 u16 requester_port; // The number assigned to the port by the requester - struct mac_addr requester_system; // The requester’s system id + struct mac_addr requester_system; // The requester's system id u32 requester_transaction_id; // The transaction id allocated by the requester, u16 pad; // = 0 u8 tlv_type_terminator; // = 0x00 diff -Nuarp linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_main.c linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_main.c --- linux-2.4.22-pre1-netdrvr1/drivers/net/bonding/bond_main.c Wed Jun 25 16:33:22 2003 +++ linux-2.4.22-pre1-netdrvr1-devel/drivers/net/bonding/bond_main.c Wed Jun 25 16:33:23 2003 @@ -429,7 +429,7 @@ #include "bond_3ad.h" #include "bond_alb.h" -#define DRV_VERSION "2.2.12" +#define DRV_VERSION "2.2.13" #define DRV_RELDATE "June 25, 2003" #define DRV_NAME "bonding" #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" From krkumar@us.ibm.com Thu Jun 26 09:32:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 09:33:12 -0700 (PDT) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QGWp2x000892 for ; Thu, 26 Jun 2003 09:32:58 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e2.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5QGWi9X127468; Thu, 26 Jun 2003 12:32:44 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5QGWf7A062178; Thu, 26 Jun 2003 12:32:42 -0400 Message-ID: <3EFB2017.5030202@us.ibm.com> Date: Thu, 26 Jun 2003 09:32:23 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) References: <3EF37458.3070103@us.ibm.com> <20030621.233634.67057417.yoshfuji@linux-ipv6.org> <3EF9D5C2.5080101@us.ibm.com> <20030625.234251.116353369.davem@redhat.com> In-Reply-To: <20030625.234251.116353369.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3543 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Dave, > I don't think it's wise to make RTNETLINK facilities dependant upon ifdef Following is the patch without the PREFIXLIST config option. I have remade it against 2.5.73. We would like to have this functionality in 2.4 kernel, so can I go ahead and send this patch against 2.4.21 too ? Thanks, - KK -------------------------------------------------------------------------------- diff -ruN linux-2.5.73.org/include/linux/ipv6_route.h linux-2.5.73.new/include/linux/ipv6_route.h --- linux-2.5.73.org/include/linux/ipv6_route.h 2003-06-22 11:32:36.000000000 -0700 +++ linux-2.5.73.new/include/linux/ipv6_route.h 2003-06-26 09:05:01.000000000 -0700 @@ -44,4 +44,16 @@ #define RTMSG_NEWROUTE 0x21 #define RTMSG_DELROUTE 0x22 +/* + * Return entire prefix list in array of following structures. Provides the + * prefix and prefix length for all devices. + */ + +struct in6_prefix_msg +{ + int ifindex; + int prefix_len; + struct in6_addr prefix; +}; + #endif diff -ruN linux-2.5.73.org/include/linux/rtnetlink.h linux-2.5.73.new/include/linux/rtnetlink.h --- linux-2.5.73.org/include/linux/rtnetlink.h 2003-06-22 11:33:07.000000000 -0700 +++ linux-2.5.73.new/include/linux/rtnetlink.h 2003-06-26 09:05:01.000000000 -0700 @@ -47,7 +47,11 @@ #define RTM_DELTFILTER (RTM_BASE+29) #define RTM_GETTFILTER (RTM_BASE+30) -#define RTM_MAX (RTM_BASE+31) +#define RTM_GETLNKFLAGS (RTM_BASE+34) + +#define RTM_GETPLIST (RTM_BASE+38) + +#define RTM_MAX (RTM_GETPLIST+1) /* Generic structure for encapsulation of optional route information. @@ -61,6 +65,14 @@ unsigned short rta_type; }; +/* Structure to return per interface device flags */ + +struct ifp_if6info +{ + int ifindex; + int flags; +}; + /* Macros to handle rtattributes */ #define RTA_ALIGNTO 4 @@ -201,9 +213,11 @@ RTA_FLOW, RTA_CACHEINFO, RTA_SESSION, + RTA_LINKFLAGS, + RTA_RA6INFO, /* No support yet, send event on new prefix event */ }; -#define RTA_MAX RTA_SESSION +#define RTA_MAX RTA_RA6INFO #define RTM_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct rtmsg)))) #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg)) diff -ruN linux-2.5.73.org/include/net/if_inet6.h linux-2.5.73.new/include/net/if_inet6.h --- linux-2.5.73.org/include/net/if_inet6.h 2003-06-22 11:33:32.000000000 -0700 +++ linux-2.5.73.new/include/net/if_inet6.h 2003-06-26 09:05:01.000000000 -0700 @@ -17,6 +17,8 @@ #include +#define IF_RA_OTHERCONF 0x80 +#define IF_RA_MANAGED 0x40 #define IF_RA_RCVD 0x20 #define IF_RS_SENT 0x10 diff -ruN linux-2.5.73.org/include/net/ip6_route.h linux-2.5.73.new/include/net/ip6_route.h --- linux-2.5.73.org/include/net/ip6_route.h 2003-06-22 11:32:37.000000000 -0700 +++ linux-2.5.73.new/include/net/ip6_route.h 2003-06-26 09:05:01.000000000 -0700 @@ -87,6 +87,7 @@ struct nlmsghdr; struct netlink_callback; extern int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb); +extern int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb); extern int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_getroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); diff -ruN linux-2.5.73.org/net/ipv6/addrconf.c linux-2.5.73.new/net/ipv6/addrconf.c --- linux-2.5.73.org/net/ipv6/addrconf.c 2003-06-22 11:33:17.000000000 -0700 +++ linux-2.5.73.new/net/ipv6/addrconf.c 2003-06-26 09:05:01.000000000 -0700 @@ -129,7 +129,7 @@ static int addrconf_ifdown(struct net_device *dev, int how); -static void addrconf_dad_start(struct inet6_ifaddr *ifp); +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags); static void addrconf_dad_timer(unsigned long data); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_rs_timer(unsigned long data); @@ -715,7 +715,7 @@ ift->prefered_lft = tmp_prefered_lft; ift->tstamp = ifp->tstamp; spin_unlock_bh(&ift->lock); - addrconf_dad_start(ift); + addrconf_dad_start(ift, 0); in6_ifa_put(ift); in6_dev_put(idev); out: @@ -1211,7 +1211,7 @@ rtmsg.rtmsg_dst_len = 8; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; rtmsg.rtmsg_ifindex = dev->ifindex; - rtmsg.rtmsg_flags = RTF_UP|RTF_ADDRCONF; + rtmsg.rtmsg_flags = RTF_UP; rtmsg.rtmsg_type = RTMSG_NEWROUTE; ip6_route_add(&rtmsg, NULL, NULL); } @@ -1238,7 +1238,7 @@ struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xFE800000), 0, 0, 0); - addrconf_prefix_route(&addr, 64, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&addr, 64, dev, 0, 0); } static struct inet6_dev *addrconf_add_dev(struct net_device *dev) @@ -1378,7 +1378,7 @@ } create = 1; - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, RTF_ADDRCONF); } if (ifp && valid_lft == 0) { @@ -1529,7 +1529,7 @@ ifp = ipv6_add_addr(idev, pfx, plen, scope, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); return 0; } @@ -1704,7 +1704,7 @@ ifp = ipv6_add_addr(idev, addr, 64, IFA_LINK, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); } } @@ -1943,8 +1943,7 @@ memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); rtmsg.rtmsg_type = RTMSG_NEWROUTE; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; - rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | - RTF_DEFAULT | RTF_UP); + rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_DEFAULT | RTF_UP); rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; @@ -1958,7 +1957,7 @@ /* * Duplicate Address Detection */ -static void addrconf_dad_start(struct inet6_ifaddr *ifp) +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags) { struct net_device *dev; unsigned long rand_num; @@ -1968,7 +1967,7 @@ addrconf_join_solict(dev, &ifp->addr); if (ifp->prefix_len != 128 && (ifp->flags&IFA_F_PERMANENT)) - addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, flags); net_srandom(ifp->addr.s6_addr32[3]); rand_num = net_random() % (ifp->idev->cnf.rtr_solicit_delay ? : 1); @@ -2451,6 +2450,42 @@ netlink_broadcast(rtnl, skb, 0, RTMGRP_IPV6_IFADDR, GFP_ATOMIC); } +int inet6_dump_linkflags(struct sk_buff *skb, struct netlink_callback *cb) +{ + int ifindex, flags = 0; + struct net_device *dev; + struct inet6_dev *idev; + struct nlmsghdr *nlh; + struct ifp_if6info *ifp = NLMSG_DATA(cb->nlh); + unsigned char *org_tail = skb->tail; + + ifindex = ifp->ifindex; + + if ((dev = dev_get_by_index(ifindex)) == NULL) + goto out; + if ((idev = in6_dev_get(dev)) != NULL) { + flags = idev->if_flags; + in6_dev_put(idev); + } + dev_put(dev); + + nlh = NLMSG_PUT(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + RTA_LINKFLAGS, sizeof(*ifp)); + ifp = NLMSG_DATA(nlh); + ifp->flags = flags; + ifp->ifindex = ifindex; /* duplicate information for user to verify */ + + nlh->nlmsg_len = skb->tail - org_tail; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "inet6_dump_linkflags:skb size not enough\n"); + skb_trim(skb, org_tail - skb->data); + +out: + return -1; +} + static struct rtnetlink_link inet6_rtnetlink_table[RTM_MAX - RTM_BASE + 1] = { [RTM_NEWADDR - RTM_BASE] = { .doit = inet6_rtm_newaddr, }, [RTM_DELADDR - RTM_BASE] = { .doit = inet6_rtm_deladdr, }, @@ -2459,6 +2494,8 @@ [RTM_DELROUTE - RTM_BASE] = { .doit = inet6_rtm_delroute, }, [RTM_GETROUTE - RTM_BASE] = { .doit = inet6_rtm_getroute, .dumpit = inet6_dump_fib, }, + [RTM_GETLNKFLAGS - RTM_BASE] = { .dumpit = inet6_dump_linkflags, }, + [RTM_GETPLIST - RTM_BASE] = { .dumpit = inet6_dump_prefix, }, }; static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) diff -ruN linux-2.5.73.org/net/ipv6/ndisc.c linux-2.5.73.new/net/ipv6/ndisc.c --- linux-2.5.73.org/net/ipv6/ndisc.c 2003-06-22 11:32:56.000000000 -0700 +++ linux-2.5.73.new/net/ipv6/ndisc.c 2003-06-26 09:05:01.000000000 -0700 @@ -1036,6 +1036,16 @@ */ in6_dev->if_flags |= IF_RA_RCVD; } + /* + * Remember the managed/otherconf flags from most recently + * receieved RA message (RFC 2462) -- yoshfuji + */ + in6_dev->if_flags = (in6_dev->if_flags & ~(IF_RA_MANAGED| + IF_RA_OTHERCONF)) | + (ra_msg->icmph.icmp6_addrconf_managed ? + IF_RA_MANAGED : 0) | + (ra_msg->icmph.icmp6_addrconf_other ? + IF_RA_OTHERCONF : 0); lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); diff -ruN linux-2.5.73.org/net/ipv6/route.c linux-2.5.73.new/net/ipv6/route.c --- linux-2.5.73.org/net/ipv6/route.c 2003-06-22 11:33:05.000000000 -0700 +++ linux-2.5.73.new/net/ipv6/route.c 2003-06-26 09:05:01.000000000 -0700 @@ -1511,6 +1511,66 @@ return 0; } +static int rt6_fill_prefix(struct sk_buff *skb, struct rt6_info *rt, + int type, u32 pid, u32 seq) +{ + struct in6_prefix_msg *pmsg; + struct nlmsghdr *nlh; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, type, sizeof(*pmsg)); + pmsg = NLMSG_DATA(nlh); + pmsg->ifindex = rt->rt6i_dev->ifindex; + pmsg->prefix_len = rt->rt6i_dst.plen; + ipv6_addr_copy(&pmsg->prefix, &rt->rt6i_dst.addr); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "rt6_fill_prefix:skb size not enough\n"); + skb_trim(skb, b - skb->data); + return -1; +} + +static int rt6_dump_route_prefix(struct rt6_info *rt, void *p_arg) +{ + int addr_type; + struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; + + /* + * Definition of a prefix : + * - Should be autoconfigured + * - No nexthop + * - Not a linklocal, loopback or multicast type. + */ + if (rt->rt6i_nexthop || (rt->rt6i_flags & RTF_ADDRCONF) == 0) + return 0; + addr_type = ipv6_addr_type(&rt->rt6i_dst.addr); + if ((addr_type & (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK | + IPV6_ADDR_MULTICAST)) != 0 || + addr_type == IPV6_ADDR_ANY) + return 0; + return rt6_fill_prefix(arg->skb, rt, RTM_GETPLIST, + NETLINK_CB(arg->cb->skb).pid, arg->cb->nlh->nlmsg_seq); +} + +static int fib6_dump_prefix(struct fib6_walker_t *w) +{ + int res; + struct rt6_info *rt; + + for (rt = w->leaf; rt; rt = rt->u.next) { + res = rt6_dump_route_prefix(rt, w->args); + if (res < 0) { + /* Frame is full, suspend walking */ + w->leaf = rt; + return 1; + } + } + w->leaf = NULL; + return 0; +} + static void fib6_dump_end(struct netlink_callback *cb) { struct fib6_walker_t *w = (void*)cb->args[0]; @@ -1532,7 +1592,8 @@ return cb->done(cb); } -int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +static int __inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb, + int prefix) { struct rt6_rtnl_dump_arg arg; struct fib6_walker_t *w; @@ -1559,7 +1620,10 @@ RT6_TRACE("dump<%p", w); memset(w, 0, sizeof(*w)); w->root = &ip6_routing_table; - w->func = fib6_dump_node; + if (prefix) + w->func = fib6_dump_prefix; + else + w->func = fib6_dump_node; w->args = &arg; cb->args[0] = (long)w; read_lock_bh(&rt6_lock); @@ -1586,6 +1650,16 @@ return res; } +int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 0); +} + +int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 1); +} + int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; From krkumar@us.ibm.com Thu Jun 26 15:40:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 15:41:05 -0700 (PDT) Received: from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.106]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QMeh2x004614 for ; Thu, 26 Jun 2003 15:40:50 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e6.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5QMeaxr168518; Thu, 26 Jun 2003 18:40:37 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5QMeT7A204994; Thu, 26 Jun 2003 18:40:30 -0400 Message-ID: <3EFB7648.3080702@us.ibm.com> Date: Thu, 26 Jun 2003 15:40:08 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, linux-net@vger.kernel.org, yoshfuji@linux-ipv6.org Subject: [PATCH] Prefix List against 2.4.21 References: <3EF37458.3070103@us.ibm.com> <20030621.233634.67057417.yoshfuji@linux-ipv6.org> <3EF9D5C2.5080101@us.ibm.com> <20030625.234251.116353369.davem@redhat.com> In-Reply-To: <20030625.234251.116353369.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3545 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 11819 Lines: 419 Hi dave, This is the same patch against 2.4.21. Minor changes in initialization of inet6_rtnetlink_table, otherwise the rest is the same. I have tested it, and it works fine. thanks, - KK -------------------------------------------------------------------------------- diff -ruN linux-2.4.21.org/include/linux/ipv6_route.h linux-2.4.21/include/linux/ipv6_route.h --- linux-2.4.21.org/include/linux/ipv6_route.h 1998-08-27 19:33:08.000000000 -0700 +++ linux-2.4.21/include/linux/ipv6_route.h 2003-06-26 09:35:05.000000000 -0700 @@ -53,4 +53,16 @@ #define RTMSG_NEWROUTE 0x21 #define RTMSG_DELROUTE 0x22 +/* + * Return entire prefix list in array of following structures. Provides the + * prefix and prefix length for all devices. + */ + +struct in6_prefix_msg +{ + int ifindex; + int prefix_len; + struct in6_addr prefix; +}; + #endif diff -ruN linux-2.4.21.org/include/linux/rtnetlink.h linux-2.4.21/include/linux/rtnetlink.h --- linux-2.4.21.org/include/linux/rtnetlink.h 2002-11-28 15:53:15.000000000 -0800 +++ linux-2.4.21/include/linux/rtnetlink.h 2003-06-26 12:50:46.000000000 -0700 @@ -46,7 +46,11 @@ #define RTM_DELTFILTER (RTM_BASE+29) #define RTM_GETTFILTER (RTM_BASE+30) -#define RTM_MAX (RTM_BASE+31) +#define RTM_GETLNKFLAGS (RTM_BASE+34) + +#define RTM_GETPLIST (RTM_BASE+38) + +#define RTM_MAX (RTM_GETPLIST+1) /* Generic structure for encapsulation optional route information. @@ -60,6 +64,14 @@ unsigned short rta_type; }; +/* Structure to return per interface device flags */ + +struct ifp_if6info +{ + int ifindex; + int flags; +}; + /* Macros to handle rtattributes */ #define RTA_ALIGNTO 4 @@ -198,10 +210,12 @@ RTA_MULTIPATH, RTA_PROTOINFO, RTA_FLOW, - RTA_CACHEINFO + RTA_CACHEINFO, + RTA_LINKFLAGS, + RTA_RA6INFO, /* No support yet, send event on new prefix event */ }; -#define RTA_MAX RTA_CACHEINFO +#define RTA_MAX RTA_RA6INFO #define RTM_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct rtmsg)))) #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg)) diff -ruN linux-2.4.21.org/include/net/if_inet6.h linux-2.4.21/include/net/if_inet6.h --- linux-2.4.21.org/include/net/if_inet6.h 2003-06-13 07:51:39.000000000 -0700 +++ linux-2.4.21/include/net/if_inet6.h 2003-06-26 09:35:05.000000000 -0700 @@ -15,6 +15,8 @@ #ifndef _NET_IF_INET6_H #define _NET_IF_INET6_H +#define IF_RA_OTHERCONF 0x80 +#define IF_RA_MANAGED 0x40 #define IF_RA_RCVD 0x20 #define IF_RS_SENT 0x10 diff -ruN linux-2.4.21.org/include/net/ip6_route.h linux-2.4.21/include/net/ip6_route.h --- linux-2.4.21.org/include/net/ip6_route.h 2003-06-13 07:51:39.000000000 -0700 +++ linux-2.4.21/include/net/ip6_route.h 2003-06-26 13:51:22.000000000 -0700 @@ -84,6 +84,7 @@ struct nlmsghdr; struct netlink_callback; extern int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb); +extern int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb); extern int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet6_rtm_getroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); diff -ruN linux-2.4.21.org/net/ipv6/addrconf.c linux-2.4.21/net/ipv6/addrconf.c --- linux-2.4.21.org/net/ipv6/addrconf.c 2003-06-13 07:51:39.000000000 -0700 +++ linux-2.4.21/net/ipv6/addrconf.c 2003-06-26 15:05:27.000000000 -0700 @@ -101,7 +101,7 @@ static int addrconf_ifdown(struct net_device *dev, int how); -static void addrconf_dad_start(struct inet6_ifaddr *ifp); +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags); static void addrconf_dad_timer(unsigned long data); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_rs_timer(unsigned long data); @@ -889,7 +889,7 @@ rtmsg.rtmsg_dst_len = 8; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; rtmsg.rtmsg_ifindex = dev->ifindex; - rtmsg.rtmsg_flags = RTF_UP|RTF_ADDRCONF; + rtmsg.rtmsg_flags = RTF_UP; rtmsg.rtmsg_type = RTMSG_NEWROUTE; ip6_route_add(&rtmsg, NULL); } @@ -916,7 +916,7 @@ struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xFE800000), 0, 0, 0); - addrconf_prefix_route(&addr, 64, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&addr, 64, dev, 0, 0); } static struct inet6_dev *addrconf_add_dev(struct net_device *dev) @@ -1054,7 +1054,7 @@ return; } - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, RTF_ADDRCONF); } if (ifp && valid_lft == 0) { @@ -1166,7 +1166,7 @@ ifp = ipv6_add_addr(idev, pfx, plen, scope, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); return 0; } @@ -1341,7 +1341,7 @@ ifp = ipv6_add_addr(idev, addr, 64, IFA_LINK, IFA_F_PERMANENT); if (!IS_ERR(ifp)) { - addrconf_dad_start(ifp); + addrconf_dad_start(ifp, 0); in6_ifa_put(ifp); } } @@ -1578,8 +1578,7 @@ memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); rtmsg.rtmsg_type = RTMSG_NEWROUTE; rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; - rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | - RTF_DEFAULT | RTF_UP); + rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_DEFAULT | RTF_UP); rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; @@ -1593,7 +1592,7 @@ /* * Duplicate Address Detection */ -static void addrconf_dad_start(struct inet6_ifaddr *ifp) +static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags) { struct net_device *dev; unsigned long rand_num; @@ -1603,7 +1602,7 @@ addrconf_join_solict(dev, &ifp->addr); if (ifp->prefix_len != 128 && (ifp->flags&IFA_F_PERMANENT)) - addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, RTF_ADDRCONF); + addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev, 0, flags); net_srandom(ifp->addr.s6_addr32[3]); rand_num = net_random() % (ifp->idev->cnf.rtr_solicit_delay ? : 1); @@ -1971,6 +1970,42 @@ netlink_broadcast(rtnl, skb, 0, RTMGRP_IPV6_IFADDR, GFP_ATOMIC); } +int inet6_dump_linkflags(struct sk_buff *skb, struct netlink_callback *cb) +{ + int ifindex, flags = 0; + struct net_device *dev; + struct inet6_dev *idev; + struct nlmsghdr *nlh; + struct ifp_if6info *ifp = NLMSG_DATA(cb->nlh); + unsigned char *org_tail = skb->tail; + + ifindex = ifp->ifindex; + + if ((dev = dev_get_by_index(ifindex)) == NULL) + goto out; + if ((idev = in6_dev_get(dev)) != NULL) { + flags = idev->if_flags; + in6_dev_put(idev); + } + dev_put(dev); + + nlh = NLMSG_PUT(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + RTA_LINKFLAGS, sizeof(*ifp)); + ifp = NLMSG_DATA(nlh); + ifp->flags = flags; + ifp->ifindex = ifindex; /* duplicate information for user to verify */ + + nlh->nlmsg_len = skb->tail - org_tail; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "inet6_dump_linkflags:skb size not enough\n"); + skb_trim(skb, org_tail - skb->data); + +out: + return -1; +} + static struct rtnetlink_link inet6_rtnetlink_table[RTM_MAX-RTM_BASE+1] = { { NULL, NULL, }, @@ -1987,6 +2022,41 @@ { inet6_rtm_delroute, NULL, }, { inet6_rtm_getroute, inet6_dump_fib, }, { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, inet6_dump_linkflags }, + { NULL, NULL, }, + + { NULL, NULL, }, + { NULL, NULL, }, + { NULL, inet6_dump_prefix }, + { NULL, NULL, }, }; static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) @@ -2200,7 +2270,7 @@ #ifdef CONFIG_PROC_FS proc_net_create("if_inet6", 0, iface_proc_info); #endif - + addrconf_verify(0); rtnetlink_links[PF_INET6] = inet6_rtnetlink_table; #ifdef CONFIG_SYSCTL diff -ruN linux-2.4.21.org/net/ipv6/ndisc.c linux-2.4.21/net/ipv6/ndisc.c --- linux-2.4.21.org/net/ipv6/ndisc.c 2003-06-13 07:51:39.000000000 -0700 +++ linux-2.4.21/net/ipv6/ndisc.c 2003-06-26 09:35:05.000000000 -0700 @@ -940,6 +940,16 @@ */ in6_dev->if_flags |= IF_RA_RCVD; } + /* + * Remember the managed/otherconf flags from most recently + * receieved RA message (RFC 2462) -- yoshfuji + */ + in6_dev->if_flags = (in6_dev->if_flags & ~(IF_RA_MANAGED| + IF_RA_OTHERCONF)) | + (ra_msg->icmph.icmp6_addrconf_managed ? + IF_RA_MANAGED : 0) | + (ra_msg->icmph.icmp6_addrconf_other ? + IF_RA_OTHERCONF : 0); lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); diff -ruN linux-2.4.21.org/net/ipv6/route.c linux-2.4.21/net/ipv6/route.c --- linux-2.4.21.org/net/ipv6/route.c 2003-06-13 07:51:39.000000000 -0700 +++ linux-2.4.21/net/ipv6/route.c 2003-06-26 09:35:05.000000000 -0700 @@ -1627,6 +1627,66 @@ return 0; } +static int rt6_fill_prefix(struct sk_buff *skb, struct rt6_info *rt, + int type, u32 pid, u32 seq) +{ + struct in6_prefix_msg *pmsg; + struct nlmsghdr *nlh; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, type, sizeof(*pmsg)); + pmsg = NLMSG_DATA(nlh); + pmsg->ifindex = rt->rt6i_dev->ifindex; + pmsg->prefix_len = rt->rt6i_dst.plen; + ipv6_addr_copy(&pmsg->prefix, &rt->rt6i_dst.addr); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: + printk(KERN_INFO "rt6_fill_prefix:skb size not enough\n"); + skb_trim(skb, b - skb->data); + return -1; +} + +static int rt6_dump_route_prefix(struct rt6_info *rt, void *p_arg) +{ + int addr_type; + struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; + + /* + * Definition of a prefix : + * - Should be autoconfigured + * - No nexthop + * - Not a linklocal, loopback or multicast type. + */ + if (rt->rt6i_nexthop || (rt->rt6i_flags & RTF_ADDRCONF) == 0) + return 0; + addr_type = ipv6_addr_type(&rt->rt6i_dst.addr); + if ((addr_type & (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK | + IPV6_ADDR_MULTICAST)) != 0 || + addr_type == IPV6_ADDR_ANY) + return 0; + return rt6_fill_prefix(arg->skb, rt, RTM_GETPLIST, + NETLINK_CB(arg->cb->skb).pid, arg->cb->nlh->nlmsg_seq); +} + +static int fib6_dump_prefix(struct fib6_walker_t *w) +{ + int res; + struct rt6_info *rt; + + for (rt = w->leaf; rt; rt = rt->u.next) { + res = rt6_dump_route_prefix(rt, w->args); + if (res < 0) { + /* Frame is full, suspend walking */ + w->leaf = rt; + return 1; + } + } + w->leaf = NULL; + return 0; +} + static void fib6_dump_end(struct netlink_callback *cb) { struct fib6_walker_t *w = (void*)cb->args[0]; @@ -1648,7 +1708,8 @@ return cb->done(cb); } -int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +static int __inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb, + int prefix) { struct rt6_rtnl_dump_arg arg; struct fib6_walker_t *w; @@ -1675,7 +1736,10 @@ RT6_TRACE("dump<%p", w); memset(w, 0, sizeof(*w)); w->root = &ip6_routing_table; - w->func = fib6_dump_node; + if (prefix) + w->func = fib6_dump_prefix; + else + w->func = fib6_dump_node; w->args = &arg; cb->args[0] = (long)w; read_lock_bh(&rt6_lock); @@ -1702,6 +1766,16 @@ return res; } +int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 0); +} + +int inet6_dump_prefix(struct sk_buff *skb, struct netlink_callback *cb) +{ + return __inet6_dump_fib(skb, cb, 1); +} + int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; From hadi@shell.cyberus.ca Thu Jun 26 16:18:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 16:19:06 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5QNIw2x005163 for ; Thu, 26 Jun 2003 16:18:59 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Vg0S-000MoK-Rt; Thu, 26 Jun 2003 19:18:20 -0400 Date: Thu, 26 Jun 2003 19:18:20 -0400 (EDT) From: Jamal Hadi To: James Carlson cc: "David S. Miller" , rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> Message-ID: <20030626190407.S87648@shell.cyberus.ca> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3546 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Thu, 26 Jun 2003, James Carlson wrote: > David S. Miller writes: > If so, then that needs to be taken up with the manufacturer. That's a > rather severe design flaw that will prevent such a card from ever > being used for anything other than IP -- many other protocols *ASSUME* > that packets on a single wire cannot be reordered, including SNA, PPP > (!), and link aggregation, among others. > So what about packet being loss? Wouldnt that ensure reordering? And there is no such thing as a lossless wire. cheers, jamal PS:- Paulus i wasnt preaching getting rid of ppp/pppoe although its a nice thouhgt. More fix linux pppd and pppoe ;-> From jmorris@intercode.com.au Thu Jun 26 18:22:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 18:22:16 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:rZIvpsaFENTSqiWY+bKwl+FUxHuBCpH7@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R1M52x010069 for ; Thu, 26 Jun 2003 18:22:07 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5R1Ljr07902; Fri, 27 Jun 2003 11:21:46 +1000 Date: Fri, 27 Jun 2003 11:21:43 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , Subject: Re: [PATCH] IPV6: Fixed fragment check in ip6_output.c:ip6_fragment() In-Reply-To: <20030626.205540.83592923.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3547 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 26 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Hello. > > MTU / alignment check in ip6_fragment() was wrong; > first_len was not correct. Applied to bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From dave@thedillows.org Thu Jun 26 21:52:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 21:53:07 -0700 (PDT) Received: from ori.thedillows.org (pcp03710388pcs.westk01.tn.comcast.net [68.34.200.110]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R4qn2x017462 for ; Thu, 26 Jun 2003 21:52:50 -0700 Received: from ori.thedillows.org (localhost.thedillows.org [127.0.0.1]) by ori.thedillows.org (8.12.8/8.12.8) with ESMTP id h5R4qiYh020857; Fri, 27 Jun 2003 00:52:44 -0400 Received: (from il1@localhost) by ori.thedillows.org (8.12.8/8.12.8/Submit) id h5R4qhTY020855; Fri, 27 Jun 2003 00:52:43 -0400 X-Authentication-Warning: ori.thedillows.org: il1 set sender to dave@thedillows.org using -f Subject: [BK] Typhoon net driver fixes for 2.4 From: David Dillow To: Jeff Garzik Cc: Netdev Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056689562.8679.36.camel@ori.thedillows.org> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 27 Jun 2003 00:52:43 -0400 X-archive-position: 3550 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dave@thedillows.org Precedence: bulk X-list: netdev Content-Length: 345 Lines: 17 Jeff, please do a bk pull http://typhoon.bkbits.net/typhoon-2.4 This will update the following files: drivers/net/typhoon.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) through these ChangeSets: (03/06/27 1.1029) Fix misreporting of card type and spurious "already scheduled" messages. From dave@thedillows.org Thu Jun 26 21:52:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 21:53:08 -0700 (PDT) Received: from ori.thedillows.org (pcp03710388pcs.westk01.tn.comcast.net [68.34.200.110]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R4qu2x017468 for ; Thu, 26 Jun 2003 21:52:57 -0700 Received: from ori.thedillows.org (localhost.thedillows.org [127.0.0.1]) by ori.thedillows.org (8.12.8/8.12.8) with ESMTP id h5R4qpYh020869; Fri, 27 Jun 2003 00:52:51 -0400 Received: (from il1@localhost) by ori.thedillows.org (8.12.8/8.12.8/Submit) id h5R4qpi7020867; Fri, 27 Jun 2003 00:52:51 -0400 X-Authentication-Warning: ori.thedillows.org: il1 set sender to dave@thedillows.org using -f Subject: [BK] Typhoon net driver fixes for 2.5 From: David Dillow To: Jeff Garzik Cc: Netdev Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056689571.8679.42.camel@ori.thedillows.org> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 27 Jun 2003 00:52:51 -0400 X-archive-position: 3551 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dave@thedillows.org Precedence: bulk X-list: netdev Content-Length: 345 Lines: 17 Jeff, please do a bk pull http://typhoon.bkbits.net/typhoon-2.5 This will update the following files: drivers/net/typhoon.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) through these ChangeSets: (03/06/27 1.1387) Fix misreporting of card type and spurious "already scheduled" messages. From dave@thedillows.org Thu Jun 26 21:52:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 21:53:09 -0700 (PDT) Received: from ori.thedillows.org (pcp03710388pcs.westk01.tn.comcast.net [68.34.200.110]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R4qp2x017463 for ; Thu, 26 Jun 2003 21:52:51 -0700 Received: from ori.thedillows.org (localhost.thedillows.org [127.0.0.1]) by ori.thedillows.org (8.12.8/8.12.8) with ESMTP id h5R4qjYh020863; Fri, 27 Jun 2003 00:52:45 -0400 Received: (from il1@localhost) by ori.thedillows.org (8.12.8/8.12.8/Submit) id h5R4qjIB020861; Fri, 27 Jun 2003 00:52:45 -0400 X-Authentication-Warning: ori.thedillows.org: il1 set sender to dave@thedillows.org using -f Subject: [PATCH] Typhoon net driver fixes for 2.4 From: David Dillow To: Jeff Garzik Cc: Netdev Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056689565.8692.38.camel@ori.thedillows.org> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 27 Jun 2003 00:52:45 -0400 X-archive-position: 3552 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dave@thedillows.org Precedence: bulk X-list: netdev Content-Length: 1521 Lines: 47 # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1028 -> 1.1029 # drivers/net/typhoon.c 1.2 -> 1.5 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/27 dave@thedillows.org 1.1029 # Fix misreporting of card type and spurious "already scheduled" messages. # -------------------------------------------- # diff -Nru a/drivers/net/typhoon.c b/drivers/net/typhoon.c --- a/drivers/net/typhoon.c Fri Jun 27 00:37:07 2003 +++ b/drivers/net/typhoon.c Fri Jun 27 00:37:07 2003 @@ -85,8 +85,8 @@ #define PKT_BUF_SZ 1536 #define DRV_MODULE_NAME "typhoon" -#define DRV_MODULE_VERSION "1.0" -#define DRV_MODULE_RELDATE "03/02/14" +#define DRV_MODULE_VERSION "1.4.1" +#define DRV_MODULE_RELDATE "03/06/26" #define PFX DRV_MODULE_NAME ": " #define ERR_PFX KERN_ERR PFX @@ -150,7 +150,7 @@ #define TYPHOON_CRYPTO_DES 1 #define TYPHOON_CRYPTO_3DES 2 #define TYPHOON_CRYPTO_VARIABLE 4 -#define TYPHOON_FIBER 5 +#define TYPHOON_FIBER 8 enum typhoon_cards { TYPHOON_TX = 0, TYPHOON_TX95, TYPHOON_TX97, TYPHOON_SVR, @@ -1798,7 +1798,7 @@ u32 intr_status; intr_status = readl(ioaddr + TYPHOON_REG_INTR_STATUS); - if(!intr_status) + if(!(intr_status & TYPHOON_INTR_HOST_INT)) return; writel(intr_status, ioaddr + TYPHOON_REG_INTR_STATUS); From dave@thedillows.org Thu Jun 26 21:52:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 21:53:05 -0700 (PDT) Received: from ori.thedillows.org (pcp03710388pcs.westk01.tn.comcast.net [68.34.200.110]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R4qw2x017474 for ; Thu, 26 Jun 2003 21:52:58 -0700 Received: from ori.thedillows.org (localhost.thedillows.org [127.0.0.1]) by ori.thedillows.org (8.12.8/8.12.8) with ESMTP id h5R4qqYh020875; Fri, 27 Jun 2003 00:52:52 -0400 Received: (from il1@localhost) by ori.thedillows.org (8.12.8/8.12.8/Submit) id h5R4qqR1020873; Fri, 27 Jun 2003 00:52:52 -0400 X-Authentication-Warning: ori.thedillows.org: il1 set sender to dave@thedillows.org using -f Subject: [PATCH] Typhoon net driver fixes for 2.5 From: David Dillow To: Jeff Garzik Cc: Netdev Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056689572.8692.44.camel@ori.thedillows.org> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 27 Jun 2003 00:52:52 -0400 X-archive-position: 3549 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dave@thedillows.org Precedence: bulk X-list: netdev Content-Length: 1530 Lines: 47 # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1386 -> 1.1387 # drivers/net/typhoon.c 1.5 -> 1.8 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/06/27 dave@thedillows.org 1.1387 # Fix misreporting of card type and spurious "already scheduled" messages. # -------------------------------------------- # diff -Nru a/drivers/net/typhoon.c b/drivers/net/typhoon.c --- a/drivers/net/typhoon.c Fri Jun 27 00:38:02 2003 +++ b/drivers/net/typhoon.c Fri Jun 27 00:38:02 2003 @@ -85,8 +85,8 @@ #define PKT_BUF_SZ 1536 #define DRV_MODULE_NAME "typhoon" -#define DRV_MODULE_VERSION "1.0" -#define DRV_MODULE_RELDATE "03/02/14" +#define DRV_MODULE_VERSION "1.5.1" +#define DRV_MODULE_RELDATE "03/06/26" #define PFX DRV_MODULE_NAME ": " #define ERR_PFX KERN_ERR PFX @@ -150,7 +150,7 @@ #define TYPHOON_CRYPTO_DES 1 #define TYPHOON_CRYPTO_3DES 2 #define TYPHOON_CRYPTO_VARIABLE 4 -#define TYPHOON_FIBER 5 +#define TYPHOON_FIBER 8 enum typhoon_cards { TYPHOON_TX = 0, TYPHOON_TX95, TYPHOON_TX97, TYPHOON_SVR, @@ -1798,7 +1798,7 @@ u32 intr_status; intr_status = readl(ioaddr + TYPHOON_REG_INTR_STATUS); - if(!intr_status) + if(!(intr_status & TYPHOON_INTR_HOST_INT)) return IRQ_NONE; writel(intr_status, ioaddr + TYPHOON_REG_INTR_STATUS); From davem@redhat.com Thu Jun 26 22:36:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 22:36:14 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R5a82x018965 for ; Thu, 26 Jun 2003 22:36:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA04996; Thu, 26 Jun 2003 22:30:02 -0700 Date: Thu, 26 Jun 2003 22:30:02 -0700 (PDT) Message-Id: <20030626.223002.21926109.davem@redhat.com> To: linux-kernel@vger.kernel.org CC: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: networking bugs and bugme.osdl.org From: "David S. Miller" X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3553 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 486 Lines: 13 I would like to ask everyone NOT to use bugme.osdl.org for networking bug reporting any more. It's absolutely the wrong model. When a bug gets filed that way it sort of goes into a black hole that _I_ am forced to process, forward, etc. the bug around and I don't want to be forced to do mindless work like that when it is totally unnecessary. I want people to post the bug to linux-net and netdev and discuss the problem there. And that solves all of the problems. Thanks a lot. From mbligh@aracnet.com Thu Jun 26 22:46:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 22:46:31 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R5kP2x019314 for ; Thu, 26 Jun 2003 22:46:25 -0700 Received: from groan (216-99-194-169.dial.spiritone.com [216.99.194.169]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5R5k11r014635; Thu, 26 Jun 2003 22:46:02 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5R5k9fg018754; Thu, 26 Jun 2003 22:46:14 -0700 Date: Thu, 26 Jun 2003 22:46:10 -0700 From: "Martin J. Bligh" To: "David S. Miller" , linux-kernel@vger.kernel.org cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <18330000.1056692768@[10.10.2.4]> In-Reply-To: <20030626.223002.21926109.davem@redhat.com> References: <20030626.223002.21926109.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3554 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev Content-Length: 755 Lines: 22 > I would like to ask everyone NOT to use bugme.osdl.org > for networking bug reporting any more. > > It's absolutely the wrong model. When a bug gets filed that way > it sort of goes into a black hole that _I_ am forced to process, > forward, etc. the bug around and I don't want to be forced to do > mindless work like that when it is totally unnecessary. > > I want people to post the bug to linux-net and netdev and discuss > the problem there. And that solves all of the problems. > > Thanks a lot. I'll take you off the maintainers list, and find someone else to do it for networking. If you want net bugs reported to mailing lists, that's fine. If people choose to file bugs in bugzilla as well, they'll still be processed by someone. M. From davem@redhat.com Thu Jun 26 22:53:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 22:53:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R5rl2x019644 for ; Thu, 26 Jun 2003 22:53:47 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA05070; Thu, 26 Jun 2003 22:47:39 -0700 Date: Thu, 26 Jun 2003 22:47:39 -0700 (PDT) Message-Id: <20030626.224739.88478624.davem@redhat.com> To: mbligh@aracnet.com Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <18330000.1056692768@[10.10.2.4]> References: <20030626.223002.21926109.davem@redhat.com> <18330000.1056692768@[10.10.2.4]> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3555 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 992 Lines: 25 From: "Martin J. Bligh" Date: Thu, 26 Jun 2003 22:46:10 -0700 If people choose to file bugs in bugzilla as well, they'll still be processed by someone. Just so that someone can post them to the lists? That sounds like a completely silly way to operate. I'd rather they get posted to the lists _ONLY_. This way not that "someone", but "everyone" on the lists can participate and contribute to responding to the bug. The only way you can make things scale is if you throw a group of people into the collective of folks able to respond to a problem. If it all gets filtered through by one guy, THAT DOES NOT WORK. That one guy limits what can be done, and when he's busy one day or he goes away on vacation for a while, the whole assembly line stops. Therefore, please eliminate the networking category on bugme.osdl.org and we'll process bug reports on the lists so that not _ONE_ but the whole community of networking developers can look at the bug. From pekkas@netcore.fi Thu Jun 26 23:06:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 23:06:46 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R66b2x020123 for ; Thu, 26 Jun 2003 23:06:38 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5R66Sa03629; Fri, 27 Jun 2003 09:06:28 +0300 Date: Fri, 27 Jun 2003 09:06:27 +0300 (EEST) From: Pekka Savola To: Michael Bellion and Thomas Heinz cc: linux-kernel@vger.kernel.org, Subject: Re: [ANNOUNCE] nf-hipac v0.8 released In-Reply-To: <200306252248.44224.nf@hipac.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3556 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev Content-Length: 4023 Lines: 97 Hi, Looks interesting. Is there experience about this in bridging firewall scenarios? (With or without external patchset's like http://ebtables.sourceforge.net/) Further, you mention the performance reasons for this approach. I would be very interested to see some figures. (As it happens, we've done some testing with different iptables rules ourselves, and noticed significant problems especially when you go down from IP addresses to UDP/TCP ports, for example.) On Wed, 25 Jun 2003, Michael Bellion and Thomas Heinz wrote: > We have released a new version of nf-hipac. We rewrote most of the code > and added a bunch of new features. The main enhancements are > user-defined chains, generic support for iptables targets and matches > and 64 bit atomic counters. > > > For all of you who don't know nf-hipac yet, here is a short overview: > > nf-hipac is a drop-in replacement for the iptables packet filtering module. > It implements a novel framework for packet classification which uses an > advanced algorithm to reduce the number of memory lookups per packet. > The module is ideal for environments where large rulesets and/or high > bandwidth networks are involved. Its userspace tool, which is also called > 'nf-hipac', is designed to be as compatible as possible to 'iptables -t > filter'. > > The official project web page is: http://www.hipac.org > The releases can be downloaded from: http://sourceforge.net/projects/nf-hipac > > Features: > - optimized for high performance packet classification with moderate > memory usage > - completely dynamic: data structure isn't rebuild from scratch when > inserting or deleting rules, so fast updates are possible > - very short locking times during rule updates: packet matching is > not blocked > - support for 64 bit architectures > - optimized kernel-user protocol (netlink): improved rule listing > speed > - libnfhipac: netlink library for kernel-user communication > - native match support for: > + source/destination ip > + in/out interface > + protocol (udp, tcp, icmp) > + fragments > + source/destination ports (udp, tcp) > + tcp flags > + icmp type > + connection state > + ttl > - match negation (!) > - iptables compatibility: syntax and semantics of the userspace tool > are very similar to iptables > - coexistence of nf-hipac and iptables: both facilities can be used > at the same time > - generic support for iptables targets and matches (binary > compatibility) > - integration into the netfilter connection tracking facility > - user-defined chains support > - 64 bit atomic counters > - kernel module autoloading > - /proc/net/nf-hipac/info: > + dynamically limit the maximum memory usage > + change invokation order of nf-hipac and iptables > - extended statistics via /proc/net/nf-hipac/statistics/* > > > We are currently working on extending the hipac algorithm to do classification > with several stages. The hipac algorithm will then be capable of combining > several classification problems in one data structure, e.g. it will be > possible to solve routing and firewalling with one hipac lookup. The idea is > to shorten the packet forwarding path by combining fib_lookup and iptables > filter lookup into one hipac query. To further improve the performance in > this scenario the upcoming flow cache could be used to cache recent hipac > results. > > > > Enjoy, > > +-----------------------+----------------------+ > | Michael Bellion | Thomas Heinz | > | | | > +-----------------------+----------------------+ > > -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From davem@redhat.com Thu Jun 26 23:13:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 26 Jun 2003 23:13:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R6Da2x020488 for ; Thu, 26 Jun 2003 23:13:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA05147; Thu, 26 Jun 2003 23:07:28 -0700 Date: Thu, 26 Jun 2003 23:07:27 -0700 (PDT) Message-Id: <20030626.230727.35666164.davem@redhat.com> To: krkumar@us.ibm.com Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) From: "David S. Miller" In-Reply-To: <3EFB2017.5030202@us.ibm.com> References: <3EF9D5C2.5080101@us.ibm.com> <20030625.234251.116353369.davem@redhat.com> <3EFB2017.5030202@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3557 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 411 Lines: 15 From: Krishna Kumar Date: Thu, 26 Jun 2003 09:32:23 -0700 I still have problems with this patch. -#define RTM_MAX (RTM_BASE+31) +#define RTM_GETLNKFLAGS (RTM_BASE+34) + +#define RTM_GETPLIST (RTM_BASE+38) Please allocate contiguous numbers to the new messages, don't skip around like this. Thanks. (this of course means you have to redo your 2.4.x patch as well) From matti.aarnio@zmailer.org Fri Jun 27 00:59:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 00:59:26 -0700 (PDT) Received: from mail.zmailer.org (mail.zmailer.org [62.240.94.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R7xH2x022371 for ; Fri, 27 Jun 2003 00:59:18 -0700 Received: (mea@mea-ext) by mail.zmailer.org id S266860AbTF0H7O (ORCPT ); Fri, 27 Jun 2003 10:59:14 +0300 Date: Fri, 27 Jun 2003 10:59:14 +0300 From: Matti Aarnio To: "David S. Miller" Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030627075914.GO28900@mea-ext.zmailer.org> References: <20030626.223002.21926109.davem@redhat.com> <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030626.224739.88478624.davem@redhat.com> X-archive-position: 3558 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: matti.aarnio@zmailer.org Precedence: bulk X-list: netdev Content-Length: 2686 Lines: 66 On Thu, Jun 26, 2003 at 10:47:39PM -0700, David S. Miller wrote: > From: "Martin J. Bligh" > Date: Thu, 26 Jun 2003 22:46:10 -0700 > > If people choose to file bugs in bugzilla as well, they'll still be > processed by someone. > > Just so that someone can post them to the lists? > That sounds like a completely silly way to operate. > > I'd rather they get posted to the lists _ONLY_. I have recently pondered usage of Request Tracker for this kind of tasks. The problem with "post to the list" is that sometimes things slip thru without anybody catching them. Integrating linux-kernel and RT ... urgh.. result would be quite ugly. (Flame wars and out-of-topic threads going on as requests...) > This way not that "someone", but "everyone" on the lists > can participate and contribute to responding to the bug. That needs merely message arriving to the list. Ok, responding so that the response appears also at the bug db is another story. > The only way you can make things scale is if you throw a group > of people into the collective of folks able to respond to a problem. > > If it all gets filtered through by one guy, THAT DOES NOT WORK. > That one guy limits what can be done, and when he's busy one day > or he goes away on vacation for a while, the whole assembly > line stops. Bugzilla could be adapted to this use: - Bugs are to be assigned to, e.g. linux-net/netdev list - Everybody can comment on them at bugme (after signing on) - Only some meta-admin (and original bug creator) can alter status (e.g. mark as RESOLVED) Having plenty of bugme group admins (half a dozen or so) to do the initial bugzilla assigment work, those people taking the task seriously, and everybody of them going en masse to assign arrived things. That way people can have time off - as long as they coordinate among themselves. The minus (and plus) is, of course, that the entire discussion flowing at the list doesn't go to the bug database, but that doesn't invalidate mechanisms existence as a way to avoid slipping things thru the cracks. > Therefore, please eliminate the networking category on bugme.osdl.org > and we'll process bug reports on the lists so that not _ONE_ but the > whole community of networking developers can look at the bug. I thought you don't need to login to see things in bugzilla ? .. and proved it by looking into bugme.. http://bugme.osdl.org/show_bug.cgi?id=853 In addition to assinging an OWNER to the bug, there should be automatic assignment of linux-net or netdev as Cc, IMO... That will handle the "publish widely" issue that DaveM is complaining about. /Matti Aarnio From davem@redhat.com Fri Jun 27 01:07:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 01:07:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5R87A2x022783 for ; Fri, 27 Jun 2003 01:07:10 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA05582; Fri, 27 Jun 2003 01:00:59 -0700 Date: Fri, 27 Jun 2003 01:00:59 -0700 (PDT) Message-Id: <20030627.010059.68039169.davem@redhat.com> To: matti.aarnio@zmailer.org Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <20030627075914.GO28900@mea-ext.zmailer.org> References: <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> <20030627075914.GO28900@mea-ext.zmailer.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3559 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 993 Lines: 25 From: Matti Aarnio Date: Fri, 27 Jun 2003 10:59:14 +0300 The problem with "post to the list" is that sometimes things slip thru without anybody catching them. It is not a problem, it is a feature. What will happen is the same thing that happens if Linus drops a patch. It'll get retransmitted if the reporter cares about the bug. If he doesn't, the one of two things: 1) the bug actually isn't that important 2) it is important, and someone else will report the bug too Therefore important issues tend to keep showing up, even if they are not attended to the first time around. This repeated reporting and patch sending may seem like useless work, but this is not true, it is actually a form of validation. I thought you don't need to login to see things in bugzilla ? That's not the issue. Asking people who want to help to read a list or two, isn't much to ask. Getting them to click around some web site every day adds to the overhead. From carlson@workingcode.com Fri Jun 27 04:39:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 04:39:44 -0700 (PDT) Received: from workingcode.com (h006008986325.ne.client2.attbi.com [24.61.67.218]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RBdW2x000794 for ; Fri, 27 Jun 2003 04:39:33 -0700 Received: from workingcode.com (websterhost [127.0.0.1]) by workingcode.com (8.12.0/8.12.0) with ESMTP id h5RBdLW7008054; Fri, 27 Jun 2003 07:39:22 -0400 Received: (from carlson@localhost) by workingcode.com (8.12.0/8.12.0/Submit) id h5RBdKid008048; Fri, 27 Jun 2003 07:39:20 -0400 From: James Carlson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> Date: Fri, 27 Jun 2003 07:39:19 -0400 (EDT) To: Jamal Hadi Cc: "David S. Miller" , rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: Jamal Hadi's message of 26 June 2003 19:18:20 References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> X-Mailer: VM 6.75 under Emacs 20.6.1 X-archive-position: 3560 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carlson@workingcode.com Precedence: bulk X-list: netdev Jamal Hadi writes: > So what about packet being loss? Wouldnt that ensure reordering? Please explain. What pattern of loss possibly results in one packet being inserted in the stream ahead of another? Here's loss: 1 2 4 5 6 Here's reordering: 1 2 4 3 5 6 Loss preserves ordering. To get misordering, you have to intentionally hold onto a message and reinsert it later. What I've been pointing out is that the 802 MAC layer *does not* permit misordering (or duplication, for that matter). Loss, reordering, and duplication are all separate errors. > And there is no such thing as a lossless wire. True, but not relevant. When you put packets onto a wire, you must do so in a particular order -- it's not possible to put more than one packet at a given time on a single wire. It's also not possible for the receiver to get them in a different order than you sent them. They're essentially "single file" on that wire. PPP relies on this fact (albeit for serial wires) as part of its protocol definition (RFC 1661): 1. Introduction The Point-to-Point Protocol is designed for simple links which transport packets between two peers. These links provide full-duplex simultaneous bi-directional operation, and are assumed to deliver packets in order. It is intended that PPP provide a common solution [...] In addition, the 802 MAC layer cannot reorder packets, so there is no conflict here. Although there are many design mistakes in PPPoE, this just is not one of them. There is a design problem here, but it's not PPPoE's. > PS:- Paulus i wasnt preaching getting rid of ppp/pppoe although its > a nice thouhgt. More fix linux pppd and pppoe ;-> Believe me, the IETF working group didn't want PPPoE, either. It dropped from outer space. The only reason it was published as "Informational" is that it had already been deployed (before anyone bothered to talk to the folks who are responsible for the PPP standards), and thus somebody might want to know about it. If we could have killed it, we would have. -- James Carlson From paulus@samba.org Fri Jun 27 05:54:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 05:54:31 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RCsG2x003086 for ; Fri, 27 Jun 2003 05:54:16 -0700 Received: by lists.samba.org (Postfix, from userid 580) id D21432C003; Fri, 27 Jun 2003 12:12:33 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16124.13469.944716.441016@nanango.paulus.ozlabs.org> Date: Fri, 27 Jun 2003 22:12:13 +1000 (EST) From: Paul Mackerras To: James Carlson Cc: Jamal Hadi , "David S. Miller" , rusty@rustcorp.com.au, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> X-Mailer: VM 6.75 under Emacs 20.7.2 X-archive-position: 3561 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: paulus@samba.org Precedence: bulk X-list: netdev James Carlson writes: > Jamal Hadi writes: > > So what about packet being loss? Wouldnt that ensure reordering? > > Please explain. What pattern of loss possibly results in one packet > being inserted in the stream ahead of another? Rusty asked me today what protocols there were that coped with packet loss but couldn't cope with reordering. I couldn't think of any. Do you know of any examples? Regards, Paul. From zwane@linuxpower.ca Fri Jun 27 05:56:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 05:56:12 -0700 (PDT) Received: from hemi.commfireservices.com ([66.212.224.118]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RCu52x003364 for ; Fri, 27 Jun 2003 05:56:06 -0700 Received: from montezuma.mastecende.com (cuda.commfireservices.com [24.203.207.204]) by hemi.commfireservices.com (Postfix) with ESMTP id 44B72BC52 for ; Fri, 27 Jun 2003 08:46:04 -0400 (EDT) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by montezuma.mastecende.com (8.12.8/8.12.8) with ESMTP id h5RCiTtS022511 for ; Fri, 27 Jun 2003 08:44:30 -0400 Date: Fri, 27 Jun 2003 08:44:29 -0400 (EDT) From: Zwane Mwaikambo X-X-Sender: zwane@montezuma.mastecende.com To: netdev@oss.sgi.com Subject: e1000 lockup with port io type reset Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3562 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zwane@linuxpower.ca Precedence: bulk X-list: netdev (forwarded due to missing Cc) Hi Scott, The following card causes a hard lockup when we do a controller reset using port io instead of mmio. Switching to mmio controller reset causes it to function as expected (The patch illustrates this). Side note, is it possible to get access to errata sheets for these cards? Thank you, Zwane 00:08.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (Copper) (rev 02) Subsystem: Intel Corp. PRO/1000 T Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device. Command: DPERE- ERO+ RBC=0 OST=0 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Index: linux-2.5/drivers/net/e1000/e1000_hw.c =================================================================== RCS file: /home/cvs/linux-2.5/drivers/net/e1000/e1000_hw.c,v retrieving revision 1.16 diff -u -p -B -r1.16 e1000_hw.c --- linux-2.5/drivers/net/e1000/e1000_hw.c 26 May 2003 00:31:31 -0000 1.16 +++ linux-2.5/drivers/net/e1000/e1000_hw.c 27 Jun 2003 06:25:01 -0000 @@ -259,7 +259,7 @@ e1000_reset_hw(struct e1000_hw *hw) msec_delay(5); } - if(hw->mac_type > e1000_82543) + if(hw->mac_type > e1000_82544) E1000_WRITE_REG_IO(hw, CTRL, (ctrl | E1000_CTRL_RST)); else E1000_WRITE_REG(hw, CTRL, (ctrl | E1000_CTRL_RST)); -- function.linuxpower.ca From carlson@workingcode.com Fri Jun 27 06:19:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 06:19:45 -0700 (PDT) Received: from workingcode.com (h006008986325.ne.client2.attbi.com [24.61.67.218]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RDJW2x003890 for ; Fri, 27 Jun 2003 06:19:32 -0700 Received: from workingcode.com (websterhost [127.0.0.1]) by workingcode.com (8.12.0/8.12.0) with ESMTP id h5RDJRW7008600; Fri, 27 Jun 2003 09:19:27 -0400 Received: (from carlson@localhost) by workingcode.com (8.12.0/8.12.0/Submit) id h5RDJQ1a010898; Fri, 27 Jun 2003 09:19:26 -0400 From: James Carlson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16124.17502.166892.539182@h006008986325.ne.client2.attbi.com> Date: Fri, 27 Jun 2003 09:19:26 -0400 (EDT) To: Paul Mackerras Cc: Jamal Hadi , "David S. Miller" , rusty@rustcorp.com.au, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: Paul Mackerras's message of 27 June 2003 22:12:13 References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> <16124.13469.944716.441016@nanango.paulus.ozlabs.org> X-Mailer: VM 6.75 under Emacs 20.6.1 X-archive-position: 3563 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carlson@workingcode.com Precedence: bulk X-list: netdev Paul Mackerras writes: > Rusty asked me today what protocols there were that coped with packet > loss but couldn't cope with reordering. I couldn't think of any. Do > you know of any examples? Sure. We've already pointed out MP (RFC 1990) and VJ (RFC 1144) -- both of which handle loss just fine, but fail miserably if packets are misordered -- MP locks up and VJ will produce silent data corruption. Regular PPP negotiation itself can have trouble with misordering. Consider this example: Peer A Peer B Req-1 --> <-- Req-a Req-2 --> <-- Ack-2 <-- Ack-1 <-- Req-a Ack-a --> In this case, Peer A sent Configure-Request ID 1 with some set of options (we'll call this set "1"). Peer B then sent Configure-Request ID a with its own set of options. Based on that Configure-Request, peer A decided to start over (e.g., peer A originally offered ACCM 0 and then saw ACCM 0xa0000 from the peer and decided that, since the peer may well be an idiot, changing peer A's ACCM to 0xa0000 would be prudent) and it sends Configure-Request ID 2. Because of reordering, Peer B sees Configure-Request ID 2 first. It responds. Peer A sees the Ack and goes on to AckRcvd state. Peer B sees ID 1 next. It discards the options it saw in ID 2 and keeps the options from ID 1, and sends an Ack for that. Peer A discards this bogus Ack -- it doesn't match the current ID number. Peer A finally gets Req-a and sends an Ack. Now we're in a very bad state. Peer A believes it has negotiated its option set "2" with Peer B, and Peer B believes it has agreed to option set "1." Oops. Some others that are known to be sensitive to ordering (cited in 802.1w) are LAT, LLC2, and NETBEUI. Another is SNA. Reordering SNA packets will cause the link to reinitialize and cause a fault at the application level. Doing this causes everyone to have a bad day. Still another is GARP. This isn't a big deal if you're worrying about bridging -- GARP doesn't get forwarded -- but it does matter if the Ethernet driver itself mangles packet order. If you can't maintain order inside your own host, then GARP is dead. Yet another is EAP. There are probably others that haven't occurred to me. The L2-wire- preserves-order assumption turns out to be very easy to build into a protocol. All you have to do is pretend (as PPP does) that there's only one outstanding request at a time, and treat all others as invalid. That builds in the ordering dependency -- and it's how many lock-step protocols are designed. In order to be tolerate of misordering (and duplication), a protocol has to define some sort of ID numbering window (as is done, for example, in the L2TP control connection) with a logical sequence of ID numbers, so that the peer can determine which received ID numbers are "before" or "after" a given number. Then finally there's ISO/IEC 15802-1, Clause 9.2 (MAC) that permits only a "negligible" amount of reordering. (The rate is exactly zero on normal networks but could be nonzero for a "magically healed" bridge connection -- if you don't know what that is, don't sweat it. It requires accidents to occur.) -- James Carlson From Valdis.Kletnieks@vt.edu Fri Jun 27 07:27:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 07:27:17 -0700 (PDT) Received: from turing-police.cc.vt.edu (h80ad2662.async.vt.edu [128.173.38.98]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RER22x005213 for ; Fri, 27 Jun 2003 07:27:06 -0700 Received: from turing-police.cc.vt.edu (localhost [127.0.0.1]) by turing-police.cc.vt.edu (8.12.10.Beta0/8.12.10.Beta0) with ESMTP id h5REQvjK006106; Fri, 27 Jun 2003 10:26:57 -0400 Message-Id: <200306271426.h5REQvjK006106@turing-police.cc.vt.edu> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4+dev To: Stephen Hemminger Cc: netdev@oss.sgi.com Subject: Re: Weird modem behaviour in 2.5.73-mm1 In-Reply-To: Your message of "Wed, 25 Jun 2003 17:35:49 PDT." <20030625173549.561cfaec.shemminger@osdl.org> From: Valdis.Kletnieks@vt.edu References: <200306242102.49356.kde@myrealbox.com> <200306250327.h5P3RwH8001577@turing-police.cc.vt.edu> <200306250418.h5P4IWdA001565@turing-police.cc.vt.edu> <20030625091013.573f2e7b.shemminger@osdl.org> <200306251654.h5PGsUdA022467@turing-police.cc.vt.edu> <20030625102134.2046b04f.shemminger@osdl.org> <200306251804.h5PI4odA023590@turing-police.cc.vt.edu> <20030625173549.561cfaec.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_-1444366336P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Fri, 27 Jun 2003 10:26:56 -0400 X-archive-position: 3564 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Valdis.Kletnieks@vt.edu Precedence: bulk X-list: netdev --==_Exmh_-1444366336P Content-Type: text/plain; charset=us-ascii On Wed, 25 Jun 2003 17:35:49 PDT, Stephen Hemminger said: > Try this patch, it is more paranoid in some of the code paths. > I did get PPP over a null modem cable working between 2.4.18 and 2.5.73 with the PPP patches. I wasn't able to get this patch to apply cleanly to either the replacement .c you sent or to any of the .c's I already had - which version was this based from? Also, the entire-replacement .c showed the same symptoms of a quick death as the problem cset from bitkeeper. Here's some more info: In the *working* version, the first few lines logged by 'pppd -d' are: Jun 27 10:02:28 turing-police pppd[999]: Using interface ppp0 Jun 27 10:02:28 turing-police pppd[999]: Connect: ppp0 <--> /dev/ttyS14 Jun 27 10:02:28 turing-police /etc/hotplug/net.agent: NET add event not supported Jun 27 10:02:29 turing-police pppd[999]: sent [LCP ConfReq id=0x1 ] Jun 27 10:02:32 turing-police pppd[999]: sent [LCP ConfReq id=0x1 ] (broken version dies here) Jun 27 10:02:32 turing-police pppd[999]: rcvd [LCP ConfReq id=0xa3 ] Jun 27 10:02:32 turing-police pppd[999]: sent [LCP ConfAck id=0xa3 ] Jun 27 10:02:32 turing-police pppd[999]: rcvd [LCP ConfAck id=0x1 ] so I'm thinking there's some breakage decoding that next rcvd packet... My next step is to go through each separate change in the cset that's causing the problem and apply it, and see which one breaks it. --==_Exmh_-1444366336P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQE+/FQwcC3lWbTT17ARAgffAJ9hVhsbaNs9Ky6MYf9FjqKZ7oKnlwCfbsw/ A7Xuhcqh5+w8Ag4wU1XvyYc= =GjNy -----END PGP SIGNATURE----- --==_Exmh_-1444366336P-- From mbligh@aracnet.com Fri Jun 27 07:34:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 07:35:01 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5REYo2x005550 for ; Fri, 27 Jun 2003 07:34:51 -0700 Received: from groan (216-99-194-169.dial.spiritone.com [216.99.194.169]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5REYE1r008750; Fri, 27 Jun 2003 07:34:22 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5REYJfg020784; Fri, 27 Jun 2003 07:34:24 -0700 Date: Fri, 27 Jun 2003 07:34:19 -0700 From: "Martin J. Bligh" To: "David S. Miller" cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <21740000.1056724453@[10.10.2.4]> In-Reply-To: <20030626.224739.88478624.davem@redhat.com> References: <20030626.223002.21926109.davem@redhat.com><18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3565 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev > If people choose to file bugs in bugzilla as well, they'll still be > processed by someone. > > Just so that someone can post them to the lists? > That sounds like a completely silly way to operate. > > I'd rather they get posted to the lists _ONLY_. > > This way not that "someone", but "everyone" on the lists > can participate and contribute to responding to the bug. > > The only way you can make things scale is if you throw a group > of people into the collective of folks able to respond to a problem. We can do that. The owner of a category can be a mailing list (eg the bugme-janitors list for some of the categories). > If it all gets filtered through by one guy, THAT DOES NOT WORK. > That one guy limits what can be done, and when he's busy one day > or he goes away on vacation for a while, the whole assembly > line stops. The idea is to spread it across categories (one person for each (or a few) categories), but if you want to spread it around within a category that's possible too. > Therefore, please eliminate the networking category on bugme.osdl.org > and we'll process bug reports on the lists so that not _ONE_ but the > whole community of networking developers can look at the bug. No. If you don't want to participate, that's fine, but I'm not going to prevent other people from doing so. If you want me to forward the bugs to any given list, I'll do that. If you want to just tell people to file them to a list, that's fine too. But I won't destroy the generic model just because you don't like it. M. From davidel@xmailserver.org Fri Jun 27 07:57:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 07:58:00 -0700 (PDT) Received: from x35.xmailserver.org (x35.xmailserver.org [208.129.208.51]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5REvr2x006455 for ; Fri, 27 Jun 2003 07:57:54 -0700 X-AuthUser: davidel@xmailserver.org Received: from bigblue.dev.mcafeelabs.com by xmailserver.org with [XMail 1.16 (Linux/Ix86) ESMTP Server] id for from ; Fri, 27 Jun 2003 08:03:12 -0700 Date: Fri, 27 Jun 2003 07:56:16 -0700 (PDT) From: Davide Libenzi X-X-Sender: davide@bigblue.dev.mcafeelabs.com To: "Martin J. Bligh" cc: "David S. Miller" , Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org In-Reply-To: <21740000.1056724453@[10.10.2.4]> Message-ID: References: <20030626.223002.21926109.davem@redhat.com><18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3566 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidel@xmailserver.org Precedence: bulk X-list: netdev On Fri, 27 Jun 2003, Martin J. Bligh wrote: > No. If you don't want to participate, that's fine, but I'm not going > to prevent other people from doing so. > > If you want me to forward the bugs to any given list, I'll do that. > If you want to just tell people to file them to a list, that's fine too. > But I won't destroy the generic model just because you don't like it. A bug tracking system stick you on a bug and makes all this to look like real work, that's why maybe David does not like it :) Kidding ;) The good of a bug tracking system against the mailing list is that bugs do survive in a bug tracking system, while they usually vanish for normal underlooked posts. Many ppl posting bugs are not members of the mailing list and they are not usually setting up a repost timer if the bug does not get answered. I believe that it should be both ways. Posts on the mailing list helps main maintainers to lower the load by allowing others to take on bugs, while the bug tracking helps unresolved bugs to stick. - Davide From mbligh@aracnet.com Fri Jun 27 08:00:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:00:53 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RF0l2x006834 for ; Fri, 27 Jun 2003 08:00:47 -0700 Received: from groan (216-99-194-169.dial.spiritone.com [216.99.194.169]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5RF0M1r005200; Fri, 27 Jun 2003 08:00:23 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5RF0Ufg020902; Fri, 27 Jun 2003 08:00:34 -0700 Date: Fri, 27 Jun 2003 08:00:30 -0700 From: "Martin J. Bligh" To: Matti Aarnio , "David S. Miller" cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <25450000.1056726028@[10.10.2.4]> In-Reply-To: <20030627075914.GO28900@mea-ext.zmailer.org> References: <20030626.223002.21926109.davem@redhat.com> <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> <20030627075914.GO28900@mea-ext.zmailer.org> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3568 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev > I have recently pondered usage of Request Tracker for this > kind of tasks. The problem with "post to the list" is that > sometimes things slip thru without anybody catching them. > > Integrating linux-kernel and RT ... urgh.. result would > be quite ugly. (Flame wars and out-of-topic threads going on > as requests...) Yeah, that is tricky ... see below. >> This way not that "someone", but "everyone" on the lists >> can participate and contribute to responding to the bug. > > That needs merely message arriving to the list. That's easy. I actually already hand filter the bugs, and forward to linux-kernel those that seem to have enough information in to be useful to people, and aren't already fixed. There's also a mailing list for seeing every new bug that people can sign up for if they want (send me private email). > Ok, responding so that the response appears also > at the bug db is another story. That is possible to do - there's patches to Bugzilla that implement an email interface, but it has some problems like the one you pointed out above. One possiblility is to make people manually do something to the email for each reply, but that's rather ugly. Hopefully we can discuss this more at OLS this year, and get a plan going forward that people are happy with. I'm well aware that Bugzilla is not the perefect tool, but I think it's better than what we had before (yeah, I know some of us disagree), and is easy to change. I'd rather start with something simple, and evolve it to the needs of the community than try dumping something complex onto people up front. > Bugzilla could be adapted to this use: > - Bugs are to be assigned to, e.g. linux-net/netdev list > - Everybody can comment on them at bugme (after signing on) > - Only some meta-admin (and original bug creator) can > alter status (e.g. mark as RESOLVED) > > Having plenty of bugme group admins (half a dozen or so) to do > the initial bugzilla assigment work, those people taking the task > seriously, and everybody of them going en masse to assign arrived > things. That way people can have time off - as long as they > coordinate among themselves. Yup, that's easy to set up if you like. Or we can do it as a new list if you prefer. > In addition to assinging an OWNER to the bug, there should be > automatic assignment of linux-net or netdev as Cc, IMO... > That will handle the "publish widely" issue that DaveM is > complaining about. There's a QA field we can hack into doing that easily, but I want to ensure people are happy auto-cc'ing lists before I do it. Or I can forward the relevant ones by hand if you prefer. If it's going to piss people off more than it makes them happy, it's not worth it though. Moreover, the bugme default owner doesn't have to be the code maintainer, so if Dave wants someone else to do the "bug shuffling" stuff, that's another way to go. M. From shemminger@osdl.org Fri Jun 27 08:00:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:00:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RF0e2x006814 for ; Fri, 27 Jun 2003 08:00:40 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5RExkq22859; Fri, 27 Jun 2003 07:59:46 -0700 Date: Fri, 27 Jun 2003 07:59:46 -0700 From: Stephen Hemminger To: Paul Mackerras Cc: carlson@workingcode.com, hadi@shell.cyberus.ca, davem@redhat.com, rusty@rustcorp.com.au, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP Message-Id: <20030627075946.7ab6f591.shemminger@osdl.org> In-Reply-To: <16124.13469.944716.441016@nanango.paulus.ozlabs.org> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> <16124.13469.944716.441016@nanango.paulus.ozlabs.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3567 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Fri, 27 Jun 2003 22:12:13 +1000 (EST) Paul Mackerras wrote: > James Carlson writes: > > Jamal Hadi writes: > > > So what about packet being loss? Wouldnt that ensure reordering? > > > > Please explain. What pattern of loss possibly results in one packet > > being inserted in the stream ahead of another? > > Rusty asked me today what protocols there were that coped with packet > loss but couldn't cope with reordering. I couldn't think of any. Do > you know of any examples? > Does LLC allow for re-ordering? From john@grabjohn.com Fri Jun 27 08:16:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:16:53 -0700 (PDT) Received: from 81-2-122-30.bradfords.org.uk (81-2-122-30.bradfords.org.uk [81.2.122.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RFGl2x007579 for ; Fri, 27 Jun 2003 08:16:49 -0700 Received: from 81-2-122-30.bradfords.org.uk (localhost [127.0.0.1]) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9) with ESMTP id h5RFP5BJ001777; Fri, 27 Jun 2003 16:25:05 +0100 Received: (from john@localhost) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9/Submit) id h5RFP5K1001776; Fri, 27 Jun 2003 16:25:05 +0100 Date: Fri, 27 Jun 2003 16:25:05 +0100 From: John Bradford Message-Id: <200306271525.h5RFP5K1001776@81-2-122-30.bradfords.org.uk> To: davem@redhat.com, matti.aarnio@zmailer.org, mbligh@aracnet.com Subject: Re: networking bugs and bugme.osdl.org Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com X-archive-position: 3569 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@grabjohn.com Precedence: bulk X-list: netdev > > Ok, responding so that the response appears also > > at the bug db is another story. > > That is possible to do - there's patches to Bugzilla that implement an > email interface, but it has some problems like the one you pointed out > above. One possiblility is to make people manually do something to the > email for each reply, but that's rather ugly. > > Hopefully we can discuss this more at OLS this year, and get a plan > going forward that people are happy with. I'm well aware that Bugzilla > is not the perefect tool, but I think it's better than what we had > before (yeah, I know some of us disagree), and is easy to change. > I'd rather start with something simple, and evolve it to the needs of > the community than try dumping something complex onto people up front. I did make the effort to make a dedicated bug database for kernel development in December last year. Do people actively hate it, or are they just not aware of it? :-). I got some very favourable comments early on, and a review in a Linux magazine, but haven't had much feedback recently about it. I was specifically trying to address the kind of problems we're seeing with Bugzilla... http://grabjohn.com/kernelbugdatabase John. From carlson@workingcode.com Fri Jun 27 08:27:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:27:56 -0700 (PDT) Received: from workingcode.com (h006008986325.ne.client2.attbi.com [24.61.67.218]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RFRk2x007980 for ; Fri, 27 Jun 2003 08:27:47 -0700 Received: from workingcode.com (websterhost [127.0.0.1]) by workingcode.com (8.12.0/8.12.0) with ESMTP id h5RFRfW7019304; Fri, 27 Jun 2003 11:27:41 -0400 Received: (from carlson@localhost) by workingcode.com (8.12.0/8.12.0/Submit) id h5RFReZl005986; Fri, 27 Jun 2003 11:27:40 -0400 From: James Carlson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16124.25196.268015.288718@h006008986325.ne.client2.attbi.com> Date: Fri, 27 Jun 2003 11:27:40 -0400 (EDT) To: Stephen Hemminger Cc: Paul Mackerras , hadi@shell.cyberus.ca, davem@redhat.com, rusty@rustcorp.com.au, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: Stephen Hemminger's message of 27 June 2003 07:59:46 References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> <16124.13469.944716.441016@nanango.paulus.ozlabs.org> <20030627075946.7ab6f591.shemminger@osdl.org> X-Mailer: VM 6.75 under Emacs 20.6.1 X-archive-position: 3570 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: carlson@workingcode.com Precedence: bulk X-list: netdev Stephen Hemminger writes: > Does LLC allow for re-ordering? ANSI/IEEE Std 802.2, 1998, section 8.5.2.2 describes a common LLC type 3 simplification that relies on MAC ordering. This is used on media (such as Ethernet) that don't reorder. I believe that LLC type 2 ought to be able to handle misordering and duplication, at least that's the intent of I-mode frames. I don't know if this actually works in all implementations (after all, Ethernet doesn't reorder, so it's not as if anyone's really had to test it), but I can check one or two if someone cares. I'm not sure about LLC type 1. It appears to expose the client to the ordering guarantees of the underlying MAC layer, and thus it's very likely the case that LLC type 1 clients make assumptions about known MAC types. -- James Carlson From krkumar@us.ibm.com Fri Jun 27 08:45:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:45:06 -0700 (PDT) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RFir2x008364 for ; Fri, 27 Jun 2003 08:45:00 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5RFilx2131568; Fri, 27 Jun 2003 11:44:47 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5RFiiK6171376; Fri, 27 Jun 2003 11:44:45 -0400 Message-ID: <3EFC668F.9010004@us.ibm.com> Date: Fri, 27 Jun 2003 08:45:19 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) References: <3EF9D5C2.5080101@us.ibm.com> <20030625.234251.116353369.davem@redhat.com> <3EFB2017.5030202@us.ibm.com> <20030626.230727.35666164.davem@redhat.com> In-Reply-To: <20030626.230727.35666164.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3571 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev rtnetlink_rcv_msg() calls dumpit() (via netlink_dump_start) only for those messages for which the last two bits are binary '10'. So I had to use these values. All the other *GET* macros use the same semantics. thanks, - KK David S. Miller wrote: > From: Krishna Kumar > Date: Thu, 26 Jun 2003 09:32:23 -0700 > > I still have problems with this patch. > > -#define RTM_MAX (RTM_BASE+31) > +#define RTM_GETLNKFLAGS (RTM_BASE+34) > + > +#define RTM_GETPLIST (RTM_BASE+38) > > Please allocate contiguous numbers to the new messages, don't skip > around like this. > > Thanks. (this of course means you have to redo your 2.4.x patch > as well) > From scott.feldman@intel.com Fri Jun 27 08:57:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 08:58:04 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RFvs2x008729 for ; Fri, 27 Jun 2003 08:57:55 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by hermes.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h5RFr5123868 for ; Fri, 27 Jun 2003 15:53:06 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by talaria.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h5RFwKk08174 for ; Fri, 27 Jun 2003 15:58:20 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003062708575222411 ; Fri, 27 Jun 2003 08:57:52 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 27 Jun 2003 08:57:52 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: e1000 lockup with port io type reset Date: Fri, 27 Jun 2003 08:57:52 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: e1000 lockup with port io type reset Thread-Index: AcM8q9IqcMDTaUmyTpSwWXQzYHGNiQAF++NA From: "Feldman, Scott" To: "Zwane Mwaikambo" , X-OriginalArrivalTime: 27 Jun 2003 15:57:52.0531 (UTC) FILETIME=[DD954630:01C33CC4] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5RFvs2x008729 X-archive-position: 3572 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > The following card causes a hard lockup when we do a controller > reset using port io instead of mmio. Switching to mmio > controller reset causes it to function as expected Please send me a description of system (model, kernel, DMI decode, etc), plus dumps from lspci -vv -xxx and ethtool -d ethX. If we can repro this, we should be able to get a bus trace. -scott From greearb@candelatech.com Fri Jun 27 11:50:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 11:51:01 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RIou2x010693 for ; Fri, 27 Jun 2003 11:50:56 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5RIohxA026128; Fri, 27 Jun 2003 11:50:44 -0700 Message-ID: <3EFC9203.3090508@candelatech.com> Date: Fri, 27 Jun 2003 11:50:43 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org References: <20030626.223002.21926109.davem@redhat.com> <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> In-Reply-To: <20030626.224739.88478624.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3573 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: "Martin J. Bligh" > Date: Thu, 26 Jun 2003 22:46:10 -0700 > > If people choose to file bugs in bugzilla as well, they'll still be > processed by someone. > > Just so that someone can post them to the lists? > That sounds like a completely silly way to operate. > > I'd rather they get posted to the lists _ONLY_. I'm sure bugz could be set up to send a report to netdev everytime a bug was entered. And, we would also have a good record of bugs that people could search. It would also keep bugs from falling through the cracks: If 'everyone' is responsible, that can often mean that no one takes responsibility. Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Fri Jun 27 14:44:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 14:44:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RLhx2x013964 for ; Fri, 27 Jun 2003 14:44:00 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA07726; Fri, 27 Jun 2003 14:37:38 -0700 Date: Fri, 27 Jun 2003 14:37:38 -0700 (PDT) Message-Id: <20030627.143738.41641928.davem@redhat.com> To: davidel@xmailserver.org Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3574 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Davide Libenzi Date: Fri, 27 Jun 2003 07:56:16 -0700 (PDT) The good of a bug tracking system against the mailing list is that bugs do survive in a bug tracking system, No, this is the _BAD_ part, shit accumulates equally with useful reports. Useful reports in non-bugtracking system environments get retransmitted and eventually looked at. From davem@redhat.com Fri Jun 27 14:50:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 14:50:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RLoj2x014297 for ; Fri, 27 Jun 2003 14:50:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA07754; Fri, 27 Jun 2003 14:44:26 -0700 Date: Fri, 27 Jun 2003 14:44:26 -0700 (PDT) Message-Id: <20030627.144426.71096593.davem@redhat.com> To: greearb@candelatech.com Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <3EFC9203.3090508@candelatech.com> References: <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> <3EFC9203.3090508@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3575 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Fri, 27 Jun 2003 11:50:43 -0700 It would also keep bugs from falling through the cracks: People DON'T understand. I _WANT_ them to be able to fall through the cracks. If it's important, people will retransmit. This work for kernel patches, and has so for over 5 years. So what makes anyone thing it doesn't work for bug reporting? From davem@redhat.com Fri Jun 27 14:54:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 14:54:12 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RLs72x014614 for ; Fri, 27 Jun 2003 14:54:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA07792; Fri, 27 Jun 2003 14:47:52 -0700 Date: Fri, 27 Jun 2003 14:47:52 -0700 (PDT) Message-Id: <20030627.144752.78715628.davem@redhat.com> To: krkumar@us.ibm.com Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) From: "David S. Miller" In-Reply-To: <3EFC668F.9010004@us.ibm.com> References: <3EFB2017.5030202@us.ibm.com> <20030626.230727.35666164.davem@redhat.com> <3EFC668F.9010004@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3576 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Krishna Kumar Date: Fri, 27 Jun 2003 08:45:19 -0700 rtnetlink_rcv_msg() calls dumpit() (via netlink_dump_start) only for those messages for which the last two bits are binary '10'. So I had to use these values. All the other *GET* macros use the same semantics. Ok, please retransmit your two patches (2.4.x and 2.5.x) to me under seperate cover. I don't keep a copy around of patches I've decided not to apply. From greearb@candelatech.com Fri Jun 27 14:54:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 14:54:41 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RLsa2x014717 for ; Fri, 27 Jun 2003 14:54:37 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5RLsRxA016874; Fri, 27 Jun 2003 14:54:27 -0700 Message-ID: <3EFCBD12.3070101@candelatech.com> Date: Fri, 27 Jun 2003 14:54:26 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> In-Reply-To: <20030627.143738.41641928.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3577 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Davide Libenzi > Date: Fri, 27 Jun 2003 07:56:16 -0700 (PDT) > > The good of a bug tracking system against the mailing list is that > bugs do survive in a bug tracking system, > > No, this is the _BAD_ part, shit accumulates equally with > useful reports. > > Useful reports in non-bugtracking system environments get > retransmitted and eventually looked at. I think you are putting too much work on the bug reporter(s). If you want to ignore bug reports that only happen once, feel free, but give the rest of us a way to easily keep a history and list of bug reports. For instance, where is the list of open networking bugs for 2.4 now? -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Fri Jun 27 15:01:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:01:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RM1C2x015298 for ; Fri, 27 Jun 2003 15:01:12 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA07850; Fri, 27 Jun 2003 14:54:57 -0700 Date: Fri, 27 Jun 2003 14:54:56 -0700 (PDT) Message-Id: <20030627.145456.115915594.davem@redhat.com> To: greearb@candelatech.com Cc: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <3EFCBD12.3070101@candelatech.com> References: <20030627.143738.41641928.davem@redhat.com> <3EFCBD12.3070101@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3578 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Fri, 27 Jun 2003 14:54:26 -0700 I think you are putting too much work on the bug reporter(s). Don't even talk to me about too much work. Someone wants me to spend hours groveling through some pieces of code to track down some tricky bug, for free, and all I ask is that they retransmit the bug every once in a while if they don't see any response? Give me a frigging break. If they're not willing to do this, they DON'T care about the bug. Just like if people aren't willing to retransmit patches they want installed, they DON'T care about the patch. And just like I don't want to apply patches people don't care about, I don't want any of my contributors looking at bugs that the bug reporter doesn't care about. Just like with patches, I want to know that people are going to stick around and be responsive if I need to get information from them when a bug is reported. If they're not willing to retransmit the report every one in a while, why should I believe they will? Ben, you absolutely don't understand how all of this development works and what it relies upon to function properly. From davidel@xmailserver.org Fri Jun 27 15:03:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:03:38 -0700 (PDT) Received: from x35.xmailserver.org (x35.xmailserver.org [208.129.208.51]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RM3W2x015630 for ; Fri, 27 Jun 2003 15:03:32 -0700 X-AuthUser: davidel@xmailserver.org Received: from bigblue.dev.mcafeelabs.com by xmailserver.org with [XMail 1.16 (Linux/Ix86) ESMTP Server] id for from ; Fri, 27 Jun 2003 15:08:52 -0700 Date: Fri, 27 Jun 2003 15:02:00 -0700 (PDT) From: Davide Libenzi X-X-Sender: davide@bigblue.dev.mcafeelabs.com To: "David S. Miller" cc: mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org In-Reply-To: <20030627.143738.41641928.davem@redhat.com> Message-ID: References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3579 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidel@xmailserver.org Precedence: bulk X-list: netdev On Fri, 27 Jun 2003, David S. Miller wrote: > No, this is the _BAD_ part, shit accumulates equally with > useful reports. > > Useful reports in non-bugtracking system environments get > retransmitted and eventually looked at. David, your method is the dream of every software developer. Having Q/A repeatedly pushing the same issue. Having a track is good and flagging a report as not-a-bug or need-more-info takes almost the same time (if the system is sanely designed) it takes you to flag your message a shit. In this way though you do not lose things meaningful that you overlooked at first sight. And this comes from someone that wanted to quit his job when they forced for the first time to use a tracking system ;) - Davide From davem@redhat.com Fri Jun 27 15:09:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:09:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RM922x015997 for ; Fri, 27 Jun 2003 15:09:03 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA07920; Fri, 27 Jun 2003 15:02:48 -0700 Date: Fri, 27 Jun 2003 15:02:48 -0700 (PDT) Message-Id: <20030627.150248.08328103.davem@redhat.com> To: davidel@xmailserver.org Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: References: <20030627.143738.41641928.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3580 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Davide Libenzi Date: Fri, 27 Jun 2003 15:02:00 -0700 (PDT) David, your method is the dream of every software developer. It is not a dream, it works perfectly fine and has done so for 5+ years of Linux maintainence. To make these things scale you MUST push the work out to other people, you absolutely cannot centralize. And here we're pushing it out to the bug reporters, just like we push the work of patch maintainence to the patch submitters. If they don't care about the bug and won't retransmit when their stuff isn't being looked at, their bug isn't worth being looked at. I know that's a hard pill to swallow, but over years of work I can tell you this is the only scalable mechanism. Nobody likes this because it's not tracked somewhere and they can't show some pretty list of bugs to their management at the end of each week, TOO FUCKING BAD. Pay someone to work on your bugs if you want a pretty list and people being REQUIRED to look at and fix bugs. None of this crap is my problem. From davidel@xmailserver.org Fri Jun 27 15:13:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:13:35 -0700 (PDT) Received: from x35.xmailserver.org (x35.xmailserver.org [208.129.208.51]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMDP2x016317 for ; Fri, 27 Jun 2003 15:13:26 -0700 X-AuthUser: davidel@xmailserver.org Received: from bigblue.dev.mcafeelabs.com by xmailserver.org with [XMail 1.16 (Linux/Ix86) ESMTP Server] id for from ; Fri, 27 Jun 2003 15:18:46 -0700 Date: Fri, 27 Jun 2003 15:11:53 -0700 (PDT) From: Davide Libenzi X-X-Sender: davide@bigblue.dev.mcafeelabs.com To: "David S. Miller" cc: mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org In-Reply-To: <20030627.150248.08328103.davem@redhat.com> Message-ID: References: <20030627.143738.41641928.davem@redhat.com> <20030627.150248.08328103.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3581 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidel@xmailserver.org Precedence: bulk X-list: netdev On Fri, 27 Jun 2003, David S. Miller wrote: > From: Davide Libenzi > Date: Fri, 27 Jun 2003 15:02:00 -0700 (PDT) > > David, your method is the dream of every software developer. > > It is not a dream, it works perfectly fine and has done so > for 5+ years of Linux maintainence. > > To make these things scale you MUST push the work out to other people, > you absolutely cannot centralize. And here we're pushing it out to > the bug reporters, just like we push the work of patch maintainence to > the patch submitters. > > If they don't care about the bug and won't retransmit when their > stuff isn't being looked at, their bug isn't worth being looked > at. David, I'm not willing to waste both precious time arguing on this but I will leave you question to think about. Is a bug report more useful for the user of a "system" or for the "system" itself ? - Davide From greearb@candelatech.com Fri Jun 27 15:15:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:15:27 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMFL2x016627 for ; Fri, 27 Jun 2003 15:15:22 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5RMF7xA019540; Fri, 27 Jun 2003 15:15:08 -0700 Message-ID: <3EFCC1EB.2070904@candelatech.com> Date: Fri, 27 Jun 2003 15:15:07 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org References: <20030627.143738.41641928.davem@redhat.com> <3EFCBD12.3070101@candelatech.com> <20030627.145456.115915594.davem@redhat.com> In-Reply-To: <20030627.145456.115915594.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3582 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Ben Greear > Date: Fri, 27 Jun 2003 14:54:26 -0700 > > I think you are putting too much work on the bug reporter(s). > > Don't even talk to me about too much work. > > Someone wants me to spend hours groveling through some pieces of code > to track down some tricky bug, for free, and all I ask is that they > retransmit the bug every once in a while if they don't see any > response? I don't care if you completely ignore the bugzilla, just let the rest of us lesser mortals use it. There's always the chance we will find something we can fix and actually lessen your load. > If they're not willing to do this, they DON'T care about the bug. > Just like if people aren't willing to retransmit patches they want > installed, they DON'T care about the patch. And just like I don't > want to apply patches people don't care about, I don't want any of my > contributors looking at bugs that the bug reporter doesn't care about. Forcing people to continue to retransmit the same report just pisses people off, and in the end will get you less useful reports than if you had flagged the report as 'please-gimme-more-info'. And, most people, especially the savvy ones, will find some sort of work-around and keep going. That didn't fix the problem, it just made it invisible again untill the next person hits it. > Ben, you absolutely don't understand how all of this development works > and what it relies upon to function properly. Perhaps, but it's also possible that you are being a stubborn SOB because you fear change :) Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Fri Jun 27 15:19:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:19:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMJt2x016951 for ; Fri, 27 Jun 2003 15:19:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA07984; Fri, 27 Jun 2003 15:13:39 -0700 Date: Fri, 27 Jun 2003 15:13:38 -0700 (PDT) Message-Id: <20030627.151338.21924648.davem@redhat.com> To: davidel@xmailserver.org Cc: mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: References: <20030627.150248.08328103.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3583 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Davide Libenzi Date: Fri, 27 Jun 2003 15:11:53 -0700 (PDT) Is a bug report more useful for the user of a "system" or for the "system" itself ? You can ask the same question about a patch, and my answer is the same, "it depends upon the bug/patch and whether the people it affects actually care about it". What's truly "useful" for the "system" are things that allow the people that maintain it to SCALE. Lossless bug and patch databases that force me to look at each and every item do not scale. What you don't understand is that I, and most of the people who help me, do this because I and they want to. So as soon as things get introduced that make us less "want" to do this you can be sure contributions will slump slowly but surely to nothing. From davem@redhat.com Fri Jun 27 15:25:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:25:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMPO2x017275 for ; Fri, 27 Jun 2003 15:25:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA08028; Fri, 27 Jun 2003 15:19:06 -0700 Date: Fri, 27 Jun 2003 15:19:06 -0700 (PDT) Message-Id: <20030627.151906.102571486.davem@redhat.com> To: greearb@candelatech.com Cc: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <3EFCC1EB.2070904@candelatech.com> References: <3EFCBD12.3070101@candelatech.com> <20030627.145456.115915594.davem@redhat.com> <3EFCC1EB.2070904@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3584 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Fri, 27 Jun 2003 15:15:07 -0700 Forcing people to continue to retransmit the same report just pisses people off, and in the end will get you less useful reports than if you had flagged the report as 'please-gimme-more-info'. And this is different from patch submission in what way? Perhaps, but it's also possible that you are being a stubborn SOB because you fear change :) Absolutely not, in fact I'm daily looking for ways to change how I work with people who help me so that I scale better. And I know for sure that a bug datamase with shit that accumulates in it that _REQUIRES_ me to do something about it to make it go away does not help me scale. Bugme was an absolute burdon for me. For something to scale, it must continute to operate just as efficiently if I were to go away for a few weeks. The lists have that quality, the bug database with owner does not. From greearb@candelatech.com Fri Jun 27 15:36:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:36:52 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMal2x017614 for ; Fri, 27 Jun 2003 15:36:47 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5RMaUxA022245; Fri, 27 Jun 2003 15:36:39 -0700 Message-ID: <3EFCC6EE.3020106@candelatech.com> Date: Fri, 27 Jun 2003 15:36:30 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org References: <3EFCBD12.3070101@candelatech.com> <20030627.145456.115915594.davem@redhat.com> <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> In-Reply-To: <20030627.151906.102571486.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3585 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Ben Greear > Date: Fri, 27 Jun 2003 15:15:07 -0700 > > Forcing people to continue to retransmit the same report just pisses > people off, and in the end will get you less useful reports than if > you had flagged the report as 'please-gimme-more-info'. > > And this is different from patch submission in what way? It wouldn't bother me to have a list of all patches submitted either, it would keep folks from re-implementing the same thing from time to time. However, the main difference is that having to cary patches forward is a constant drag on the person with the patch that was not accepted, so they are constantly aware of how nice it would be to get it included..thus they may keep trying. A user with a PCMCIA NIC that reorders packets can get another NIC, so that bug will never re-transmitted, and it will never get fixed. What is worse, new users of that busted NIC will have to re-discover that all over for themselves, because there is no bug database to search. > > Perhaps, but it's also possible that you are being a stubborn SOB > because you fear change :) > > Absolutely not, in fact I'm daily looking for ways to change how > I work with people who help me so that I scale better. And I know > for sure that a bug datamase with shit that accumulates in it > that _REQUIRES_ me to do something about it to make it go away > does not help me scale. > > Bugme was an absolute burdon for me. > > For something to scale, it must continute to operate just as > efficiently if I were to go away for a few weeks. The lists have that > quality, the bug database with owner does not. So, you'd be happy so long as bugz sent mail to the netdev mailing lists instead of to you? -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From bmc@phunnypharm.org Fri Jun 27 15:44:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:44:28 -0700 (PDT) Received: from bristol.phunnypharm.org (bristol.phunnypharm.org [65.207.35.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMiF2x017957 for ; Fri, 27 Jun 2003 15:44:16 -0700 Received: from hopper.phunnypharm.org ([65.207.35.143] helo=localhost ident=mail) by bristol.phunnypharm.org with esmtp (Exim 4.20) id 19W0tq-0007cD-PC; Fri, 27 Jun 2003 17:36:54 -0400 Received: from bmc by localhost with local (Exim 3.36 #1 (Debian)) id 19W0oz-0004nV-00; Fri, 27 Jun 2003 17:31:53 -0400 Date: Fri, 27 Jun 2003 17:31:53 -0400 From: Ben Collins To: Davide Libenzi Cc: "David S. Miller" , mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030627213153.GR501@phunnypharm.org> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3586 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bcollins@debian.org Precedence: bulk X-list: netdev On Fri, Jun 27, 2003 at 03:02:00PM -0700, Davide Libenzi wrote: > On Fri, 27 Jun 2003, David S. Miller wrote: > > > No, this is the _BAD_ part, shit accumulates equally with > > useful reports. > > > > Useful reports in non-bugtracking system environments get > > retransmitted and eventually looked at. > > David, your method is the dream of every software developer. Having Q/A > repeatedly pushing the same issue. Having a track is good and flagging a > report as not-a-bug or need-more-info takes almost the same time (if the > system is sanely designed) it takes you to flag your message a shit. In > this way though you do not lose things meaningful that you overlooked at > first sight. And this comes from someone that wanted to quit his job when > they forced for the first time to use a tracking system ;) As a bug reporter, and as someone who receives bug reports, I can say that on both ends I find it easier to send emails, and get emails than to fiddle with any bug tracking tool. I'm with Dave on this one. Scrap the nifty tools, and just use good sense. Emails let each developer handle bug reports in their own way. I'm sure you could make a nice local tool for yourself to manage your own bug reports. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ Deqo - http://www.deqo.com/ From mbligh@aracnet.com Fri Jun 27 15:47:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:47:43 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMlc2x018281 for ; Fri, 27 Jun 2003 15:47:38 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5RMlCga026297; Fri, 27 Jun 2003 15:47:13 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5RMlMfg022233; Fri, 27 Jun 2003 15:47:26 -0700 Date: Fri, 27 Jun 2003 15:47:22 -0700 From: "Martin J. Bligh" To: "David S. Miller" , greearb@candelatech.com cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <1230000.1056754041@[10.10.2.4]> In-Reply-To: <20030627.144426.71096593.davem@redhat.com> References: <18330000.1056692768@[10.10.2.4]><20030626.224739.88478624.davem@redhat.com><3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3587 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --"David S. Miller" wrote (on Friday, June 27, 2003 14:44:26 -0700): > From: Ben Greear > Date: Fri, 27 Jun 2003 11:50:43 -0700 > > It would also keep bugs from falling through the cracks: > > People DON'T understand. I _WANT_ them to be able to > fall through the cracks. I fail to see your point here. If that's what you want, then just don't look at the bugme data. However it's still available for other people that do want to see it. I can easily arrange for them to be transmitted when first filed by email, in fact we already do that. M. From lm@bitmover.com Fri Jun 27 15:53:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 15:53:28 -0700 (PDT) Received: from smtp.bitmover.com (smtp.bitmover.com [192.132.92.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RMrJ2x018623 for ; Fri, 27 Jun 2003 15:53:19 -0700 Received: from work.bitmover.com (ipcop.bitmover.com [192.132.92.15]) by smtp.bitmover.com (8.12.9/8.12.9) with ESMTP id h5S6sVm7008312; Fri, 27 Jun 2003 23:54:31 -0700 Received: (from lm@localhost) by work.bitmover.com (8.11.6/8.11.6) id h5RMr5L14344; Fri, 27 Jun 2003 15:53:05 -0700 Date: Fri, 27 Jun 2003 15:53:05 -0700 From: Larry McVoy To: "Martin J. Bligh" Cc: "David S. Miller" , greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030627225305.GA13785@work.bitmover.com> Mail-Followup-To: Larry McVoy , "Martin J. Bligh" , "David S. Miller" , greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com References: <20030627.144426.71096593.davem@redhat.com> <1230000.1056754041@[10.10.2.4]> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1230000.1056754041@[10.10.2.4]> User-Agent: Mutt/1.4i X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-MailScanner-SpamCheck: not spam (whitelisted), SpamAssassin (score=0.5, required 7, AWL, DATE_IN_PAST_06_12) X-archive-position: 3588 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lm@bitmover.com Precedence: bulk X-list: netdev On Fri, Jun 27, 2003 at 03:47:22PM -0700, Martin J. Bligh wrote: > --"David S. Miller" wrote (on Friday, June 27, 2003 14:44:26 -0700): > > > From: Ben Greear > > Date: Fri, 27 Jun 2003 11:50:43 -0700 > > > > It would also keep bugs from falling through the cracks: > > > > People DON'T understand. I _WANT_ them to be able to > > fall through the cracks. > > I fail to see your point here. This might help. Or not. Brain dump on the bug tracking problem from the Kernel Summit discussions [SCCS/s.BUGS vers 1.3 2001/04/05 13:10:10] Outline Problems Problem details Past experiences Requirements Problems - getting quality bug reports - not losing any bugs - sorting low signal vs high signal into a smaller high signal pile - simplified, preferably NNTP, access to the bug database (Linus would use this; he's unlikely to use anything else) Problem details Bug report quality There was lots of discussion on this. The main agreement was that we wanted the bug reporting system to dig out as much info as possible and prefill that. There was a lot of discussion about possible tools that would dig out the /proc/pci info; there was discussion about Andre's tools which can tell you if you can write your disk; someone else had something similar. But the main thing was to extract all the info we could automatically. One thing was the machine config (hardware and at least kernel version). The other thing was extract any oops messages and get a stack traceback. The other main thing was to define some sort of structure to the bug report and try and get the use to categorize if they could. In an ideal world, we would use the maintainers file and the stack traceback to cc the bug to the maintainer. I think we want to explore this a bit. I'm not sure that the maintainer file is the way to go, what if we divided it up into much broader chunks like "fs", "vm", "network drivers", and had a mail forwarder for each area. That could fan out to the maintainers. Not losing bugs While there was much discussion about how to get rid of bad, incorrect, and/or duplicate bug reports, several people - Alan in particular - made the point that having a complete collection of all bug reports was important. You can do data mining across all/part of them and look for patterns. The point was that there is some useful signal amongst all the noise so we do not want to lose that signal. Signal/noise We had a lot of discussion about how to deal with signal/noise. The bugzilla proponents thought we could do this with some additional hacking to bugzilla. I, given the BitKeeper background, thought that we could do this by having two databases, one with all the crud in it and another with just the screened bugs in it. No matter how it is done, there needs to be some way to both keep a full list, which will likely be used only for data mining, and another, much smaller list of screened bugs. Jens wants there to be a queue of new bugs and a mechanism where people can come in the morning, pull a pile of bugs off of the queue, sort them, sending some to the real database. This idea has a lot of merit, it needs some pondering as DaveM would say, to get to the point that we have a workable mechanism which works in a distributed fashion. The other key point seemed to be that if nobody picked up a bug and nobody said that this bug should be picked up, then the bug expires out of the pending queue. It gets stashed in the bug archive for mining purposes and it can be resurrected if it later becomes a real bug, but the key point seems to be that it _automatically_ disappears out of the pending queue. I personally am very supportive of this model. We need some way to just let junk stay junk. If junk has to be pruned out of the system by humans, the system sucks. The system, not humans, needs to autoprune. Simplified access: browsing and updating Linus made the point that mailing lists suck. He isn't on any and refuses to join any. He reads lists with a news reader. I think people should sit up and listen to that - it's a key point. If your mailing list isn't gatewayed to a newsgroup, he isn't reading it and a lot of other people aren't either. There was a fair bit of discussion about how to get the bug database connected to news. There doesn't seem to be any reason that the bug system couldn't be a news server/gateway. You should be able to browse bitbucket.kernel.bugs - all the unscreened crud screened.kernel.bugs - all bugs which have been screened fs.kernel.bugs - screened bugs in the "fs" category ext2.kernel.bugs - screened bugs in the "ext2" category eepro.kernel.bugs - screened bugs in the "eepro" category etc. Furthermore, the bugs should be structured once they are screened, i.e., they have a set of fields like (this is a strawman): Synopsis - one line man-page like summary of the bug Severity - how critical is this bug? Priority - how soon does it need to be fixed? Category - subsystem in which the bug occurs Description - details on the bug, oops, stack trace, etc. Hardware - hardware info Software - kernel version, glibc version, etc. Suggested fix - any suggestion on how to fix it Interest list - set of email addresses and/or newsgroups for updates It ought to work that if someone posts a followup to the bug then if the followup changes any of the fields that gets propagated to the underlying bug database. If this is done properly the news reader will be the only interface that most people use. Past experiences This is a catch all for sound bytes that we don't want to forget... - Sorting bugs by hand is a pain in the ass (Ted burned out on it and Alan refuses to say that it is the joy of his life to do it) - bug systems tend to "get in the way". Unless they are really trivial to submit, search, update then people get tired of using them and go back to the old way - one key observation: let bugs "expire" much like news expires. If nobody has been whining enough that it gets into the high signal bug db then it probably isn't real. We really want a way where no activity means let it expire. - Alan pointed out that having all of the bugs someplace is useful, you can search through the 200 similar bugs and notice that SMP is the common feature. Requirements This section is mostly empty, it's here as a catch all for people's bullet items. - it would be very nice to be able to cross reference bugs to bug fixes in the source management system, as well as the other way around. - mail based interface -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm From greearb@candelatech.com Fri Jun 27 16:00:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:00:53 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RN0h2x018974 for ; Fri, 27 Jun 2003 16:00:44 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5RN0bxA025294 for ; Fri, 27 Jun 2003 16:00:37 -0700 Message-ID: <3EFCCC95.3000709@candelatech.com> Date: Fri, 27 Jun 2003 16:00:37 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "'netdev@oss.sgi.com'" Subject: routing bug report for 2.4 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3589 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev (This has been discussed with Alexey, but sending to the list for general consumption). Here is how to reproduce this: ifconfig eth1 192.1.1.2 netmask 255.255.255.0 ifconfig eth2 192.1.2.2 netmask 255.255.255.0 Set up policy based routing with the 'ip' tool to make packets with source-address of each interface to use the gateway for that interface. Set gateway for eth1 to be 192.1.1.1 Set gateway for eth2 to be 192.1.2.1 Now, use ping to try to send pkts from one interface to the other: ping -I 192.1.1.2 192.1.2.2 You will see arps on eth1 for 192.1.2.2, whereas you should see packets being sent to the default gateway for eth1. If you modify the ping source to BINDTODEVICE eth1, then it will send correctly. I am under the impression that you should not have to specifically BINDTODEVICE in this case since the policy based routing should take care of routing things correctly. Or, maybe, the real bug is in ping in that it did not BINDTODEVICE? Also, ping -I eth1 192.1.2.2 will fail to route externally. That may just be a feature of ping: I'm unsure what the subtle difference is *supposed* to be between using -I eth1 and -I 1.2.3.4 Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From alan@lxorguk.ukuu.org.uk Fri Jun 27 16:07:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:07:44 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RN7Z2x019359 for ; Fri, 27 Jun 2003 16:07:36 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5RN4XKd005553; Sat, 28 Jun 2003 00:04:33 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5RN4VFN005551; Sat, 28 Jun 2003 00:04:31 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030627.144426.71096593.davem@redhat.com> References: <18330000.1056692768@[10.10.2.4]> <20030626.224739.88478624.davem@redhat.com> <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jun 2003 00:04:30 +0100 X-archive-position: 3590 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Gwe, 2003-06-27 at 22:44, David S. Miller wrote: > This work for kernel patches, and has so for over 5 years. > So what makes anyone thing it doesn't work for bug reporting? It works badly for kernel patches, stuff does get lost forever, missed etc. Having it all archived somewhere is really valuable because it means you can spot patterns, trends and also when someone who isnt a hacker hits the bug *they* can find the patch you missed and send it on or remind you You are assuming there is a relationship in bug severity/commonness and number of *developers* who hit it. That isnt true, developer and end user hardware patterns are radically different in some areas From alan@lxorguk.ukuu.org.uk Fri Jun 27 16:12:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:12:07 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RNC22x019712 for ; Fri, 27 Jun 2003 16:12:02 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5RN8xKd005569; Sat, 28 Jun 2003 00:09:00 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5RN8vkG005567; Sat, 28 Jun 2003 00:08:57 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030627.151906.102571486.davem@redhat.com> References: <3EFCBD12.3070101@candelatech.com> <20030627.145456.115915594.davem@redhat.com> <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jun 2003 00:08:56 +0100 X-archive-position: 3591 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Gwe, 2003-06-27 at 23:19, David S. Miller wrote: > Forcing people to continue to retransmit the same report just pisses > people off, and in the end will get you less useful reports than if > you had flagged the report as 'please-gimme-more-info'. > > And this is different from patch submission in what way? Tried doing an SQL query or text analysis for similarities on random messages lurking in private mailboxes or mixed up in list archives. Its really hard. Now try doing that with bugzilla and its really easy. Nobody is saying "Dave shall use bugzilla", maybe you can find an underling to care, maybe the only time you want to use it is to say "thats really freaky, who else is seeing it and what hardware" >From Red Hat bugzilla I've done statistical analysis of IDE failure patterns, I've also dug up year old mislaid patches that would have been lost forever otherwise because the one person who fixed it was missed in the noise, even though lots of the noise was people hitting that same bug From mbligh@aracnet.com Fri Jun 27 16:13:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:13:13 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RND72x019999 for ; Fri, 27 Jun 2003 16:13:07 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5RNCega024469; Fri, 27 Jun 2003 16:12:41 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5RNCmfg022323; Fri, 27 Jun 2003 16:12:52 -0700 Date: Fri, 27 Jun 2003 16:12:48 -0700 From: "Martin J. Bligh" To: "David S. Miller" , greearb@candelatech.com cc: davidel@xmailserver.org, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <29290000.1056755566@[10.10.2.4]> In-Reply-To: <20030627.151906.102571486.davem@redhat.com> References: <3EFCBD12.3070101@candelatech.com><20030627.145456.115915594.davem@redhat.com><3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3592 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --"David S. Miller" wrote (on Friday, June 27, 2003 15:19:06 -0700): > From: Ben Greear > Date: Fri, 27 Jun 2003 15:15:07 -0700 > > Forcing people to continue to retransmit the same report just pisses > people off, and in the end will get you less useful reports than if > you had flagged the report as 'please-gimme-more-info'. > > And this is different from patch submission in what way? > > Perhaps, but it's also possible that you are being a stubborn SOB > because you fear change :) > > Absolutely not, in fact I'm daily looking for ways to change how > I work with people who help me so that I scale better. And I know > for sure that a bug datamase with shit that accumulates in it > that _REQUIRES_ me to do something about it to make it go away > does not help me scale. > > Bugme was an absolute burdon for me. > > For something to scale, it must continute to operate just as > efficiently if I were to go away for a few weeks. The lists have that > quality, the bug database with owner does not. OK, both the scaling thing and the vacation issue are valid complaints, but they're both easy to fix. I realise you're busy, but we can easily just set this up to go to a list of people instead, either including or not including yourself, as you wish. If you don't wish to see bugs after the first time, and just "let them fall through the cracks", then feel free to do so. > I know that's a hard pill to swallow, but over years of work I can > tell you this is the only scalable mechanism. Nobody likes this > because it's not tracked somewhere and they can't show some pretty > list of bugs to their management at the end of each week, TOO FUCKING > BAD. Pay someone to work on your bugs if you want a pretty list and > people being REQUIRED to look at and fix bugs. None of this crap is > my problem. I am not some corporate process weenie - I hate process, and overhead; ask anyone at IBM ;-) The whole thing is structured to be pretty lightweight, and easy to use, and ditches a whole bunch of stuff that's just aimed at tracking and management stats, because I really don't think that's something the Linux community gives a shit about, and neither do I. It's there to help ensure bugs get fixed, and to help both developers and users of Linux - that's really the only goal. Yes, there's a little more work for developers to do up front, but I think it pays off long term (one trivial example is that users will learn to look for a problem already there before sending a duplicate report). If you don't agree, it's easy for you to opt out of being the default owner for bugs, (we did that this morning as soon as you asked) but others might still find it useful. M. From shemminger@osdl.org Fri Jun 27 16:28:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:28:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RNST2x021723 for ; Fri, 27 Jun 2003 16:28:30 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5RNSIq15649; Fri, 27 Jun 2003 16:28:18 -0700 Date: Fri, 27 Jun 2003 16:28:18 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] Fix PPP async regression Message-Id: <20030627162818.606ac706.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5RNST2x021723 X-archive-position: 3593 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Please apply this patch, it fixes the PPP over async regression that the PPPoE changes caused. Basically, PPP puts a zero length skbuff in the receive queue as an error token, and the last change caused that to get flushed as bad data. Thanks to Diego Calleja García , Matthew Harrell for validating this. diff -Nru a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c --- a/drivers/net/ppp_generic.c Fri Jun 27 16:13:28 2003 +++ b/drivers/net/ppp_generic.c Fri Jun 27 16:13:28 2003 @@ -1348,16 +1348,9 @@ struct channel *pch = chan->ppp; int proto; - if (pch == 0) - goto drop; - - /* need to have PPP header */ - if (!pskb_may_pull(skb, 2)) { - if (pch->ppp) { - ++pch->ppp->stats.rx_length_errors; - ppp_receive_error(pch->ppp); - } - goto drop; + if (pch == 0 || skb->len == 0) { + kfree_skb(skb); + return; } proto = PPP_PROTO(skb); @@ -1374,10 +1367,6 @@ ppp_do_recv(pch->ppp, skb, pch); } read_unlock_bh(&pch->upl); - return; - drop: - kfree_skb(skb); - return; } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1409,13 +1398,23 @@ static void ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch) { + if (skb->len >= 2) { #ifdef CONFIG_PPP_MULTILINK - /* XXX do channel-level decompression here */ - if (PPP_PROTO(skb) == PPP_MP) - ppp_receive_mp_frame(ppp, skb, pch); - else + /* XXX do channel-level decompression here */ + if (PPP_PROTO(skb) == PPP_MP) + ppp_receive_mp_frame(ppp, skb, pch); + else #endif /* CONFIG_PPP_MULTILINK */ - ppp_receive_nonmp_frame(ppp, skb); + ppp_receive_nonmp_frame(ppp, skb); + return; + } + + if (skb->len > 0) + /* note: a 0-length skb is used as an error indication */ + ++ppp->stats.rx_length_errors; + + kfree_skb(skb); + ppp_receive_error(ppp); } static void @@ -1448,7 +1447,7 @@ if (ppp->vj == 0 || (ppp->flags & SC_REJ_COMP_TCP)) goto err; - if (skb_tailroom(skb) < 124 || skb_is_nonlinear(skb) ) { + if (skb_tailroom(skb) < 124) { /* copy to a new sk_buff with more tailroom */ ns = dev_alloc_skb(skb->len + 128); if (ns == 0) { @@ -1460,6 +1459,9 @@ kfree_skb(skb); skb = ns; } + else if (!pskb_may_pull(skb, skb->len)) + goto err; + len = slhc_uncompress(ppp->vj, skb->data + 2, skb->len - 2); if (len <= 0) { printk(KERN_DEBUG "PPP: VJ decompression error\n"); @@ -2033,12 +2035,12 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound) { - unsigned char *dp = skb->data + 2; + unsigned char *dp; int len; - if (!pskb_may_pull(skb, CCP_HDRLEN + 2) - || skb->len < (len = CCP_LENGTH(dp)) + 2) - return; /* too short */ + if (!pskb_may_pull(skb, CCP_HDRLEN + 2)) + return; /* no header */ + dp = skb->data + 2; switch (CCP_CODE(dp)) { case CCP_CONFREQ: @@ -2071,10 +2073,8 @@ case CCP_CONFACK: if ((ppp->flags & (SC_CCP_OPEN | SC_CCP_UP)) != SC_CCP_OPEN) break; - - if (!pskb_may_pull(skb, len)) - break; - + if (!pskb_may_pull(skb, len = CCP_LENGTH(dp)) + 2) + return; /* too short */ dp += CCP_HDRLEN; len -= CCP_HDRLEN; if (len < CCP_OPT_MINLEN || len < CCP_OPT_LENGTH(dp)) From shemminger@osdl.org Fri Jun 27 16:35:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:35:42 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RNZb2x022062 for ; Fri, 27 Jun 2003 16:35:37 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h5RNZOq18025; Fri, 27 Jun 2003 16:35:24 -0700 Date: Fri, 27 Jun 2003 16:35:24 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] PPP handling fragmented skbuff's Message-Id: <20030627163524.347b2c8e.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3594 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Don't think this ever happens today, but if PPP ever gets a fragmented a skbuff and decides to copy it then bad things will happen. The following replaces the places that memcpy() with skb_copy_bits(). Please review carefully before applying, it builds and runs but can't really force these code path to occur under normal systems and devices. diff -Nru a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c --- a/drivers/net/ppp_generic.c Fri Jun 27 16:13:38 2003 +++ b/drivers/net/ppp_generic.c Fri Jun 27 16:13:38 2003 @@ -844,7 +844,7 @@ if (ns == 0) goto outf; skb_reserve(ns, dev->hard_header_len); - memcpy(skb_put(ns, skb->len), skb->data, skb->len); + skb_copy_bits(skb, 0, skb_put(ns, skb->len), skb->len); kfree_skb(skb); skb = ns; } @@ -1455,7 +1455,7 @@ goto err; } skb_reserve(ns, 2); - memcpy(skb_put(ns, skb->len), skb->data, skb->len); + skb_copy_bits(skb, 0, skb_put(ns, skb->len), skb->len); kfree_skb(skb); skb = ns; } @@ -1826,7 +1826,7 @@ if (head != tail) /* copy to a single skb */ for (p = head; p != tail->next; p = p->next) - memcpy(skb_put(skb, p->len), p->data, p->len); + skb_copy_bits(p, 0, skb_put(skb, p->len), p->len); ppp->nextseq = tail->sequence + 1; head = tail->next; } From bmc@phunnypharm.org Fri Jun 27 16:42:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 16:42:49 -0700 (PDT) Received: from bristol.phunnypharm.org (bristol.phunnypharm.org [65.207.35.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5RNgj2x022395 for ; Fri, 27 Jun 2003 16:42:45 -0700 Received: from hopper.phunnypharm.org ([65.207.35.143] helo=localhost ident=mail) by bristol.phunnypharm.org with esmtp (Exim 4.20) id 19W1oU-0007qC-Ng; Fri, 27 Jun 2003 18:35:26 -0400 Received: from bmc by localhost with local (Exim 3.36 #1 (Debian)) id 19W1jc-0004u2-00; Fri, 27 Jun 2003 18:30:24 -0400 Date: Fri, 27 Jun 2003 18:30:24 -0400 From: Ben Collins To: Andrew Morton Cc: davidel@xmailserver.org, davem@redhat.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030627223024.GT501@phunnypharm.org> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030627162527.714091ce.akpm@digeo.com> User-Agent: Mutt/1.5.4i X-archive-position: 3595 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bcollins@debian.org Precedence: bulk X-list: netdev > - The bugs which are affecting people the most get reported the most. Not to mention the "breeding" affect. A bug that many people have seen only once, but can never pinpoint because they can't reproduce it. One of those people reports the problem to the mailing list, and suddenly half a dozen respond with "me too, but here's some extra info that I saw". You can't get that with a bug database. -- Debian - http://www.debian.org/ Linux 1394 - http://www.linux1394.org/ Subversion - http://subversion.tigris.org/ Deqo - http://www.deqo.com/ From davem@redhat.com Fri Jun 27 17:06:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:06:48 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S06g2x023112 for ; Fri, 27 Jun 2003 17:06:43 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08326; Fri, 27 Jun 2003 17:00:23 -0700 Date: Fri, 27 Jun 2003 17:00:22 -0700 (PDT) Message-Id: <20030627.170022.74744550.davem@redhat.com> To: greearb@candelatech.com Cc: davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <3EFCC6EE.3020106@candelatech.com> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <3EFCC6EE.3020106@candelatech.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3596 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ben Greear Date: Fri, 27 Jun 2003 15:36:30 -0700 So, you'd be happy so long as bugz sent mail to the netdev mailing lists instead of to you? The best power I have to scale is the delete key in my email reader, when I delete an email it's gone and that's it. bugme bugs don't have this attribute, they are like emails that persist forever until someone does something about them, and this is the big problem I have with it. From davem@redhat.com Fri Jun 27 17:15:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:15:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0FN2x023503 for ; Fri, 27 Jun 2003 17:15:23 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08352; Fri, 27 Jun 2003 17:09:07 -0700 Date: Fri, 27 Jun 2003 17:09:07 -0700 (PDT) Message-Id: <20030627.170907.71096768.davem@redhat.com> To: mbligh@aracnet.com Cc: greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1230000.1056754041@[10.10.2.4]> References: <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> <1230000.1056754041@[10.10.2.4]> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3597 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Martin J. Bligh" Date: Fri, 27 Jun 2003 15:47:22 -0700 --"David S. Miller" wrote (on Friday, June 27, 2003 14:44:26 -0700): > People DON'T understand. I _WANT_ them to be able to > fall through the cracks. I fail to see your point here. If that's what you want, then just don't look at the bugme data. bugme bugs persist, when I delete an email it doesn't get deleted from the bugme database (at least when I go and view it). Let me draw a diagram for you, say we have 3 contributors A B and C. They watch the mailing lists, analyze bugs, and work on new features. They work on what they want to, by the very nature of open-source development. When a bug hits a mailing list the following might happen: A is overloaded, he deletes the email. B has a look, realizes he is not competent in this area and deletes the email. C analayzes and fixes the bug. I want A and B to have never again have to deal with this bug report. There is zero point in having the capability to "delete" the email if it persists in some database somewhere, it's not deleted it's still in the backlog. If nobody need fear their report get deleted by overload on the developers, nobody need do anything but be lazy. And that system does not work, the contribution must be mutual for this system to work. This means that when developers are overloaded they can delete your report and you'll resend it later. I don't understand why people have no problem understanding that this system works when it is in the context of lossy networking protocols (IPV4) and the things that sit on top to ensure reliable data delivery via retransmit (TCP), but when this idea is proposed for things involving people and software development they fall to fear and doubt. From lm@bitmover.com Fri Jun 27 17:20:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:20:20 -0700 (PDT) Received: from smtp.bitmover.com (smtp.bitmover.com [192.132.92.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0KE2x023824 for ; Fri, 27 Jun 2003 17:20:14 -0700 Received: from work.bitmover.com (ipcop.bitmover.com [192.132.92.15]) by smtp.bitmover.com (8.12.9/8.12.9) with ESMTP id h5S8LKm7010360; Sat, 28 Jun 2003 01:21:20 -0700 Received: (from lm@localhost) by work.bitmover.com (8.11.6/8.11.6) id h5S0Jsv22269; Fri, 27 Jun 2003 17:19:54 -0700 Date: Fri, 27 Jun 2003 17:19:54 -0700 From: Larry McVoy To: "David S. Miller" Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030628001954.GD18676@work.bitmover.com> Mail-Followup-To: Larry McVoy , "David S. Miller" , greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030627.170022.74744550.davem@redhat.com> User-Agent: Mutt/1.4i X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-MailScanner-SpamCheck: not spam (whitelisted), SpamAssassin (score=0.5, required 7, AWL, DATE_IN_PAST_06_12) X-archive-position: 3598 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lm@bitmover.com Precedence: bulk X-list: netdev On Fri, Jun 27, 2003 at 05:00:22PM -0700, David S. Miller wrote: > From: Ben Greear > Date: Fri, 27 Jun 2003 15:36:30 -0700 > > So, you'd be happy so long as bugz sent mail to the netdev mailing > lists instead of to you? > > The best power I have to scale is the delete key in my email > reader, when I delete an email it's gone and that's it. > > bugme bugs don't have this attribute, they are like emails that > persist forever until someone does something about them, and this is > the big problem I have with it. I've proposed this before and nobody listened but maybe this time... I think what you want is a bug database which distinguishes between filed bugs and reviewed bugs. You want to capture all bug reports, as Alan says (he's right, there is no question about it, you need to capture the data). You also want an *automatic* way for bugs to just rot. Anyone can file a bug but unless someone with expertise in the area reviews the bug and agrees to do something about it, the bug rots. It's level 1 (capture) and level 2 (we really need to do something about this some day). Level 1 will have zillions of duplicates and tons of other noise. Level 2 should be a small list, no duplicates, carefully managed. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm From davem@redhat.com Fri Jun 27 17:25:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:25:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0Pr2x024150 for ; Fri, 27 Jun 2003 17:25:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08403; Fri, 27 Jun 2003 17:19:33 -0700 Date: Fri, 27 Jun 2003 17:19:33 -0700 (PDT) Message-Id: <20030627.171933.104040753.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> References: <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3599 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 28 Jun 2003 00:04:30 +0100 it means you can spot patterns, trends I already spot patterns and trends when people retransmit the bug/patch/whatever. As do other people. Frankly, people who aren't willing to maintain their patches and retransmit them to me, do not matter as far as I am concerned. If you don't want to put forth the effort, I do not want to interact with you. I feel the same way about bugs. Linus has been saying this and doing it for years, and I've had to learn the hard way that he's absolutely right in this regard. If you try to track everything, you accomplish nothing. You will, however, get overloaded and frustrated. To scale one must reserve the right to hit the delete key and it's _GONE_ not accumulating somewhere else. We need social engineering. If someone never gets their bug looked at because they post absolute crap bug reports, that's a feature. If people spend all this effort making sense of such reports and fix them _ANYWAYS_ the reporter will never learn to produce high quality bug reports that are more useful to us. That means the scarcest resource we have is being used inefficiently. That same goes for patches, and I've watched over time how this works. This is another reasone that I hate when people privately email me stuff, because I _WILL_ delete it and I _WILL_ lose it. If you post it to the lists, it gets accumulated somewhere but it doesn't clog my mailbox and it doesn't create a backlog for me. It also means that if I'm sipping Mai Tai's in Hawaii other people will see and can react to the report. From davem@redhat.com Fri Jun 27 17:27:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:27:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0Rg2x024457 for ; Fri, 27 Jun 2003 17:27:42 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08415; Fri, 27 Jun 2003 17:21:24 -0700 Date: Fri, 27 Jun 2003 17:21:23 -0700 (PDT) Message-Id: <20030627.172123.78713883.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3600 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 28 Jun 2003 00:08:56 +0100 Tried doing an SQL query or text analysis for similarities on random messages lurking in private mailboxes I respond to private reports with "please send this to the lists, what if I were on vacation for the next month?" I never actually process or analyze such reports. From lm@bitmover.com Fri Jun 27 17:32:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:32:37 -0700 (PDT) Received: from smtp.bitmover.com (smtp.bitmover.com [192.132.92.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0WV2x024779 for ; Fri, 27 Jun 2003 17:32:32 -0700 Received: from work.bitmover.com (ipcop.bitmover.com [192.132.92.15]) by smtp.bitmover.com (8.12.9/8.12.9) with ESMTP id h5S8Xim7010524; Sat, 28 Jun 2003 01:33:44 -0700 Received: (from lm@localhost) by work.bitmover.com (8.11.6/8.11.6) id h5S0WIu22508; Fri, 27 Jun 2003 17:32:18 -0700 Date: Fri, 27 Jun 2003 17:32:18 -0700 From: Larry McVoy To: Ben Collins Cc: Andrew Morton , davidel@xmailserver.org, davem@redhat.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030628003218.GE18676@work.bitmover.com> Mail-Followup-To: Larry McVoy , Ben Collins , Andrew Morton , davidel@xmailserver.org, davem@redhat.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> <20030627223024.GT501@phunnypharm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030627223024.GT501@phunnypharm.org> User-Agent: Mutt/1.4i X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-MailScanner-SpamCheck: not spam (whitelisted), SpamAssassin (score=0.5, required 7, AWL, DATE_IN_PAST_06_12) X-archive-position: 3601 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lm@bitmover.com Precedence: bulk X-list: netdev On Fri, Jun 27, 2003 at 06:30:24PM -0400, Ben Collins wrote: > > - The bugs which are affecting people the most get reported the most. > > Not to mention the "breeding" affect. A bug that many people have seen > only once, but can never pinpoint because they can't reproduce it. One > of those people reports the problem to the mailing list, and suddenly > half a dozen respond with "me too, but here's some extra info that I > saw". You can't get that with a bug database. I can't believe that I'm dumb enough to ask this given the BK experience. We've built BugDB technology and we're quite interested in trying to make a system that works for engineers as well as managers. All that DB crud is great for managers who want metrics but engineers want an easy way to deal with the bugs. For example, an email interface. Our bugdb already has that, the emails include a URL so you can go look at that and do stuff to it but you can also reply to the email and do everything through the email interface. An NNTP interface is in the works. Is there any interest in having us mirror the bugzilla DB and work on making an interface that works for people with different needs? I had already assumed that I'd get hissed out of the room if I proposed this so feel free to say no if that's what you want. On the other hand, this one is maybe easier to swallow than BK because the interfaces are standard protocols (SMTP, HTTP, NNTP and maybe IMAP or POP some day) so you don't have to put your fingers on any evil BitMover software to get at it. If you do want us to look at this then I'd suggest that you elect someone to come up with a proposal that the community finds acceptable, i.e., if you use it then we have to do some stuff like - free access for everyone - data exported in CSV form so other people can get at it - ??? If you say you want it then we have to figure out some way that the community is happy up front. I'd suggest that Alan define the relationship, he has credibility, he doesn't like BK, he's smart enough to not get talked into something unreasonable, etc. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm From mbligh@aracnet.com Fri Jun 27 17:35:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:35:31 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0ZQ2x025101 for ; Fri, 27 Jun 2003 17:35:26 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5S0Ywgd020477; Fri, 27 Jun 2003 17:35:01 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5S0R9fg022551; Fri, 27 Jun 2003 17:27:13 -0700 Date: Fri, 27 Jun 2003 17:27:09 -0700 From: "Martin J. Bligh" To: Larry McVoy , "David S. Miller" cc: greearb@candelatech.com, davidel@xmailserver.org, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <34700000.1056760028@[10.10.2.4]> In-Reply-To: <20030628001954.GD18676@work.bitmover.com> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> <20030628001954.GD18676@work.bitmover.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3602 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --Larry McVoy wrote (on Friday, June 27, 2003 17:19:54 -0700): > On Fri, Jun 27, 2003 at 05:00:22PM -0700, David S. Miller wrote: >> From: Ben Greear >> Date: Fri, 27 Jun 2003 15:36:30 -0700 >> >> So, you'd be happy so long as bugz sent mail to the netdev mailing >> lists instead of to you? >> >> The best power I have to scale is the delete key in my email >> reader, when I delete an email it's gone and that's it. >> >> bugme bugs don't have this attribute, they are like emails that >> persist forever until someone does something about them, and this is >> the big problem I have with it. > > I've proposed this before and nobody listened but maybe this time... > > I think what you want is a bug database which distinguishes between > filed bugs and reviewed bugs. You want to capture all bug reports, > as Alan says (he's right, there is no question about it, you need to > capture the data). You also want an *automatic* way for bugs to just > rot. Anyone can file a bug but unless someone with expertise in the > area reviews the bug and agrees to do something about it, the bug rots. > > It's level 1 (capture) and level 2 (we really need to do something about > this some day). Level 1 will have zillions of duplicates and tons of > other noise. Level 2 should be a small list, no duplicates, carefully > managed. That's a trivial change to make if you want it. we just add a "reviewed" / "certified" state between "new" and "assigned". Yes, might be a good idea. I'm not actually that convinced that "assigned" is overly useful in the context of open-source, but that's a separate discussion. I'm hoping to get a discussion going at Kernel Summit / OLS on how people want this to evolve, I'll add this one to the list ... thanks. M. From mbligh@aracnet.com Fri Jun 27 17:35:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:35:34 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0ZT2x025106 for ; Fri, 27 Jun 2003 17:35:29 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5S0Ywgh020477; Fri, 27 Jun 2003 17:35:04 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5S0Fofg022522; Fri, 27 Jun 2003 17:15:55 -0700 Date: Fri, 27 Jun 2003 17:15:50 -0700 From: "Martin J. Bligh" To: "David S. Miller" , greearb@candelatech.com cc: davidel@xmailserver.org, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <34180000.1056759349@[10.10.2.4]> In-Reply-To: <20030627.170022.74744550.davem@redhat.com> References: <3EFCC1EB.2070904@candelatech.com><20030627.151906.102571486.davem@redhat.com><3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3603 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --"David S. Miller" wrote (on Friday, June 27, 2003 17:00:22 -0700): > From: Ben Greear > Date: Fri, 27 Jun 2003 15:36:30 -0700 > > So, you'd be happy so long as bugz sent mail to the netdev mailing > lists instead of to you? > > The best power I have to scale is the delete key in my email > reader, when I delete an email it's gone and that's it. > > bugme bugs don't have this attribute, they are like emails that > persist forever until someone does something about them, and this is > the big problem I have with it. Right ... but if bugs were sent to netdev or whatever, you'd get something similar to what you have today, as long as *you* don't go looking in bugme (which you've made it clear you won't). Other people seem to find this useful, and they can still go use it if they like. So presumably that'd work out OK? M. From davem@redhat.com Fri Jun 27 17:39:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:39:12 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0d82x025723 for ; Fri, 27 Jun 2003 17:39:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08501; Fri, 27 Jun 2003 17:32:51 -0700 Date: Fri, 27 Jun 2003 17:32:50 -0700 (PDT) Message-Id: <20030627.173250.85405799.davem@redhat.com> To: mbligh@aracnet.com Cc: greearb@candelatech.com, davidel@xmailserver.org, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <34180000.1056759349@[10.10.2.4]> References: <3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> <34180000.1056759349@[10.10.2.4]> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3604 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Martin J. Bligh" Date: Fri, 27 Jun 2003 17:15:50 -0700 So presumably that'd work out OK? Yes, people can go live in their own bugme world if they want to, I can't force people not to use it. But a bug database that the actual maintainers refuse to use seems quite pointless to me. From mbligh@aracnet.com Fri Jun 27 17:39:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:39:15 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0d92x025728 for ; Fri, 27 Jun 2003 17:39:10 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5S0cdga024951; Fri, 27 Jun 2003 17:38:40 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5S0cjfg022579; Fri, 27 Jun 2003 17:38:49 -0700 Date: Fri, 27 Jun 2003 17:38:45 -0700 From: "Martin J. Bligh" To: Andrew Morton , Ben Collins cc: davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <35240000.1056760723@[10.10.2.4]> In-Reply-To: <20030627162527.714091ce.akpm@digeo.com> References: <20030626.224739.88478624.davem@redhat.com><21740000.1056724453@[10.10.2.4]><20030627.143738.41641928.davem@redhat.com><20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3605 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev > I also. The bug database tries to convert the traditional many<->many > debugging process into a one<->one process. This surely results in a > lower cleanup rate. I think your suggestion of sending new bugs out to LKML has made a big dent in the one<->one problem already. Replacing all the default owner fields with mailing lists (either existing ones or new ones) instead of individuals would be another step in that direction, though there may be a few hurdles to deal with on the way to that. Yes, we probably also need an "email back in" interface as we've discussed before to take it up to many-many. > It is nice to have a record. But bugzilla is not a comfortable or > productive environment within which to drill down into and fix problems. OK ... But I'd rather try to fix it than to throw the baby out with the bath water. I don't believe it's "unfixable" - the concept of tracking bugs / problems and making sure they're closed out still seems sound to me. As an example, I've seen several examples already where I've pestered people about bugs that already had patches attatched to them that resulted in "oh, yeah, I forgot to actually submit that", and it's got fixes back into mainline. I find it somewhat hard to believe that just about every other big project (including open source ones) uses some form of bug tracking system, and yet Linux is somehow magically different ;-) M. From davem@redhat.com Fri Jun 27 17:50:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 17:51:03 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S0ov2x026385 for ; Fri, 27 Jun 2003 17:50:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id RAA08545; Fri, 27 Jun 2003 17:44:41 -0700 Date: Fri, 27 Jun 2003 17:44:40 -0700 (PDT) Message-Id: <20030627.174440.59660428.davem@redhat.com> To: lm@bitmover.com Cc: mbligh@aracnet.com, greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <20030627225305.GA13785@work.bitmover.com> References: <20030627.144426.71096593.davem@redhat.com> <1230000.1056754041@[10.10.2.4]> <20030627225305.GA13785@work.bitmover.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3606 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Larry McVoy Date: Fri, 27 Jun 2003 15:53:05 -0700 - one key observation: let bugs "expire" much like news expires. If nobody has been whining enough that it gets into the high signal bug db then it probably isn't real. We really want a way where no activity means let it expire. I want more than time based expiry, I want expiry for me that is controlled by me. When I delete the notification email in my mailbox, I never want to see that bug again unless I want to. This effectively degrades into list posting based bug reports and my current email inbox, which is what I'm advocating to use :-) When I see the "me too, heres some more info" response to the list posting, then I'm interested and I'll reread the list thread to digest all the information to see what I can make of it. When this happens bugs basically fix themselves, and this occurs only because of the acts taken on by the reporters of the bug not me. From jgarzik@pobox.com Fri Jun 27 18:08:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 18:08:49 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S18d2x026816 for ; Fri, 27 Jun 2003 18:08:40 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19W4Ci-0005yF-Vw; Sat, 28 Jun 2003 02:08:37 +0100 Message-ID: <3EFCEA8A.5010806@pobox.com> Date: Fri, 27 Jun 2003 21:08:26 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: David Dillow CC: Netdev Subject: Re: [BK] Typhoon net driver fixes for 2.5 References: <1056689571.8679.42.camel@ori.thedillows.org> In-Reply-To: <1056689571.8679.42.camel@ori.thedillows.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3607 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev pulled for 2.4 and 2.5 From mbligh@aracnet.com Fri Jun 27 19:13:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 19:13:57 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S2Dl2x027660 for ; Fri, 27 Jun 2003 19:13:48 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5S2DJga021600; Fri, 27 Jun 2003 19:13:19 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5S2DOfg022851; Fri, 27 Jun 2003 19:13:29 -0700 Date: Fri, 27 Jun 2003 19:13:24 -0700 From: "Martin J. Bligh" To: Andrew Morton cc: bcollins@debian.org, davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <36630000.1056766403@[10.10.2.4]> In-Reply-To: <20030627181432.61bf6f3a.akpm@digeo.com> References: <20030626.224739.88478624.davem@redhat.com><21740000.1056724453@[10.10.2.4]><20030627.143738.41641928.davem@redhat.com><20030627213153.GR501@phunnypharm.org><20030627162527.714091ce.akpm@digeo.com><35240000.1056760723@[10.10.2.4]> <20030627181432.61bf6f3a.akpm@digeo.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3608 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --Andrew Morton wrote (on Friday, June 27, 2003 18:14:32 -0700): > "Martin J. Bligh" wrote: >> >> I think your suggestion of sending new bugs out to LKML has made a big >> dent in the one<->one problem already. Replacing all the default owner >> fields with mailing lists (either existing ones or new ones) instead of >> individuals would be another step in that direction, though there may >> be a few hurdles to deal with on the way to that. >> >> Yes, we probably also need an "email back in" interface as we've >> discussed before to take it up to many-many. > > Both these things would help heaps - the tracking system then > becomes invisible, basically. The best of both. Can we make it so? The answer to both is yes, but one's harder than the other ;-) 1. default owners -> lists: Setting default owners to existing lists is somewhat invasive, and might provoke riots ;-) Not only do you get the new bug notification, but also any updates, which may become irritating. There's probably some vaguely happy medium to be found between: a) sending newly logged bugs to existing lists, b) sending updates to some new list. Maybe if we just create a new list for each category, and let people subscribe at will to those ... and I keep sending newly logged bugs to linux-kernel? I can cc netdev / linux-scsi / whatever on those new ones if that helps? That seems reasonably helpful and non-invasive to people who don't want to see it to me. People who like the mailing lists will see the new bug reports, and can just delete and forget them (as now). I'll go with the consensus of opinion (ha!) on this ... I'd like to make it useful without getting lynched ;-) Using new lists makes it less intrusive. Any way we go here is fairly easy to set up. 2. email back in. Email back in is harder, and needs more thought as to how to make it easy to use, whilst avoiding logging crap (eg. ensuing flamewars that derive from the bug reports, etc). My intuition is to log replies by default, and hack off certain threads by hand by keeping track of replies-to headers or something. Not desperately enamoured with that, but it's the best I can think of, off the top of my head. Open to other ideas ... Anyway, that bit is definitely a longer term project (ie not going to happen next week, but maybe in a few weeks). M. From hadi@shell.cyberus.ca Fri Jun 27 19:21:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 19:22:03 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S2Lt2x028463 for ; Fri, 27 Jun 2003 19:21:55 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19W5L8-000NYW-0F; Fri, 27 Jun 2003 22:21:22 -0400 Date: Fri, 27 Jun 2003 22:21:21 -0400 (EDT) From: Jamal Hadi To: James Carlson cc: "David S. Miller" , rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP In-Reply-To: <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> Message-ID: <20030627213846.V90398@shell.cyberus.ca> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3609 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Fri, 27 Jun 2003, James Carlson wrote: > Jamal Hadi writes: > > So what about packet being loss? Wouldnt that ensure reordering? > > Please explain. What pattern of loss possibly results in one packet > being inserted in the stream ahead of another? > > Here's loss: 1 2 4 5 6 > > Here's reordering: 1 2 4 3 5 6 > > Loss preserves ordering. To get misordering, you have to > intentionally hold onto a message and reinsert it later. What I've And thats what i was implying. In your above example: 1 2 4 5 6 If the entity above the wire cared about packet 3 there will be a retransmit. so it becomes: 1 2 4 5 6 3 I suppose if you can ensure ordering with a retransmit by having a window of size 1 clocked by ACKs. cheers, jamal From hadi@shell.cyberus.ca Fri Jun 27 20:28:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 20:28:22 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S3SG2x029217 for ; Fri, 27 Jun 2003 20:28:17 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19W6NO-000Na9-9b; Fri, 27 Jun 2003 23:27:46 -0400 Date: Fri, 27 Jun 2003 23:27:46 -0400 (EDT) From: Jamal Hadi To: "Martin J. Bligh" cc: Andrew Morton , bcollins@debian.org, davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org In-Reply-To: <36630000.1056766403@[10.10.2.4]> Message-ID: <20030627224649.E90398@shell.cyberus.ca> References: <20030626.224739.88478624.davem@redhat.com><21740000.1056724453@[10.10.2.4]><20030627.143738.41641928.davem@redhat.com><20030627213153.GR501@phunnypharm.org><20030627162527.714091ce.akpm@digeo.com><35240000.1056760723@[10.10.2.4]> <20030627181432.61bf6f3a.akpm@digeo.com> <36630000.1056766403@[10.10.2.4]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3610 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev I think what you need to ensure is a "push" operation with retransmits. Obvioulsy the "pull"ing of Dave to bugzilla hasnt worked (otherwise this discussion wouldnt be happening). Bug trackers have never worked for me either - in my current day job i am now passed notifications of every bugzuilla opened. I actually asked for this because i hate checking bugzilla. Over time heres what happened: month 0-3: Read the whole thing and called the owner. month 4-5: Spent about a 30 second glance and may email somebody month 6-9: Spend about 30 secs and archive them in a separate maibox month 9-12: fuck this shit. procmail the whole thing. That what procmail is for! Maybe someday over a fine cup of Tim Hortons French Vanilla cappucino and donut i'll go over that list and read them all- if only we had krispy creme donuts to go with Tim hortons coffee then i am sure i will read them ;-> The truth is i drink Tim Hortons capucino but still dont read the damn bugzilla mailbox. But i have it just in case i need it ... I can almost swear this is what will happen when you start ccing Dave on bugzillas. If you think of Dave as a server then the most reliable protocol is to retransmit. Under resource constraint he dumps packets (that del key). Add another server - Alexey - and broadcast to both via netdev and you start to scale. I dont think retransmission by a robot would work well either since it misses that human touch. So you have a challenging task. cheers, jamal From yoshfuji@linux-ipv6.org Fri Jun 27 21:04:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 21:04:58 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S44l2x029707 for ; Fri, 27 Jun 2003 21:04:48 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5S462Bo017214; Sat, 28 Jun 2003 13:06:03 +0900 Date: Sat, 28 Jun 2003 13:06:02 +0900 (JST) Message-Id: <20030628.130602.63704890.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: krkumar@us.ibm.com, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030627.144752.78715628.davem@redhat.com> References: <20030626.230727.35666164.davem@redhat.com> <3EFC668F.9010004@us.ibm.com> <20030627.144752.78715628.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3611 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030627.144752.78715628.davem@redhat.com> (at Fri, 27 Jun 2003 14:47:52 -0700 (PDT)), "David S. Miller" says: > From: Krishna Kumar > Date: Fri, 27 Jun 2003 08:45:19 -0700 > > rtnetlink_rcv_msg() calls dumpit() (via netlink_dump_start) only > for those messages for which the last two bits are binary '10'. So > I had to use these values. All the other *GET* macros use the same > semantics. > > Ok, please retransmit your two patches (2.4.x and 2.5.x) to me > under seperate cover. I don't keep a copy around of patches > I've decided not to apply. Well... 1. is it okay to have another hook for garbbig prefix list? Userspace application can get such information via - routing table - interface flag 2. is the "managed" flags etc, which is per interface variable, really NEWROUTE information? It is NOT L2 thing, but it is per-link information. I think it is NEWLINK thing. What I'm thinking is: - fix "ADDRCONF" flag in route information - manage / other flags via NEWLINK message (- No new interface to get prefix itself.) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From zwane@linuxpower.ca Fri Jun 27 22:58:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 22:59:00 -0700 (PDT) Received: from hemi.commfireservices.com ([66.212.224.118]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S5wt2x030704 for ; Fri, 27 Jun 2003 22:58:55 -0700 Received: from montezuma.mastecende.com (cuda.commfireservices.com [24.203.207.204]) by hemi.commfireservices.com (Postfix) with ESMTP id A5875BC51; Sat, 28 Jun 2003 01:48:52 -0400 (EDT) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by montezuma.mastecende.com (8.12.8/8.12.8) with ESMTP id h5S5lL4c017468; Sat, 28 Jun 2003 01:47:22 -0400 Date: Sat, 28 Jun 2003 01:47:21 -0400 (EDT) From: Zwane Mwaikambo X-X-Sender: zwane@montezuma.mastecende.com To: "Feldman, Scott" Cc: netdev@oss.sgi.com Subject: RE: e1000 lockup with port io type reset In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3612 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zwane@linuxpower.ca Precedence: bulk X-list: netdev On Fri, 27 Jun 2003, Feldman, Scott wrote: > > The following card causes a hard lockup when we do a controller > > reset using port io instead of mmio. Switching to mmio > > controller reset causes it to function as expected > > Please send me a description of system (model, kernel, DMI decode, etc), > plus dumps from lspci -vv -xxx and ethtool -d ethX. If we can repro > this, we should be able to get a bus trace. Thanks, i just have to wait on the dmi decode. I'll have it to you ASAP. Zwane -- function.linuxpower.ca From mbligh@aracnet.com Fri Jun 27 23:08:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 27 Jun 2003 23:08:40 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S68Y2x031128 for ; Fri, 27 Jun 2003 23:08:34 -0700 Received: from groan (216-99-192-35.dial.spiritone.com [216.99.192.35]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5S686ga001343; Fri, 27 Jun 2003 23:08:06 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5S68Cfg023454; Fri, 27 Jun 2003 23:08:17 -0700 Date: Fri, 27 Jun 2003 23:08:12 -0700 From: "Martin J. Bligh" To: Andrew Morton cc: bcollins@debian.org, davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <37590000.1056780491@[10.10.2.4]> In-Reply-To: <20030627193521.25040f3e.akpm@digeo.com> References: <20030626.224739.88478624.davem@redhat.com><21740000.1056724453@[10.10.2.4]><20030627.143738.41641928.davem@redhat.com><20030627213153.GR501@phunnypharm.org><20030627162527.714091ce.akpm@digeo.com><35240000.1056760723@[10.10.2.4]><20030627181432.61bf6f3a.akpm@digeo.com><36630000.1056766403@[10.10.2.4]> <20030627193521.25040f3e.akpm@digeo.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3613 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev > If some low-value stuff leaks through then ho-hum, at least it was > on-topic. It is not as if we are unused to low-value content... ;-) > It would be good if pure administrata such as changing the status were > filtered. That should be easy enough. > In fact, there is probably no point in sending anything bugzilla->list apart > from the initial report. If the bug is then pursued via bugzilla then OK. > If is is pursued via email then bugzilla just captures the discussion. OK, but I'm pretty much doing that already. I try to filter out some of the "bugs with no content". So it sounds like the issue is more the loop from email back in. Will see what I can get done - have to schedule some time from the admins. >> 2. email back in. >> >> Email back in is harder, and needs more thought as to how to make it >> easy to use, whilst avoiding logging crap (eg. ensuing flamewars that >> derive from the bug reports, etc). > > Well hopefully people will have the sense to cut the bugzilla address off > the Cc line if it drifts off-topic. Fairy nuff. >> My intuition is to log replies by >> default, and hack off certain threads by hand > > Nah. Just log everything and hack off the crap by larting people. Heh. need to get a good "remote slap protocol" implemented. Perhaps the net guys can write us an RFC for it ;-) M. From davem@redhat.com Sat Jun 28 00:27:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 00:27:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S7RC2x000328 for ; Sat, 28 Jun 2003 00:27:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA09498; Sat, 28 Jun 2003 00:20:55 -0700 Date: Sat, 28 Jun 2003 00:20:55 -0700 (PDT) Message-Id: <20030628.002055.38707658.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Fix PPP async regression From: "David S. Miller" In-Reply-To: <20030627162818.606ac706.shemminger@osdl.org> References: <20030627162818.606ac706.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3614 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 27 Jun 2003 16:28:18 -0700 Please apply this patch, it fixes the PPP over async regression that the PPPoE changes caused. Thanks a lot Stephen, applied. From davem@redhat.com Sat Jun 28 00:27:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 00:27:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S7Rj2x000401 for ; Sat, 28 Jun 2003 00:27:45 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA09510; Sat, 28 Jun 2003 00:21:29 -0700 Date: Sat, 28 Jun 2003 00:21:29 -0700 (PDT) Message-Id: <20030628.002129.35022526.davem@redhat.com> To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH] PPP handling fragmented skbuff's From: "David S. Miller" In-Reply-To: <20030627163524.347b2c8e.shemminger@osdl.org> References: <20030627163524.347b2c8e.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3615 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 27 Jun 2003 16:35:24 -0700 Don't think this ever happens today, but if PPP ever gets a fragmented a skbuff and decides to copy it then bad things will happen. The following replaces the places that memcpy() with skb_copy_bits(). Please review carefully before applying, it builds and runs but can't really force these code path to occur under normal systems and devices. It looks ok. But I'll let this one sit over the weekend before applying so others can test it out. From john@grabjohn.com Sat Jun 28 00:52:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 00:52:43 -0700 (PDT) Received: from 81-2-122-30.bradfords.org.uk (81-2-122-30.bradfords.org.uk [81.2.122.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S7qY2x001075 for ; Sat, 28 Jun 2003 00:52:35 -0700 Received: from 81-2-122-30.bradfords.org.uk (localhost [127.0.0.1]) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9) with ESMTP id h5S80dUo000477; Sat, 28 Jun 2003 09:00:39 +0100 Received: (from john@localhost) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9/Submit) id h5S80dab000476; Sat, 28 Jun 2003 09:00:39 +0100 Date: Sat, 28 Jun 2003 09:00:39 +0100 From: John Bradford Message-Id: <200306280800.h5S80dab000476@81-2-122-30.bradfords.org.uk> To: lm@bitmover.com, mbligh@aracnet.com Subject: Re: networking bugs and bugme.osdl.org Cc: davem@redhat.com, greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com X-archive-position: 3616 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@grabjohn.com Precedence: bulk X-list: netdev > > > It would also keep bugs from falling through the cracks: > > > > > > People DON'T understand. I _WANT_ them to be able to > > > fall through the cracks. > > > > I fail to see your point here. > > This might help. Or not. > > Brain dump on the bug tracking problem from the Kernel Summit discussions I implemented the vast majority of this months ago, in my bug database: http://www.grabjohn.com/kernelbugdatabase/ > [SCCS/s.BUGS vers 1.3 2001/04/05 13:10:10] > > Outline > Problems > Problem details > Past experiences > Requirements > > Problems > - getting quality bug reports > - not losing any bugs > - sorting low signal vs high signal into a smaller high signal pile > - simplified, preferably NNTP, access to the bug database (Linus > would use this; he's unlikely to use anything else) > > Problem details > Bug report quality > There was lots of discussion on this. The main agreement was that we > wanted the bug reporting system to dig out as much info as possible > and prefill that. There was a lot of discussion about possible tools > that would dig out the /proc/pci info; there was discussion about > Andre's tools which can tell you if you can write your disk; someone > else had something similar. This is controversial, due to the potential for unwanted information disclosure. I purposely didn't implement it. If a large proportion of users want it implemented, just let me know. > But the main thing was to extract all the info we could > automatically. One thing was the machine config (hardware and > at least kernel version). The other thing was extract any oops > messages and get a stack traceback. The (fairly complex) way kernel tree version numbers are implemented is very well handled. Different trees can be added to the database, using an admin utility, (which is not currently publically accessible), and they are categorised. Currently we have 2.4 and 2.5 mainline, 2.4 and 2.5 -ac, and 2.5 -dj trees in the database. All version numbers are sorted properly with -pre and -rc coming before the release. > The other main thing was to define some sort of structure to the > bug report and try and get the use to categorize if they could. > In an ideal world, we would use the maintainers file Did that since version 1.0. > and the > stack traceback to cc the bug to the maintainer. I think we want > to explore this a bit. I'm not sure that the maintainer file is > the way to go, what if we divided it up into much broader chunks > like "fs", "vm", "network drivers", and had a mail forwarder > for each area. That could fan out to the maintainers. No problem. The admin utility can scan any file which is in the same format as the current maintainers file. Just prepare and upload one. > Not losing bugs > While there was much discussion about how to get rid of bad, > incorrect, and/or duplicate bug reports, several people - Alan > in particular - made the point that having a complete collection > of all bug reports was important. You can do data mining across > all/part of them and look for patterns. The point was that there > is some useful signal amongst all the noise so we do not want to > lose that signal. Done since version 2.0. We have bug reports, and confirmed bugs. Bug reports are archived after 2 weeks of inactivity, (or should be, I introduced a bug recently which stopped that working, but I'll fix that at the earliest opportunity). Anybody can add a bug report, and they are all archived. Confirmed bugs can only be added by admins, and collect together bug reports. > Signal/noise > We had a lot of discussion about how to deal with signal/noise. > The bugzilla proponents thought we could do this with some additional > hacking to bugzilla. I, given the BitKeeper background, thought > that we could do this by having two databases, one with all the > crud in it and another with just the screened bugs in it. See above - done since version 2.0. > No matter > how it is done, there needs to be some way to both keep a full list, > which will likely be used only for data mining, and another, much > smaller list of screened bugs. Confirmed bugs VS bug reports. > Jens wants there to be a queue of > new bugs and a mechanism where people can come in the morning, pull > a pile of bugs off of the queue, sort them, sending some to the real > database. No problem - just deselect 'include bug reports', and 'include archived entries', and click 'All entries'. Then, (you need to have an admin account for this), select 'open a new confirmed bug'. Add the bug reports to this confirmed bug. > This idea has a lot of merit, it needs some pondering as > DaveM would say, to get to the point that we have a workable mechanism > which works in a distributed fashion. > > The other key point seemed to be that if nobody picked up a bug and > nobody said that this bug should be picked up, then the bug expires > out of the pending queue. It gets stashed in the bug archive for > mining purposes and it can be resurrected if it later becomes a real > bug, but the key point seems to be that it _automatically_ disappears > out of the pending queue. I personally am very supportive of this > model. We need some way to just let junk stay junk. If junk has to > be pruned out of the system by humans, the system sucks. The system, > not humans, needs to autoprune. It does autoprune. (OK, there is currently a bug which is preventing it from working, but as I said above, I'll fix that as soon as I get chance to work on it). Bug reports over two weeks old become archived. > Simplified access: browsing and updating > Linus made the point that mailing lists suck. He isn't on any and > refuses to join any. He reads lists with a news reader. I think > people should sit up and listen to that - it's a key point. If your > mailing list isn't gatewayed to a newsgroup, he isn't reading it and > a lot of other people aren't either. > > There was a fair bit of discussion about how to get the bug database > connected to news. There doesn't seem to be any reason that the > bug system couldn't be a news server/gateway. You should be able to > browse > bitbucket.kernel.bugs - all the unscreened crud > screened.kernel.bugs - all bugs which have been screened > fs.kernel.bugs - screened bugs in the "fs" category > ext2.kernel.bugs - screened bugs in the "ext2" category > eepro.kernel.bugs - screened bugs in the "eepro" category > etc. Not yet implemented. Let me know more specifically what you want, and I'll implement it. Note - there _was_ a prefectly good command line/E-Mail interface, but hardly anybody used it, so I removed it. > Furthermore, the bugs should be structured once they are screened, > i.e., they have a set of fields like (this is a strawman): > > Synopsis - one line man-page like summary of the bug Implemented. > Severity - how critical is this bug? > Priority - how soon does it need to be fixed? Not implemented, but trivial to implement if people care. > Category - subsystem in which the bug occurs Implemented via Maintainers file. > Description - details on the bug, oops, stack trace, etc. Implemented. > Hardware - hardware info > Software - kernel version, glibc version, etc. > Suggested fix - any suggestion on how to fix it > Interest list - set of email addresses and/or newsgroups for updates Not implemented, but trivial to implement if people care. > It ought to work that if someone posts a followup to the bug then if > the followup changes any of the fields that gets propagated to the > underlying bug database. If this is done properly the news reader will > be the only interface that most people use. OK, my idea is a bit different - each possibly widely differing bug report is a completely separate entity. You can only add to a bug report, unless you are the submitter, or an admin. Anybody else should add a separate bug report, and have an admin find and connect it to the existing confirmed bug. > Past experiences > This is a catch all for sound bytes that we don't want to forget... > > - Sorting bugs by hand is a pain in the ass (Ted burned out on it and > Alan refuses to say that it is the joy of his life to do it) Bug reports are sorted via the categories in the maintainers file. Anybody can 'watch' any particular subsystem and get notified of all the bug reports that are submitted as included in that subsystem. A bug report can go in to more than one subsystem or none at all. > - bug systems tend to "get in the way". Unless they are really trivial > to submit, search, update then people get tired of using them and go > back to the old way Isn't this exactly what we're seeing with a lot of bugs in the kernel Bugzilla being forwarded to LKML? We shouldn't need that if the bug database is operating satisfactorily. > - one key observation: let bugs "expire" much like news expires. If > nobody has been whining enough that it gets into the high signal > bug db then it probably isn't real. We really want a way where no > activity means let it expire. Done. > - Alan pointed out that having all of the bugs someplace is useful, > you can search through the 200 similar bugs and notice that SMP > is the common feature. Done - just make sure 'include archived entries' is selected. > Requirements > This section is mostly empty, it's here as a catch all for people's > bullet items. > > - it would be very nice to be able to cross reference bugs to bug fixes > in the source management system, as well as the other way around. Larry, if you can provide pointers as to the best way to link to stuff in BK, I'm happy to put that in. > - mail based interface Removed, because nobody used it. You can still see screenshots of it at: http://www.grabjohn.com/kernelbugdatabase/screenshots/ You can find the Kernel Bug Database at: http://www.grabjohn.com/kernelbugdatabase/ or alternatively, the database and documentation are linked to from: http://www.grabjohn.com/ If there are problems with it, or reasons why people hate it, please _let me know_. John. From john@grabjohn.com Sat Jun 28 01:02:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 01:02:41 -0700 (PDT) Received: from 81-2-122-30.bradfords.org.uk (81-2-122-30.bradfords.org.uk [81.2.122.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S82V2x001463 for ; Sat, 28 Jun 2003 01:02:32 -0700 Received: from 81-2-122-30.bradfords.org.uk (localhost [127.0.0.1]) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9) with ESMTP id h5S8AWUo000514; Sat, 28 Jun 2003 09:10:32 +0100 Received: (from john@localhost) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9/Submit) id h5S8AWqi000513; Sat, 28 Jun 2003 09:10:32 +0100 Date: Sat, 28 Jun 2003 09:10:32 +0100 From: John Bradford Message-Id: <200306280810.h5S8AWqi000513@81-2-122-30.bradfords.org.uk> To: john@grabjohn.com, lm@bitmover.com, mbligh@aracnet.com Subject: Re: networking bugs and bugme.osdl.org Cc: davem@redhat.com, greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com X-archive-position: 3617 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@grabjohn.com Precedence: bulk X-list: netdev > > This might help. Or not. > > > > Brain dump on the bug tracking problem from the Kernel Summit discussions > > I implemented the vast majority of this months ago, in my bug database: > > http://www.grabjohn.com/kernelbugdatabase/ > [snip] > > Problem details > > Bug report quality > > There was lots of discussion on this. The main agreement was that we > > wanted the bug reporting system to dig out as much info as possible > > and prefill that. There was a lot of discussion about possible tools > > that would dig out the /proc/pci info; there was discussion about > > Andre's tools which can tell you if you can write your disk; someone > > else had something similar. > > This is controversial, due to the potential for unwanted information > disclosure. I purposely didn't implement it. If a large proportion > of users want it implemented, just let me know. Having said that, I've had a .config uploading and analysing facility since version 1.0. Infact, the reason I forgot to mention it in my first E-Mail, is that it is the core around which the whole Kernel Bug Database operates. The user uploads their .config, and the database finds bugs that might be the one you're experiencing. If so, you add a separate bug report, an admin collects both bug reports in to one confirmed bug, and picks out which config options he wants to flag as causing the confirmed bug. John. From ja@ssi.bg Sat Jun 28 01:59:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 01:59:26 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5S8x52x002037 for ; Sat, 28 Jun 2003 01:59:11 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5S92Uk01777; Sat, 28 Jun 2003 12:02:33 +0300 Date: Sat, 28 Jun 2003 12:02:29 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: "'netdev@oss.sgi.com'" Subject: Re: routing bug report for 2.4 In-Reply-To: <3EFCCC95.3000709@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3618 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, I'll try to reply to some of your posts... On Fri, 27 Jun 2003, Ben Greear wrote: > (This has been discussed with Alexey, but sending to the list for > general consumption). I remember your previous posts, I assume they were skipped because you do not use properly the routing system, you have to use preferred sources in your routes. Now I'm not sure if the case is same. > Here is how to reproduce this: > ifconfig eth1 192.1.1.2 netmask 255.255.255.0 > ifconfig eth2 192.1.2.2 netmask 255.255.255.0 > > > Set up policy based routing with the 'ip' tool to make packets with > source-address of each interface to use the gateway for that interface. > Set gateway for eth1 to be 192.1.1.1 > Set gateway for eth2 to be 192.1.2.1 But not all packets, may be you have to place the source-based rules after table main. This is the recommeneded way. > Now, use ping to try to send pkts from one interface to the other: > > ping -I 192.1.1.2 192.1.2.2 Your report is damn wrong, why do you ping local IP? Or may be that is your test? Trying ping from ip-utils... sorry, not reproducible here (I hope it is the expected result). > You will see arps on eth1 for 192.1.2.2, whereas you should see packets > being sent to the default gateway for eth1. Why? 192.1.2.2 is local IP and the local table is first priority. We should not see any ARP packets for local targets, right? > If you modify the ping source to BINDTODEVICE eth1, then it will send > correctly. I am under the impression that you should not have to specifically > BINDTODEVICE in this case since the policy based routing should take care of > routing things correctly. Or, maybe, the real bug is in ping in that it did > not BINDTODEVICE? Do you really ping local IP? > Also, ping -I eth1 192.1.2.2 will fail to route externally. That may > just be a feature of ping: I'm unsure what the subtle difference is *supposed* > to be between using -I eth1 and -I 1.2.3.4 I think, the root of your problems is that you specify 'ping -I device' and the routing is forced to construct result from unknown route by using source address autoselection. From previous post: > # The other interface on the router machine (same machine as I just pinged above) > [root@localhost root]# ping -I eth1 10.3.2.1 > PING 10.3.2.1 (10.3.2.1) from 10.3.1.4 eth1: 56(84) bytes of data. > From 10.3.1.4 icmp_seq=1 Destination Host Unreachable > From 10.3.1.4 icmp_seq=3 Destination Host Unreachable > > # It is NOT using the default gateway for this traffic, but is instead > # just trying to ARP. > [root@localhost root]# tcpdump -n -i eth1 > tcpdump: listening on eth1 > 11:56:19.788336 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:20.788134 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:21.788149 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:22.788379 arp who-has 10.3.2.1 tell 10.3.1.4 '-I eth1 10.3.2.1' requests route "from 0.0.0.0 to 10.3.2.1 oif eth1". You do not have such routes. I assume the result is (quoting route.c): "Apparently, routing tables are wrong." "Assume, that the destination is on link." For your setup I would say "The request is wrong". You see that the kernel even do not check whether eth1 is UP. You are lucky. Then the kernel autoselects 10.3.1.4 as src for the forced eth1 device. Thus, you see this ARP probe. Later, it seems 10.3.2.1 does not want to reply to 10.3.1.4, I assume this is a known problem? As for ping from iputils: you can specify device or saddr, not the both, so the only valid test for source based routing can be '-I IP'. Do you really need '-I eth1' ? Regards -- Julian Anastasov From yoshfuji@linux-ipv6.org Sat Jun 28 03:09:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 03:09:10 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SA922x006476 for ; Sat, 28 Jun 2003 03:09:03 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5SAAKBo018728; Sat, 28 Jun 2003 19:10:21 +0900 Date: Sat, 28 Jun 2003 19:10:13 +0900 (JST) Message-Id: <20030628.191013.105714999.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com Subject: [PATCH] IPV6: Fixed M-Flag in last fragment From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3619 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. M-Flag was set on last fragment. From yoshfuji@linux-ipv6.org Sat Jun 28 03:10:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 03:10:59 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SAAr2x006714 for ; Sat, 28 Jun 2003 03:10:54 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5SACCBo018756; Sat, 28 Jun 2003 19:12:12 +0900 Date: Sat, 28 Jun 2003 19:12:12 +0900 (JST) Message-Id: <20030628.191212.43739794.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: [PATCH] IPV6: Fixed M-Flag in last fragment From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030628.191013.105714999.yoshfuji@linux-ipv6.org> References: <20030628.191013.105714999.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3620 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030628.191013.105714999.yoshfuji@linux-ipv6.org> (at Sat, 28 Jun 2003 19:10:13 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > Hello. > > M-Flag was set on last fragment. Oops, here's the patch. Thanks. Index: linux-2.5/net/ipv6/ip6_output.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/ip6_output.c,v retrieving revision 1.30 diff -u -r1.30 ip6_output.c --- linux-2.5/net/ipv6/ip6_output.c 24 Jun 2003 21:56:18 -0000 1.30 +++ linux-2.5/net/ipv6/ip6_output.c 28 Jun 2003 08:52:47 -0000 @@ -1004,9 +1004,7 @@ offset += skb->len - hlen - sizeof(struct frag_hdr); fh->nexthdr = nexthdr; fh->reserved = 0; - if (frag->next != NULL) - offset |= 0x0001; - fh->frag_off = htons(offset); + fh->frag_off = htons(offset | (frag->next != NULL ? 0x0001 : 0)); fh->identification = frag_id; frag->nh.ipv6h->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); ip6_copy_metadata(frag, skb); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From Andrew.Morton@digeo.com Sat Jun 28 03:45:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 03:45:28 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SAjH2x007310 for ; Sat, 28 Jun 2003 03:45:17 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 27 Jun 2003 19:35:10 -0700 Date: Fri, 27 Jun 2003 19:35:21 -0700 From: Andrew Morton To: "Martin J. Bligh" Cc: bcollins@debian.org, davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-Id: <20030627193521.25040f3e.akpm@digeo.com> In-Reply-To: <36630000.1056766403@[10.10.2.4]> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> <35240000.1056760723@[10.10.2.4]> <20030627181432.61bf6f3a.akpm@digeo.com> <36630000.1056766403@[10.10.2.4]> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 28 Jun 2003 02:35:10.0646 (UTC) FILETIME=[E5469560:01C33D1D] X-archive-position: 3621 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev "Martin J. Bligh" wrote: > > 1. default owners -> lists: > > Setting default owners to existing lists is somewhat invasive, and > might provoke riots ;-) Not only do you get the new bug notification, > but also any updates, which may become irritating. That's OK. It is a matter of people being aware that the updates will be echoed to a mailing list and acting appropriately. If some low-value stuff leaks through then ho-hum, at least it was on-topic. It is not as if we are unused to low-value content... It would be good if pure administrata such as changing the status were filtered. In fact, there is probably no point in sending anything bugzilla->list apart from the initial report. If the bug is then pursued via bugzilla then OK. If is is pursued via email then bugzilla just captures the discussion. > There's probably > some vaguely happy medium to be found between: > a) sending newly logged bugs to existing lists, > b) sending updates to some new list. > Maybe if we just create a new list for each category, and let > people subscribe at will to those ... and I keep sending newly logged > bugs to linux-kernel? I can cc netdev / linux-scsi / whatever on those > new ones if that helps? I think sending the initial report to the relevant lists and then capturing incoming email would suffice. > 2. email back in. > > Email back in is harder, and needs more thought as to how to make it > easy to use, whilst avoiding logging crap (eg. ensuing flamewars that > derive from the bug reports, etc). Well hopefully people will have the sense to cut the bugzilla address off the Cc line if it drifts off-topic. > My intuition is to log replies by > default, and hack off certain threads by hand Nah. Just log everything and hack off the crap by larting people. From yoshfuji@linux-ipv6.org Sat Jun 28 04:28:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 04:29:00 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SBSr2x008439 for ; Sat, 28 Jun 2003 04:28:54 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5SBUBBo020281; Sat, 28 Jun 2003 20:30:12 +0900 Date: Sat, 28 Jun 2003 20:30:11 +0900 (JST) Message-Id: <20030628.203011.05536524.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: use macro for M-Flag and clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3622 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Use macro IP6_MF for the M-Flag. Clean-up for readability (commented by ). Patch against "[PATCH] IPV6: Fixed M-Flag in last fragment" patch. Thanks. --yoshfuji Index: linux-2.5/include/net/ipv6.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/ipv6.h,v retrieving revision 1.20 diff -u -r1.20 ipv6.h --- linux-2.5/include/net/ipv6.h 24 Jun 2003 21:56:18 -0000 1.20 +++ linux-2.5/include/net/ipv6.h 28 Jun 2003 09:56:10 -0000 @@ -101,6 +101,8 @@ __u32 identification; }; +#define IP6_MF 0x0001 + #ifdef __KERNEL__ #include Index: linux-2.5/net/ipv6/reassembly.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/reassembly.c,v retrieving revision 1.17 diff -u -r1.17 reassembly.c --- linux-2.5/net/ipv6/reassembly.c 24 Jun 2003 21:56:18 -0000 1.17 +++ linux-2.5/net/ipv6/reassembly.c 28 Jun 2003 09:56:10 -0000 @@ -435,7 +435,7 @@ csum_partial(skb->nh.raw, (u8*)(fhdr+1)-skb->nh.raw, 0)); /* Is this the final fragment? */ - if (!(fhdr->frag_off & htons(0x0001))) { + if (!(fhdr->frag_off & htons(IP6_MF))) { /* If we already have some bits beyond end * or have different end, the segment is corrupted. */ --- linux-2.5/net/ipv6/ip6_output.c.orig Sat Jun 28 20:10:18 2003 +++ linux-2.5/net/ipv6/ip6_output.c Sat Jun 28 20:16:02 2003 @@ -984,7 +984,7 @@ ipv6_select_ident(skb, fh); fh->nexthdr = nexthdr; fh->reserved = 0; - fh->frag_off = htons(0x0001); + fh->frag_off = htons(IP6_MF); frag_id = fh->identification; first_len = skb_pagelen(skb); @@ -1004,7 +1004,9 @@ offset += skb->len - hlen - sizeof(struct frag_hdr); fh->nexthdr = nexthdr; fh->reserved = 0; - fh->frag_off = htons(offset | (frag->next != NULL ? 0x0001 : 0)); + fh->frag_off = htons(offset); + if (frag->next != NULL) + fh->frag_off |= htons(IP6_MF); fh->identification = frag_id; frag->nh.ipv6h->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); ip6_copy_metadata(frag, skb); @@ -1111,7 +1113,9 @@ BUG(); left -= len; - fh->frag_off = htons( left > 0 ? (offset | 0x0001) : offset); + fh->frag_off = htons(offset); + if (left > 0) + fh->frag_off |= htons(IP6_MF); frag->nh.ipv6h->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); ptr += len; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From Andrew.Morton@digeo.com Sat Jun 28 06:21:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 06:21:46 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SDLc2x011607 for ; Sat, 28 Jun 2003 06:21:38 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 27 Jun 2003 18:14:21 -0700 Date: Fri, 27 Jun 2003 18:14:32 -0700 From: Andrew Morton To: "Martin J. Bligh" Cc: bcollins@debian.org, davidel@xmailserver.org, davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-Id: <20030627181432.61bf6f3a.akpm@digeo.com> In-Reply-To: <35240000.1056760723@[10.10.2.4]> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> <35240000.1056760723@[10.10.2.4]> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 28 Jun 2003 01:14:22.0035 (UTC) FILETIME=[9B474230:01C33D12] X-archive-position: 3623 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev "Martin J. Bligh" wrote: > > I think your suggestion of sending new bugs out to LKML has made a big > dent in the one<->one problem already. Replacing all the default owner > fields with mailing lists (either existing ones or new ones) instead of > individuals would be another step in that direction, though there may > be a few hurdles to deal with on the way to that. > > Yes, we probably also need an "email back in" interface as we've > discussed before to take it up to many-many. Both these things would help heaps - the tracking system then becomes invisible, basically. The best of both. Can we make it so? From Andrew.Morton@digeo.com Sat Jun 28 06:23:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 06:23:15 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SDNB2x011735 for ; Sat, 28 Jun 2003 06:23:11 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 27 Jun 2003 16:25:18 -0700 Date: Fri, 27 Jun 2003 16:25:27 -0700 From: Andrew Morton To: Ben Collins Cc: davidel@xmailserver.org, davem@redhat.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-Id: <20030627162527.714091ce.akpm@digeo.com> In-Reply-To: <20030627213153.GR501@phunnypharm.org> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 27 Jun 2003 23:25:18.0212 (UTC) FILETIME=[5EDB1C40:01C33D03] X-archive-position: 3624 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Ben Collins wrote: > > I'm with Dave on this one. I also. The bug database tries to convert the traditional many<->many debugging process into a one<->one process. This surely results in a lower cleanup rate. It's irritating replying to a bugzilla entry when you _know_ that you're cutting other interested parties out of the loop. And mailing lists tend to be self-correcting: - The once-off bugs due to broken hardware get filtered away. - The bugs which simply get magically fixed when someone repaired some unrelated part of the kernel get filtered out. - The bugs which are affecting people the most get reported the most. - Lots of other people can chip in with potentially useful info. It is nice to have a record. But bugzilla is not a comfortable or productive environment within which to drill down into and fix problems. From yoshfuji@linux-ipv6.org Sat Jun 28 09:56:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 09:56:16 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SGu52x013536 for ; Sat, 28 Jun 2003 09:56:06 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5SGvOBo028572; Sun, 29 Jun 2003 01:57:24 +0900 Date: Sun, 29 Jun 2003 01:57:23 +0900 (JST) Message-Id: <20030629.015723.75141662.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: convert /proc/net/ip6_flowlabel to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3625 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This converts /proc/net/ip6_flowlabel to seq_file{}. Thanks. ===== net/ipv6/ip6_flowlabel.c 1.2 vs edited ===== --- 1.2/net/ipv6/ip6_flowlabel.c Mon Feb 11 16:06:54 2002 +++ edited/net/ipv6/ip6_flowlabel.c Sun Jun 29 01:44:40 2003 @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -554,66 +555,132 @@ #ifdef CONFIG_PROC_FS +struct ip6fl_iter_state { + int bucket; +}; -static int ip6_fl_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0; - off_t begin=0; - int len=0; - int i, k; - struct ip6_flowlabel *fl; +#define ip6fl_seq_private(seq) ((struct ip6fl_iter_state *)&(seq)->private) - len+= sprintf(buffer,"Label S Owner Users Linger Expires " - "Dst Opt\n"); +static struct ip6_flowlabel *ip6fl_get_first(struct seq_file *seq) +{ + struct ip6_flowlabel *fl = NULL; + struct ip6fl_iter_state *state = ip6fl_seq_private(seq); - read_lock_bh(&ip6_fl_lock); - for (i=0; i<=FL_HASH_MASK; i++) { - for (fl = fl_ht[i]; fl; fl = fl->next) { - len+=sprintf(buffer+len,"%05X %-1d %-6d %-6d %-6d %-8ld ", - (unsigned)ntohl(fl->label), - fl->share, - (unsigned)fl->owner, - atomic_read(&fl->users), - fl->linger/HZ, - (long)(fl->expires - jiffies)/HZ); - - for (k=0; k<16; k++) - len+=sprintf(buffer+len, "%02x", fl->dst.s6_addr[k]); - buffer[len++]=' '; - len+=sprintf(buffer+len, "%-4d", fl->opt ? fl->opt->opt_nflen : 0); - buffer[len++]='\n'; - - pos=begin+len; - if(posoffset+length) - goto done; + for (state->bucket = 0; state->bucket <= FL_HASH_MASK; ++state->bucket) { + if (fl_ht[state->bucket]) { + fl = fl_ht[state->bucket]; + break; } } - *eof = 1; + return fl; +} -done: +static struct ip6_flowlabel *ip6fl_get_next(struct seq_file *seq, struct ip6_flowlabel *fl) +{ + struct ip6fl_iter_state *state = ip6fl_seq_private(seq); + + fl = fl->next; + while (!fl) { + if (++state->bucket <= FL_HASH_MASK) + fl = fl_ht[state->bucket]; + } + return fl; +} + +static struct ip6_flowlabel *ip6fl_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip6_flowlabel *fl = ip6fl_get_first(seq); + if (fl) + while (pos && (fl = ip6fl_get_next(seq, fl)) != NULL) + --pos; + return pos ? NULL : fl; +} + +static void *ip6fl_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock_bh(&ip6_fl_lock); + return *pos ? ip6fl_get_idx(seq, *pos) : (void *)1; +} + +static void *ip6fl_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip6_flowlabel *fl; + + if (v == (void *)1) + fl = ip6fl_get_first(seq); + else + fl = ip6fl_get_next(seq, v); + ++*pos; + return fl; +} + +static void ip6fl_seq_stop(struct seq_file *seq, void *v) +{ read_unlock_bh(&ip6_fl_lock); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; } + +static void ip6fl_fl_seq_show(struct seq_file *seq, struct ip6_flowlabel *fl) +{ + while(fl) { + seq_printf(seq, + "%05X %-1d %-6d %-6d %-6d %-8ld " + "%02x%02x%02x%02x%02x%02x%02x%02x " + "%-4d\n", + (unsigned)ntohl(fl->label), + fl->share, + (unsigned)fl->owner, + atomic_read(&fl->users), + fl->linger/HZ, + (long)(fl->expires - jiffies)/HZ, + NIP6(fl->dst), + fl->opt ? fl->opt->opt_nflen : 0); + fl = fl->next; + } +} + +static int ip6fl_seq_show(struct seq_file *seq, void *v) +{ + if (v == (void *)1) + seq_printf(seq, "Label S Owner Users Linger Expires " + "Dst Opt\n"); + else + ip6fl_fl_seq_show(seq, v); + return 0; +} + +static struct seq_operations ip6fl_seq_ops = { + .start = ip6fl_seq_start, + .next = ip6fl_seq_next, + .stop = ip6fl_seq_stop, + .show = ip6fl_seq_show, +}; + +static int ip6fl_seq_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &ip6fl_seq_ops); +} + +static struct file_operations ip6fl_seq_fops = { + .owner = THIS_MODULE, + .open = ip6fl_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; #endif void ip6_flowlabel_init() { +#ifdef CONFIG_PROC_FS + struct proc_dir_entry *p; +#endif init_timer(&ip6_fl_gc_timer); ip6_fl_gc_timer.function = ip6_fl_gc; #ifdef CONFIG_PROC_FS - create_proc_read_entry("net/ip6_flowlabel", 0, 0, ip6_fl_read_proc, NULL); + p = create_proc_entry("ip6_flowlabel", S_IRUGO, proc_net); + if (p) + p->proc_fops = &ip6fl_seq_fops; #endif } @@ -621,6 +688,6 @@ { del_timer(&ip6_fl_gc_timer); #ifdef CONFIG_PROC_FS - remove_proc_entry("net/ip6_flowlabel", 0); + proc_net_remove("ip6_flowlabel"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From greearb@candelatech.com Sat Jun 28 11:41:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 11:41:17 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SIf42x014468 for ; Sat, 28 Jun 2003 11:41:05 -0700 Received: from candelatech.com (evrtwa1-ar2-4-35-051-050.evrtwa1.dsl-verizon.net [4.35.51.50]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5SIetJa010690; Sat, 28 Jun 2003 11:40:56 -0700 Message-ID: <3EFDE0BC.8040803@candelatech.com> Date: Sat, 28 Jun 2003 11:38:52 -0700 From: Ben Greear Organization: Candela Technologies Inc User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Anastasov , netdev@oss.sgi.com Subject: Re: routing bug report for 2.4 References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3626 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Julian Anastasov wrote: > Hello, > > I'll try to reply to some of your posts... > > On Fri, 27 Jun 2003, Ben Greear wrote: > > >>(This has been discussed with Alexey, but sending to the list for >>general consumption). > > > I remember your previous posts, I assume they were skipped > because you do not use properly the routing system, you have to > use preferred sources in your routes. Now I'm not sure if the case > is same. > > >>Here is how to reproduce this: >>ifconfig eth1 192.1.1.2 netmask 255.255.255.0 >>ifconfig eth2 192.1.2.2 netmask 255.255.255.0 >> >> >>Set up policy based routing with the 'ip' tool to make packets with >>source-address of each interface to use the gateway for that interface. >> Set gateway for eth1 to be 192.1.1.1 >> Set gateway for eth2 to be 192.1.2.1 > > > But not all packets, may be you have to place the source-based > rules after table main. This is the recommeneded way. My test works if I ping the 192.1.2.1 router from the eth1 interface, the issue is that the localness of eth2 over-rides the policy based routing. Also, note that it does work when I BINDTODEVICE on eth1. I had assumed that because I was setting the source IP, and had a specific routing table for that case, then it would use that routing table. In the error case, it is at least partially ignoring that routing table, though not entirely: It is trying to communicate on eth1, but it is arping instead of routing. > > >>Now, use ping to try to send pkts from one interface to the other: >> >>ping -I 192.1.1.2 192.1.2.2 > > > Your report is damn wrong, why do you ping local IP? > Or may be that is your test? Trying ping from ip-utils... sorry, > not reproducible here (I hope it is the expected result). What results do you get? And did you set up policy based routing? I tried ping with RH8, RH9, and downloaded the latest ip-utils I could find. Only when I hacked the ping source to bind to the local IP AND bind specifically to the device did it work. I am trying to ping a local IP but over the external network. It is not something most people try to do now, I am aware. As well as my twisted reasons, it would be good for determining path failures in an HA setup, so it's not completely useless :) > > >>You will see arps on eth1 for 192.1.2.2, whereas you should see packets >>being sent to the default gateway for eth1. > > > Why? 192.1.2.2 is local IP and the local table is first > priority. We should not see any ARP packets for local targets, right? Local table is not used in my case because I specifically bind to the sending IP and have a table specifically for that case. > > >>If you modify the ping source to BINDTODEVICE eth1, then it will send >>correctly. I am under the impression that you should not have to specifically >>BINDTODEVICE in this case since the policy based routing should take care of >>routing things correctly. Or, maybe, the real bug is in ping in that it did >>not BINDTODEVICE? > > > Do you really ping local IP? Yes. > > >>Also, ping -I eth1 192.1.2.2 will fail to route externally. That may >>just be a feature of ping: I'm unsure what the subtle difference is *supposed* >>to be between using -I eth1 and -I 1.2.3.4 > > > I think, the root of your problems is that you specify > 'ping -I device' and the routing is forced to construct result from > unknown route by using source address autoselection. I am open to suggestions as to other ways to make this work: I want to ping from eth1 to eth2, and have at least the echo-request go out over eth1 and be routed back to eth2. > > From previous post: > > >># The other interface on the router machine (same machine as I just pinged above) >>[root@localhost root]# ping -I eth1 10.3.2.1 >>PING 10.3.2.1 (10.3.2.1) from 10.3.1.4 eth1: 56(84) bytes of data. >> From 10.3.1.4 icmp_seq=1 Destination Host Unreachable >> From 10.3.1.4 icmp_seq=3 Destination Host Unreachable >> >># It is NOT using the default gateway for this traffic, but is instead >># just trying to ARP. >>[root@localhost root]# tcpdump -n -i eth1 >>tcpdump: listening on eth1 >>11:56:19.788336 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:20.788134 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:21.788149 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:22.788379 arp who-has 10.3.2.1 tell 10.3.1.4 > > > '-I eth1 10.3.2.1' requests route > "from 0.0.0.0 to 10.3.2.1 oif eth1". You do not have such routes. > I assume the result is (quoting route.c): > "Apparently, routing tables are wrong." > "Assume, that the destination is on link." > > For your setup I would say "The request is wrong". You see that > the kernel even do not check whether eth1 is UP. You are lucky. > > Then the kernel autoselects 10.3.1.4 as src for the forced eth1 device. > Thus, you see this ARP probe. Later, it seems 10.3.2.1 does not > want to reply to 10.3.1.4, I assume this is a known problem? > > As for ping from iputils: you can specify device or saddr, > not the both, so the only valid test for source based routing can > be '-I IP'. Do you really need '-I eth1' ? Actually, from the code I looked at, you can use two -I flags, but what appears to be a bug actually keeps it from working completely (I could find no combo of arguments to make it make the BINDTODEVICE call.) During some of my earlier testing, I had various things wrong. For instance, I noticed that if I had policy-based routing on my router, it would not work correctly. I have not debugged that issue in depth, as it does not really hinder the functionality that I require. If it still doesn't work in 2.6 I'll open a bug ;) One final note, I am running a kernel with a patch that allows external comm over two interfaces on the same machine on the same subnet (with policy based routing). The normal ping works in this case, btw. So, it may be that even if you change ping, it may still not work for you (my patch mostly deals with getting local ARPs to answer correctly, so I am not sure it comes into play in the routed case.) Ben > > Regards > > -- > Julian Anastasov > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From alan@lxorguk.ukuu.org.uk Sat Jun 28 12:22:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 12:22:46 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SJMa2x015129 for ; Sat, 28 Jun 2003 12:22:37 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5SJJZKd006517; Sat, 28 Jun 2003 20:19:36 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5SJJXho006515; Sat, 28 Jun 2003 20:19:33 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030627.172123.78713883.davem@redhat.com> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> <20030627.172123.78713883.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jun 2003 20:19:32 +0100 X-archive-position: 3627 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sad, 2003-06-28 at 01:21, David S. Miller wrote: > From: Alan Cox > Date: 28 Jun 2003 00:08:56 +0100 > > Tried doing an SQL query or text analysis for similarities on random > messages lurking in private mailboxes > > I respond to private reports with "please send this to the lists, > what if I were on vacation for the next month?" I never actually > process or analyze such reports. Which means you miss stuff. Here is an example my tools found yesterday 18 months ago someone with a specific printer reported doing network printing to it crashed the kernel. Lost in the noise, filed in bugzilla, categorised mentally at the time as "weird". Not long ago a second identical report popped up. Different setup, same network printing, similar "it reboots" report. So now I've gone chasing tcpdumps from these. Its a *different* thing to the kind of patch management you are doing, but its only possible because of tools like bugzilla From alan@lxorguk.ukuu.org.uk Sat Jun 28 12:24:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 12:24:03 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SJNw2x015329 for ; Sat, 28 Jun 2003 12:23:59 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5SJKtKd006541; Sat, 28 Jun 2003 20:20:56 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5SJKrEM006539; Sat, 28 Jun 2003 20:20:53 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "Martin J. Bligh" Cc: Larry McVoy , "David S. Miller" , greearb@candelatech.com, davidel@xmailserver.org, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <34700000.1056760028@[10.10.2.4]> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> <20030628001954.GD18676@work.bitmover.com> <34700000.1056760028@[10.10.2.4]> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056828052.6295.31.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jun 2003 20:20:53 +0100 X-archive-position: 3628 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sad, 2003-06-28 at 01:27, Martin J. Bligh wrote: > That's a trivial change to make if you want it. we just add a "reviewed" > / "certified" state between "new" and "assigned". Yes, might be a good > idea. I'm not actually that convinced that "assigned" is overly useful > in the context of open-source, but that's a separate discussion. Most bugzilla's seem to use VERIFIED for this, and it means people who have better things to do can just pull bugs that are verified and/or tagged with "patch" in the attachments From alan@lxorguk.ukuu.org.uk Sat Jun 28 12:30:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 12:30:11 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SJTw2x015763 for ; Sat, 28 Jun 2003 12:29:59 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5SJQrKd006586; Sat, 28 Jun 2003 20:26:53 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5SJQoMj006584; Sat, 28 Jun 2003 20:26:50 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: Larry McVoy Cc: Ben Collins , Andrew Morton , davidel@xmailserver.org, davem@redhat.com, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030628003218.GE18676@work.bitmover.com> References: <20030626.224739.88478624.davem@redhat.com> <21740000.1056724453@[10.10.2.4]> <20030627.143738.41641928.davem@redhat.com> <20030627213153.GR501@phunnypharm.org> <20030627162527.714091ce.akpm@digeo.com> <20030627223024.GT501@phunnypharm.org> <20030628003218.GE18676@work.bitmover.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056828409.6289.39.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jun 2003 20:26:50 +0100 X-archive-position: 3629 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sad, 2003-06-28 at 01:32, Larry McVoy wrote: > Is there any interest in having us mirror the bugzilla DB and work on > making an interface that works for people with different needs? I had > already assumed that I'd get hissed out of the room if I proposed this > so feel free to say no if that's what you want. I already pull chunks of bugzilla data into plain text for processing, the more formats and tools the better. You can do a lot of great things with large sets of bugzilla data when you throw it at a text indexing engine for example. One of my todo list items is to learn enough emacs to play with feeding bugzilla data into the remembrance agent and seeing what happens From nf@hipac.org Sat Jun 28 13:05:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 13:05:21 -0700 (PDT) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [62.67.200.161] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SK5D2x016326 for ; Sat, 28 Jun 2003 13:05:14 -0700 Received: (qmail 4721 invoked from network); 28 Jun 2003 20:05:12 -0000 Received: from unknown (HELO portal.lan) (134300@[80.138.232.235]) (envelope-sender ) by smtprelay02.ispgateway.de (qmail-ldap-1.03) with SMTP for ; 28 Jun 2003 20:05:12 -0000 Received: from hipac.org (tmobile.lan [192.168.0.6]) by portal.lan (Postfix) with ESMTP id 33B9A4B060; Sat, 28 Jun 2003 20:39:32 +0200 (CEST) Message-ID: <3EFDF4DA.80201@hipac.org> Date: Sat, 28 Jun 2003 22:04:42 +0200 From: Michael Bellion and Thomas Heinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1 X-Accept-Language: de, en MIME-Version: 1.0 To: Pekka Savola Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3630 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nf@hipac.org Precedence: bulk X-list: netdev Hi Pekka You wrote: > Looks interesting. Is there experience about this in bridging firewall > scenarios? (With or without external patchset's like > http://ebtables.sourceforge.net/) Sorry for this answer being so late but we wanted to check whether nf-hipac works with the ebtables patch first in order to give you a definite answer. We tried on a sparc64 which was a bad decision because the ebtables patch does not work on sparc64 systems. We are going to test the stuff tomorrow on an i386 and tell you the results afterwards. In principle, nf-hipac should work properly whith the bridge patch. We expect it to work just like iptables apart from the fact that you cannot match on bridge ports. The iptables' in/out interface match in 2.4 works the way that it matches if either in/out dev _or_ in/out physdev. The nf-hipac in/out interface match matches solely on in/out dev. > Further, you mention the performance reasons for this approach. I would > be very interested to see some figures. We have done some performance tests with an older release of nf-hipac. The results are available on http://www.hipac.org/ Apart from that Roberto Nibali did some preliminary testing on nf-hipac. You can find his posting to linux-kernel here: http://marc.theaimsgroup.com/?l=linux-kernel&m=103358029605079&w=2 Since there are currently no performance tests available for the new release we want to encourage people interested in firewall performance evaluation to include nf-hipac in their tests. Regards, +-----------------------+----------------------+ | Michael Bellion | Thomas Heinz | | | | +-----------------------+----------------------+ From ja@ssi.bg Sat Jun 28 13:16:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 13:16:49 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SKGK2x016807 for ; Sat, 28 Jun 2003 13:16:27 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5SKClk03376; Sat, 28 Jun 2003 23:12:47 +0300 Date: Sat, 28 Jun 2003 23:12:47 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: netdev@oss.sgi.com Subject: Re: routing bug report for 2.4 In-Reply-To: <3EFDE0BC.8040803@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3631 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > My test works if I ping the 192.1.2.1 router from the eth1 interface, the issue > is that the localness of eth2 over-rides the policy based routing. ok but be ready for problems if rp_filter is used > Also, note that it does work when I BINDTODEVICE on eth1. I had assumed that > because I was setting the source IP, and had a specific routing table for > that case, then it would use that routing table. In the error case, it is > at least partially ignoring that routing table, though not entirely: It is > trying to communicate on eth1, but it is arping instead of routing. It is arping because '-I device' does not hit your 'from local_IP => 0/0 via remote_GW' route, the kernel can not find route "from 0 to remote_IP oif dev". If you specify '-I local_IP' then it will hit the 'from local_IP' rule that points to your table. See, "Assume, that the destination is on link", it is not gatewayed as you expect. Thus, the ARP probe is resolving target, not the GW. BINDTODEVICE translated to routing request is "oif XXX". As ping can do -I device (and can not specify saddr at the same time) the result is that the device is used (unless target is local), saddr is autoselected (there is no provided saddr) starting from the -I device, there is no GW (the target becomes gw, route is forced onlink), the packet reaches the neighbouring code where ARP sends probe to target (not to GW). > >>Now, use ping to try to send pkts from one interface to the other: > >> > >>ping -I 192.1.1.2 192.1.2.2 > > > > > > Your report is damn wrong, why do you ping local IP? > > Or may be that is your test? Trying ping from ip-utils... sorry, > > not reproducible here (I hope it is the expected result). > > What results do you get? And did you set up policy based routing? Yes, I have tried to simulate your rules and routes but not exactly. In any case, I can not generate ARP traffic when pinging local IP no matter what device I use. The kernel normally overrides the -I option if you talk to local IP, lo is used. It is expected with the plain kernel. > I tried ping with RH8, RH9, and downloaded the latest ip-utils I could > find. Only when I hacked the ping source to bind to the local IP AND bind > specifically to the device did it work. Yes, that will hit the ip rule and will avoid the "lo" cancellation for your patched kernel. > I am trying to ping a local IP but over the external network. It is not something > most people try to do now, I am aware. As well as my twisted reasons, it would > be good for determining path failures in an HA setup, so it's not completely > useless :) I now see that you have patched kernel and this is the reason I can not fully understand your previous postings. The normal kernel can not generate such strange results (I mean the ARP requests when resolving local IP). All your problems do not show kernel bug yet, it seems the problem is hidden in your strategy to support remote local IPs. Or may be you do not have problems with your tests but the plain kernel is suspect for ping insanity? > > Why? 192.1.2.2 is local IP and the local table is first > > priority. We should not see any ARP packets for local targets, right? > > Local table is not used in my case because I specifically bind to the sending IP > and have a table specifically for that case. Not true with the normal kernel, may be your patches avoid selecting dev lo for traffic to local IPs if oif is specified? > > I think, the root of your problems is that you specify > > 'ping -I device' and the routing is forced to construct result from > > unknown route by using source address autoselection. > > I am open to suggestions as to other ways to make this work: I want to ping from eth1 > to eth2, and have at least the echo-request go out over eth1 and be routed back to eth2. I see, this is another problem because you do not mention in your posts that you have patched kernel. > > As for ping from iputils: you can specify device or saddr, > > not the both, so the only valid test for source based routing can > > be '-I IP'. Do you really need '-I eth1' ? > > Actually, from the code I looked at, you can use two -I flags, but what appears > to be a bug actually keeps it from working completely (I could find no combo of arguments > to make it make the BINDTODEVICE call.) I do not see such -I behaviour in ping. I understand that the only way to really avoid the "lo" cancellation and to send traffic with daddr=local_IP is to patch the routing to keep the original device and always to BINDTODEVICE for this reason (-I dev). > During some of my earlier testing, I had various things wrong. For instance, I > noticed that if I had policy-based routing on my router, it would not work correctly. missing preferred sources in routes? > I have not debugged that issue in depth, as it does not really hinder the functionality > that I require. If it still doesn't work in 2.6 I'll open a bug ;) > > One final note, I am running a kernel with a patch that allows external comm over > two interfaces on the same machine on the same subnet (with policy based routing). > The normal ping works in this case, btw. So, it may be that even if > you change ping, it may still not work for you (my patch mostly deals with getting > local ARPs to answer correctly, so I am not sure it comes into play in the routed case.) If you still suspect the kernel may be you can show me fresh link for this patch because I'm not sure it is valid or at least does not break the things. But adding 'I local_IP' together with "-I device" should avoid the wrong ARP probe "where is TARGET", it should be changed to "where is GW". So, IMO, you need to make sure in your tests that: - you have patched ping to support -I device and -I local_IP together - you have preferred source in all your routes Do you still suspect the kernel? > Ben Regards -- Julian Anastasov From ja@ssi.bg Sat Jun 28 13:34:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 13:35:00 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SKYj2x017177 for ; Sat, 28 Jun 2003 13:34:52 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5SKcBk03478; Sat, 28 Jun 2003 23:38:11 +0300 Date: Sat, 28 Jun 2003 23:38:11 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: netdev@oss.sgi.com Subject: Re: routing bug report for 2.4 In-Reply-To: <3EFDE0BC.8040803@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3632 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > What results do you get? And did you set up policy based routing? I now see, the kernel sends "who-has local_IP" when you use 'ping -I device local_IP'. If this is considered bad we can extend the checks when fib_lookup fails: - check for UP state (is it needed? return ENETDOWN?) - check if target IP is local and select "lo" instead of oif Regards -- Julian Anastasov From davem@redhat.com Sat Jun 28 15:09:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 15:10:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SM9v2x018451 for ; Sat, 28 Jun 2003 15:09:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA10749; Sat, 28 Jun 2003 15:03:28 -0700 Date: Sat, 28 Jun 2003 15:03:28 -0700 (PDT) Message-Id: <20030628.150328.74739742.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> References: <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> <20030627.172123.78713883.davem@redhat.com> <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3633 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 28 Jun 2003 20:19:32 +0100 On Sad, 2003-06-28 at 01:21, David S. Miller wrote: > I respond to private reports with "please send this to the lists, > what if I were on vacation for the next month?" I never actually > process or analyze such reports. Which means you miss stuff. Not my problem Alan. If the user gives a crap about their report mattering, they'll do what I ask them to do. If users send their report to the wrong place, it will get lost, just like if their cat their report into /dev/null. I have no reason to feel bad about the information getting lost. If it's too much for them to do as I ask, it's too much for me to consider their report. Bug reporting, just like patch submission, is a 2 way street. From greearb@candelatech.com Sat Jun 28 15:16:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 15:16:12 -0700 (PDT) Received: from grok.yi.org (dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SMG42x018805 for ; Sat, 28 Jun 2003 15:16:05 -0700 Received: from candelatech.com (evrtwa1-ar2-4-35-051-050.evrtwa1.dsl-verizon.net [4.35.51.50]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5SMFtJa005351; Sat, 28 Jun 2003 15:15:57 -0700 Message-ID: <3EFE131E.1080807@candelatech.com> Date: Sat, 28 Jun 2003 15:13:50 -0700 From: Ben Greear Organization: Candela Technologies Inc User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Anastasov CC: netdev@oss.sgi.com Subject: Re: routing bug report for 2.4 References: In-Reply-To: Content-Type: multipart/mixed; boundary="------------010105070505030601090008" X-archive-position: 3634 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------010105070505030601090008 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Julian Anastasov wrote: > Hello, > > On Sat, 28 Jun 2003, Ben Greear wrote: > > >>What results do you get? And did you set up policy based routing? > > > I now see, the kernel sends "who-has local_IP" when you > use 'ping -I device local_IP'. If this is considered bad we can extend > the checks when fib_lookup fails: > > - check for UP state (is it needed? return ENETDOWN?) > - check if target IP is local and select "lo" instead of oif Well, why should it try to route locally in this case (I'm assuming that by using 'lo' it will not try to send on the external link) Why not instead make it send to the router for that source-ip, if it is configured. If it is not configured, then I think arping is the best that can be expected, as the behaviour becomes quite undefined and we really have 'no route to host'. My send-to-self patch that I have been using is attached. I also have some other patches for mac-vlans and packet-gen applied, but I don't believe these will have any impact on the behaviour we have been discussing. There is example code on how to use it (and an original, more crufty patch) here: http://lwn.net/Articles/9897/ Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear --------------010105070505030601090008 Content-Type: text/plain; name="sts.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sts.diff" --- linux-2.4.20/include/linux/sockios.h 2001-11-07 14:39:36.000000000 -0800 +++ linux-2.4.20.c3/include/linux/sockios.h 2003-03-18 14:32:53.000000000 -0800 @@ -65,6 +65,8 @@ #define SIOCDIFADDR 0x8936 /* delete PA address */ #define SIOCSIFHWBROADCAST 0x8937 /* set hardware broadcast addr */ #define SIOCGIFCOUNT 0x8938 /* get number of devices */ +#define SIOCGIFWEIGHT 0x8939 /* get weight of device, in stones */ +#define SIOCSIFWEIGHT 0x893a /* set weight of device, in stones */ #define SIOCGIFBR 0x8940 /* Bridging support */ #define SIOCSIFBR 0x8941 /* Set bridging options */ @@ -92,6 +94,10 @@ #define SIOCGRARP 0x8961 /* get RARP table entry */ #define SIOCSRARP 0x8962 /* set RARP table entry */ +/* MAC address based VLAN control calls */ +#define SIOCGIFMACVLAN 0x8965 /* Mac address multiplex/demultiplex support */ +#define SIOCSIFMACVLAN 0x8966 /* Set macvlan options */ + /* Driver configuration calls */ #define SIOCGIFMAP 0x8970 /* Get device parameters */ @@ -114,6 +120,16 @@ #define SIOCBONDINFOQUERY 0x8994 /* rtn info about bond state */ #define SIOCBONDCHANGEACTIVE 0x8995 /* update to a new active slave */ + +/* Ben's little hack land */ +#define SIOCSACCEPTLOCALADDRS 0x89a0 /* Allow interfaces to accept pkts from + * local interfaces...use with SO_BINDTODEVICE + */ +#define SIOCGACCEPTLOCALADDRS 0x89a1 /* Allow interfaces to accept pkts from + * local interfaces...use with SO_BINDTODEVICE + */ + + /* Device private ioctl calls */ /* --- linux-2.4.20/net/Config.in 2002-08-02 17:39:46.000000000 -0700 +++ linux-2.4.20.c3/net/Config.in 2003-03-18 14:32:53.000000000 -0800 @@ -48,6 +48,7 @@ bool ' Per-VC IP filter kludge' CONFIG_ATM_BR2684_IPFILTER fi fi + tristate 'MAC address based VLANs (EXPERIMENTAL)' CONFIG_MACVLAN fi tristate '802.1Q VLAN Support' CONFIG_VLAN_8021Q --- linux-2.4.20/net/ipv4/arp.c 2002-11-28 15:53:15.000000000 -0800 +++ linux-2.4.20.c3/net/ipv4/arp.c 2003-03-18 14:32:53.000000000 -0800 @@ -1,4 +1,4 @@ -/* linux/net/inet/arp.c +/* linux/net/inet/arp.c -*-linux-c-*- * * Version: $Id: arp.c,v 1.99 2001/08/30 22:55:42 davem Exp $ * @@ -351,12 +351,22 @@ int flag = 0; /*unsigned long now; */ - if (ip_route_output(&rt, sip, tip, 0, 0) < 0) + if (ip_route_output(&rt, sip, tip, 0, 0) < 0) return 1; - if (rt->u.dst.dev != dev) { - NET_INC_STATS_BH(ArpFilter); - flag = 1; - } + + if (rt->u.dst.dev != dev) { + if ((dev->priv_flags & IFF_ACCEPT_LOCAL_ADDRS) && + (rt->u.dst.dev == &loopback_dev)) { + /* OK, we'll let this special case slide, so that we can arp from one + * local interface to another. This seems to work, but could use some + * review. --Ben + */ + } + else { + NET_INC_STATS_BH(ArpFilter); + flag = 1; + } + } ip_rt_put(rt); return flag; } --- linux-2.4.20/net/ipv4/fib_frontend.c 2002-08-02 17:39:46.000000000 -0700 +++ linux-2.4.20.c3/net/ipv4/fib_frontend.c 2003-03-18 14:32:53.000000000 -0800 @@ -233,8 +233,17 @@ if (fib_lookup(&key, &res)) goto last_resort; - if (res.type != RTN_UNICAST) - goto e_inval_res; + + if (res.type != RTN_UNICAST) { + if ((res.type == RTN_LOCAL) && + (dev->priv_flags & IFF_ACCEPT_LOCAL_ADDRS)) { + /* All is OK */ + } + else { + goto e_inval_res; + } + } + *spec_dst = FIB_RES_PREFSRC(res); fib_combine_itag(itag, &res); #ifdef CONFIG_IP_ROUTE_MULTIPATH --- linux-2.4.20/net/ipv4/tcp_ipv4.c 2002-11-28 15:53:15.000000000 -0800 +++ linux-2.4.20.c3/net/ipv4/tcp_ipv4.c 2003-03-18 14:32:53.000000000 -0800 @@ -1394,7 +1394,7 @@ #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */ #endif - /* Never answer to SYNs send to broadcast or multicast */ + /* Never answer to SYNs sent to broadcast or multicast */ if (((struct rtable *)skb->dst)->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) goto drop; --------------010105070505030601090008-- From john@grabjohn.com Sat Jun 28 15:29:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 15:29:16 -0700 (PDT) Received: from 81-2-122-30.bradfords.org.uk (81-2-122-30.bradfords.org.uk [81.2.122.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SMT92x019147 for ; Sat, 28 Jun 2003 15:29:13 -0700 Received: from 81-2-122-30.bradfords.org.uk (localhost [127.0.0.1]) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9) with ESMTP id h5SMbNQT000444; Sat, 28 Jun 2003 23:37:23 +0100 Received: (from john@localhost) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9/Submit) id h5SMbMEm000443; Sat, 28 Jun 2003 23:37:22 +0100 Date: Sat, 28 Jun 2003 23:37:22 +0100 From: John Bradford Message-Id: <200306282237.h5SMbMEm000443@81-2-122-30.bradfords.org.uk> To: alan@lxorguk.ukuu.org.uk, davem@redhat.com Subject: Re: networking bugs and bugme.osdl.org Cc: davidel@xmailserver.org, greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, mbligh@aracnet.com, netdev@oss.sgi.com X-archive-position: 3635 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@grabjohn.com Precedence: bulk X-list: netdev > If users send their report to the wrong place, it will get lost, > just like if their cat their report into /dev/null. I have no reason > to feel bad about the information getting lost. Also, remember that we sometimes get no response when something is fixed, which is especially true when the fix happens by itself. E.G. 2.5.foo released Bug reported to LKML, and nobody responds. 2.5.bar released Bug re-reported to LKML, still nobody answers, maybe it's not a very detailed bug report, or everybody is too busy. 2.5.baz released No bug report. We have so far been assuming in this discussion that 2.5.baz won't have fixed the bug. It's not entirely impossible that 2.5.baz _will_ have fixed the bug - maybe a subsystem was being overhauled anyway, and it was generally known on the list that the bug existed. By not letting bug reports expire, we'd have a lot of unclosed bugs that were really fixed. There is an analogy with TCP: Compare: SYN --> <-- ACK DATA --> FIN --> and SYN --> <-- ACK DATA --> with: Bug report --> Bug report --> <-- Please test this patch Follow up bug report --> <-- Please test this patch Follow up bug report --> <-- Please test this patch OK, thanks, it works --> <-- Glad it worked and Bug report --> Bug report --> <-- Please test this patch Follow up bug report --> <-- Please test this patch <-- Please test this patch <-- Please test this patch > If it's too much for them to do as I ask, it's too much for > me to consider their report. > > Bug reporting, just like patch submission, is a 2 way street. It's not even a case of effort, more that you need 2 way communication to successfully fix a bug. You need to know that the fix worked initially, continues to work, and that it doesn't break anything else, otherwise you might be adding more bugs. John. From frank@google.com Sat Jun 28 15:51:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 15:51:35 -0700 (PDT) Received: from 216-239-45-4.google.com (216-239-45-4.google.com [216.239.45.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SMpS2x019555 for ; Sat, 28 Jun 2003 15:51:29 -0700 Received: from moma.corp.google.com (moma.corp.google.com [10.3.0.12]) by 216-239-45-4.google.com (8.12.9/8.12.6) with ESMTP id h5SMpNXM016741; Sat, 28 Jun 2003 15:51:23 -0700 Received: from vger.corp.google.com (vger.corp.google.com [10.32.60.132]) by moma.corp.google.com (8.12.9/8.12.3) with ESMTP id h5SMpNff031331; Sat, 28 Jun 2003 15:51:23 -0700 Received: (from frank@localhost) by vger.corp.google.com (8.10.2/8.10.2) id h5SMpJu15498; Sat, 28 Jun 2003 15:51:19 -0700 Date: Sat, 28 Jun 2003 15:51:19 -0700 From: Frank Cusack To: Jamal Hadi Cc: James Carlson , "David S. Miller" , rusty@rustcorp.com.au, paulus@samba.org, netdev@oss.sgi.com, fcusack@samba.org Subject: Re: [PATCH, untested] Support for PPPOE on SMP Message-ID: <20030628155119.A15491@google.com> References: <20030625.143334.85380461.davem@redhat.com> <20030626035824.D68B62C147@lists.samba.org> <20030625.205941.41631020.davem@redhat.com> <16122.53298.150512.793074@h006008986325.ne.client2.attbi.com> <20030626190407.S87648@shell.cyberus.ca> <16124.11495.374998.153330@h006008986325.ne.client2.attbi.com> <20030627213846.V90398@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030627213846.V90398@shell.cyberus.ca>; from hadi@shell.cyberus.ca on Fri, Jun 27, 2003 at 10:21:21PM -0400 X-archive-position: 3636 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fcusack@fcusack.com Precedence: bulk X-list: netdev On Fri, Jun 27, 2003 at 10:21:21PM -0400, Jamal Hadi wrote: > On Fri, 27 Jun 2003, James Carlson wrote: > > > > Loss preserves ordering. To get misordering, you have to > > intentionally hold onto a message and reinsert it later. What I've > > And thats what i was implying. > In your above example: > > 1 2 4 5 6 > If the entity above the wire cared about packet 3 there will be a > retransmit. so it becomes: > > 1 2 4 5 6 3 Higher layer entities doing retransmits is not reordering. To the lower layer, it's just the next message. /fc From sba@lodur.srl.caltech.edu Sat Jun 28 16:17:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 16:18:02 -0700 (PDT) Received: from lodur.srl.caltech.edu (lodur.srl.caltech.edu [131.215.120.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SNHv2x020018 for ; Sat, 28 Jun 2003 16:17:57 -0700 Received: from jelly.caltech.edu (jelly.ligo.caltech.edu [131.215.115.246]) by lodur.srl.caltech.edu (8.12.9/8.12.9) with ESMTP id h5SNHu41002449; Sat, 28 Jun 2003 16:17:56 -0700 (PDT) Received: (from sba@localhost) by jelly.caltech.edu (8.8.8+Sun/8.8.8) id QAA05803; Sat, 28 Jun 2003 16:17:56 -0700 (PDT) From: Stuart Anderson Message-Id: <200306282317.QAA05803@jelly.caltech.edu> Subject: tg3 lockup under load on SMP kernel To: linux-net@vger.kernel.org, netdev@oss.sgi.com Date: Sat, 28 Jun 2003 16:17:56 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL89 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-archive-position: 3637 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sba@srl.caltech.edu Precedence: bulk X-list: netdev I am having a problem with the tg3 driver locking up dual-xeon machines when put under heavy load if and only if I am running an SMP kernel. No problems with tg3+UP or bcm5700+SMP or with the a SysKonnect card under load with either UP or SMP kernel. The machines are: dual 2.2GHz P4 Xeon/2GB RAM/Super Micro P4DL6 MOBO running RedHat7.3 plus whatever kernel is needed. Besides the on-board Broadcom BCM5701 10/100/1000 copper GigE there is one SysKonnect9843 fiber GigE PCI card and 2 3Ware IDE-RAID cards per machine. I first ran into this with RH2.4.18, and reproduced the problem with vanilla 2.4.20 and 2.4.20 plus tg3 1.4c patch at that time. It is now a more important issue and I am seeing it with RH smp-2.4.20-13.7 but not the uni-processor 2.4.20-13.7--both of which use the tg3 1.5 driver. Any suggestions on how to get tg3 running stably under SMP kernel on a BCM5701 interface would be greatly appreciated. Here is the output of lspci -v. # lspci -v 00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13) Flags: fast devsel 00:00.1 Host bridge: ServerWorks: Unknown device 0012 Flags: fast devsel 00:00.2 Host bridge: ServerWorks: Unknown device 0000 Flags: fast devsel 00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc Rage XL Flags: bus master, stepping, medium devsel, latency 64, IRQ 11 Memory at fb000000 (32-bit, non-prefetchable) [size=16M] I/O ports at a800 [size=256] Memory at fc7ff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at fc7c0000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 00:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d) Subsystem: Intel Corp. EtherExpress PRO/100 S Server Adapter Flags: bus master, medium devsel, latency 64, IRQ 9 Memory at fc7fd000 (32-bit, non-prefetchable) [size=4K] I/O ports at af00 [size=64] Memory at fc7a0000 (32-bit, non-prefetchable) [size=128K] Expansion ROM at fc7e0000 [disabled] [size=64K] Capabilities: [dc] Power Management version 2 00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93) Subsystem: Unknown device d915:5538 Flags: bus master, medium devsel, latency 64 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 8a [Master SecP PriP]) Subsystem: ServerWorks CSB5 IDE Controller Flags: bus master, medium devsel, latency 64 I/O ports at I/O ports at I/O ports at I/O ports at I/O ports at ffa0 [size=16] 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05) (prog-if 10 [OHCI]) Subsystem: ServerWorks OSB4/CSB5 USB Controller Flags: bus master, medium devsel, latency 64, IRQ 10 Memory at fc7fe000 (32-bit, non-prefetchable) [size=4K] 00:0f.3 Host bridge: ServerWorks: Unknown device 0225 Subsystem: Unknown device d915:5538 Flags: bus master, medium devsel, latency 0 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 01:02.0 RAID bus controller: 3ware Inc 3ware 7000-series ATA-RAID (rev 01) Subsystem: 3ware Inc 3ware 7000-series ATA-RAID Flags: bus master, medium devsel, latency 64, IRQ 10 I/O ports at bfa0 [size=16] Memory at fd8ffc00 (32-bit, non-prefetchable) [size=16] Memory at fd000000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at fd8e0000 [disabled] [size=64K] Capabilities: [40] Power Management version 1 01:03.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) Subsystem: BROADCOM Corporation: Unknown device 1644 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 5 Memory at fd8d0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at fd8c0000 [disabled] [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 02:02.0 RAID bus controller: 3ware Inc 3ware 7000-series ATA-RAID (rev 01) Subsystem: 3ware Inc 3ware 7000-series ATA-RAID Flags: bus master, medium devsel, latency 64, IRQ 7 I/O ports at cfa0 [size=16] Memory at fe9ffc00 (32-bit, non-prefetchable) [size=16] Memory at fe000000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at fe9e0000 [disabled] [size=64K] Capabilities: [40] Power Management version 1 03:02.0 Ethernet controller: Syskonnect (Schneider & Koch) Gigabit Ethernet (rev 12) Subsystem: Syskonnect (Schneider & Koch) SK-9843 (1000Base-SX single link) Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 11 Memory at feafc000 (32-bit, non-prefetchable) [size=16K] I/O ports at d800 [size=256] Expansion ROM at feac0000 [disabled] [size=128K] Capabilities: [48] Power Management version 1 Capabilities: [50] Vital Product Data -- Stuart Anderson sba@srl.caltech.edu http://www.srl.caltech.edu/personnel/sba From alan@lxorguk.ukuu.org.uk Sat Jun 28 16:18:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 16:18:53 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SNIk2x020102 for ; Sat, 28 Jun 2003 16:18:47 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5SNFjKd006957; Sun, 29 Jun 2003 00:15:46 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5SNFd3G006955; Sun, 29 Jun 2003 00:15:39 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030628.150328.74739742.davem@redhat.com> References: <1056755336.5459.16.camel@dhcp22.swansea.linux.org.uk> <20030627.172123.78713883.davem@redhat.com> <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> <20030628.150328.74739742.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056842138.6753.16.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 29 Jun 2003 00:15:38 +0100 X-archive-position: 3638 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sad, 2003-06-28 at 23:03, David S. Miller wrote: > Not my problem Alan. If the user gives a crap about their report > mattering, they'll do what I ask them to do. If users send their > report to the wrong place, it will get lost, just like if their > cat their report into /dev/null. I have no reason to feel bad about > the information getting lost. You might not care but some of us do. Capturing the data matters for lots of things. That you don't have the time to be the filter for that info for networking is also fine. From davem@redhat.com Sat Jun 28 16:26:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 16:26:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SNQV2x020665 for ; Sat, 28 Jun 2003 16:26:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA11050; Sat, 28 Jun 2003 16:20:02 -0700 Date: Sat, 28 Jun 2003 16:20:02 -0700 (PDT) Message-Id: <20030628.162002.48522400.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056842138.6753.16.camel@dhcp22.swansea.linux.org.uk> References: <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> <20030628.150328.74739742.davem@redhat.com> <1056842138.6753.16.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3639 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 29 Jun 2003 00:15:38 +0100 You might not care but some of us do. Capturing the data matters for lots of things. That you don't have the time to be the filter for that info for networking is also fine. Alan, you really stretch yourself thin doing this stuff all the time. Now imagine if you invested this effort in educating people and getting them to be better bug reporters, sending the right info to the right place? I think that, along with some actual development I actually miss seeing you be able to do that, is a much better allocation of your time and talents. You're grovelling at the bottom of the barrel, it's time to start skimming from the top instead. Things that matters will come back, it doesn't disappear. From alan@lxorguk.ukuu.org.uk Sat Jun 28 16:50:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 16:50:05 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5SNo02x021069 for ; Sat, 28 Jun 2003 16:50:01 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5SNkuKd007041; Sun, 29 Jun 2003 00:46:57 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5SNksH2007039; Sun, 29 Jun 2003 00:46:54 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, davidel@xmailserver.org, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030628.162002.48522400.davem@redhat.com> References: <1056827972.6295.28.camel@dhcp22.swansea.linux.org.uk> <20030628.150328.74739742.davem@redhat.com> <1056842138.6753.16.camel@dhcp22.swansea.linux.org.uk> <20030628.162002.48522400.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056844013.6778.42.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 29 Jun 2003 00:46:54 +0100 X-archive-position: 3640 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sul, 2003-06-29 at 00:20, David S. Miller wrote: > Now imagine if you invested this effort in educating people > and getting them to be better bug reporters, sending the > right info to the right place? For IDE the statistical value of being able to go digging through old data has been more than worth the effort. Similarly writing tools to do the grovelling has a clear value. > You're grovelling at the bottom of the barrel, it's time to start > skimming from the top instead. Things that matters will come back, it > doesn't disappear. I'm trying to turn grovelling into barrels into an automated process. Thats much more useful and also entertaining (things like oops matching and gdb trace matching turn out to be interesting little problems of their own) Alan From mbligh@aracnet.com Sat Jun 28 17:45:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 17:45:21 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T0jC2x021687 for ; Sat, 28 Jun 2003 17:45:13 -0700 Received: from groan (216-99-192-224.dial.spiritone.com [216.99.192.224]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5T0ifga020354; Sat, 28 Jun 2003 17:44:42 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5T0imfg027065; Sat, 28 Jun 2003 17:44:53 -0700 Date: Sat, 28 Jun 2003 17:44:48 -0700 From: "Martin J. Bligh" To: Alan Cox cc: Larry McVoy , "David S. Miller" , greearb@candelatech.com, davidel@xmailserver.org, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <1340000.1056847487@[10.10.2.4]> In-Reply-To: <1056828052.6295.31.camel@dhcp22.swansea.linux.org.uk> References: <3EFCC1EB.2070904@candelatech.com> <20030627.151906.102571486.davem@redhat.com> <3EFCC6EE.3020106@candelatech.com> <20030627.170022.74744550.davem@redhat.com> <20030628001954.GD18676@work.bitmover.com> <34700000.1056760028@[10.10.2.4]> <1056828052.6295.31.camel@dhcp22.swansea.linux.org.uk> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3641 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --Alan Cox wrote (on Saturday, June 28, 2003 20:20:53 +0100): > On Sad, 2003-06-28 at 01:27, Martin J. Bligh wrote: >> That's a trivial change to make if you want it. we just add a "reviewed" >> / "certified" state between "new" and "assigned". Yes, might be a good >> idea. I'm not actually that convinced that "assigned" is overly useful >> in the context of open-source, but that's a separate discussion. > > Most bugzilla's seem to use VERIFIED for this, and it means people who > have better things to do can just pull bugs that are verified and/or > tagged with "patch" in the attachments Hmmm. we have VERIFIED set up to mean that the proposed fix has been verified to work. Could reshuffle it, or we could find a different word I guess - reusing the same one might cause confusion (on the other hand ...using the same word for different things in different bugzillas is confusing too ...) M. From jmorris@intercode.com.au Sat Jun 28 18:14:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 18:15:06 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:Wxy0O86E0YauIaXf0MJqZlP9jKobYndA@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T1Eu2x022168 for ; Sat, 28 Jun 2003 18:14:58 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5T1EZr25808; Sun, 29 Jun 2003 11:14:36 +1000 Date: Sun, 29 Jun 2003 11:14:35 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: use macro for M-Flag and clean-up In-Reply-To: <20030628.203011.05536524.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3642 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sat, 28 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > - fh->frag_off = htons(0x0001); > + fh->frag_off = htons(IP6_MF); Please use __constant_htons(). - James -- James Morris From yoshfuji@linux-ipv6.org Sat Jun 28 19:24:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 19:24:53 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T2Oh2x022972 for ; Sat, 28 Jun 2003 19:24:44 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5T2PiBo000881; Sun, 29 Jun 2003 11:25:47 +0900 Date: Sun, 29 Jun 2003 11:25:44 +0900 (JST) Message-Id: <20030629.112544.42878077.yoshfuji@linux-ipv6.org> To: jmorris@intercode.com.au Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: use macro for M-Flag and clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030628.203011.05536524.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3643 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Sun, 29 Jun 2003 11:14:35 +1000 (EST)), James Morris says: > On Sat, 28 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > > - fh->frag_off = htons(0x0001); > > + fh->frag_off = htons(IP6_MF); > > Please use __constant_htons(). No. We don't use __constant_hton{s,l} in runtime code since October, 2002. We use hton{s,l} when we can. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From pekkas@netcore.fi Sat Jun 28 23:27:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 28 Jun 2003 23:27:16 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T6R52x025718 for ; Sat, 28 Jun 2003 23:27:06 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5T6Qu728907; Sun, 29 Jun 2003 09:26:56 +0300 Date: Sun, 29 Jun 2003 09:26:55 +0300 (EEST) From: Pekka Savola To: Michael Bellion and Thomas Heinz cc: linux-kernel@vger.kernel.org, Subject: Re: [ANNOUNCE] nf-hipac v0.8 released In-Reply-To: <3EFDF4DA.80201@hipac.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3644 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev Hi, On Sat, 28 Jun 2003, Michael Bellion and Thomas Heinz wrote: > You wrote: > > Looks interesting. Is there experience about this in bridging firewall > > scenarios? (With or without external patchset's like > > http://ebtables.sourceforge.net/) > > Sorry for this answer being so late but we wanted to check whether > nf-hipac works with the ebtables patch first in order to give you > a definite answer. We tried on a sparc64 which was a bad decision > because the ebtables patch does not work on sparc64 systems. > We are going to test the stuff tomorrow on an i386 and tell you > the results afterwards. > > In principle, nf-hipac should work properly whith the bridge patch. > We expect it to work just like iptables apart from the fact that > you cannot match on bridge ports. The iptables' in/out interface > match in 2.4 works the way that it matches if either in/out dev > _or_ in/out physdev. The nf-hipac in/out interface match matches > solely on in/out dev. Thanks for this information. > > Further, you mention the performance reasons for this approach. I would > > be very interested to see some figures. > > We have done some performance tests with an older release of nf-hipac. > The results are available on http://www.hipac.org/ > > Apart from that Roberto Nibali did some preliminary testing on nf-hipac. > You can find his posting to linux-kernel here: > http://marc.theaimsgroup.com/?l=linux-kernel&m=103358029605079&w=2 > > Since there are currently no performance tests available for the > new release we want to encourage people interested in firewall > performance evaluation to include nf-hipac in their tests. Yes, I had missed this when I quickly looked at the web page using lynx. Thanks. One obvious thing that's missing in your performance and Roberto's figures is what *exactly* are the non-matching rules. Ie. do they only match IP address, a TCP port, or what? (TCP port matching is about a degree of complexity more expensive with iptables, I recall.) -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From ja@ssi.bg Sun Jun 29 00:25:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 00:25:26 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T7Of2x026326 for ; Sun, 29 Jun 2003 00:24:49 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5T7SIo08422; Sun, 29 Jun 2003 10:28:18 +0300 Date: Sun, 29 Jun 2003 10:28:18 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: netdev@oss.sgi.com Subject: Re: routing bug report for 2.4 In-Reply-To: <3EFE131E.1080807@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3645 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > > - check for UP state (is it needed? return ENETDOWN?) > > - check if target IP is local and select "lo" instead of oif First, here is what I mean (not compiled): - ignore matching of the oif key for local destinations - return ENETDOWN when the specified out_dev is down Dave, Alexey, can you judge on these issues because they are not fatal corner cases and can be ignored. --- v2.4.21/linux/net/ipv4/fib_semantics.c.orig Sat Jun 14 08:42:55 2003 +++ v2.4.21/linux/net/ipv4/fib_semantics.c Sun Jun 29 09:28:10 2003 @@ -603,7 +603,9 @@ for_nexthops(fi) { if (nh->nh_flags&RTNH_F_DEAD) continue; - if (!key->oif || key->oif == nh->nh_oif) + if (!key->oif || + key->oif == nh->nh_oif || + nh->nh_scope == RT_SCOPE_NOWHERE) break; } #ifdef CONFIG_IP_ROUTE_MULTIPATH --- v2.4.21/linux/net/ipv4/route.c.orig Sat Jun 14 08:42:55 2003 +++ v2.4.21/linux/net/ipv4/route.c Sun Jun 29 09:16:03 2003 @@ -1793,6 +1793,9 @@ dev_put(dev_out); goto out; /* Wrong error code */ } + err = -ENETDOWN; + if (!(dev_out->flags&IFF_UP)) + goto out; if (LOCAL_MCAST(oldkey->dst) || oldkey->dst == 0xFFFFFFFF) { if (!key.src) > Well, why should it try to route locally in this case (I'm assuming that > by using 'lo' it will not try to send on the external link) No, it does not use "lo", "lo" replaces "dev" only if we get RTN_LOCAL result. But "to local_IP dev different_device" can escape from our host because we can not find route and thus we can not override out_dev with lo. > Why not instead make it send to the router for that source-ip, if it is > configured. If it is not configured, then I think arping is the best that What we have is that app uses BINDTODEVICE to send packet with saddr=some_IP daddr=any_valid_local_IP. This is confusing but I do not see any harm. But I think route request "to local_IP" deserves "lo" result no matter the oif key. > can be expected, as the behaviour becomes quite undefined and we really > have 'no route to host'. The only reason can be to avoid confusions and to make it symmetric with the source validation check. And yes, this patch breaks your tests. > My send-to-self patch that I have been using is attached. I also have some other > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > impact on the behaviour we have been discussing. I don't see anything in your patch that can disturb these tests. The kernel is helpful enough to send your ARP probe for local_IP on the LAN :) When I tested the first time, you claimed -I local_IP1 local_IP2 causes the problem but as we see, it is caused from -I dev > Thanks, > Ben Regards -- Julian Anastasov From ratz@drugphish.ch Sun Jun 29 00:46:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 00:46:27 -0700 (PDT) Received: from mailphish.drugphish.ch (adsl-196-233.cybernet.ch [212.90.196.233]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T7kI2x026858 for ; Sun, 29 Jun 2003 00:46:19 -0700 Received: from drugphish.ch (unknown [172.23.2.31]) by mailphish.drugphish.ch (drugphish mail transportation agency) with ESMTP id 9D68D2EC2; Sun, 29 Jun 2003 07:28:49 +0000 (/etc/localtime) Message-ID: <3EFE9921.5010902@drugphish.ch> Date: Sun, 29 Jun 2003 09:45:37 +0200 From: Roberto Nibali User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030611 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pekka Savola Cc: Michael Bellion and Thomas Heinz , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3646 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ratz@drugphish.ch Precedence: bulk X-list: netdev Hello, >>Apart from that Roberto Nibali did some preliminary testing on nf-hipac. >>You can find his posting to linux-kernel here: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=103358029605079&w=2 >> >>Since there are currently no performance tests available for the >>new release we want to encourage people interested in firewall >>performance evaluation to include nf-hipac in their tests. > > Yes, I had missed this when I quickly looked at the web page using lynx. > Thanks. > > One obvious thing that's missing in your performance and Roberto's figures > is what *exactly* are the non-matching rules. Ie. do they only match IP > address, a TCP port, or what? (TCP port matching is about a degree of > complexity more expensive with iptables, I recall.) When I did the tests I used a variant of following simple script [1]. There you can see that I only used a src port range. In an original paper I wrote for my company (announced here [2]) I did create rules that only matched IP addresses, the results were bad enough ;). Meanwhile I should revise the paper as quite a few things have been addressed since then: For example the performance issues with OpenBSD packet filtering have mostly been squashed. I didn't continue on that matter because I fell severely ill last autumn and first had to take care of that. [1] http://www.drugphish.ch/~ratz/genrules.sh [2] http://www.ussg.iu.edu/hypermail/linux/kernel/0203.3/0847.html HTH and Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc From ja@ssi.bg Sun Jun 29 02:39:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 02:40:04 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5T9dj2x029214 for ; Sun, 29 Jun 2003 02:39:48 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5T9hQo08719; Sun, 29 Jun 2003 12:43:26 +0300 Date: Sun, 29 Jun 2003 12:43:26 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: netdev@oss.sgi.com Subject: send-to-self (was Re: routing bug report for 2.4) In-Reply-To: <3EFE131E.1080807@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3647 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > My send-to-self patch that I have been using is attached. I also have some other > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > impact on the behaviour we have been discussing. Ben, lets define new behaviour for your feature: 1. we mark ethX with /proc/sys/net/ipv4/conf/ethX/loop=1. That means this is a loop device (my site contains lot of device flags, you can see what costs creating a sysctl var): http://www.ssi.bg/~ja/ just hit some of the links, recommended example: http://www.ssi.bg/~ja/forward_shared-2.4.19-2.diff there are 2 variants: - loop can be 0(no loop) / 1(loop inout) or - 0(no loop), 1(loop in only), 2(loop out only), 3(loop inout) where "loop in only" means "accept only" and "loop out only" is "send only" interface but as all traffics are inout I think "loop inout" will be always used 2. arp_filter accepts traffic on ethX (as in your patch) if "loop in" is allowed for indev and "loop out" for the out_dev in routing result 3. rp_filter (source validation) accepts traffic on ethX (as in your patch) if "loop in" is allowed 4. get unicast output route for local IPs ethY->ethX if "loop in" is allowed for ethX and "loop out" is allowed for "ethY. ARP will add cache entries for local IPs. Goal 1. Can we just skip the BINDTODEVICE thing and to replace it with bind to src IP. We can avoid binding to src IP for our tests if we replace the preferred source IP in the desired local routes but this is a hack. Using BINDTODEVICE will not add any benefits but will be supported (it is ignored). Then to define it in this way: If ethX has "/proc/sys/net/ipv4/conf/ethX/loop" set to !0 then all output routes "from local_ip_on_ethY to local_ip_on_ethX" will not receive "lo" result but "ethY" with RTN_UNICAST type if local_ip_on_ethY is configured on ethY (ethY has loop enabled too), no matter the key->oif value. Sort of: fib_lookup for "from IP1 to IP2 oif XXX" if (RTN_LOCAL) { if dev_out is loop_in and key->src != 0 { src = key->src? : FIB_RES_PREFSRC(res); dev_in = ip_dev_find(src); if (dev_in is loop_out) { use dev_in as dev_out goto make_route; } } // else use "lo" } - this code is slow but it is guarded from loop check for out_dev so I do not see performance impact (the output routing to localhost is not used often). The result is cached (you can set long routing cache expiration value during the tests). - we assume my patch from previous posting is applied and we match any local IP no matter the key oif. Goal 2. Can we skip all TCP/UDP changes? - we rely on the fact the routing results allow traffic in both directions (incoming is accepted with RTN_LOCAL, output gets RTN_UNICAST). As for IPv6 I can not comment, we define ipv4/conf/XXX/loop flag, though. But I prefer we to keep the changes only at routing level. For TCP and UDP these talks should look as if "lo" is used. - what I'm not sure is whether any socket hash problems exists and this is the only thing that can prevent this patch to look nice and fast. But I'm wondering there are such issues as the talks on "lo" should work but we have to check that. The usage: - mark eth0 as loop_out and eth1 as loop_in device and start the test in eth0->eth1 direction or use loop inout for both directions. If you think that we can change only the routing then I can prepare patch for testing, I'm not sure I have a test setup for this feature right now. Regards -- Julian Anastasov From jmorris@intercode.com.au Sun Jun 29 03:50:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 03:50:36 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:S0c1hTIoTkGB4q7BHkstCl5mfVC8jCKt@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TAoO2x031395 for ; Sun, 29 Jun 2003 03:50:26 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5TAo8r27250; Sun, 29 Jun 2003 20:50:09 +1000 Date: Sun, 29 Jun 2003 20:50:08 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: use macro for M-Flag and clean-up In-Reply-To: <20030629.112544.42878077.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3648 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sun, 29 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > No. We don't use __constant_hton{s,l} in runtime code > since October, 2002. We use hton{s,l} when we can. Ok. All three patches applied to bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From yoshfuji@linux-ipv6.org Sun Jun 29 06:33:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 06:33:55 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TDXh2x007471 for ; Sun, 29 Jun 2003 06:33:44 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5TDZ2Bo003282; Sun, 29 Jun 2003 22:35:02 +0900 Date: Sun, 29 Jun 2003 22:35:00 +0900 (JST) Message-Id: <20030629.223500.58352746.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com Subject: [PATCH] XFRM: typo From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3649 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev This fixes a typo. Index: linux-2.5/net/xfrm/xfrm_input.c =================================================================== RCS file: /home/cvs/linux-2.5/net/xfrm/xfrm_input.c,v retrieving revision 1.4 diff -u -r1.4 xfrm_input.c --- linux-2.5/net/xfrm/xfrm_input.c 16 May 2003 21:02:52 -0000 1.4 +++ linux-2.5/net/xfrm/xfrm_input.c 29 Jun 2003 12:11:51 -0000 @@ -18,7 +18,7 @@ kmem_cache_free(sp->pool, sp); } -/* Fetch spi and seq frpm ipsec header */ +/* Fetch spi and seq from ipsec header */ int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq) { -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Sun Jun 29 08:58:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 08:58:28 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:woD/tqCbHEIlK+XGbzMzTbrn0Kazx4UF@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TFwJ2x009013 for ; Sun, 29 Jun 2003 08:58:21 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5TFw4r28076; Mon, 30 Jun 2003 01:58:06 +1000 Date: Mon, 30 Jun 2003 01:58:03 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] XFRM: typo In-Reply-To: <20030629.223500.58352746.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3650 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sun, 29 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > This fixes a typo. Applied, thanks. - James -- James Morris From nf@hipac.org Sun Jun 29 09:27:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 09:27:32 -0700 (PDT) Received: from indyio.rz.uni-saarland.de (indyio.rz.uni-saarland.de [134.96.7.3]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TGRN2x009498 for ; Sun, 29 Jun 2003 09:27:25 -0700 Received: from mars.rz.uni-saarland.de (mars.rz.uni-saarland.de [134.96.7.4]) by indyio.rz.uni-saarland.de (8.12.9/8.12.5) with ESMTP id h5TGRFai9543604; Sun, 29 Jun 2003 18:27:15 +0200 (CEST) Received: from e002.stw.stud.uni-saarland.de (e002.stw.stud.uni-saarland.de [134.96.65.17]) by mars.rz.uni-saarland.de (8.9.3p2/8.8.4/8.8.2) with ESMTP id SAA20928074; Sun, 29 Jun 2003 18:27:14 +0200 (CEST) Received: from e226.stw.stud.uni-saarland.de ([134.96.65.241] helo=hipac.org) by e002.stw.stud.uni-saarland.de with esmtp (Exim 3.35 #1 (Debian)) id 19Wf1G-0001LG-00; Sun, 29 Jun 2003 18:27:14 +0200 Message-ID: <3EFF1349.6020802@hipac.org> Date: Sun, 29 Jun 2003 18:26:49 +0200 From: Michael Bellion and Thomas Heinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1 X-Accept-Language: de, en MIME-Version: 1.0 To: Pekka Savola CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [ANNOUNCE] nf-hipac v0.8 released References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3651 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nf@hipac.org Precedence: bulk X-list: netdev Hi Pekka You wrote: >>We are going to test the stuff tomorrow on an i386 and tell you >>the results afterwards. Well, nf-hipac works fine together with the ebtables patch for 2.4.21 on an i386 machine. We expect it to work with other patches too. >>In principle, nf-hipac should work properly whith the bridge patch. >>We expect it to work just like iptables apart from the fact that >>you cannot match on bridge ports. Well, this statement holds for the native nf-hipac in/out interface match but of course you can match on bridge ports with nf-hipac using the iptables physdev match. So everything should be fine :) > One obvious thing that's missing in your performance and Roberto's figures > is what *exactly* are the non-matching rules. Ie. do they only match IP > address, a TCP port, or what? (TCP port matching is about a degree of > complexity more expensive with iptables, I recall.) [answered in private e-mail] Regards, +-----------------------+----------------------+ | Michael Bellion | Thomas Heinz | | | | +-----------------------+----------------------+ From bunk@fs.tum.de Sun Jun 29 12:55:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 12:55:17 -0700 (PDT) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TJt72x011401 for ; Sun, 29 Jun 2003 12:55:09 -0700 Received: (qmail 10793 invoked from network); 29 Jun 2003 19:55:02 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 29 Jun 2003 19:55:02 -0000 Resent-Message-ID: <20030629195459.13390.qmail@mimas> Received: (qmail 9228 invoked from network); 29 Jun 2003 19:08:54 -0000 Received: from mailrelay1.lrz-muenchen.de (129.187.254.106) by hermes.fachschaften.tu-muenchen.de with SMTP; 29 Jun 2003 19:08:54 -0000 Received: from vger.kernel.org by mailrelay1.lrz-muenchen.de with ESMTP for bunk@fs.tum.de; Sun, 29 Jun 2003 21:08:52 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262073AbTF2Su1 (ORCPT ); Sun, 29 Jun 2003 14:50:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262116AbTF2Su1 (ORCPT ); Sun, 29 Jun 2003 14:50:27 -0400 Received: from hermes.fachschaften.tu-muenchen.de ([129.187.202.12]:31471 "HELO hermes.fachschaften.tu-muenchen.de") by vger.kernel.org with SMTP id S262073AbTF2SuU (ORCPT ); Sun, 29 Jun 2003 14:50:20 -0400 Received: (qmail 8954 invoked from network); 29 Jun 2003 19:04:34 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 29 Jun 2003 19:04:34 -0000 Date: Sun, 29 Jun 2003 21:04:32 +0200 From: Adrian Bunk To: Andrew Morton , ralf@linux-mips.org Cc: linux-kernel@vger.kernel.org, trivial@rustcorp.com.au Subject: [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927 Message-Id: <20030629190431.GB282@fs.tum.de> References: <20030627202130.066c183b.akpm@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030627202130.066c183b.akpm@digeo.com> User-Agent: Mutt/1.4.1i Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Resent-From: bunk@fs.tum.de Resent-Date: Sun, 29 Jun 2003 21:54:59 +0200 Resent-To: netdev@oss.sgi.com X-archive-position: 3652 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev The following problem seems to come from Linus' tree: I got an error at the final linking with CONFIG_TC35815 enabled since the variables tc_readl and tc_writel are not available. The only place where they are defined is arch/mips/pci/ops-jmr3927.c, so I assume the following was intended: --- linux-2.5.73-mm2/drivers/net/Kconfig.old 2003-06-28 11:14:16.000000000 +0200 +++ linux-2.5.73-mm2/drivers/net/Kconfig 2003-06-29 20:55:16.000000000 +0200 @@ -1397,7 +1397,7 @@ config TC35815 tristate "TOSHIBA TC35815 Ethernet support" - depends on NET_PCI && PCI + depends on NET_PCI && PCI && TOSHIBA_JMR3927 config DGRS tristate "Digi Intl. RightSwitch SE-X support" cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From ja@ssi.bg Sun Jun 29 13:16:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 13:16:06 -0700 (PDT) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TKFm2x011949 for ; Sun, 29 Jun 2003 13:15:59 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.6/8.11.6) with ESMTP id h5TKIpo16693; Sun, 29 Jun 2003 23:19:05 +0300 Date: Sun, 29 Jun 2003 23:18:50 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Ben Greear cc: netdev@oss.sgi.com Subject: Re: send-to-self (was Re: routing bug report for 2.4) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3653 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, Ben, I have something for comments and testing (compiled only): http://www.ssi.bg/~ja/send-to-self-2.4.21-1.diff The usage should be: eth0/loop=1 eth1/loop=1 bind to src IP from eth0 and connect to local IP on eth1 Be ready, there can be something totally wrong. I'm avoiding the arp_filter changes. The setup uses asymmetric routing so better use arp_filter=0 or other ARP filtering tools that can restrict our ARP replies only via the desired device. Regards -- Julian Anastasov From davem@redhat.com Sun Jun 29 14:22:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 14:22:19 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TLM62x013186 for ; Sun, 29 Jun 2003 14:22:09 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA18835; Sun, 29 Jun 2003 14:15:28 -0700 Date: Sun, 29 Jun 2003 14:15:28 -0700 (PDT) Message-Id: <20030629.141528.74734144.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> References: <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3654 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 28 Jun 2003 00:04:30 +0100 You are assuming there is a relationship in bug severity/commonness and number of *developers* who hit it. Not true, the assumption I make is that a bug report that a bug reporter cares about, and a patch that a patch submitter cares about, will all get resent if they get dropped. If the reporter/submitter doesn't care, neither do I. You keep saying that lost information is bad and serves no positive purpose, and I totally disagree. Drops are litmus tests for the patch/report, they also serve to educate the submitters. And to repeat, this process is a two way street Alan. If you try to make it anything else, you will wear yourself thin. Once you enforce the work to be distributed to the people who report to you as much as to the people taking the reports, thing will go much more smoothly. :-) From aebr@win.tue.nl Sun Jun 29 14:46:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 14:46:19 -0700 (PDT) Received: from kweetal.tue.nl (kweetal.tue.nl [131.155.3.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TLk82x013571 for ; Sun, 29 Jun 2003 14:46:10 -0700 Received: from kweetal.tue.nl ([127.0.0.1]) by localhost (kweetal.tue.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 05253-08; Sun, 29 Jun 2003 23:46:02 +0200 (CEST) Received: from wsdw15.win.tue.nl (wsdw15.win.tue.nl [131.155.69.229]) by kweetal.tue.nl (Postfix) with ESMTP id 6194A13C93B; Sun, 29 Jun 2003 23:46:02 +0200 (CEST) Received: (from aebr@localhost) by wsdw15.win.tue.nl (8.12.6/8.12.3) id h5TLjwVA015094; Sun, 29 Jun 2003 23:45:58 +0200 (MET DST) Date: Sun, 29 Jun 2003 23:45:58 +0200 From: Andries Brouwer To: "David S. Miller" Cc: alan@lxorguk.ukuu.org.uk, greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030629214558.GA15089@win.tue.nl> References: <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030629.141528.74734144.davem@redhat.com> User-Agent: Mutt/1.3.25i X-archive-position: 3655 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aebr@win.tue.nl Precedence: bulk X-list: netdev On Sun, Jun 29, 2003 at 02:15:28PM -0700, David S. Miller wrote: > From: Alan Cox > Date: 28 Jun 2003 00:04:30 +0100 > > You are assuming there is a relationship in bug severity/commonness > and number of *developers* who hit it. > > Not true, the assumption I make is that a bug report that > a bug reporter cares about, and a patch that a patch submitter > cares about, will all get resent if they get dropped. > > If the reporter/submitter doesn't care, neither do I. You are right, but only in the part where you say that this is the best, indeed the only, way you can work. Alan is right, information is important, and a lot of it is submitted only once. (And a lot of it is submitted three times and ignored three times.) Suppose you find a gcc bug, construct a small example that is mistranslated and send it off to the gcc list. Maybe even include a fix. Will you babysit them, check whether later snapshots correct this flaw, resubmit your report every month if not? Maybe you will. I certainly don't - send the report and that's it. Andries From davem@redhat.com Sun Jun 29 14:57:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 14:58:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TLvt2x013932 for ; Sun, 29 Jun 2003 14:57:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA19043; Sun, 29 Jun 2003 14:51:15 -0700 Date: Sun, 29 Jun 2003 14:51:14 -0700 (PDT) Message-Id: <20030629.145114.115923819.davem@redhat.com> To: aebr@win.tue.nl Cc: alan@lxorguk.ukuu.org.uk, greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <20030629214558.GA15089@win.tue.nl> References: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> <20030629214558.GA15089@win.tue.nl> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3656 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andries Brouwer Date: Sun, 29 Jun 2003 23:45:58 +0200 Will you babysit them, check whether later snapshots correct this flaw, resubmit your report every month if not? Maybe you will. I certainly don't - send the report and that's it. And like evolution and many other processes in life, the things that put forth the effort will tend to get further and accomplish more. As a result, your bug reports will tend to not be tended to, whilst those of the persistent people will. People who care about their bug reports learn the become persistent or accept that their bugs tend to not get looked at. People who expect that, for free, one bug submission will get their bug fixed and this is somehow ensured are truly living in a dream world. Let's see, what makes more sense from my perspective. Should I reward and put forth effort for the people who fart a bug report onto the lists and expect everyone to stop what they're doing and fix the bug, or should I reward and put forth effort for the guy who spent the time to put together a stellar bug report and also doesn't mind retransmitting it from time to time whilst everyone is busy? From alan@lxorguk.ukuu.org.uk Sun Jun 29 15:10:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 15:10:17 -0700 (PDT) Received: from lxorguk.ukuu.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TMAA2x014474 for ; Sun, 29 Jun 2003 15:10:11 -0700 Received: from dhcp22.swansea.linux.org.uk (dhcp22.swansea.linux.org.uk [127.0.0.1]) by lxorguk.ukuu.org.uk (8.12.8/8.12.5) with ESMTP id h5TM79Kd016890; Sun, 29 Jun 2003 23:07:09 +0100 Received: (from alan@localhost) by dhcp22.swansea.linux.org.uk (8.12.8/8.12.8/Submit) id h5TM779d016888; Sun, 29 Jun 2003 23:07:07 +0100 X-Authentication-Warning: dhcp22.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: networking bugs and bugme.osdl.org From: Alan Cox To: "David S. Miller" Cc: greearb@candelatech.com, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030629.141528.74734144.davem@redhat.com> References: <3EFC9203.3090508@candelatech.com> <20030627.144426.71096593.davem@redhat.com> <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1056924426.16255.24.camel@dhcp22.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 29 Jun 2003 23:07:07 +0100 X-archive-position: 3657 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Sul, 2003-06-29 at 22:15, David S. Miller wrote: > You keep saying that lost information is bad and serves no > positive purpose, and I totally disagree. Drops are litmus tests > for the patch/report, they also serve to educate the submitters. What you don't get is that like you I'm distributing work. I'm getting end users to spot bug correlations - and thats why I want better tools Report bug should get Your bug looks like #1131 #4151 or #11719 (Resolved), please check From davem@redhat.com Sun Jun 29 15:19:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 15:19:47 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TMJg2x014800 for ; Sun, 29 Jun 2003 15:19:43 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA19159; Sun, 29 Jun 2003 15:13:02 -0700 Date: Sun, 29 Jun 2003 15:13:02 -0700 (PDT) Message-Id: <20030629.151302.28804993.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org From: "David S. Miller" In-Reply-To: <1056924426.16255.24.camel@dhcp22.swansea.linux.org.uk> References: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> <1056924426.16255.24.camel@dhcp22.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3658 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 29 Jun 2003 23:07:07 +0100 What you don't get is that like you I'm distributing work. I'm getting end users to spot bug correlations - and thats why I want better tools I understand this part, it's great sounding in theory. But all the examples I've seen are you sifting through bugzilla making these correlations. I've seen no evidence of community participation in this activity. The greatest tools in the world aren't useful if people don't want to use them. Nobody wants to use tools unless it melds easily into their existing daily routine. This means it must be email based and it must somehow work via the existing mailing lists. It sounds a lot like what I'm advocating except that there's some robot monitoring the list postings. But then who monitors and maintains the entries? That's the big problem and I haven't heard a good solution yet. Going to a web site and clicking buttons is not a solution. That's a waste of time. From john@grabjohn.com Sun Jun 29 15:19:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 15:19:59 -0700 (PDT) Received: from 81-2-122-30.bradfords.org.uk (81-2-122-30.bradfords.org.uk [81.2.122.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TMJs2x014807 for ; Sun, 29 Jun 2003 15:19:56 -0700 Received: from 81-2-122-30.bradfords.org.uk (localhost [127.0.0.1]) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9) with ESMTP id h5TMSDFm000510; Sun, 29 Jun 2003 23:28:13 +0100 Received: (from john@localhost) by 81-2-122-30.bradfords.org.uk (8.12.9/8.12.9/Submit) id h5TMSDAk000509; Sun, 29 Jun 2003 23:28:13 +0100 Date: Sun, 29 Jun 2003 23:28:13 +0100 From: John Bradford Message-Id: <200306292228.h5TMSDAk000509@81-2-122-30.bradfords.org.uk> To: alan@lxorguk.ukuu.org.uk, davem@redhat.com Subject: Re: networking bugs and bugme.osdl.org Cc: greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, mbligh@aracnet.com, netdev@oss.sgi.com X-archive-position: 3659 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@grabjohn.com Precedence: bulk X-list: netdev > Report bug should get > Your bug looks like #1131 #4151 or #11719 (Resolved), please check Note that my bug database actually does that, and has done for months, when somebody uploads their .config. John. From jmorris@intercode.com.au Sun Jun 29 16:01:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 16:01:05 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:G/lfa0xSmgGyQiMumDzngQR/rVk/fsKv@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TN0w2x016460 for ; Sun, 29 Jun 2003 16:01:00 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5TN0or29748; Mon, 30 Jun 2003 09:00:51 +1000 Date: Mon, 30 Jun 2003 09:00:49 +1000 (EST) From: James Morris To: "David S. Miller" cc: netdev@oss.sgi.com Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3660 Subject: (no subject) X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev Hi Dave, Please pull from bk://kernel.bkbits.net/jmorris/net-2.5 for the followng changesets: ChangeSet@1.1525.1.1, 2003-06-29 20:37:48+10:00, yoshfuji@linux-ipv6.org [IPV6] Don't set M flag in last fragment. ChangeSet@1.1525.1.2, 2003-06-29 20:38:55+10:00, yoshfuji@linux-ipv6.org [IPV6] Use macro for M-Flag and clean-up. ChangeSet@1.1525.1.3, 2003-06-29 20:40:01+10:00, yoshfuji@linux-ipv6.org [IPV6] Convert /proc/net/ip6_flowlabel to seq_file. ChangeSet@1.1528, 2003-06-30 01:53:54+10:00, yoshfuji@linux-ipv6.org [XFRM] Fix typo. ChangeSet@1.1529, 2003-06-30 02:17:41+10:00, herbert@gondor.apana.org.au [XFRM] Set SA saddr correctly ChangeSet@1.1525.2.1, 2003-06-30 02:28:59+10:00, herbert@gondor.apana.org.au [IPSEC] split xfrm_state_replace + fixes - James -- James Morris From davidel@xmailserver.org Sun Jun 29 16:27:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 16:27:07 -0700 (PDT) Received: from x35.xmailserver.org (x35.xmailserver.org [208.129.208.51]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TNR12x016911 for ; Sun, 29 Jun 2003 16:27:02 -0700 X-AuthUser: davidel@xmailserver.org Received: from bigblue.dev.mcafeelabs.com by xmailserver.org with [XMail 1.16 (Linux/Ix86) ESMTP Server] id for from ; Sun, 29 Jun 2003 16:32:31 -0700 Date: Sun, 29 Jun 2003 16:21:05 -0700 (PDT) From: Davide Libenzi X-X-Sender: davide@bigblue.dev.mcafeelabs.com To: Andries Brouwer cc: "David S. Miller" , Alan Cox , greearb@candelatech.com, mbligh@aracnet.com, Linux Kernel Mailing List , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org In-Reply-To: <20030629224934.GA15108@win.tue.nl> Message-ID: References: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> <20030629214558.GA15089@win.tue.nl> <20030629.145114.115923819.davem@redhat.com> <20030629224934.GA15108@win.tue.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3661 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidel@xmailserver.org Precedence: bulk X-list: netdev On Mon, 30 Jun 2003, Andries Brouwer wrote: > See, you think you are doing the submitter a favour. > I prefer the point of view that the submitter does us a favour. You the winner ! You answered correctly to my previous question ;) - Davide From aebr@win.tue.nl Sun Jun 29 16:30:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 16:30:32 -0700 (PDT) Received: from mailhost.tue.nl (mailhost.tue.nl [131.155.2.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5TNUB2x017220 for ; Sun, 29 Jun 2003 16:30:12 -0700 Received: from mailhost.tue.nl ([127.0.0.1]) by localhost (mailhost.tue.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 32723-07; Mon, 30 Jun 2003 00:49:35 +0200 (CEST) Received: from wsdw15.win.tue.nl (wsdw15.win.tue.nl [131.155.69.229]) by mailhost.tue.nl (Postfix) with ESMTP id A207314C380; Mon, 30 Jun 2003 00:49:35 +0200 (CEST) Received: (from aebr@localhost) by wsdw15.win.tue.nl (8.12.6/8.12.3) id h5TMnYti015278; Mon, 30 Jun 2003 00:49:34 +0200 (MET DST) Date: Mon, 30 Jun 2003 00:49:34 +0200 From: Andries Brouwer To: "David S. Miller" Cc: alan@lxorguk.ukuu.org.uk, greearb@candelatech.com, mbligh@aracnet.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <20030629224934.GA15108@win.tue.nl> References: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk> <20030629.141528.74734144.davem@redhat.com> <20030629214558.GA15089@win.tue.nl> <20030629.145114.115923819.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030629.145114.115923819.davem@redhat.com> User-Agent: Mutt/1.3.25i X-archive-position: 3662 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aebr@win.tue.nl Precedence: bulk X-list: netdev On Sun, Jun 29, 2003 at 02:51:14PM -0700, David S. Miller wrote: > From: Andries Brouwer > Date: Sun, 29 Jun 2003 23:45:58 +0200 > > Will you babysit them, check whether later snapshots correct this > flaw, resubmit your report every month if not? Maybe you will. I > certainly don't - send the report and that's it. > > As a result, your bug reports will tend to not be tended to, Maybe that is the wrong answer. It seems they have a bug tracking system. > People who expect that, for free, one bug submission will get > their bug fixed See, you think you are doing the submitter a favour. I prefer the point of view that the submitter does us a favour. Something is wrong, and people work around it. But they tell us about it. If we want we can try to fix. Or we can store the information for later examination. Very often it is needed later - fortunately Google helps. A more focused data base might help even better. From davem@redhat.com Sun Jun 29 19:27:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 19:27:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5U2RN2x018907 for ; Sun, 29 Jun 2003 19:27:24 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA19615; Sun, 29 Jun 2003 19:20:43 -0700 Date: Sun, 29 Jun 2003 19:20:42 -0700 (PDT) Message-Id: <20030629.192042.23028020.davem@redhat.com> To: jmorris@intercode.com.au Cc: netdev@oss.sgi.com From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3663 Subject: (no subject) X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Mon, 30 Jun 2003 09:00:49 +1000 (EST) Please pull from bk://kernel.bkbits.net/jmorris/net-2.5 for the followng changesets: Pulled, thanks James. From mbligh@aracnet.com Sun Jun 29 19:36:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 29 Jun 2003 19:36:16 -0700 (PDT) Received: from franka.aracnet.com (franka.aracnet.com [216.99.193.44]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5U2aA2x019513 for ; Sun, 29 Jun 2003 19:36:11 -0700 Received: from groan (216-99-192-216.dial.spiritone.com [216.99.192.216]) by franka.aracnet.com (8.12.9/8.12.9) with ESMTP id h5U2ZWga006636; Sun, 29 Jun 2003 19:35:33 -0700 Received: from [10.10.2.4] (fletch@titus.gormenghast [10.10.2.4]) by groan (8.12.3/8.12.3/Debian -4) with ESMTP id h5U2Zhfg032203; Sun, 29 Jun 2003 19:35:47 -0700 Date: Sun, 29 Jun 2003 19:35:44 -0700 From: "Martin J. Bligh" To: "David S. Miller" , alan@lxorguk.ukuu.org.uk cc: greearb@candelatech.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: networking bugs and bugme.osdl.org Message-ID: <17280000.1056940541@[10.10.2.4]> In-Reply-To: <20030629.151302.28804993.davem@redhat.com> References: <1056755070.5463.12.camel@dhcp22.swansea.linux.org.uk><20030629.141528.74734144.davem@redhat.com><1056924426.16255.24.camel@dhcp22.swansea.linux.org.uk> <20030629.151302.28804993.davem@redhat.com> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 3664 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbligh@aracnet.com Precedence: bulk X-list: netdev --"David S. Miller" wrote (on Sunday, June 29, 2003 15:13:02 -0700): > From: Alan Cox > Date: 29 Jun 2003 23:07:07 +0100 > > What you don't get is that like you I'm distributing work. I'm > getting end users to spot bug correlations - and thats why I want > better tools > > DaveM: > I understand this part, it's great sounding in theory. > > But all the examples I've seen are you sifting through bugzilla making > these correlations. I've seen no evidence of community participation > in this activity. People have been. Maybe not with the networking bugs, but I've seen people sift through other stuff, and mark off duplicates, and point out similarities. People go chase attatched patches back into the tree, and people test fixes they didn't submit the bug for, and report status. I'd like more of that, but it happens already. > The greatest tools in the world aren't useful if people don't want > to use them. > > Nobody wants to use tools unless it melds easily into their existing > daily routine. This means it must be email based and it must somehow > work via the existing mailing lists. It sounds a lot like what I'm > advocating except that there's some robot monitoring the list > postings. Agreed, the interface could be better - we're working on it. It won't be totally change free, but it could be better integrated. Feedback is very useful, though it helps a lot of you can pinpoint what's the underlying issue rather than "this is crap". Better email integration is top of the list, starting with sending stuff out to multiple people when filed, not a single bottleneck point. > But then who monitors and maintains the entries? That's the big > problem and I haven't heard a good solution yet. Going to a web site > and clicking buttons is not a solution. That's a waste of time. There is an army of elves out there, quite capable and willing. Like most change, it takes a little time, but it's started already. > AEB: > > See, you think you are doing the submitter a favour. > I prefer the point of view that the submitter does us a favour. Absolutely. Personally, I think testing & communication with users is more what we're lacking as a community than coding power. In Dave's case, it sounds like he's so swamped, it's not an issue for him. However, finding and fixing stuff earlier on will actually reduce the workload, IMHO. It's a damned sight easier to find a bug you wrote yesterday than one you wrote last year. I *love* things like nightly regression testing that reaches out and larts me with a bug report in < 24 hrs of me screwing things up. Lastly, I'd rather ditch bug reports based on crap content, or overall impact, than whether I happened to be busy at the moment they came in. M. From yoshfuji@linux-ipv6.org Mon Jun 30 00:34:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 00:34:15 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5U7Xx2x001710 for ; Mon, 30 Jun 2003 00:34:01 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5U7ZKBo010088; Mon, 30 Jun 2003 16:35:20 +0900 Date: Mon, 30 Jun 2003 16:35:17 +0900 (JST) Message-Id: <20030630.163517.41915691.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com Subject: [PATCH] IPV6: put ipv6_rcv_saddr_equal() common place From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3665 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. Put ipv6_rcv_saddr_equal() common place as comment says. Thanks. Index: linux-2.5/include/net/addrconf.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/addrconf.h,v retrieving revision 1.10 diff -u -r1.10 addrconf.h --- linux-2.5/include/net/addrconf.h 17 Apr 2003 00:35:02 -0000 1.10 +++ linux-2.5/include/net/addrconf.h 30 Jun 2003 06:12:24 -0000 @@ -68,6 +68,8 @@ struct in6_addr *saddr, int onlink); extern int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *); +extern int ipv6_rcv_saddr_equal(const struct sock *sk, + const struct sock *sk2); extern void addrconf_join_solict(struct net_device *dev, struct in6_addr *addr); extern void addrconf_leave_solict(struct net_device *dev, Index: linux-2.5/net/ipv6/addrconf.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/addrconf.c,v retrieving revision 1.43 diff -u -r1.43 addrconf.c --- linux-2.5/net/ipv6/addrconf.c 21 Jun 2003 16:16:59 -0000 1.43 +++ linux-2.5/net/ipv6/addrconf.c 30 Jun 2003 06:12:25 -0000 @@ -66,6 +66,7 @@ #include #include #include +#include #include #include #include @@ -967,6 +968,43 @@ read_unlock_bh(&addrconf_hash_lock); return ifp; +} + +int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2) +{ + struct ipv6_pinfo *np = inet6_sk(sk); + int addr_type = ipv6_addr_type(&np->rcv_saddr); + + if (!inet_sk(sk2)->rcv_saddr && !ipv6_only_sock(sk)) + return 1; + + if (sk2->sk_family == AF_INET6 && + ipv6_addr_any(&inet6_sk(sk2)->rcv_saddr) && + !(ipv6_only_sock(sk2) && addr_type == IPV6_ADDR_MAPPED)) + return 1; + + if (addr_type == IPV6_ADDR_ANY && + (!ipv6_only_sock(sk) || + !(sk2->sk_family == AF_INET6 ? + (ipv6_addr_type(&inet6_sk(sk2)->rcv_saddr) == IPV6_ADDR_MAPPED) : + 1))) + return 1; + + if (sk2->sk_family == AF_INET6 && + !ipv6_addr_cmp(&np->rcv_saddr, + (sk2->sk_state != TCP_TIME_WAIT ? + &inet6_sk(sk2)->rcv_saddr : + &tcptw_sk(sk)->tw_v6_rcv_saddr))) + return 1; + + if (addr_type == IPV6_ADDR_MAPPED && + !ipv6_only_sock(sk2) && + (!inet_sk(sk2)->rcv_saddr || + !inet_sk(sk)->rcv_saddr || + inet_sk(sk)->rcv_saddr == inet_sk(sk2)->rcv_saddr)) + return 1; + + return 0; } /* Gets referenced address, destroys ifaddr */ Index: linux-2.5/net/ipv6/tcp_ipv6.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/tcp_ipv6.c,v retrieving revision 1.51 diff -u -r1.51 tcp_ipv6.c --- linux-2.5/net/ipv6/tcp_ipv6.c 21 Jun 2003 16:18:47 -0000 1.51 +++ linux-2.5/net/ipv6/tcp_ipv6.c 30 Jun 2003 06:12:25 -0000 @@ -93,43 +93,6 @@ return tcp_v6_hashfn(laddr, lport, faddr, fport); } -static inline int ipv6_rcv_saddr_equal(struct sock *sk, struct sock *sk2) -{ - struct ipv6_pinfo *np = inet6_sk(sk); - int addr_type = ipv6_addr_type(&np->rcv_saddr); - - if (!inet_sk(sk2)->rcv_saddr && !ipv6_only_sock(sk)) - return 1; - - if (sk2->sk_family == AF_INET6 && - ipv6_addr_any(&inet6_sk(sk2)->rcv_saddr) && - !(ipv6_only_sock(sk2) && addr_type == IPV6_ADDR_MAPPED)) - return 1; - - if (addr_type == IPV6_ADDR_ANY && - (!ipv6_only_sock(sk) || - !(sk2->sk_family == AF_INET6 ? - (ipv6_addr_type(&inet6_sk(sk2)->rcv_saddr) == IPV6_ADDR_MAPPED) : - 1))) - return 1; - - if (sk2->sk_family == AF_INET6 && - !ipv6_addr_cmp(&np->rcv_saddr, - (sk2->sk_state != TCP_TIME_WAIT ? - &inet6_sk(sk2)->rcv_saddr : - &((struct tcp_tw_bucket *)sk)->tw_v6_rcv_saddr))) - return 1; - - if (addr_type == IPV6_ADDR_MAPPED && - !ipv6_only_sock(sk2) && - (!inet_sk(sk2)->rcv_saddr || - !inet_sk(sk)->rcv_saddr || - inet_sk(sk)->rcv_saddr == inet_sk(sk2)->rcv_saddr)) - return 1; - - return 0; -} - static inline int tcp_v6_bind_conflict(struct sock *sk, struct tcp_bind_bucket *tb) { Index: linux-2.5/net/ipv6/udp.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/udp.c,v retrieving revision 1.38 diff -u -r1.38 udp.c --- linux-2.5/net/ipv6/udp.c 21 Jun 2003 16:20:28 -0000 1.38 +++ linux-2.5/net/ipv6/udp.c 30 Jun 2003 06:12:25 -0000 @@ -59,43 +59,6 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_stats_in6); -/* XXX This is identical to tcp_ipv6.c:ipv6_rcv_saddr_equal, put - * XXX it somewhere common. -DaveM - */ -static __inline__ int udv6_rcv_saddr_equal(struct sock *sk, struct sock *sk2) -{ - struct ipv6_pinfo *np = inet6_sk(sk); - int addr_type = ipv6_addr_type(&np->rcv_saddr); - - if (!inet_sk(sk2)->rcv_saddr && !ipv6_only_sock(sk)) - return 1; - - if (sk2->sk_family == AF_INET6 && - ipv6_addr_any(&inet6_sk(sk2)->rcv_saddr) && - !(ipv6_only_sock(sk2) && addr_type == IPV6_ADDR_MAPPED)) - return 1; - - if (addr_type == IPV6_ADDR_ANY && - (!ipv6_only_sock(sk) || - !(sk2->sk_family == AF_INET6 ? - (ipv6_addr_type(&inet6_sk(sk2)->rcv_saddr) == IPV6_ADDR_MAPPED) : 1))) - return 1; - - if (sk2->sk_family == AF_INET6 && - !ipv6_addr_cmp(&inet6_sk(sk)->rcv_saddr, - &inet6_sk(sk2)->rcv_saddr)) - return 1; - - if (addr_type == IPV6_ADDR_MAPPED && - !ipv6_only_sock(sk2) && - (!inet_sk(sk2)->rcv_saddr || - !inet_sk(sk)->rcv_saddr || - inet_sk(sk)->rcv_saddr == inet_sk(sk2)->rcv_saddr)) - return 1; - - return 0; -} - /* Grrr, addr_type already calculated by caller, but I don't want * to add some silly "cookie" argument to this method just for that. */ @@ -151,7 +114,7 @@ sk2 != sk && sk2->sk_bound_dev_if == sk->sk_bound_dev_if && (!sk2->sk_reuse || !sk->sk_reuse) && - udv6_rcv_saddr_equal(sk, sk2)) + ipv6_rcv_saddr_equal(sk, sk2)) goto fail; } } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From greearb@candelatech.com Mon Jun 30 00:59:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 00:59:43 -0700 (PDT) Received: from grok.yi.org (evrtwa1-ar2-4-33-045-074.evrtwa1.dsl-verizon.net [4.33.45.74]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5U7xa2x002201 for ; Mon, 30 Jun 2003 00:59:37 -0700 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id h5U7xJKk002358; Mon, 30 Jun 2003 00:59:30 -0700 Message-ID: <3EFFEDD7.5020205@candelatech.com> Date: Mon, 30 Jun 2003 00:59:19 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Anastasov CC: netdev@oss.sgi.com Subject: Re: send-to-self (was Re: routing bug report for 2.4) References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3666 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Julian Anastasov wrote: > Hello, > > Ben, I have something for comments and testing (compiled > only): > > http://www.ssi.bg/~ja/send-to-self-2.4.21-1.diff Just moved to my new home..will be a few days before I can take a detailed look at this..and your long description confused my tired mind for tonight... I'll look in detail soon. > > The usage should be: > eth0/loop=1 > eth1/loop=1 > bind to src IP from eth0 and connect to local IP on eth1 > > Be ready, there can be something totally wrong. > > I'm avoiding the arp_filter changes. The setup uses > asymmetric routing so better use arp_filter=0 or other arp_filter=1, right? > ARP filtering tools that can restrict our ARP replies > only via the desired device. I want to avoid strange(r) routing configurations, as I'm already using lots of routing tricks, and don't want to confuse matters more. I also turn on arp filtering to ensure the arps go out the right interface currently. You should be able to easily test most of the changes your code if you have a machine with two ethernet interfaces and a loopback cable... My requirements are: 1) Both ethernet ports communicate over the exernal link, UDP & IP traffic. Third-party programs if possible, thus I set the flag on the interface in my patch, not on an individual socket, though I do have to BINDTODEVICE and policy-base base route to get things working right... 1b) Allow both same-subnet comm (eth1 & eth2 are on same subnet), and also routed traffic (eth1 & eth2 have their own default router, similar to the previously discussed routing setup) 2) Allow normal non-looped communication on the ports, including policy-based routing based on source addr. Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From ja@ssi.bg Mon Jun 30 03:43:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 03:44:03 -0700 (PDT) Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UAhs2x008972 for ; Mon, 30 Jun 2003 03:43:56 -0700 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.11.6/8.9.3) with ESMTP id h5UAhX407670; Mon, 30 Jun 2003 13:43:33 +0300 Date: Mon, 30 Jun 2003 13:43:33 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@l To: Ben Greear cc: netdev@oss.sgi.com Subject: Re: send-to-self (was Re: routing bug report for 2.4) In-Reply-To: <3EFFEDD7.5020205@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3667 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Mon, 30 Jun 2003, Ben Greear wrote: > > I'm avoiding the arp_filter changes. The setup uses > > asymmetric routing so better use arp_filter=0 or other > > arp_filter=1, right? Right, my mistake, the routing is symmetric, arp_filter=1 is even recommended. Only rp_filter=1 can not be used as ARP filter but you can use rp_filter=1 for IP filtering. > I want to avoid strange(r) routing configurations, as I'm already > using lots of routing tricks, and don't want to confuse matters > more. I also turn on arp filtering to ensure the arps go out the > right interface currently. Right, you need just to bind to src IP (may be you can avoid even that if you replace the prefsrc in your local routes). > You should be able to easily test most of the changes your code > if you have a machine with two ethernet interfaces and a loopback > cable... That is the problem, no 2.4 host with 2 NICs. Not soon. Regards -- Julian Anastasov From g0202512@nus.edu.sg Mon Jun 30 06:21:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 06:21:14 -0700 (PDT) Received: from leonis.nus.edu.sg (leonis.nus.edu.sg [137.132.1.18]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UDL22x015203 for ; Mon, 30 Jun 2003 06:21:04 -0700 Received: from nusnet-165-146.dynip.nus.edu.sg (nusnet-165-146.dynip.nus.edu.sg [137.132.165.146]) by leonis.nus.edu.sg (8.12.9/8.12.9) with ESMTP id h5UDMKJY029685 for ; Mon, 30 Jun 2003 21:22:20 +0800 (SGT) Subject: Re: SIOCGIFCONF and ifr_bandwidth From: Eng Se-Hsieng To: netdev@oss.sgi.com In-Reply-To: <1057037598.1597.16.camel@nusnet-165-146.dynip.nus.edu.sg> References: <1057037447.1564.14.camel@nusnet-165-146.dynip.nus.edu.sg> <1057037598.1597.16.camel@nusnet-165-146.dynip.nus.edu.sg> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 30 Jun 2003 21:34:45 -0800 Message-Id: <1057037686.1564.19.camel@nusnet-165-146.dynip.nus.edu.sg> Mime-Version: 1.0 X-archive-position: 3668 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: g0202512@nus.edu.sg Precedence: bulk X-list: netdev > Dear all, > > Could someone please tell me why when I use > > ioctl(fd, SIOCGIFCONF, (char *)cf); > > All the addresses returned have a link bandwidth > (ifrequest->ifr_bandwidth) of 2? > > What does this 2 mean and why is the same for all the interfaces (eth, > ppp, lo)? > > Thank you. > > Regards, > > Se-Hsieng > > > > From yoshfuji@linux-ipv6.org Mon Jun 30 06:59:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 07:00:06 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UDxt2x024745 for ; Mon, 30 Jun 2003 06:59:56 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UE1FBo014094; Mon, 30 Jun 2003 23:01:15 +0900 Date: Mon, 30 Jun 2003 23:01:15 +0900 (JST) Message-Id: <20030630.230115.84925092.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: [PATCH] IPV6: convert /proc/net/ip6_flowlabel to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030629.015723.75141662.yoshfuji@linux-ipv6.org> References: <20030629.015723.75141662.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3669 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030629.015723.75141662.yoshfuji@linux-ipv6.org> (at Sun, 29 Jun 2003 01:57:23 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > This converts /proc/net/ip6_flowlabel to seq_file{}. > Thanks. Oops, this was buggy. Here's fixed one... Index: linux25/net/ipv6/ip6_flowlabel.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux25/net/ipv6/ip6_flowlabel.c,v retrieving revision 1.1.1.2 retrieving revision 1.4 diff -u -r1.1.1.2 -r1.4 --- linux25/net/ipv6/ip6_flowlabel.c 24 Feb 2002 03:45:41 -0000 1.1.1.2 +++ linux25/net/ipv6/ip6_flowlabel.c 30 Jun 2003 13:48:40 -0000 1.4 @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -554,66 +555,150 @@ #ifdef CONFIG_PROC_FS +struct ip6fl_iter_state { + int bucket; +}; -static int ip6_fl_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0; - off_t begin=0; - int len=0; - int i, k; - struct ip6_flowlabel *fl; +#define ip6fl_seq_private(seq) ((struct ip6fl_iter_state *)&(seq)->private) - len+= sprintf(buffer,"Label S Owner Users Linger Expires " - "Dst Opt\n"); +static struct ip6_flowlabel *ip6fl_get_first(struct seq_file *seq) +{ + struct ip6_flowlabel *fl = NULL; + struct ip6fl_iter_state *state = ip6fl_seq_private(seq); - read_lock_bh(&ip6_fl_lock); - for (i=0; i<=FL_HASH_MASK; i++) { - for (fl = fl_ht[i]; fl; fl = fl->next) { - len+=sprintf(buffer+len,"%05X %-1d %-6d %-6d %-6d %-8ld ", - (unsigned)ntohl(fl->label), - fl->share, - (unsigned)fl->owner, - atomic_read(&fl->users), - fl->linger/HZ, - (long)(fl->expires - jiffies)/HZ); - - for (k=0; k<16; k++) - len+=sprintf(buffer+len, "%02x", fl->dst.s6_addr[k]); - buffer[len++]=' '; - len+=sprintf(buffer+len, "%-4d", fl->opt ? fl->opt->opt_nflen : 0); - buffer[len++]='\n'; - - pos=begin+len; - if(posoffset+length) - goto done; + for (state->bucket = 0; state->bucket <= FL_HASH_MASK; ++state->bucket) { + if (fl_ht[state->bucket]) { + fl = fl_ht[state->bucket]; + break; } } - *eof = 1; + return fl; +} -done: +static struct ip6_flowlabel *ip6fl_get_next(struct seq_file *seq, struct ip6_flowlabel *fl) +{ + struct ip6fl_iter_state *state = ip6fl_seq_private(seq); + + fl = fl->next; + while (!fl) { + if (++state->bucket <= FL_HASH_MASK) + fl = fl_ht[state->bucket]; + } + return fl; +} + +static struct ip6_flowlabel *ip6fl_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip6_flowlabel *fl = ip6fl_get_first(seq); + if (fl) + while (pos && (fl = ip6fl_get_next(seq, fl)) != NULL) + --pos; + return pos ? NULL : fl; +} + +static void *ip6fl_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock_bh(&ip6_fl_lock); + return *pos ? ip6fl_get_idx(seq, *pos) : (void *)1; +} + +static void *ip6fl_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip6_flowlabel *fl; + + if (v == (void *)1) + fl = ip6fl_get_first(seq); + else + fl = ip6fl_get_next(seq, v); + ++*pos; + return fl; +} + +static void ip6fl_seq_stop(struct seq_file *seq, void *v) +{ read_unlock_bh(&ip6_fl_lock); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; } + +static void ip6fl_fl_seq_show(struct seq_file *seq, struct ip6_flowlabel *fl) +{ + while(fl) { + seq_printf(seq, + "%05X %-1d %-6d %-6d %-6d %-8ld " + "%02x%02x%02x%02x%02x%02x%02x%02x " + "%-4d\n", + (unsigned)ntohl(fl->label), + fl->share, + (unsigned)fl->owner, + atomic_read(&fl->users), + fl->linger/HZ, + (long)(fl->expires - jiffies)/HZ, + NIP6(fl->dst), + fl->opt ? fl->opt->opt_nflen : 0); + fl = fl->next; + } +} + +static int ip6fl_seq_show(struct seq_file *seq, void *v) +{ + if (v == (void *)1) + seq_printf(seq, "Label S Owner Users Linger Expires " + "Dst Opt\n"); + else + ip6fl_fl_seq_show(seq, v); + return 0; +} + +static struct seq_operations ip6fl_seq_ops = { + .start = ip6fl_seq_start, + .next = ip6fl_seq_next, + .stop = ip6fl_seq_stop, + .show = ip6fl_seq_show, +}; + +static int ip6fl_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct ip6fl_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &ip6fl_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; +} + +static struct file_operations ip6fl_seq_fops = { + .owner = THIS_MODULE, + .open = ip6fl_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; #endif void ip6_flowlabel_init() { +#ifdef CONFIG_PROC_FS + struct proc_dir_entry *p; +#endif init_timer(&ip6_fl_gc_timer); ip6_fl_gc_timer.function = ip6_fl_gc; #ifdef CONFIG_PROC_FS - create_proc_read_entry("net/ip6_flowlabel", 0, 0, ip6_fl_read_proc, NULL); + p = create_proc_entry("ip6_flowlabel", S_IRUGO, proc_net); + if (p) + p->proc_fops = &ip6fl_seq_fops; #endif } @@ -621,6 +706,6 @@ { del_timer(&ip6_fl_gc_timer); #ifdef CONFIG_PROC_FS - remove_proc_entry("net/ip6_flowlabel", 0); + proc_net_remove("ip6_flowlabel"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 07:08:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 07:08:43 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UE8Y2x025970 for ; Mon, 30 Jun 2003 07:08:35 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UE9tBo014734; Mon, 30 Jun 2003 23:09:56 +0900 Date: Mon, 30 Jun 2003 23:09:55 +0900 (JST) Message-Id: <20030630.230955.65202856.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH] IPV{4,6}: fixed /proc/net/raw{,6} seq_file support From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3670 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello, There were bugs in /proc/net/raw{,6} seq_file support. Sorry, my fault. This patch fixes the problem. Thanks in advance. Index: linux-2.5/net/ipv4/raw.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/raw.c,v retrieving revision 1.31 diff -u -r1.31 raw.c --- linux-2.5/net/ipv4/raw.c 21 Jun 2003 16:20:28 -0000 1.31 +++ linux-2.5/net/ipv4/raw.c 30 Jun 2003 12:48:00 -0000 @@ -801,7 +801,24 @@ static int raw_seq_open(struct inode *inode, struct file *file) { - return seq_open(file, &raw_seq_ops); + struct seq_file *seq; + int rc = -ENOMEM; + struct raw_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + rc = seq_open(file, &raw_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } static struct file_operations raw_seq_fops = { @@ -809,7 +826,7 @@ .open = raw_seq_open, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = seq_release_private, }; int __init raw_proc_init(void) Index: linux-2.5/net/ipv6/raw.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/raw.c,v retrieving revision 1.30 diff -u -r1.30 raw.c --- linux-2.5/net/ipv6/raw.c 21 Jun 2003 16:20:28 -0000 1.30 +++ linux-2.5/net/ipv6/raw.c 30 Jun 2003 12:48:00 -0000 @@ -1029,7 +1029,22 @@ static int raw6_seq_open(struct inode *inode, struct file *file) { - return seq_open(file, &raw6_seq_ops); + struct seq_file *seq; + int rc = -ENOMEM; + struct raw6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + if (!s) + goto out; + rc = seq_open(file, &raw6_seq_ops); + if (rc) + goto out_kfree; + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } static struct file_operations raw6_seq_fops = { @@ -1037,7 +1052,7 @@ .open = raw6_seq_open, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = seq_release_private, }; int __init raw6_proc_init(void) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Mon Jun 30 07:50:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 07:50:19 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:VVlu4GTyuh2Zj57uMrXEQZDP8ej4zGSB@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UEo82x026530 for ; Mon, 30 Jun 2003 07:50:10 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5UEnrr00830; Tue, 1 Jul 2003 00:49:53 +1000 Date: Tue, 1 Jul 2003 00:49:52 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: convert /proc/net/ip6_flowlabel to seq_file In-Reply-To: <20030630.230115.84925092.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3671 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 30 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > In article <20030629.015723.75141662.yoshfuji@linux-ipv6.org> (at Sun, 29 Jun 2003 01:57:23 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > > This converts /proc/net/ip6_flowlabel to seq_file{}. > > Thanks. > > Oops, this was buggy. Here's fixed one... > This has already been pushed to Dave's tree. Could you please generate a diff against bk://kernel.bkbits.net/jmorris/net-2.5? (Not sure what state Dave's tree is in, but it may work against that). - James -- James Morris From yoshfuji@linux-ipv6.org Mon Jun 30 08:10:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 08:11:08 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UFAv2x029515 for ; Mon, 30 Jun 2003 08:10:58 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UFCCBo015073; Tue, 1 Jul 2003 00:12:12 +0900 Date: Tue, 01 Jul 2003 00:12:12 +0900 (JST) Message-Id: <20030701.001212.27447753.yoshfuji@linux-ipv6.org> To: jmorris@intercode.com.au Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: convert /proc/net/ip6_flowlabel to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030630.230115.84925092.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3672 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Tue, 1 Jul 2003 00:49:52 +1000 (EST)), James Morris says: > On Mon, 30 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > > In article <20030629.015723.75141662.yoshfuji@linux-ipv6.org> (at Sun, 29 Jun 2003 01:57:23 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > > > > This converts /proc/net/ip6_flowlabel to seq_file{}. > > > Thanks. > > > > Oops, this was buggy. Here's fixed one... > > > > This has already been pushed to Dave's tree. Could you please generate a > diff against bk://kernel.bkbits.net/jmorris/net-2.5? Here it is. ===== net/ipv6/ip6_flowlabel.c 1.3 vs edited ===== --- 1.3/net/ipv6/ip6_flowlabel.c Sun Jun 29 19:39:57 2003 +++ edited/net/ipv6/ip6_flowlabel.c Tue Jul 1 00:00:57 2003 @@ -657,7 +657,25 @@ static int ip6fl_seq_open(struct inode *inode, struct file *file) { - return seq_open(file, &ip6fl_seq_ops); + struct seq_file *seq; + int rc = -ENOMEM; + struct ip6fl_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &ip6fl_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } static struct file_operations ip6fl_seq_fops = { @@ -665,7 +683,7 @@ .open = ip6fl_seq_open, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = seq_release_private, }; #endif -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Mon Jun 30 09:08:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:09:00 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:Xogoc2sAE45qv1vCYevjd+4Q1oJsWsvu@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UG8l2x030293 for ; Mon, 30 Jun 2003 09:08:49 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5UG8Wr01220; Tue, 1 Jul 2003 02:08:33 +1000 Date: Tue, 1 Jul 2003 02:08:32 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: put ipv6_rcv_saddr_equal() common place In-Reply-To: <20030630.163517.41915691.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3673 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 30 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Put ipv6_rcv_saddr_equal() common place as comment says. Applied to bk://kernel.bkbits.net/jmorris/net-2.5 -- James Morris From jmorris@intercode.com.au Mon Jun 30 09:09:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:09:05 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:owkNZAx66VEersCR3mBDqVMMASetcIse@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UG8x2x030298 for ; Mon, 30 Jun 2003 09:09:01 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5UG8mr01228; Tue, 1 Jul 2003 02:08:49 +1000 Date: Tue, 1 Jul 2003 02:08:48 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV{4,6}: fixed /proc/net/raw{,6} seq_file support In-Reply-To: <20030630.230955.65202856.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3674 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 30 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Hello, > > There were bugs in /proc/net/raw{,6} seq_file support. > Sorry, my fault. This patch fixes the problem. Also applied! -- James Morris From jmorris@intercode.com.au Mon Jun 30 09:09:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:09:36 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:A5/ATvtIxs/6jRyVRDLnSPAVkPSbcY8R@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UG9V2x030374 for ; Mon, 30 Jun 2003 09:09:32 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h5UG9Gr01239; Tue, 1 Jul 2003 02:09:16 +1000 Date: Tue, 1 Jul 2003 02:09:14 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: convert /proc/net/ip6_flowlabel to seq_file In-Reply-To: <20030701.001212.27447753.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3675 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > > Oops, this was buggy. Here's fixed one... > > > > This has already been pushed to Dave's tree. Could you please generate a > > diff against bk://kernel.bkbits.net/jmorris/net-2.5? > > Here it is. Thanks, this is applied as well. - James -- James Morris From yoshfuji@linux-ipv6.org Mon Jun 30 09:56:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:56:45 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UGuX2x031959 for ; Mon, 30 Jun 2003 09:56:34 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UGvtBo026866; Tue, 1 Jul 2003 01:57:55 +0900 Date: Tue, 01 Jul 2003 01:57:55 +0900 (JST) Message-Id: <20030701.015755.80809794.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3678 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This converts /proc/net/igmp6 to seq_file. Thanks. Index: linux-2.5/net/ipv6/mcast.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/mcast.c,v retrieving revision 1.24 diff -u -r1.24 mcast.c --- linux-2.5/net/ipv6/mcast.c 24 Jun 2003 19:45:24 -0000 1.24 +++ linux-2.5/net/ipv6/mcast.c 30 Jun 2003 15:28:16 -0000 @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -2039,64 +2040,145 @@ } #ifdef CONFIG_PROC_FS -static int igmp6_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0, begin=0; - struct ifmcaddr6 *im; - int len=0; +struct igmp6_mc_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - struct inet6_dev *idev; + struct inet6_dev *idev; +}; - if ((idev = in6_dev_get(dev)) == NULL) - continue; +#define igmp6_mc_seq_private(seq) ((struct igmp6_mc_iter_state *)&seq->private) - read_lock_bh(&idev->lock); - for (im = idev->mc_list; im; im = im->next) { - int i; +static inline struct ifmcaddr6 *igmp6_mc_get_first(struct seq_file *seq) +{ + struct ifmcaddr6 *im = NULL; + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); - len += sprintf(buffer+len,"%-4d %-15s ", dev->ifindex, dev->name); + for (state->dev = dev_base, state->idev = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (!idev) + continue; + read_lock_bh(&idev->lock); + im = idev->mc_list; + if (im) { + state->idev = idev; + break; + } + read_unlock_bh(&idev->lock); + } + return im; +} - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", im->mca_addr.s6_addr[i]); +static struct ifmcaddr6 *igmp6_mc_get_next(struct seq_file *seq, struct ifmcaddr6 *im) +{ + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); - len+=sprintf(buffer+len, - " %5d %08X %ld\n", - im->mca_users, - im->mca_flags, - (im->mca_flags&MAF_TIMER_RUNNING) ? im->mca_timer.expires-jiffies : 0); - - pos=begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = im->next; + while (!im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); } - read_unlock_bh(&idev->lock); - in6_dev_put(idev); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + break; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + im = state->idev->mc_list; } - *eof = 1; + return im; +} -done: +static struct ifmcaddr6 *igmp6_mc_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ifmcaddr6 *im = igmp6_mc_get_first(seq); + if (im) + while (pos && (im = igmp6_mc_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *igmp6_mc_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp6_mc_get_idx(seq, *pos) : igmp6_mc_get_first(seq); +} + +static void *igmp6_mc_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ifmcaddr6 *im; + im = igmp6_mc_get_next(seq, v); + ++*pos; + return im; +} + +static void igmp6_mc_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); + if (likely(state->idev != NULL)) { + read_unlock_bh(state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; +static int igmp6_mc_seq_show(struct seq_file *seq, void *v) +{ + struct ifmcaddr6 *im = (struct ifmcaddr6 *)v; + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); + + seq_printf(seq, + "%-4d %-15s %04x%04x%04x%04x%04x%04x%04x%04x %5d %08X %ld\n", + state->dev->ifindex, state->dev->name, + NIP6(im->mca_addr), + im->mca_users, im->mca_flags, + (im->mca_flags&MAF_TIMER_RUNNING) ? im->mca_timer.expires-jiffies : 0); + return 0; } +static struct seq_operations igmp6_mc_seq_ops = { + .start = igmp6_mc_seq_start, + .next = igmp6_mc_seq_next, + .stop = igmp6_mc_seq_stop, + .show = igmp6_mc_seq_show, +}; + +static int igmp6_mc_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp6_mc_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &igmp6_mc_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; +} + +static struct file_operations igmp6_mc_seq_fops = { + .owner = THIS_MODULE, + .open = igmp6_mc_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + static int ip6_mcf_read_proc(char *buffer, char **start, off_t offset, int length, int *eof, void *data) { @@ -2178,6 +2260,9 @@ struct ipv6_pinfo *np; struct sock *sk; int err; +#ifdef CONFIG_PROC_FS + struct proc_dir_entry *p; +#endif err = sock_create(PF_INET6, SOCK_RAW, IPPROTO_ICMPV6, &igmp6_socket); if (err < 0) { @@ -2194,8 +2279,11 @@ np = inet6_sk(sk); np->hop_limit = 1; + #ifdef CONFIG_PROC_FS - create_proc_read_entry("net/igmp6", 0, 0, igmp6_read_proc, NULL); + p = create_proc_entry("igmp6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp6_mc_seq_fops; create_proc_read_entry("net/mcfilter6", 0, 0, ip6_mcf_read_proc, NULL); #endif @@ -2207,6 +2295,6 @@ sock_release(igmp6_socket); igmp6_socket = NULL; /* for safety */ #ifdef CONFIG_PROC_FS - remove_proc_entry("net/igmp6", 0); + proc_net_remove("igmp6"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 09:56:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:56:47 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UGub2x031964 for ; Mon, 30 Jun 2003 09:56:37 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UGvxBo026872; Tue, 1 Jul 2003 01:57:59 +0900 Date: Tue, 01 Jul 2003 01:57:58 +0900 (JST) Message-Id: <20030701.015758.91005834.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: convert /proc/net/mcfilter6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3679 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This patch, depends on "[PATCH] IPV6: convert /proc/net/igmp6 to seq_file patch," converts /proc/net/mcfilter6 to seq_file. Thanks. --- linux-2.5/net/ipv6/mcast.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv6/mcast.c Tue Jul 1 01:46:12 2003 @@ -2179,80 +2179,178 @@ .release = seq_release_private, }; -static int ip6_mcf_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0, begin=0; - int len=0; - int first=1; +struct igmp6_mcf_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev=dev_base; dev; dev=dev->next) { - struct inet6_dev *idev = in6_dev_get(dev); - struct ifmcaddr6 *imc; + struct inet6_dev *idev; + struct ifmcaddr6 *im; +}; - if (idev == NULL) - continue; +#define igmp6_mcf_seq_private(seq) ((struct igmp6_mcf_iter_state *)&seq->private) +static inline struct ip6_sf_list *igmp6_mcf_get_first(struct seq_file *seq) +{ + struct ip6_sf_list *psf = NULL; + struct ifmcaddr6 *im = NULL; + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + for (state->dev = dev_base, state->idev = NULL, state->im = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (unlikely(idev == NULL)) + continue; read_lock_bh(&idev->lock); - - for (imc=idev->mc_list; imc; imc=imc->next) { - struct ip6_sf_list *psf; - unsigned long i; - - spin_lock_bh(&imc->mca_lock); - for (psf=imc->mca_sources; psf; psf=psf->sf_next) { - if (first) { - len += sprintf(buffer+len, "%3s %6s " - "%32s %32s %6s %6s\n", "Idx", - "Device", "Multicast Address", - "Source Address", "INC", "EXC"); - first = 0; - } - len += sprintf(buffer+len,"%3d %6.6s ", - dev->ifindex, dev->name); - - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", - imc->mca_addr.s6_addr[i]); - buffer[len++] = ' '; - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", - psf->sf_addr.s6_addr[i]); - len += sprintf(buffer+len, " %6lu %6lu\n", - psf->sf_count[MCAST_INCLUDE], - psf->sf_count[MCAST_EXCLUDE]); - pos = begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - spin_unlock_bh(&imc->mca_lock); - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = idev->mc_list; + if (likely(im != NULL)) { + spin_lock_bh(&im->mca_lock); + psf = im->mca_sources; + if (likely(psf != NULL)) { + state->im = im; + state->idev = idev; + break; } - spin_unlock_bh(&imc->mca_lock); + spin_unlock_bh(&im->mca_lock); } read_unlock_bh(&idev->lock); - in6_dev_put(idev); } - *eof = 1; + return psf; +} + +static struct ip6_sf_list *igmp6_mcf_get_next(struct seq_file *seq, struct ip6_sf_list *psf) +{ + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + psf = psf->sf_next; + while (!psf) { + spin_unlock_bh(&state->im->mca_lock); + state->im = state->im->next; + while (!state->im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + goto out; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + state->im = state->idev->mc_list; + } + if (!state->im) + break; + spin_lock_bh(&state->im->mca_lock); + psf = state->im->mca_sources; + } +out: + return psf; +} + +static struct ip6_sf_list *igmp6_mcf_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip6_sf_list *psf = igmp6_mcf_get_first(seq); + if (psf) + while (pos && (psf = igmp6_mcf_get_next(seq, psf)) != NULL) + --pos; + return pos ? NULL : psf; +} + +static void *igmp6_mcf_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp6_mcf_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp6_mcf_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip6_sf_list *psf; + if (v == (void *)1) + psf = igmp6_mcf_get_first(seq); + else + psf = igmp6_mcf_get_next(seq, v); + ++*pos; + return psf; +} -done: +static void igmp6_mcf_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + if (likely(state->im != NULL)) + spin_unlock_bh(&state->im->mca_lock); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} + +static int igmp6_mcf_seq_show(struct seq_file *seq, void *v) +{ + struct ip6_sf_list *psf = (struct ip6_sf_list *)v; + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + if (v == (void *)1) { + seq_printf(seq, + "%3s %6s " + "%32s %32s %6s %6s\n", "Idx", + "Device", "Multicast Address", + "Source Address", "INC", "EXC"); + } else { + seq_printf(seq, + "%3d %6.6s " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%6lu %6lu\n", + state->dev->ifindex, state->dev->name, + NIP6(state->im->mca_addr), + NIP6(psf->sf_addr), + psf->sf_count[MCAST_INCLUDE], + psf->sf_count[MCAST_EXCLUDE]); + } + return 0; +} + +static struct seq_operations igmp6_mcf_seq_ops = { + .start = igmp6_mcf_seq_start, + .next = igmp6_mcf_seq_next, + .stop = igmp6_mcf_seq_stop, + .show = igmp6_mcf_seq_show, +}; + +static int igmp6_mcf_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp6_mcf_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &igmp6_mcf_seq_ops); + if (rc) + goto out_kfree; - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } + +static struct file_operations igmp6_mcf_seq_fops = { + .owner = THIS_MODULE, + .open = igmp6_mcf_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; #endif int __init igmp6_init(struct net_proto_family *ops) @@ -2284,7 +2382,9 @@ p = create_proc_entry("igmp6", S_IRUGO, proc_net); if (p) p->proc_fops = &igmp6_mc_seq_fops; - create_proc_read_entry("net/mcfilter6", 0, 0, ip6_mcf_read_proc, NULL); + p = create_proc_entry("mcfilter6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp6_mcf_seq_fops; #endif return 0; @@ -2295,6 +2395,7 @@ sock_release(igmp6_socket); igmp6_socket = NULL; /* for safety */ #ifdef CONFIG_PROC_FS + proc_net_remove("mcfilter6"); proc_net_remove("igmp6"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 09:56:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:56:39 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UGuP2x031947 for ; Mon, 30 Jun 2003 09:56:26 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UGvkBo026854; Tue, 1 Jul 2003 01:57:47 +0900 Date: Tue, 01 Jul 2003 01:57:46 +0900 (JST) Message-Id: <20030701.015746.56567539.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV4: convert /proc/net/igmp to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3676 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This converts /proc/net/igmp to seq_file. Thanks. Index: linux-2.5/include/net/ip.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/ip.h,v retrieving revision 1.20 diff -u -r1.20 ip.h --- linux-2.5/include/net/ip.h 7 Jun 2003 00:22:34 -0000 1.20 +++ linux-2.5/include/net/ip.h 30 Jun 2003 15:28:13 -0000 @@ -79,7 +79,7 @@ extern void ip_mc_dropsocket(struct sock *); extern void ip_mc_dropdevice(struct net_device *dev); -extern int ip_mc_procinfo(char *, char **, off_t, int); +extern int igmp_mc_proc_init(void); extern int ip_mcf_procinfo(char *, char **, off_t, int); /* Index: linux-2.5/net/ipv4/igmp.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/igmp.c,v retrieving revision 1.28 diff -u -r1.28 igmp.c --- linux-2.5/net/ipv4/igmp.c 26 Jun 2003 03:45:59 -0000 1.28 +++ linux-2.5/net/ipv4/igmp.c 30 Jun 2003 15:28:13 -0000 @@ -99,7 +99,10 @@ #ifdef CONFIG_IP_MROUTE #include #endif - +#ifdef CONFIG_PROC_FS +#include +#include +#endif #define IP_MAX_MEMBERSHIPS 20 @@ -2090,65 +2093,162 @@ return rv; } - -int ip_mc_procinfo(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - struct ip_mc_list *im; - int len=0; +#if defined(CONFIG_PROC_FS) +struct igmp_mc_iter_state { struct net_device *dev; + struct in_device *in_dev; +}; - len=sprintf(buffer,"Idx\tDevice : Count Querier\tGroup Users Timer\tReporter\n"); +#define igmp_mc_seq_private(seq) ((struct igmp_mc_iter_state *)&seq->private) - read_lock(&dev_base_lock); - for(dev = dev_base; dev; dev = dev->next) { - struct in_device *in_dev = in_dev_get(dev); - char *querier = "NONE"; +static inline struct ip_mc_list *igmp_mc_get_first(struct seq_file *seq) +{ + struct ip_mc_list *im = NULL; + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); - if (in_dev == NULL) + for (state->dev = dev_base, state->in_dev = NULL; + state->dev; + state->dev = state->dev->next) { + struct in_device *in_dev; + in_dev = in_dev_get(state->dev); + if (!in_dev) continue; - -#ifdef CONFIG_IP_MULTICAST - querier = IGMP_V1_SEEN(in_dev) ? "V1" : "V2"; -#endif - - len+=sprintf(buffer+len,"%d\t%-10s: %5d %7s\n", - dev->ifindex, dev->name, dev->mc_count, querier); - read_lock(&in_dev->lock); - for (im = in_dev->mc_list; im; im = im->next) { - len+=sprintf(buffer+len, - "\t\t\t\t%08lX %5d %d:%08lX\t\t%d\n", - im->multiaddr, im->users, - im->tm_running, im->timer.expires-jiffies, im->reporter); - - pos=begin+len; - if(posoffset+length) { - read_unlock(&in_dev->lock); - in_dev_put(in_dev); - goto done; - } + im = in_dev->mc_list; + if (im) { + state->in_dev = in_dev; + break; } read_unlock(&in_dev->lock); - in_dev_put(in_dev); } -done: + return im; +} + +static struct ip_mc_list *igmp_mc_get_next(struct seq_file *seq, struct ip_mc_list *im) +{ + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + im = im->next; + while (!im) { + if (likely(state->in_dev != NULL)) { + read_unlock(&state->in_dev->lock); + in_dev_put(state->in_dev); + } + state->dev = state->dev->next; + if (!state->dev) { + state->in_dev = NULL; + break; + } + state->in_dev = in_dev_get(state->dev); + if (!state->in_dev) + continue; + read_lock(&state->in_dev->lock); + im = state->in_dev->mc_list; + } + return im; +} + +static struct ip_mc_list *igmp_mc_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip_mc_list *im = igmp_mc_get_first(seq); + if (im) + while (pos && (im = igmp_mc_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *igmp_mc_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp_mc_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp_mc_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip_mc_list *im; + if (v == (void *)1) + im = igmp_mc_get_first(seq); + else + im = igmp_mc_get_next(seq, v); + ++*pos; + return im; +} + +static void igmp_mc_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + if (likely(state->in_dev != NULL)) { + read_unlock(state->in_dev->lock); + in_dev_put(state->in_dev); + } read_unlock(&dev_base_lock); +} - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; +static int igmp_mc_seq_show(struct seq_file *seq, void *v) +{ + if (v == (void *)1) + seq_printf(seq, + "Idx\tDevice : Count Querier\tGroup Users Timer\tReporter\n"); + else { + struct ip_mc_list *im = (struct ip_mc_list *)v; + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + char *querier; +#ifdef CONFIG_IP_MULTICAST + querier = IGMP_V1_SEEN(state->in_dev) ? "V1" : "V2"; +#else + querier = "NONE"; +#endif + + if (state->in_dev->mc_list == im) { + seq_printf(seq, "%d\t%-10s: %5d %7s\n", + state->dev->ifindex, state->dev->name, state->dev->mc_count, querier); + } + + seq_printf(seq, + "\t\t\t\t%08lX %5d %d:%08lX\t\t%d\n", + im->multiaddr, im->users, + im->tm_running, im->timer.expires-jiffies, im->reporter); + } + return 0; +} + +static struct seq_operations igmp_mc_seq_ops = { + .start = igmp_mc_seq_start, + .next = igmp_mc_seq_next, + .stop = igmp_mc_seq_stop, + .show = igmp_mc_seq_show, +}; + +static int igmp_mc_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp_mc_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + rc = seq_open(file, &igmp_mc_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } +static struct file_operations igmp_mc_seq_fops = { + .owner = THIS_MODULE, + .open = igmp_mc_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; +#endif + int ip_mcf_procinfo(char *buffer, char **start, off_t offset, int length) { off_t pos=0, begin=0; @@ -2213,4 +2313,16 @@ len=0; return len; } + +#ifdef CONFIG_PROC_FS +int __init igmp_mc_proc_init(void) +{ + struct proc_dir_entry *p; + + p = create_proc_entry("igmp", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp_mc_seq_fops; + return 0; +} +#endif Index: linux-2.5/net/ipv4/ip_output.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/ip_output.c,v retrieving revision 1.34 diff -u -r1.34 ip_output.c --- linux-2.5/net/ipv4/ip_output.c 21 Jun 2003 16:20:41 -0000 1.34 +++ linux-2.5/net/ipv4/ip_output.c 30 Jun 2003 15:28:13 -0000 @@ -1314,7 +1314,7 @@ inet_initpeers(); #ifdef CONFIG_IP_MULTICAST - proc_net_create("igmp", 0, ip_mc_procinfo); + igmp_mc_proc_init(); #endif proc_net_create("mcfilter", 0, ip_mcf_procinfo); } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 09:56:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 09:56:42 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UGuU2x031954 for ; Mon, 30 Jun 2003 09:56:31 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UGvqBo026860; Tue, 1 Jul 2003 01:57:52 +0900 Date: Tue, 01 Jul 2003 01:57:52 +0900 (JST) Message-Id: <20030701.015752.117028166.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV4: convert /proc/net/mcfilter to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3677 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This patch, depends on "[PATCH] IPV4: convert /proc/net/igmp to seq_file patch," converts /proc/net/mcfilter to seq_file. Thanks. --- linux-2.5/include/net/ip.h.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/include/net/ip.h Tue Jul 1 01:46:12 2003 @@ -80,7 +80,6 @@ extern void ip_mc_dropsocket(struct sock *); extern void ip_mc_dropdevice(struct net_device *dev); extern int igmp_mc_proc_init(void); -extern int ip_mcf_procinfo(char *, char **, off_t, int); /* * Functions provided by ip.c --- linux-2.5/net/ipv4/igmp.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv4/igmp.c Tue Jul 1 01:46:12 2003 @@ -2247,74 +2247,177 @@ .llseek = seq_lseek, .release = seq_release_private, }; -#endif -int ip_mcf_procinfo(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - int len=0; - int first = 1; +struct igmp_mcf_iter_state { struct net_device *dev; + struct in_device *idev; + struct ip_mc_list *im; +}; - read_lock(&dev_base_lock); - for(dev=dev_base; dev; dev=dev->next) { - struct in_device *in_dev = in_dev_get(dev); - struct ip_mc_list *imc; +#define igmp_mcf_seq_private(seq) ((struct igmp_mcf_iter_state *)&seq->private) - if (in_dev == NULL) +static inline struct ip_sf_list *igmp_mcf_get_first(struct seq_file *seq) +{ + struct ip_sf_list *psf = NULL; + struct ip_mc_list *im = NULL; + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); + + for (state->dev = dev_base, state->idev = NULL, state->im = NULL; + state->dev; + state->dev = state->dev->next) { + struct in_device *idev; + idev = in_dev_get(state->dev); + if (unlikely(idev == NULL)) continue; + read_lock_bh(&idev->lock); + im = idev->mc_list; + if (likely(im != NULL)) { + spin_lock_bh(&im->lock); + psf = im->sources; + if (likely(psf != NULL)) { + state->im = im; + state->idev = idev; + break; + } + spin_unlock_bh(&im->lock); + } + read_unlock_bh(&idev->lock); + } + return psf; +} - read_lock(&in_dev->lock); - - for (imc=in_dev->mc_list; imc; imc=imc->next) { - struct ip_sf_list *psf; +static struct ip_sf_list *igmp_mcf_get_next(struct seq_file *seq, struct ip_sf_list *psf) +{ + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); - spin_lock_bh(&imc->lock); - for (psf=imc->sources; psf; psf=psf->sf_next) { - if (first) { - len += sprintf(buffer+len, "%3s %6s " - "%10s %10s %6s %6s\n", "Idx", - "Device", "MCA", "SRC", "INC", - "EXC"); - first = 0; - } - len += sprintf(buffer+len, "%3d %6.6s 0x%08x " - "0x%08x %6lu %6lu\n", dev->ifindex, - dev->name, ntohl(imc->multiaddr), - ntohl(psf->sf_inaddr), - psf->sf_count[MCAST_INCLUDE], - psf->sf_count[MCAST_EXCLUDE]); - pos=begin+len; - if(posoffset+length) { - spin_unlock_bh(&imc->lock); - read_unlock(&in_dev->lock); - in_dev_put(in_dev); - goto done; - } + psf = psf->sf_next; + while (!psf) { + spin_unlock_bh(&state->im->lock); + state->im = state->im->next; + while (!state->im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in_dev_put(state->idev); } - spin_unlock_bh(&imc->lock); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + goto out; + } + state->idev = in_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + state->im = state->idev->mc_list; } - read_unlock(&in_dev->lock); - in_dev_put(in_dev); + if (!state->im) + break; + spin_lock_bh(&state->im->lock); + psf = state->im->sources; + } +out: + return psf; +} + +static struct ip_sf_list *igmp_mcf_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip_sf_list *psf = igmp_mcf_get_first(seq); + if (psf) + while (pos && (psf = igmp_mcf_get_next(seq, psf)) != NULL) + --pos; + return pos ? NULL : psf; +} + +static void *igmp_mcf_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp_mcf_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp_mcf_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip_sf_list *psf; + if (v == (void *)1) + psf = igmp_mcf_get_first(seq); + else + psf = igmp_mcf_get_next(seq, v); + ++*pos; + return psf; +} + +static void igmp_mcf_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); + if (likely(state->im != NULL)) + spin_unlock_bh(&state->im->lock); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in_dev_put(state->idev); } -done: read_unlock(&dev_base_lock); +} + +static int igmp_mcf_seq_show(struct seq_file *seq, void *v) +{ + struct ip_sf_list *psf = (struct ip_sf_list *)v; + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; + if (v == (void *)1) { + seq_printf(seq, + "%3s %6s " + "%10s %10s %6s %6s\n", "Idx", + "Device", "MCA", + "SRC", "INC", "EXC"); + } else { + seq_printf(seq, + "%3d %6.6s 0x%08x " + "0x%08x %6lu %6lu\n", + state->dev->ifindex, state->dev->name, + ntohl(state->im->multiaddr), + ntohl(psf->sf_inaddr), + psf->sf_count[MCAST_INCLUDE], + psf->sf_count[MCAST_EXCLUDE]); + } + return 0; +} + +static struct seq_operations igmp_mcf_seq_ops = { + .start = igmp_mcf_seq_start, + .next = igmp_mcf_seq_next, + .stop = igmp_mcf_seq_stop, + .show = igmp_mcf_seq_show, +}; + +static int igmp_mcf_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp_mcf_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + rc = seq_open(file, &igmp_mcf_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } -#ifdef CONFIG_PROC_FS +static struct file_operations igmp_mcf_seq_fops = { + .owner = THIS_MODULE, + .open = igmp_mcf_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + int __init igmp_mc_proc_init(void) { struct proc_dir_entry *p; @@ -2322,6 +2425,10 @@ p = create_proc_entry("igmp", S_IRUGO, proc_net); if (p) p->proc_fops = &igmp_mc_seq_fops; + + p = create_proc_entry("mcfilter", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp_mcf_seq_fops; return 0; } #endif --- linux-2.5/net/ipv4/ip_output.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv4/ip_output.c Tue Jul 1 01:46:12 2003 @@ -1316,5 +1316,4 @@ #ifdef CONFIG_IP_MULTICAST igmp_mc_proc_init(); #endif - proc_net_create("mcfilter", 0, ip_mcf_procinfo); } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 10:58:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 10:58:31 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UHwK2x002267 for ; Mon, 30 Jun 2003 10:58:22 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5UHxgBo027275; Tue, 1 Jul 2003 02:59:42 +0900 Date: Tue, 01 Jul 2003 02:59:41 +0900 (JST) Message-Id: <20030701.025941.105202597.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] IPV6: convert /proc/net/anycast6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3680 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This converts /proc/net/anycast6 to seq_file. Thanks. Index: linux-2.5/net/ipv6/af_inet6.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/af_inet6.c,v retrieving revision 1.46 diff -u -r1.46 af_inet6.c --- linux-2.5/net/ipv6/af_inet6.c 9 Jun 2003 17:26:52 -0000 1.46 +++ linux-2.5/net/ipv6/af_inet6.c 30 Jun 2003 16:36:55 -0000 @@ -85,7 +85,8 @@ extern void udp6_proc_exit(void); extern int ipv6_misc_proc_init(void); extern void ipv6_misc_proc_exit(void); -extern int anycast6_get_info(char *, char **, off_t, int); +extern int ac6_proc_init(void); +extern void ac6_proc_exit(void); extern int if6_proc_init(void); extern void if6_proc_exit(void); #endif @@ -799,7 +800,7 @@ if (ipv6_misc_proc_init()) goto proc_misc6_fail; - if (!proc_net_create("anycast6", 0, anycast6_get_info)) + if (ac6_proc_init()) goto proc_anycast6_fail; if (if6_proc_init()) goto proc_if6_fail; @@ -825,7 +826,7 @@ #ifdef CONFIG_PROC_FS proc_if6_fail: - proc_net_remove("anycast6"); + ac6_proc_exit(); proc_anycast6_fail: ipv6_misc_proc_exit(); proc_misc6_fail: @@ -863,7 +864,7 @@ sock_unregister(PF_INET6); #ifdef CONFIG_PROC_FS if6_proc_exit(); - proc_net_remove("anycast6"); + ac6_proc_exit(); ipv6_misc_proc_exit(); udp6_proc_exit(); tcp6_proc_exit(); Index: linux-2.5/net/ipv6/anycast.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/anycast.c,v retrieving revision 1.3 diff -u -r1.3 anycast.c --- linux-2.5/net/ipv6/anycast.c 22 May 2003 07:38:17 -0000 1.3 +++ linux-2.5/net/ipv6/anycast.c 30 Jun 2003 16:36:55 -0000 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -435,56 +436,159 @@ #ifdef CONFIG_PROC_FS -int anycast6_get_info(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - struct ifacaddr6 *im; - int len=0; +struct ac6_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - struct inet6_dev *idev; + struct inet6_dev *idev; +}; - if ((idev = in6_dev_get(dev)) == NULL) - continue; +#define ac6_seq_private(seq) ((struct ac6_iter_state *)&seq->private) - read_lock_bh(&idev->lock); - for (im = idev->ac_list; im; im = im->aca_next) { - int i; +static inline struct ifacaddr6 *ac6_get_first(struct seq_file *seq) +{ + struct ifacaddr6 *im = NULL; + struct ac6_iter_state *state = ac6_seq_private(seq); - len += sprintf(buffer+len,"%-4d %-15s ", dev->ifindex, dev->name); + for (state->dev = dev_base, state->idev = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (!idev) + continue; + read_lock_bh(&idev->lock); + im = idev->ac_list; + if (im) { + state->idev = idev; + break; + } + read_unlock_bh(&idev->lock); + } + return im; +} - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", im->aca_addr.s6_addr[i]); +static struct ifacaddr6 *ac6_get_next(struct seq_file *seq, struct ifacaddr6 *im) +{ + struct ac6_iter_state *state = ac6_seq_private(seq); - len += sprintf(buffer+len, " %5d\n", im->aca_users); - - pos=begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = im->aca_next; + while (!im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); } - read_unlock_bh(&idev->lock); - in6_dev_put(idev); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + break; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + im = state->idev->ac_list; } + return im; +} -done: +static struct ifacaddr6 *ac6_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ifacaddr6 *im = ac6_get_first(seq); + if (im) + while (pos && (im = ac6_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *ac6_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? ac6_get_idx(seq, *pos) : ac6_get_first(seq); +} + +static void *ac6_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ifacaddr6 *im; + im = ac6_get_next(seq, v); + ++*pos; + return im; +} + +static void ac6_seq_stop(struct seq_file *seq, void *v) +{ + struct ac6_iter_state *state = ac6_seq_private(seq); + if (likely(state->idev != NULL)) { + read_unlock_bh(state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} + +static int ac6_seq_show(struct seq_file *seq, void *v) +{ + struct ifacaddr6 *im = (struct ifacaddr6 *)v; + struct ac6_iter_state *state = ac6_seq_private(seq); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; + seq_printf(seq, + "%-4d %-15s " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%5d\n", + state->dev->ifindex, state->dev->name, + NIP6(im->aca_addr), + im->aca_users); + return 0; } +static struct seq_operations ac6_seq_ops = { + .start = ac6_seq_start, + .next = ac6_seq_next, + .stop = ac6_seq_stop, + .show = ac6_seq_show, +}; + +static int ac6_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct ac6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &ac6_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; +} + +static struct file_operations ac6_seq_fops = { + .owner = THIS_MODULE, + .open = ac6_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +int __init ac6_proc_init(void) +{ + struct proc_dir_entry *p; + + p = create_proc_entry("anycast6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &ac6_seq_fops; + return 0; +} + +void ac6_proc_exit(void) +{ + proc_net_remove("anycast6"); +} #endif + -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jleu@nero.doit.wisc.edu Mon Jun 30 11:37:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 11:37:47 -0700 (PDT) Received: from nero.doit.wisc.edu (nero.doit.wisc.edu [128.104.17.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UIbc2x004450 for ; Mon, 30 Jun 2003 11:37:39 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.11.6/8.11.6) id h5UKMlk23252; Mon, 30 Jun 2003 15:22:47 -0500 Date: Mon, 30 Jun 2003 15:22:46 -0500 From: "James R. Leu" To: Julian Anastasov Cc: Ben Greear , netdev@oss.sgi.com Subject: Re: send-to-self (was Re: routing bug report for 2.4) Message-ID: <20030630152246.B22997@mindspring.com> Reply-To: jleu@mindspring.com References: <3EFE131E.1080807@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from ja@ssi.bg on Sun, Jun 29, 2003 at 12:43:26PM +0300 Organization: none X-archive-position: 3681 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jleu@mindspring.com Precedence: bulk X-list: netdev I have some done some work on a related subject, 'virtual routing and forwarding' for linux. One of the applications of this is 'self-to-self' routing. I have mentioned my work before on this list, and have been flamed (but no one provided me with ideas on how to do it better). If you would like to take a look at what I have done, head over to: http://linux-vrf.sf.net/ I'm open for suggestions of how to implement this better. On Sun, Jun 29, 2003 at 12:43:26PM +0300, Julian Anastasov wrote: > > Hello, > > On Sat, 28 Jun 2003, Ben Greear wrote: > > > My send-to-self patch that I have been using is attached. I also have some other > > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > > impact on the behaviour we have been discussing. > > Ben, lets define new behaviour for your feature: > > 1. we mark ethX with /proc/sys/net/ipv4/conf/ethX/loop=1. That means > this is a loop device (my site contains lot of device flags, you > can see what costs creating a sysctl var): > http://www.ssi.bg/~ja/ > just hit some of the links, recommended example: > http://www.ssi.bg/~ja/forward_shared-2.4.19-2.diff > > there are 2 variants: > > - loop can be 0(no loop) / 1(loop inout) or > > - 0(no loop), 1(loop in only), 2(loop out only), 3(loop inout) > > where "loop in only" means "accept only" and "loop out only" > is "send only" interface > > but as all traffics are inout I think "loop inout" will > be always used > > 2. arp_filter accepts traffic on ethX (as in your patch) > if "loop in" is allowed for indev and "loop out" for the > out_dev in routing result > > 3. rp_filter (source validation) accepts traffic on ethX (as in your > patch) if "loop in" is allowed > > 4. get unicast output route for local IPs ethY->ethX if "loop in" is > allowed for ethX and "loop out" is allowed for "ethY. ARP > will add cache entries for local IPs. > > > Goal 1. Can we just skip the BINDTODEVICE thing and to replace it > with bind to src IP. We can avoid binding to src IP for our > tests if we replace the preferred source IP in the desired local > routes but this is a hack. Using BINDTODEVICE will not add > any benefits but will be supported (it is ignored). > > Then to define it in this way: > > If ethX has "/proc/sys/net/ipv4/conf/ethX/loop" set to !0 then > all output routes "from local_ip_on_ethY to local_ip_on_ethX" will > not receive "lo" result but "ethY" with RTN_UNICAST type > if local_ip_on_ethY is configured on ethY (ethY has loop enabled too), > no matter the key->oif value. Sort of: > > fib_lookup for "from IP1 to IP2 oif XXX" > if (RTN_LOCAL) > { > if dev_out is loop_in and key->src != 0 > { > src = key->src? : FIB_RES_PREFSRC(res); > dev_in = ip_dev_find(src); > if (dev_in is loop_out) > { > use dev_in as dev_out > goto make_route; > } > } > // else > use "lo" > } > > - this code is slow but it is guarded from loop check for out_dev > so I do not see performance impact (the output routing to localhost > is not used often). The result is cached (you can set long > routing cache expiration value during the tests). > > - we assume my patch from previous posting is applied > and we match any local IP no matter the key oif. > > Goal 2. Can we skip all TCP/UDP changes? > > - we rely on the fact the routing results allow traffic in > both directions (incoming is accepted with RTN_LOCAL, output > gets RTN_UNICAST). As for IPv6 I can not comment, we define > ipv4/conf/XXX/loop flag, though. But I prefer we to keep the > changes only at routing level. For TCP and UDP these talks > should look as if "lo" is used. > > - what I'm not sure is whether any socket hash problems exists > and this is the only thing that can prevent this patch to look > nice and fast. But I'm wondering there are such issues as > the talks on "lo" should work but we have to check that. > > The usage: > > - mark eth0 as loop_out and eth1 as loop_in device and start the test > in eth0->eth1 direction or use loop inout for both directions. > > If you think that we can change only the routing then > I can prepare patch for testing, I'm not sure I have a test setup > for this feature right now. > > Regards > > -- > Julian Anastasov > -- James R. Leu From krkumar@us.ibm.com Mon Jun 30 11:54:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 11:54:56 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UIsh2x004947 for ; Mon, 30 Jun 2003 11:54:50 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5UIsbi8161602; Mon, 30 Jun 2003 14:54:37 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5UIsYlv207296; Mon, 30 Jun 2003 14:54:35 -0400 Message-ID: <3F008771.5030206@us.ibm.com> Date: Mon, 30 Jun 2003 11:54:41 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "YOSHIFUJI Hideaki" CC: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List against 2.5.70 (re-done) References: <20030626.230727.35666164.davem@redhat.com> <3EFC668F.9010004@us.ibm.com> <20030627.144752.78715628.davem@redhat.com> <20030628.130602.63704890.yoshfuji@linux-ipv6.org> In-Reply-To: <20030628.130602.63704890.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3682 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev > 1. is it okay to have another hook for garbbig prefix list? > Userspace application can get such information via > - routing table > - interface flag > > 2. is the "managed" flags etc, which is per interface variable, > really NEWROUTE information? > It is NOT L2 thing, but it is per-link information. > I think it is NEWLINK thing. > > What I'm thinking is: > > - fix "ADDRCONF" flag in route information > - manage / other flags via NEWLINK message > (- No new interface to get prefix itself.) Well, there are two reason that I can see to not do so (ADDRCONF flag is already fixed in earlier patch) : - With the latest submission, the actual code to get the prefix list itself is very small, the top level inet6_dump_fib uses either the dump_node or the dump_prefix, the latter being the new function added. This is the whole user interface, 50 odd lines of code with comments. - If I understood your point about using interface flag and routing table, you are suggesting that the user can make look at rttable and get the prefix entries by make checks (it is non-trivial, eg the address should not LL or MC, there should be no nexthop and it should be added via an RA, etc). However, having a user interface makes it easier to get the prefix list without significant bloat to the kernel, and the user doesn't have to make a lot of checks to get the system prefixes. I don't see much gain from this approach. About your point about the managed flag, I think it is a per interface flag that gets returned when a request for getting flags on that interface is made. That's why I have made it per interface as part of a GETLNKFLAGS operation. I don't understand why you think it is NEWLINK thing (not sure what you mean by that), since it is a flag information on your existing device that a RA is advertising. I want to get this information not on receipt of an RA, but when a request is made. Thanks, - KK From latten@austin.ibm.com Mon Jun 30 12:03:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 12:03:15 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UJ3A2x005377 for ; Mon, 30 Jun 2003 12:03:11 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e32.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5UJ2V8w092868; Mon, 30 Jun 2003 15:02:31 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5UJ2Q9f176250; Mon, 30 Jun 2003 13:02:27 -0600 Received: from faith.austin.ibm.com (faith.austin.ibm.com [9.41.94.16]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h5UJ2PJN026760; Mon, 30 Jun 2003 14:02:25 -0500 Received: from faith.austin.ibm.com (localhost.localdomain [127.0.0.1]) by faith.austin.ibm.com (8.12.5/8.12.8) with ESMTP id h5UJ6xIY000539; Mon, 30 Jun 2003 14:06:59 -0500 Received: (from jml@localhost) by faith.austin.ibm.com (8.12.5/8.12.5/Submit) id h5UJ6wwV000537; Mon, 30 Jun 2003 14:06:58 -0500 Date: Mon, 30 Jun 2003 14:06:58 -0500 From: latten@austin.ibm.com Message-Id: <200306301906.h5UJ6wwV000537@faith.austin.ibm.com> To: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: PATCH: IPSecv6 in tunnel won't work with ext hdrs X-archive-position: 3683 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: latten@austin.ibm.com Precedence: bulk X-list: netdev I noticed that using extensions headers along with IPsecv6 tunnel mode did not work in 2.5.73 + patch-2.5.73-bk3. The following patch checks "nexthdr" instead of "iph->nexthdr" which could be an extension header. I tested this with tunnel mode and transport mode with and without extension headers and it worked ok. Let me know if it is ok. Joy Latten ----------------------------------------------------------------------- --- xfrm6_input.c.orig 2003-06-30 11:04:31.000000000 -0500 +++ xfrm6_input.c 2003-06-30 11:09:27.000000000 -0500 @@ -67,10 +67,8 @@ xfrm_vec[xfrm_nr++].xvec = x; - iph = skb->nh.ipv6h; - if (x->props.mode) { /* XXX */ - if (iph->nexthdr != IPPROTO_IPV6) + if (nexthdr != IPPROTO_IPV6) goto drop; skb->nh.raw = skb->data; iph = skb->nh.ipv6h; From niz@vencraft.com Mon Jun 30 14:06:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 14:07:10 -0700 (PDT) Received: from cortez.tablus.com (covad-tablus.meer.net [209.157.140.126]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5UL6u2x008524 for ; Mon, 30 Jun 2003 14:06:57 -0700 Received: from NIZRUNNER (nizrunner.tablus.com [10.11.0.151] (may be forged)) by cortez.tablus.com (8.12.8/8.12.8) with ESMTP id h5UL6pgh018992; Mon, 30 Jun 2003 14:06:54 -0700 From: "Jim Nisbet" To: Cc: Subject: kernel bug fix: entered/exited promiscuous mode flip/flop error Date: Mon, 30 Jun 2003 14:06:40 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h5UL6u2x008524 X-archive-position: 3684 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niz@vencraft.com Precedence: bulk X-list: netdev Re: bug in entering/exiting promiscuous read mode A number of people have mentioned that they get a weird situation where when they *start* a program that does promiscuous network reads (with, say, ‘tcpdump –i eth0’). They then get a kernel message “left promiscuous mode” when the program starts and the message “entered promiscuous mode” when it exits – the exact opposite of what should happen.  I’ve tracked this problem down and fixed it in the 2.4.20 kernel sources.  I have also verified that the problem still exists in the latest dev sources (2.5.73).  The patch file is at the end of this email.   The problem comes when the interface is “downed” the dev->promiscuity count gets incorrectly decremented by 2.  This happens because the dev_change_flags routine calls dev_mc_upload which decrements the count and the count gets decremented a 2nd time when dev_change_flags calls dev_close which calls notifier_call_chain->packet_notifier->packet_dev_mclist->packet_dev_mc.  The logic in dev_set_promiscuity is to test if dev->promiscuity == 0.  What happens is the count is now set to -1.  When a new caller increments the count by 1 to enter promisc mode but that means that the count is now set to 0 which means exit promisc; when the caller exits the count is decremented and is once again set to -1 which means re-enter promisc mode (since the count is not == 0).   There is a similar problem when the interface is enabled when there is already a promisc reader.  The count is incremented twice.    I’ve fixed the problem and verified the fix by running tcpdump and then doing various combinations of ifconfig..up/down.  I also added one new diagnostic message and changed the code so that it will not do the inverse logic thing if the count should ever be wrong again in the future.     My solution involves minor changes to net/core/dev.c and net/packet/af_packet.c.  The patch file is attached to this email.  The changes are:   1.  Don’t increment/decrement the promisc count if the interface is not up in packet_dev_mc routine.  Note: the request to be in promisc mode is still stored on the multicast (mc) list so when the interface is enabled the interface will be put back in promisc mode by dev_mc_upload as it should be.   2.  In packet_notifier call packet_dev_mclist to remove multicast/promisc mode when the interface is GOING_DOWN not DOWN.  Since if it is “DOWN” the request would now be ignored.   3.  Change dev_set_promiscuity routine to check for a count of <= 0 and then set the count to 0 and unset promisc mode.  What this means is that if the count is ever wrong the interface will go out of promisc mode “too early”.  But a new program could still be started and it would enter promisc mode again.  It will never invert the logic.   4.  Write a kernel info message whenever the use count is incremented/decremented instead of *just* when promisc is turned on/off.  If a 2nd caller uses setsockopt to put the interface in promisc mode, a message would be generated:                Jun 29 16:57:08 tabby kernel: device eth1 still in promiscuous mode (use count now 2)   and when one of the existing callers closes the socket connection then the message would be:                Jun 29 16:57:20 tabby kernel: device eth1 still in promiscuous mode (use count now 1)     I hope these changes or something like it can get incorporated in some future release.  For those who don’t want to patch the kernel, I’d say stay away from ifconfig up/down if you have a promisc read operation going.   Best, /j ==================== here is the diff -u for the changes ======================== diff -ruN linux-2.4.20-18.8/net/core/dev.c linux-2.4.20-18.8-tablus/net/core/dev.c --- linux-2.4.20-18.8/net/core/dev.c 2003-05-29 03:46:18.000000000 -0700 +++ linux-2.4.20-18.8-tablus/net/core/dev.c 2003-06-29 16:59:45.000000000 -0700 @@ -1949,8 +1949,10 @@ unsigned short old_flags = dev->flags; dev->flags |= IFF_PROMISC; - if ((dev->promiscuity += inc) == 0) + if ((dev->promiscuity += inc) <= 0) { + dev->promiscuity = 0; dev->flags &= ~IFF_PROMISC; + } if (dev->flags^old_flags) { #ifdef CONFIG_NET_FASTROUTE if (dev->flags&IFF_PROMISC) { @@ -1963,6 +1965,12 @@ printk(KERN_INFO "device %s %s promiscuous mode\n", dev->name, (dev->flags&IFF_PROMISC) ? "entered" : "left"); } + else { + if (dev->flags&IFF_PROMISC) + printk(KERN_INFO "device %s still in promiscuous mode" + " (use count now %d)\n", + dev->name, dev->promiscuity); + } } /** diff -ruN linux-2.4.20-18.8/net/packet/af_packet.c linux-2.4.20-18.8-tablus/net/packet/af_packet.c --- linux-2.4.20-18.8/net/packet/af_packet.c 2003-05-29 03:47:00.000000000 -0700 +++ linux-2.4.20-18.8-tablus/net/packet/af_packet.c 2003-06-29 15:02:22.000000000 -0700 @@ -1146,6 +1146,9 @@ #ifdef CONFIG_PACKET_MULTICAST static void packet_dev_mc(struct net_device *dev, struct packet_mclist *i, int what) { + if (!(dev->flags & IFF_UP)) + return; + switch (i->type) { case PACKET_MR_MULTICAST: if (what > 0) @@ -1396,6 +1399,8 @@ } spin_unlock(&po->bind_lock); } + break; + case NETDEV_GOING_DOWN: #ifdef CONFIG_PACKET_MULTICAST if (po->mclist) packet_dev_mclist(dev, po->mclist, -1); From jmorris@intercode.com.au Mon Jun 30 17:13:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 17:13:08 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:p+rCv9eUbDFbou7L9SnUGyhdRK1rW411@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h610Cx2x024419 for ; Mon, 30 Jun 2003 17:13:00 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h610Cnr03360; Tue, 1 Jul 2003 10:12:49 +1000 Date: Tue, 1 Jul 2003 10:12:48 +1000 (EST) From: James Morris To: Jim Nisbet cc: netdev@oss.sgi.com, Subject: Re: kernel bug fix: entered/exited promiscuous mode flip/flop error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-MIME-Autoconverted: from 8bit to quoted-printable by blackbird.intercode.com.au id h610Cnr03360 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h610Cx2x024419 X-archive-position: 3685 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 30 Jun 2003, Jim Nisbet wrote: > I hope these changes or something like it can get incorporated in some > future release.  For those who don’t want to patch the kernel, I’d say stay > away from ifconfig up/down if you have a promisc read operation going. Would it be possible for you to resubmit the patch so that spaces are not replacing tabs for some of the indentation? - James -- James Morris From jmorris@intercode.com.au Mon Jun 30 17:23:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 17:23:33 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:k2nIroYgmp+utzVPfw7M7njOeNSO+E4V@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h610NM2x024751 for ; Mon, 30 Jun 2003 17:23:24 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h610N8r03395; Tue, 1 Jul 2003 10:23:09 +1000 Date: Tue, 1 Jul 2003 10:23:08 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file In-Reply-To: <20030701.015755.80809794.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3686 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > This converts /proc/net/igmp6 to seq_file. This does not compile. - James -- James Morris From jmorris@intercode.com.au Mon Jun 30 17:29:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 17:29:56 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:i0i1OVqq5FZoJd9bKeiVV2Ny4FlXUQWm@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h610To2x025068 for ; Mon, 30 Jun 2003 17:29:51 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h610Tcr03432; Tue, 1 Jul 2003 10:29:38 +1000 Date: Tue, 1 Jul 2003 10:29:37 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV4: convert /proc/net/mcfilter to seq_file In-Reply-To: <20030701.015752.117028166.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3687 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Hello. > > This patch, depends on "[PATCH] IPV4: convert /proc/net/igmp to > seq_file patch," converts /proc/net/mcfilter to seq_file. None of these either compile or applies cleanly without the previous patches. - James -- James Morris From niz@vencraft.com Mon Jun 30 18:35:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 18:35:50 -0700 (PDT) Received: from cortez.tablus.com (covad-tablus.meer.net [209.157.140.126]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h611Zf2x025925 for ; Mon, 30 Jun 2003 18:35:41 -0700 Received: from NIZRUNNER (nizrunner.tablus.com [10.11.0.151] (may be forged)) by cortez.tablus.com (8.12.8/8.12.8) with ESMTP id h611ZQgh019369; Mon, 30 Jun 2003 18:35:27 -0700 From: "Jim Nisbet" To: Cc: Subject: [PATCH 2.4.20] entered/exited promiscuous mode flip/flop error (2nd submit attempt--white space error in added source lines corrected) Date: Mon, 30 Jun 2003 18:35:14 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h611Zf2x025925 X-archive-position: 3688 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niz@vencraft.com Precedence: bulk X-list: netdev Re: bug in entering/exiting promiscuous read mode A number of people have mentioned that they get a weird situation where when they *start* a program that does promiscuous network reads (with, say, ‘tcpdump –i eth0’). They then get a kernel message “left promiscuous mode” when the program starts and the message “entered promiscuous mode” when it exits – the exact opposite of what should happen.  I’ve tracked this problem down and fixed it in the 2.4.20 kernel sources.  I have also verified that the problem still exists in the latest dev sources (2.5.73).  The patch file is at the end of this email.   The problem comes when the interface is “downed” the dev->promiscuity count gets incorrectly decremented by 2.  This happens because the dev_change_flags routine calls dev_mc_upload which decrements the count and the count gets decremented a 2nd time when dev_change_flags calls dev_close which calls notifier_call_chain->packet_notifier->packet_dev_mclist->packet_dev_mc.  The logic in dev_set_promiscuity is to test if dev->promiscuity == 0.  What happens is the count is now set to -1.  When a new caller increments the count by 1 to enter promisc mode but that means that the count is now set to 0 which means exit promisc; when the caller exits the count is decremented and is once again set to -1 which means re-enter promisc mode (since the count is not == 0).   There is a similar problem when the interface is enabled when there is already a promisc reader.  The count is incremented twice.    I’ve fixed the problem and verified the fix by running tcpdump and then doing various combinations of ifconfig..up/down.  I also added one new diagnostic message and changed the code so that it will not do the inverse logic thing if the count should ever be wrong again in the future.     My solution involves minor changes to net/core/dev.c and net/packet/af_packet.c.  The patch file is attached to this email.  The changes are:   1.  Don’t increment/decrement the promisc count if the interface is not up in packet_dev_mc routine.  Note: the request to be in promisc mode is still stored on the multicast (mc) list so when the interface is enabled the interface will be put back in promisc mode by dev_mc_upload as it should be.   2.  In packet_notifier call packet_dev_mclist to remove multicast/promisc mode when the interface is GOING_DOWN not DOWN.  Since if it is “DOWN” the request would now be ignored.   3.  Change dev_set_promiscuity routine to check for a count of <= 0 and then set the count to 0 and unset promisc mode.  What this means is that if the count is ever wrong the interface will go out of promisc mode “too early”.  But a new program could still be started and it would enter promisc mode again.  It will never invert the logic.   4.  Write a kernel info message whenever the use count is incremented/decremented instead of *just* when promisc is turned on/off.  If a 2nd caller uses setsockopt to put the interface in promisc mode, a message would be generated:                Jun 29 16:57:08 tabby kernel: device eth1 still in promiscuous mode (use count now 2)   and when one of the existing callers closes the socket connection then the message would be:                Jun 29 16:57:20 tabby kernel: device eth1 still in promiscuous mode (use count now 1)     For those who don’t want to patch the kernel, I’d say stay away from ifconfig up/down if you have a promisc read operation going. ================= diff -u patch follows =================== diff -ru linux-2.4.20-18.8/net/core/dev.c linux-2.4.20-18.8-tablus/net/core/dev.c --- linux-2.4.20-18.8/net/core/dev.c 2003-05-29 03:46:18.000000000 -0700 +++ linux-2.4.20-18.8-tablus/net/core/dev.c 2003-06-30 18:16:59.000000000 -0700 @@ -1949,8 +1949,10 @@ unsigned short old_flags = dev->flags; dev->flags |= IFF_PROMISC; - if ((dev->promiscuity += inc) == 0) + if ((dev->promiscuity += inc) <= 0) { + dev->promiscuity = 0; dev->flags &= ~IFF_PROMISC; + } if (dev->flags^old_flags) { #ifdef CONFIG_NET_FASTROUTE if (dev->flags&IFF_PROMISC) { @@ -1963,6 +1965,12 @@ printk(KERN_INFO "device %s %s promiscuous mode\n", dev->name, (dev->flags&IFF_PROMISC) ? "entered" : "left"); } + else { + if (dev->flags&IFF_PROMISC) + printk(KERN_INFO "device %s still in promiscuous mode" + " (use count now %d)\n", + dev->name, dev->promiscuity); + } } /** diff -ru linux-2.4.20-18.8/net/packet/af_packet.c linux-2.4.20-18.8-tablus/net/packet/af_packet.c --- linux-2.4.20-18.8/net/packet/af_packet.c 2003-05-29 03:47:00.000000000 -0700 +++ linux-2.4.20-18.8-tablus/net/packet/af_packet.c 2003-06-30 18:19:47.000000000 -0700 @@ -1146,6 +1146,9 @@ #ifdef CONFIG_PACKET_MULTICAST static void packet_dev_mc(struct net_device *dev, struct packet_mclist *i, int what) { + if (!(dev->flags & IFF_UP)) + return; + switch (i->type) { case PACKET_MR_MULTICAST: if (what > 0) @@ -1396,6 +1399,8 @@ } spin_unlock(&po->bind_lock); } + break; + case NETDEV_GOING_DOWN: #ifdef CONFIG_PACKET_MULTICAST if (po->mclist) packet_dev_mclist(dev, po->mclist, -1); From jmorris@intercode.com.au Mon Jun 30 18:41:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 18:41:48 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:xkhSO0H2RNtGrzbuvyscFjA0MzlQsCAf@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h611fb2x026275 for ; Mon, 30 Jun 2003 18:41:43 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h611efr03715; Tue, 1 Jul 2003 11:40:42 +1000 Date: Tue, 1 Jul 2003 11:40:40 +1000 (EST) From: James Morris To: latten@austin.ibm.com cc: davem@redhat.com, , Subject: Re: PATCH: IPSecv6 in tunnel won't work with ext hdrs In-Reply-To: <200306301906.h5UJ6wwV000537@faith.austin.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3689 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Mon, 30 Jun 2003 latten@austin.ibm.com wrote: > Let me know if it is ok. Looks correct to me, applied to bk://kernel.bkbits.net/jmorris/net-2.5 - James -- James Morris From yoshfuji@linux-ipv6.org Mon Jun 30 21:14:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 21:14:49 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h614Ec2x027936 for ; Mon, 30 Jun 2003 21:14:39 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h614FqBo000635; Tue, 1 Jul 2003 13:15:53 +0900 Date: Tue, 01 Jul 2003 13:15:52 +0900 (JST) Message-Id: <20030701.131552.10422805.yoshfuji@linux-ipv6.org> To: jmorris@intercode.com.au Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV4: convert /proc/net/mcfilter to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030701.015752.117028166.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3690 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Tue, 1 Jul 2003 10:29:37 +1000 (EST)), James Morris says: > > This patch, depends on "[PATCH] IPV4: convert /proc/net/igmp to > > seq_file patch," converts /proc/net/mcfilter to seq_file. > > None of these either compile or applies cleanly without the previous > patches. What do you mean by "previous patches?" /proc/net/mcfilter patch depends on /proc/net/igmp patch as I wrote. Thanks. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 21:22:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 21:22:05 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h614Lw2x028273 for ; Mon, 30 Jun 2003 21:21:59 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h614NFBo000690; Tue, 1 Jul 2003 13:23:15 +0900 Date: Tue, 01 Jul 2003 13:23:15 +0900 (JST) Message-Id: <20030701.132315.11621908.yoshfuji@linux-ipv6.org> To: jmorris@intercode.com.au Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030701.015755.80809794.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3691 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Tue, 1 Jul 2003 10:23:08 +1000 (EST)), James Morris says: > On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > > This converts /proc/net/igmp6 to seq_file. > > This does not compile. Really???? I had compiled it before I sent the patch and I can compile it now. How did you fail to compile it? -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Mon Jun 30 21:41:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 21:41:43 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:FvJq5zYhQHPVdO2jLO9Zh/hdzctXNaJB@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h614fX2x028700 for ; Mon, 30 Jun 2003 21:41:35 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h614fKr04524; Tue, 1 Jul 2003 14:41:20 +1000 Date: Tue, 1 Jul 2003 14:41:19 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file In-Reply-To: <20030701.132315.11621908.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3692 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > Really???? > I had compiled it before I sent the patch and I can compile it now. > How did you fail to compile it? net/ipv6/mcast.c: In function `igmp6_mc_seq_stop': net/ipv6/mcast.c:2124: invalid type argument of `->' make[2]: *** [net/ipv6/mcast.o] Error 1 make[1]: *** [net/ipv6] Error 2 make: *** [net] Error 2 read_unlock_bh(state->idev->lock) should be read_unlock_bh(&state->idev->lock) At least one of the other patches has a similar problem. - James -- James Morris From jmorris@intercode.com.au Mon Jun 30 21:43:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 21:43:48 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:UOvLDc0clEERHZl6qzfcVi7+/bSkZUN7@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h614hg2x029013 for ; Mon, 30 Jun 2003 21:43:43 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h614hVr04532; Tue, 1 Jul 2003 14:43:31 +1000 Date: Tue, 1 Jul 2003 14:43:30 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] IPV4: convert /proc/net/mcfilter to seq_file In-Reply-To: <20030701.131552.10422805.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3693 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > What do you mean by "previous patches?" The group of patches which started with Subject: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file > /proc/net/mcfilter patch depends on /proc/net/igmp patch > as I wrote. Yes, and the /proc/net/igmp patch does not compile for me, and I cannot thus apply and test it. - James -- James Morris From davem@redhat.com Mon Jun 30 22:34:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 22:34:11 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h615Y52x029638 for ; Mon, 30 Jun 2003 22:34:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29771; Mon, 30 Jun 2003 22:27:09 -0700 Date: Mon, 30 Jun 2003 22:27:08 -0700 (PDT) Message-Id: <20030630.222708.48505367.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: jmorris@intercode.com.au, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file From: "David S. Miller" In-Reply-To: <20030701.142947.75158018.yoshfuji@linux-ipv6.org> References: <20030701.132315.11621908.yoshfuji@linux-ipv6.org> <20030701.142947.75158018.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3694 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Tue, 01 Jul 2003 14:29:47 +0900 (JST) Hmm..., I don't understand why I could compile it... CONFIG_SMP=n From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:17 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w3B030220 for ; Mon, 30 Jun 2003 23:07:07 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615bIBo005979; Tue, 1 Jul 2003 14:37:18 +0900 Date: Tue, 01 Jul 2003 14:37:18 +0900 (JST) Message-Id: <20030701.143718.82262791.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: jmorris@intercode.com.au, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030630.222708.48505367.davem@redhat.com> References: <20030701.142947.75158018.yoshfuji@linux-ipv6.org> <20030630.222708.48505367.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 3696 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030630.222708.48505367.davem@redhat.com> (at Mon, 30 Jun 2003 22:27:08 -0700 (PDT)), "David S. Miller" says: > From: YOSHIFUJI Hideaki / $B5HF#1QL@(B > Date: Tue, 01 Jul 2003 14:29:47 +0900 (JST) > > Hmm..., I don't understand why I could compile it... > > CONFIG_SMP=n Ah,... I understood. :-p -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:24 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w31030220 for ; Mon, 30 Jun 2003 23:07:00 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615mpBo013063; Tue, 1 Jul 2003 14:48:51 +0900 Date: Tue, 01 Jul 2003 14:48:50 +0900 (JST) Message-Id: <20030701.144850.120945416.yoshfuji@linux-ipv6.org> To: davem@redhat.com, jmorris@intercode.com.au CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] seq_file conversion 2/5: /proc/net/igmp6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3701 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev 2/5: convert /proc/net/igmp6 to seq_file. Index: linux-2.5/net/ipv6/mcast.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/mcast.c,v retrieving revision 1.24 diff -u -r1.24 mcast.c --- linux-2.5/net/ipv6/mcast.c 24 Jun 2003 19:45:24 -0000 1.24 +++ linux-2.5/net/ipv6/mcast.c 30 Jun 2003 15:28:16 -0000 @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -2039,64 +2040,145 @@ } #ifdef CONFIG_PROC_FS -static int igmp6_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0, begin=0; - struct ifmcaddr6 *im; - int len=0; +struct igmp6_mc_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - struct inet6_dev *idev; + struct inet6_dev *idev; +}; - if ((idev = in6_dev_get(dev)) == NULL) - continue; +#define igmp6_mc_seq_private(seq) ((struct igmp6_mc_iter_state *)&seq->private) - read_lock_bh(&idev->lock); - for (im = idev->mc_list; im; im = im->next) { - int i; +static inline struct ifmcaddr6 *igmp6_mc_get_first(struct seq_file *seq) +{ + struct ifmcaddr6 *im = NULL; + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); - len += sprintf(buffer+len,"%-4d %-15s ", dev->ifindex, dev->name); + for (state->dev = dev_base, state->idev = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (!idev) + continue; + read_lock_bh(&idev->lock); + im = idev->mc_list; + if (im) { + state->idev = idev; + break; + } + read_unlock_bh(&idev->lock); + } + return im; +} - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", im->mca_addr.s6_addr[i]); +static struct ifmcaddr6 *igmp6_mc_get_next(struct seq_file *seq, struct ifmcaddr6 *im) +{ + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); - len+=sprintf(buffer+len, - " %5d %08X %ld\n", - im->mca_users, - im->mca_flags, - (im->mca_flags&MAF_TIMER_RUNNING) ? im->mca_timer.expires-jiffies : 0); - - pos=begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = im->next; + while (!im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); } - read_unlock_bh(&idev->lock); - in6_dev_put(idev); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + break; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + im = state->idev->mc_list; } - *eof = 1; + return im; +} -done: +static struct ifmcaddr6 *igmp6_mc_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ifmcaddr6 *im = igmp6_mc_get_first(seq); + if (im) + while (pos && (im = igmp6_mc_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *igmp6_mc_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp6_mc_get_idx(seq, *pos) : igmp6_mc_get_first(seq); +} + +static void *igmp6_mc_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ifmcaddr6 *im; + im = igmp6_mc_get_next(seq, v); + ++*pos; + return im; +} + +static void igmp6_mc_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; +static int igmp6_mc_seq_show(struct seq_file *seq, void *v) +{ + struct ifmcaddr6 *im = (struct ifmcaddr6 *)v; + struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq); + + seq_printf(seq, + "%-4d %-15s %04x%04x%04x%04x%04x%04x%04x%04x %5d %08X %ld\n", + state->dev->ifindex, state->dev->name, + NIP6(im->mca_addr), + im->mca_users, im->mca_flags, + (im->mca_flags&MAF_TIMER_RUNNING) ? im->mca_timer.expires-jiffies : 0); + return 0; } +static struct seq_operations igmp6_mc_seq_ops = { + .start = igmp6_mc_seq_start, + .next = igmp6_mc_seq_next, + .stop = igmp6_mc_seq_stop, + .show = igmp6_mc_seq_show, +}; + +static int igmp6_mc_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp6_mc_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &igmp6_mc_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; +} + +static struct file_operations igmp6_mc_seq_fops = { + .owner = THIS_MODULE, + .open = igmp6_mc_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + static int ip6_mcf_read_proc(char *buffer, char **start, off_t offset, int length, int *eof, void *data) { @@ -2178,6 +2260,9 @@ struct ipv6_pinfo *np; struct sock *sk; int err; +#ifdef CONFIG_PROC_FS + struct proc_dir_entry *p; +#endif err = sock_create(PF_INET6, SOCK_RAW, IPPROTO_ICMPV6, &igmp6_socket); if (err < 0) { @@ -2194,8 +2279,11 @@ np = inet6_sk(sk); np->hop_limit = 1; + #ifdef CONFIG_PROC_FS - create_proc_read_entry("net/igmp6", 0, 0, igmp6_read_proc, NULL); + p = create_proc_entry("igmp6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp6_mc_seq_fops; create_proc_read_entry("net/mcfilter6", 0, 0, ip6_mcf_read_proc, NULL); #endif @@ -2207,6 +2295,6 @@ sock_release(igmp6_socket); igmp6_socket = NULL; /* for safety */ #ifdef CONFIG_PROC_FS - remove_proc_entry("net/igmp6", 0); + proc_net_remove("igmp6"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:19 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w37030220 for ; Mon, 30 Jun 2003 23:07:05 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615mvBo013078; Tue, 1 Jul 2003 14:48:57 +0900 Date: Tue, 01 Jul 2003 14:48:57 +0900 (JST) Message-Id: <20030701.144857.99268305.yoshfuji@linux-ipv6.org> To: davem@redhat.com, jmorris@intercode.com.au CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] seq_file conversion 5/5: /proc/net/anycast6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3697 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev 5/5: convert /proc/net/anycast6 to seq_file. Index: linux-2.5/net/ipv6/af_inet6.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/af_inet6.c,v retrieving revision 1.46 diff -u -r1.46 af_inet6.c --- linux-2.5/net/ipv6/af_inet6.c 9 Jun 2003 17:26:52 -0000 1.46 +++ linux-2.5/net/ipv6/af_inet6.c 30 Jun 2003 16:36:55 -0000 @@ -85,7 +85,8 @@ extern void udp6_proc_exit(void); extern int ipv6_misc_proc_init(void); extern void ipv6_misc_proc_exit(void); -extern int anycast6_get_info(char *, char **, off_t, int); +extern int ac6_proc_init(void); +extern void ac6_proc_exit(void); extern int if6_proc_init(void); extern void if6_proc_exit(void); #endif @@ -799,7 +800,7 @@ if (ipv6_misc_proc_init()) goto proc_misc6_fail; - if (!proc_net_create("anycast6", 0, anycast6_get_info)) + if (ac6_proc_init()) goto proc_anycast6_fail; if (if6_proc_init()) goto proc_if6_fail; @@ -825,7 +826,7 @@ #ifdef CONFIG_PROC_FS proc_if6_fail: - proc_net_remove("anycast6"); + ac6_proc_exit(); proc_anycast6_fail: ipv6_misc_proc_exit(); proc_misc6_fail: @@ -863,7 +864,7 @@ sock_unregister(PF_INET6); #ifdef CONFIG_PROC_FS if6_proc_exit(); - proc_net_remove("anycast6"); + ac6_proc_exit(); ipv6_misc_proc_exit(); udp6_proc_exit(); tcp6_proc_exit(); Index: linux-2.5/net/ipv6/anycast.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv6/anycast.c,v retrieving revision 1.3 diff -u -r1.3 anycast.c --- linux-2.5/net/ipv6/anycast.c 22 May 2003 07:38:17 -0000 1.3 +++ linux-2.5/net/ipv6/anycast.c 30 Jun 2003 16:36:55 -0000 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -435,56 +436,159 @@ #ifdef CONFIG_PROC_FS -int anycast6_get_info(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - struct ifacaddr6 *im; - int len=0; +struct ac6_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - struct inet6_dev *idev; + struct inet6_dev *idev; +}; - if ((idev = in6_dev_get(dev)) == NULL) - continue; +#define ac6_seq_private(seq) ((struct ac6_iter_state *)&seq->private) - read_lock_bh(&idev->lock); - for (im = idev->ac_list; im; im = im->aca_next) { - int i; +static inline struct ifacaddr6 *ac6_get_first(struct seq_file *seq) +{ + struct ifacaddr6 *im = NULL; + struct ac6_iter_state *state = ac6_seq_private(seq); - len += sprintf(buffer+len,"%-4d %-15s ", dev->ifindex, dev->name); + for (state->dev = dev_base, state->idev = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (!idev) + continue; + read_lock_bh(&idev->lock); + im = idev->ac_list; + if (im) { + state->idev = idev; + break; + } + read_unlock_bh(&idev->lock); + } + return im; +} - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", im->aca_addr.s6_addr[i]); +static struct ifacaddr6 *ac6_get_next(struct seq_file *seq, struct ifacaddr6 *im) +{ + struct ac6_iter_state *state = ac6_seq_private(seq); - len += sprintf(buffer+len, " %5d\n", im->aca_users); - - pos=begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = im->aca_next; + while (!im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); } - read_unlock_bh(&idev->lock); - in6_dev_put(idev); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + break; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + im = state->idev->ac_list; } + return im; +} -done: +static struct ifacaddr6 *ac6_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ifacaddr6 *im = ac6_get_first(seq); + if (im) + while (pos && (im = ac6_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *ac6_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? ac6_get_idx(seq, *pos) : ac6_get_first(seq); +} + +static void *ac6_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ifacaddr6 *im; + im = ac6_get_next(seq, v); + ++*pos; + return im; +} + +static void ac6_seq_stop(struct seq_file *seq, void *v) +{ + struct ac6_iter_state *state = ac6_seq_private(seq); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} + +static int ac6_seq_show(struct seq_file *seq, void *v) +{ + struct ifacaddr6 *im = (struct ifacaddr6 *)v; + struct ac6_iter_state *state = ac6_seq_private(seq); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; + seq_printf(seq, + "%-4d %-15s " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%5d\n", + state->dev->ifindex, state->dev->name, + NIP6(im->aca_addr), + im->aca_users); + return 0; } +static struct seq_operations ac6_seq_ops = { + .start = ac6_seq_start, + .next = ac6_seq_next, + .stop = ac6_seq_stop, + .show = ac6_seq_show, +}; + +static int ac6_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct ac6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &ac6_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; +} + +static struct file_operations ac6_seq_fops = { + .owner = THIS_MODULE, + .open = ac6_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +int __init ac6_proc_init(void) +{ + struct proc_dir_entry *p; + + p = create_proc_entry("anycast6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &ac6_seq_fops; + return 0; +} + +void ac6_proc_exit(void) +{ + proc_net_remove("anycast6"); +} #endif + -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:21 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w33030220 for ; Mon, 30 Jun 2003 23:07:02 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615mrBo013066; Tue, 1 Jul 2003 14:48:53 +0900 Date: Tue, 01 Jul 2003 14:48:53 +0900 (JST) Message-Id: <20030701.144853.89770574.yoshfuji@linux-ipv6.org> To: davem@redhat.com, jmorris@intercode.com.au CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] seq_file conversion 3/5: /proc/net/mfilter From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3698 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev 3/5: convert /proc/net/mfilter to seq_file. --- linux-2.5/include/net/ip.h.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/include/net/ip.h Tue Jul 1 01:46:12 2003 @@ -80,7 +80,6 @@ extern void ip_mc_dropsocket(struct sock *); extern void ip_mc_dropdevice(struct net_device *dev); extern int igmp_mc_proc_init(void); -extern int ip_mcf_procinfo(char *, char **, off_t, int); /* * Functions provided by ip.c --- linux-2.5/net/ipv4/igmp.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv4/igmp.c Tue Jul 1 01:46:12 2003 @@ -2247,74 +2247,177 @@ .llseek = seq_lseek, .release = seq_release_private, }; -#endif -int ip_mcf_procinfo(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - int len=0; - int first = 1; +struct igmp_mcf_iter_state { struct net_device *dev; + struct in_device *idev; + struct ip_mc_list *im; +}; - read_lock(&dev_base_lock); - for(dev=dev_base; dev; dev=dev->next) { - struct in_device *in_dev = in_dev_get(dev); - struct ip_mc_list *imc; +#define igmp_mcf_seq_private(seq) ((struct igmp_mcf_iter_state *)&seq->private) - if (in_dev == NULL) +static inline struct ip_sf_list *igmp_mcf_get_first(struct seq_file *seq) +{ + struct ip_sf_list *psf = NULL; + struct ip_mc_list *im = NULL; + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); + + for (state->dev = dev_base, state->idev = NULL, state->im = NULL; + state->dev; + state->dev = state->dev->next) { + struct in_device *idev; + idev = in_dev_get(state->dev); + if (unlikely(idev == NULL)) continue; + read_lock_bh(&idev->lock); + im = idev->mc_list; + if (likely(im != NULL)) { + spin_lock_bh(&im->lock); + psf = im->sources; + if (likely(psf != NULL)) { + state->im = im; + state->idev = idev; + break; + } + spin_unlock_bh(&im->lock); + } + read_unlock_bh(&idev->lock); + } + return psf; +} - read_lock(&in_dev->lock); - - for (imc=in_dev->mc_list; imc; imc=imc->next) { - struct ip_sf_list *psf; +static struct ip_sf_list *igmp_mcf_get_next(struct seq_file *seq, struct ip_sf_list *psf) +{ + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); - spin_lock_bh(&imc->lock); - for (psf=imc->sources; psf; psf=psf->sf_next) { - if (first) { - len += sprintf(buffer+len, "%3s %6s " - "%10s %10s %6s %6s\n", "Idx", - "Device", "MCA", "SRC", "INC", - "EXC"); - first = 0; - } - len += sprintf(buffer+len, "%3d %6.6s 0x%08x " - "0x%08x %6lu %6lu\n", dev->ifindex, - dev->name, ntohl(imc->multiaddr), - ntohl(psf->sf_inaddr), - psf->sf_count[MCAST_INCLUDE], - psf->sf_count[MCAST_EXCLUDE]); - pos=begin+len; - if(posoffset+length) { - spin_unlock_bh(&imc->lock); - read_unlock(&in_dev->lock); - in_dev_put(in_dev); - goto done; - } + psf = psf->sf_next; + while (!psf) { + spin_unlock_bh(&state->im->lock); + state->im = state->im->next; + while (!state->im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in_dev_put(state->idev); } - spin_unlock_bh(&imc->lock); + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + goto out; + } + state->idev = in_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + state->im = state->idev->mc_list; } - read_unlock(&in_dev->lock); - in_dev_put(in_dev); + if (!state->im) + break; + spin_lock_bh(&state->im->lock); + psf = state->im->sources; + } +out: + return psf; +} + +static struct ip_sf_list *igmp_mcf_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip_sf_list *psf = igmp_mcf_get_first(seq); + if (psf) + while (pos && (psf = igmp_mcf_get_next(seq, psf)) != NULL) + --pos; + return pos ? NULL : psf; +} + +static void *igmp_mcf_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp_mcf_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp_mcf_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip_sf_list *psf; + if (v == (void *)1) + psf = igmp_mcf_get_first(seq); + else + psf = igmp_mcf_get_next(seq, v); + ++*pos; + return psf; +} + +static void igmp_mcf_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); + if (likely(state->im != NULL)) + spin_unlock_bh(&state->im->lock); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in_dev_put(state->idev); } -done: read_unlock(&dev_base_lock); +} + +static int igmp_mcf_seq_show(struct seq_file *seq, void *v) +{ + struct ip_sf_list *psf = (struct ip_sf_list *)v; + struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq); - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; + if (v == (void *)1) { + seq_printf(seq, + "%3s %6s " + "%10s %10s %6s %6s\n", "Idx", + "Device", "MCA", + "SRC", "INC", "EXC"); + } else { + seq_printf(seq, + "%3d %6.6s 0x%08x " + "0x%08x %6lu %6lu\n", + state->dev->ifindex, state->dev->name, + ntohl(state->im->multiaddr), + ntohl(psf->sf_inaddr), + psf->sf_count[MCAST_INCLUDE], + psf->sf_count[MCAST_EXCLUDE]); + } + return 0; +} + +static struct seq_operations igmp_mcf_seq_ops = { + .start = igmp_mcf_seq_start, + .next = igmp_mcf_seq_next, + .stop = igmp_mcf_seq_stop, + .show = igmp_mcf_seq_show, +}; + +static int igmp_mcf_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp_mcf_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + rc = seq_open(file, &igmp_mcf_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } -#ifdef CONFIG_PROC_FS +static struct file_operations igmp_mcf_seq_fops = { + .owner = THIS_MODULE, + .open = igmp_mcf_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + int __init igmp_mc_proc_init(void) { struct proc_dir_entry *p; @@ -2322,6 +2425,10 @@ p = create_proc_entry("igmp", S_IRUGO, proc_net); if (p) p->proc_fops = &igmp_mc_seq_fops; + + p = create_proc_entry("mcfilter", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp_mcf_seq_fops; return 0; } #endif --- linux-2.5/net/ipv4/ip_output.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv4/ip_output.c Tue Jul 1 01:46:12 2003 @@ -1316,5 +1316,4 @@ #ifdef CONFIG_IP_MULTICAST igmp_mc_proc_init(); #endif - proc_net_create("mcfilter", 0, ip_mcf_procinfo); } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:22 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w35030220 for ; Mon, 30 Jun 2003 23:07:03 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615mtBo013072; Tue, 1 Jul 2003 14:48:55 +0900 Date: Tue, 01 Jul 2003 14:48:55 +0900 (JST) Message-Id: <20030701.144855.86742367.yoshfuji@linux-ipv6.org> To: davem@redhat.com, jmorris@intercode.com.au CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] seq_file conversion 4/5: /proc/net/mfilter6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3699 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev 4/5: convert /proc/net/mfilter6 to seq_file. --- linux-2.5/net/ipv6/mcast.c.orig Tue Jul 1 01:45:32 2003 +++ linux-2.5/net/ipv6/mcast.c Tue Jul 1 01:46:12 2003 @@ -2179,80 +2179,178 @@ .release = seq_release_private, }; -static int ip6_mcf_read_proc(char *buffer, char **start, off_t offset, - int length, int *eof, void *data) -{ - off_t pos=0, begin=0; - int len=0; - int first=1; +struct igmp6_mcf_iter_state { struct net_device *dev; - - read_lock(&dev_base_lock); - for (dev=dev_base; dev; dev=dev->next) { - struct inet6_dev *idev = in6_dev_get(dev); - struct ifmcaddr6 *imc; + struct inet6_dev *idev; + struct ifmcaddr6 *im; +}; - if (idev == NULL) - continue; +#define igmp6_mcf_seq_private(seq) ((struct igmp6_mcf_iter_state *)&seq->private) +static inline struct ip6_sf_list *igmp6_mcf_get_first(struct seq_file *seq) +{ + struct ip6_sf_list *psf = NULL; + struct ifmcaddr6 *im = NULL; + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + for (state->dev = dev_base, state->idev = NULL, state->im = NULL; + state->dev; + state->dev = state->dev->next) { + struct inet6_dev *idev; + idev = in6_dev_get(state->dev); + if (unlikely(idev == NULL)) + continue; read_lock_bh(&idev->lock); - - for (imc=idev->mc_list; imc; imc=imc->next) { - struct ip6_sf_list *psf; - unsigned long i; - - spin_lock_bh(&imc->mca_lock); - for (psf=imc->mca_sources; psf; psf=psf->sf_next) { - if (first) { - len += sprintf(buffer+len, "%3s %6s " - "%32s %32s %6s %6s\n", "Idx", - "Device", "Multicast Address", - "Source Address", "INC", "EXC"); - first = 0; - } - len += sprintf(buffer+len,"%3d %6.6s ", - dev->ifindex, dev->name); - - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", - imc->mca_addr.s6_addr[i]); - buffer[len++] = ' '; - for (i=0; i<16; i++) - len += sprintf(buffer+len, "%02x", - psf->sf_addr.s6_addr[i]); - len += sprintf(buffer+len, " %6lu %6lu\n", - psf->sf_count[MCAST_INCLUDE], - psf->sf_count[MCAST_EXCLUDE]); - pos = begin+len; - if (pos < offset) { - len=0; - begin=pos; - } - if (pos > offset+length) { - spin_unlock_bh(&imc->mca_lock); - read_unlock_bh(&idev->lock); - in6_dev_put(idev); - goto done; - } + im = idev->mc_list; + if (likely(im != NULL)) { + spin_lock_bh(&im->mca_lock); + psf = im->mca_sources; + if (likely(psf != NULL)) { + state->im = im; + state->idev = idev; + break; } - spin_unlock_bh(&imc->mca_lock); + spin_unlock_bh(&im->mca_lock); } read_unlock_bh(&idev->lock); - in6_dev_put(idev); } - *eof = 1; + return psf; +} + +static struct ip6_sf_list *igmp6_mcf_get_next(struct seq_file *seq, struct ip6_sf_list *psf) +{ + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + psf = psf->sf_next; + while (!psf) { + spin_unlock_bh(&state->im->mca_lock); + state->im = state->im->next; + while (!state->im) { + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } + state->dev = state->dev->next; + if (!state->dev) { + state->idev = NULL; + goto out; + } + state->idev = in6_dev_get(state->dev); + if (!state->idev) + continue; + read_lock_bh(&state->idev->lock); + state->im = state->idev->mc_list; + } + if (!state->im) + break; + spin_lock_bh(&state->im->mca_lock); + psf = state->im->mca_sources; + } +out: + return psf; +} + +static struct ip6_sf_list *igmp6_mcf_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip6_sf_list *psf = igmp6_mcf_get_first(seq); + if (psf) + while (pos && (psf = igmp6_mcf_get_next(seq, psf)) != NULL) + --pos; + return pos ? NULL : psf; +} + +static void *igmp6_mcf_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp6_mcf_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp6_mcf_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip6_sf_list *psf; + if (v == (void *)1) + psf = igmp6_mcf_get_first(seq); + else + psf = igmp6_mcf_get_next(seq, v); + ++*pos; + return psf; +} -done: +static void igmp6_mcf_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + if (likely(state->im != NULL)) + spin_unlock_bh(&state->im->mca_lock); + if (likely(state->idev != NULL)) { + read_unlock_bh(&state->idev->lock); + in6_dev_put(state->idev); + } read_unlock(&dev_base_lock); +} + +static int igmp6_mcf_seq_show(struct seq_file *seq, void *v) +{ + struct ip6_sf_list *psf = (struct ip6_sf_list *)v; + struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq); + + if (v == (void *)1) { + seq_printf(seq, + "%3s %6s " + "%32s %32s %6s %6s\n", "Idx", + "Device", "Multicast Address", + "Source Address", "INC", "EXC"); + } else { + seq_printf(seq, + "%3d %6.6s " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%04x%04x%04x%04x%04x%04x%04x%04x " + "%6lu %6lu\n", + state->dev->ifindex, state->dev->name, + NIP6(state->im->mca_addr), + NIP6(psf->sf_addr), + psf->sf_count[MCAST_INCLUDE], + psf->sf_count[MCAST_EXCLUDE]); + } + return 0; +} + +static struct seq_operations igmp6_mcf_seq_ops = { + .start = igmp6_mcf_seq_start, + .next = igmp6_mcf_seq_next, + .stop = igmp6_mcf_seq_stop, + .show = igmp6_mcf_seq_show, +}; + +static int igmp6_mcf_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp6_mcf_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + + rc = seq_open(file, &igmp6_mcf_seq_ops); + if (rc) + goto out_kfree; - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if (len<0) - len=0; - return len; + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } + +static struct file_operations igmp6_mcf_seq_fops = { + .owner = THIS_MODULE, + .open = igmp6_mcf_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; #endif int __init igmp6_init(struct net_proto_family *ops) @@ -2284,7 +2382,9 @@ p = create_proc_entry("igmp6", S_IRUGO, proc_net); if (p) p->proc_fops = &igmp6_mc_seq_fops; - create_proc_read_entry("net/mcfilter6", 0, 0, ip6_mcf_read_proc, NULL); + p = create_proc_entry("mcfilter6", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp6_mcf_seq_fops; #endif return 0; @@ -2295,6 +2395,7 @@ sock_release(igmp6_socket); igmp6_socket = NULL; /* for safety */ #ifdef CONFIG_PROC_FS + proc_net_remove("mcfilter6"); proc_net_remove("igmp6"); #endif } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:17 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w2x030220 for ; Mon, 30 Jun 2003 23:06:59 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615mmBo013037; Tue, 1 Jul 2003 14:48:48 +0900 Date: Tue, 01 Jul 2003 14:48:47 +0900 (JST) Message-Id: <20030701.144847.100402841.yoshfuji@linux-ipv6.org> To: davem@redhat.com, jmorris@intercode.com.au CC: netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: [PATCH] seq_file conversion 1/5: /proc/net/igmp From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3695 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev 1/5: convert /proc/net/igmp to seq_file Index: linux-2.5/include/net/ip.h =================================================================== RCS file: /home/cvs/linux-2.5/include/net/ip.h,v retrieving revision 1.20 diff -u -r1.20 ip.h --- linux-2.5/include/net/ip.h 7 Jun 2003 00:22:34 -0000 1.20 +++ linux-2.5/include/net/ip.h 30 Jun 2003 15:28:13 -0000 @@ -79,7 +79,7 @@ extern void ip_mc_dropsocket(struct sock *); extern void ip_mc_dropdevice(struct net_device *dev); -extern int ip_mc_procinfo(char *, char **, off_t, int); +extern int igmp_mc_proc_init(void); extern int ip_mcf_procinfo(char *, char **, off_t, int); /* Index: linux-2.5/net/ipv4/igmp.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/igmp.c,v retrieving revision 1.28 diff -u -r1.28 igmp.c --- linux-2.5/net/ipv4/igmp.c 26 Jun 2003 03:45:59 -0000 1.28 +++ linux-2.5/net/ipv4/igmp.c 30 Jun 2003 15:28:13 -0000 @@ -99,7 +99,10 @@ #ifdef CONFIG_IP_MROUTE #include #endif - +#ifdef CONFIG_PROC_FS +#include +#include +#endif #define IP_MAX_MEMBERSHIPS 20 @@ -2090,65 +2093,162 @@ return rv; } - -int ip_mc_procinfo(char *buffer, char **start, off_t offset, int length) -{ - off_t pos=0, begin=0; - struct ip_mc_list *im; - int len=0; +#if defined(CONFIG_PROC_FS) +struct igmp_mc_iter_state { struct net_device *dev; + struct in_device *in_dev; +}; - len=sprintf(buffer,"Idx\tDevice : Count Querier\tGroup Users Timer\tReporter\n"); +#define igmp_mc_seq_private(seq) ((struct igmp_mc_iter_state *)&seq->private) - read_lock(&dev_base_lock); - for(dev = dev_base; dev; dev = dev->next) { - struct in_device *in_dev = in_dev_get(dev); - char *querier = "NONE"; +static inline struct ip_mc_list *igmp_mc_get_first(struct seq_file *seq) +{ + struct ip_mc_list *im = NULL; + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); - if (in_dev == NULL) + for (state->dev = dev_base, state->in_dev = NULL; + state->dev; + state->dev = state->dev->next) { + struct in_device *in_dev; + in_dev = in_dev_get(state->dev); + if (!in_dev) continue; - -#ifdef CONFIG_IP_MULTICAST - querier = IGMP_V1_SEEN(in_dev) ? "V1" : "V2"; -#endif - - len+=sprintf(buffer+len,"%d\t%-10s: %5d %7s\n", - dev->ifindex, dev->name, dev->mc_count, querier); - read_lock(&in_dev->lock); - for (im = in_dev->mc_list; im; im = im->next) { - len+=sprintf(buffer+len, - "\t\t\t\t%08lX %5d %d:%08lX\t\t%d\n", - im->multiaddr, im->users, - im->tm_running, im->timer.expires-jiffies, im->reporter); - - pos=begin+len; - if(posoffset+length) { - read_unlock(&in_dev->lock); - in_dev_put(in_dev); - goto done; - } + im = in_dev->mc_list; + if (im) { + state->in_dev = in_dev; + break; } read_unlock(&in_dev->lock); - in_dev_put(in_dev); } -done: + return im; +} + +static struct ip_mc_list *igmp_mc_get_next(struct seq_file *seq, struct ip_mc_list *im) +{ + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + im = im->next; + while (!im) { + if (likely(state->in_dev != NULL)) { + read_unlock(&state->in_dev->lock); + in_dev_put(state->in_dev); + } + state->dev = state->dev->next; + if (!state->dev) { + state->in_dev = NULL; + break; + } + state->in_dev = in_dev_get(state->dev); + if (!state->in_dev) + continue; + read_lock(&state->in_dev->lock); + im = state->in_dev->mc_list; + } + return im; +} + +static struct ip_mc_list *igmp_mc_get_idx(struct seq_file *seq, loff_t pos) +{ + struct ip_mc_list *im = igmp_mc_get_first(seq); + if (im) + while (pos && (im = igmp_mc_get_next(seq, im)) != NULL) + --pos; + return pos ? NULL : im; +} + +static void *igmp_mc_seq_start(struct seq_file *seq, loff_t *pos) +{ + read_lock(&dev_base_lock); + return *pos ? igmp_mc_get_idx(seq, *pos) : (void *)1; +} + +static void *igmp_mc_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct ip_mc_list *im; + if (v == (void *)1) + im = igmp_mc_get_first(seq); + else + im = igmp_mc_get_next(seq, v); + ++*pos; + return im; +} + +static void igmp_mc_seq_stop(struct seq_file *seq, void *v) +{ + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + if (likely(state->in_dev != NULL)) { + read_unlock(&state->in_dev->lock); + in_dev_put(state->in_dev); + } read_unlock(&dev_base_lock); +} - *start=buffer+(offset-begin); - len-=(offset-begin); - if(len>length) - len=length; - if(len<0) - len=0; - return len; +static int igmp_mc_seq_show(struct seq_file *seq, void *v) +{ + if (v == (void *)1) + seq_printf(seq, + "Idx\tDevice : Count Querier\tGroup Users Timer\tReporter\n"); + else { + struct ip_mc_list *im = (struct ip_mc_list *)v; + struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq); + char *querier; +#ifdef CONFIG_IP_MULTICAST + querier = IGMP_V1_SEEN(state->in_dev) ? "V1" : "V2"; +#else + querier = "NONE"; +#endif + + if (state->in_dev->mc_list == im) { + seq_printf(seq, "%d\t%-10s: %5d %7s\n", + state->dev->ifindex, state->dev->name, state->dev->mc_count, querier); + } + + seq_printf(seq, + "\t\t\t\t%08lX %5d %d:%08lX\t\t%d\n", + im->multiaddr, im->users, + im->tm_running, im->timer.expires-jiffies, im->reporter); + } + return 0; +} + +static struct seq_operations igmp_mc_seq_ops = { + .start = igmp_mc_seq_start, + .next = igmp_mc_seq_next, + .stop = igmp_mc_seq_stop, + .show = igmp_mc_seq_show, +}; + +static int igmp_mc_seq_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int rc = -ENOMEM; + struct igmp_mc_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (!s) + goto out; + rc = seq_open(file, &igmp_mc_seq_ops); + if (rc) + goto out_kfree; + + seq = file->private_data; + seq->private = s; + memset(s, 0, sizeof(*s)); +out: + return rc; +out_kfree: + kfree(s); + goto out; } +static struct file_operations igmp_mc_seq_fops = { + .owner = THIS_MODULE, + .open = igmp_mc_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; +#endif + int ip_mcf_procinfo(char *buffer, char **start, off_t offset, int length) { off_t pos=0, begin=0; @@ -2213,4 +2313,16 @@ len=0; return len; } + +#ifdef CONFIG_PROC_FS +int __init igmp_mc_proc_init(void) +{ + struct proc_dir_entry *p; + + p = create_proc_entry("igmp", S_IRUGO, proc_net); + if (p) + p->proc_fops = &igmp_mc_seq_fops; + return 0; +} +#endif Index: linux-2.5/net/ipv4/ip_output.c =================================================================== RCS file: /home/cvs/linux-2.5/net/ipv4/ip_output.c,v retrieving revision 1.34 diff -u -r1.34 ip_output.c --- linux-2.5/net/ipv4/ip_output.c 21 Jun 2003 16:20:41 -0000 1.34 +++ linux-2.5/net/ipv4/ip_output.c 30 Jun 2003 15:28:13 -0000 @@ -1314,7 +1314,7 @@ inet_initpeers(); #ifdef CONFIG_IP_MULTICAST - proc_net_create("igmp", 0, ip_mc_procinfo); + igmp_mc_proc_init(); #endif proc_net_create("mcfilter", 0, ip_mcf_procinfo); } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Mon Jun 30 23:07:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:07:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6166w39030220 for ; Mon, 30 Jun 2003 23:07:06 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h615TmBo005917; Tue, 1 Jul 2003 14:29:48 +0900 Date: Tue, 01 Jul 2003 14:29:47 +0900 (JST) Message-Id: <20030701.142947.75158018.yoshfuji@linux-ipv6.org> To: jmorris@intercode.com.au Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: convert /proc/net/igmp6 to seq_file From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030701.132315.11621908.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3700 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Tue, 1 Jul 2003 14:41:19 +1000 (EST)), James Morris says: > net/ipv6/mcast.c: In function `igmp6_mc_seq_stop': > net/ipv6/mcast.c:2124: invalid type argument of `->' > make[2]: *** [net/ipv6/mcast.o] Error 1 > make[1]: *** [net/ipv6] Error 2 > make: *** [net] Error 2 > > > read_unlock_bh(state->idev->lock) should be > read_unlock_bh(&state->idev->lock) > > > At least one of the other patches has a similar problem. Hmm..., I don't understand why I could compile it... Okay, I'll resend the patches. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From jmorris@intercode.com.au Mon Jun 30 23:27:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 30 Jun 2003 23:28:06 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:dpSnwoSn70dVI8bvroJEfYlYtGx+mXMW@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h616Rt2x032480 for ; Mon, 30 Jun 2003 23:27:57 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h616Rfr05288; Tue, 1 Jul 2003 16:27:41 +1000 Date: Tue, 1 Jul 2003 16:27:41 +1000 (EST) From: James Morris To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, Subject: Re: [PATCH] seq_file conversion 5/5: /proc/net/anycast6 In-Reply-To: <20030701.144857.99268305.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 3702 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 1 Jul 2003, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > 5/5: convert /proc/net/anycast6 to seq_file. All five applied. - James -- James Morris