From davem@redhat.com Sun Jun 1 01:33:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 01:33:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h518Wt2x021058 for ; Sun, 1 Jun 2003 01:33:01 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA15160; Sun, 1 Jun 2003 01:30:40 -0700 Date: Sun, 01 Jun 2003 01:30:40 -0700 (PDT) Message-Id: <20030601.013040.116362760.davem@redhat.com> To: mk@linux-ipv6.org Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 From: "David S. Miller" In-Reply-To: <87fzmv5ejc.wl@karaba.org> References: <87fzmv5ejc.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2798 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / 神田 充 Date: Sun, 01 Jun 2003 00:20:07 +0900 Hello Mitsuru-san! + t->id.spi = xfrm6_tunnel_addr_hash((xfrm_address_t *)&x->props.saddr); You misunderstood what I tried to explain to you. Consider, how do you guarentee that this t->id.spi value is unique across all xfrm6_tunnel tunnels using the same t->id.daddr and t->id.prot? The answer is that you cannot. You must generate fake "spi" values, they have no meaning outside of xfrm6_tunnel.c They serve purpose only to map 128-bit ipv6 address to 32-bit "xfrm6_tunnel" SPI value. I would suggest following implementation: 1) Implement something similar to xfrm_alloc_spi(t, 1, ~(u32)0) It just needs to allocate unique SPI numbers local to xfrm6_tunnel.c We mark "SPI" value zero as reserved and to indicate failed lookup. 2) Create hash table, it is keyed by ipv6 address and hash table entries give SPI values. So on input you would say something like: u32 spi; spi = spihash_lookup(&iph->saddr); if (!spi) goto drop; x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, spi, IPPROTO_IPV6, AF_INET6); Is the idea more clear now? Once you fix this up I'll apply your xfrm6_tunnel.c work. Thank you. From davem@redhat.com Sun Jun 1 01:37:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 01:37:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h518b32x021371 for ; Sun, 1 Jun 2003 01:37:04 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA15174; Sun, 1 Jun 2003 01:34:52 -0700 Date: Sun, 01 Jun 2003 01:34:52 -0700 (PDT) Message-Id: <20030601.013452.68050592.davem@redhat.com> To: jmorris@intercode.com.au Cc: mk@linux-ipv6.org, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] xfrm ip6ip6 From: "David S. Miller" In-Reply-To: References: <87fzmv5ejc.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2799 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Sun, 1 Jun 2003 02:01:42 +1000 (EST) We need to either filter them out or make sure they are displayed as ipip. Part of the answer will depend on whether we want to expose xfrm-based ipip tunnels for general use, or only use them internally for ipcomp. I think it is an error to extend PF_KEY for our Linux purposes. Our API here is basically defined to be whatever is in KAME :-) However, setkey should filter entries it does not understand. Currently I see no use for exposing these tunnel transforms outside of the kernel. Mobile IPV6, if it decides to use xfrm6_tunnel, can configure them itself in the kernel side support. Or, if user side is more appropriate for MIPV6 access, we may allow it to use xfrm netlink interface somehow. From aj@dungeon.inka.de Sun Jun 1 04:48:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 04:48:22 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51BmG2x032765 for ; Sun, 1 Jun 2003 04:48:17 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19MQ81-0008Ud-00; Sun, 01 Jun 2003 12:31:53 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 2FBA420FAA; Sun, 1 Jun 2003 12:31:50 +0200 (CEST) From: Andreas Jellinghaus To: netdev@oss.sgi.com Subject: ipsec / pppoe Date: Sun, 1 Jun 2003 12:33:22 +0200 User-Agent: KMail/1.5.2 Cc: howto@lartc.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306011233.22544.aj@dungeon.inka.de> X-archive-position: 2800 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev with pppoe it is usualy necessary to clamp the maximum segment size down to 1452 bytes. This can be done with a netfilter module or with "-m 1452" option to pppoe. with ipsec (esp, tunnel mode) even on a wlan interface before the ppp connection I needed to clamp the mss down further to 1384 bytes. now all connections are working fine. my calculation gave me 1500 mtu (wlan0) - 20 (ip) - 48 (esp) - 20 (ip) - 20 (tcp) = 1392 or 1492 (ppp(oe)) - 20 (ip) - 20 (tcp) = 1452, so the min of 1392 should have been the right value. Don't know why I need to clamp the mss down to 1384, but e.g. http connections to www.microsoft.com work fine with 1384 and do not work at all with 1392. still I don't know why some machines don't respond to icmp packet to big errors with a smaller packet but not act on it at all. maybe some broken firewall thinks it is some kind of attack? I don't know what exactly is between me and websites such as www.google.com or www.microsoft.com, so I can't figure out. sorry to have bothered everyone and many thanks to james for his help. cc: to howto@lartc.org, it think this would make a nice howto entry. Regards, Andreas From jmorris@intercode.com.au Sun Jun 1 05:19:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 05:19:35 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:ypEXYr5lpcO3McCfmTLAPa7YYYHO+9Ax@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51CJQ2x004280 for ; Sun, 1 Jun 2003 05:19:28 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h51CIwr12637; Sun, 1 Jun 2003 22:18:58 +1000 Date: Sun, 1 Jun 2003 22:18:56 +1000 (EST) From: James Morris To: Andreas Jellinghaus cc: netdev@oss.sgi.com, Subject: Re: ipsec / pppoe In-Reply-To: <200306011233.22544.aj@dungeon.inka.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2801 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sun, 1 Jun 2003, Andreas Jellinghaus wrote: > cc: to howto@lartc.org, it think this would make a nice > howto entry. Actually, there is a bug in the way icmp pmtu messages are being generated here, which should be fixed soon. - James -- James Morris From gandalf@wlug.westbo.se Sun Jun 1 16:05:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 01 Jun 2003 16:05:12 -0700 (PDT) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h51N512x016698 for ; Sun, 1 Jun 2003 16:05:02 -0700 Received: by tux.rsn.bth.se (Postfix, from userid 501) id 6AE3836FE0; Mon, 2 Jun 2003 01:04:58 +0200 (CEST) Subject: [PATCH] fix use after free in e100 From: Martin Josefsson To: scott.feldman@intel.com Cc: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1054508698.24777.17.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 02 Jun 2003 01:04:58 +0200 X-archive-position: 2802 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Hi Scott. Here's a fix for a use-after-free in the e100 driver. You can't touch the skb after a call to netif_rx(), it might have been free'd. Caught with Manfred's unmap-page-debugging patch in -mm. Applies to both 2.4 and 2.5 --- linux-2.5.69-mm9/drivers/net/e100/e100_main.c.orig 2003-06-02 00:48:13.000000000 +0200 +++ linux-2.5.69-mm9/drivers/net/e100/e100_main.c 2003-06-02 00:50:09.000000000 +0200 @@ -2052,13 +2052,14 @@ skb->ip_summed = CHECKSUM_NONE; } + bdp->drv_stats.net_stats.rx_bytes += skb->len; + if(bdp->vlgrp && (rfd_status & CB_STATUS_VLAN)) { vlan_hwaccel_rx(skb, bdp->vlgrp, be16_to_cpu(rfd->vlanid)); } else { netif_rx(skb); } dev->last_rx = jiffies; - bdp->drv_stats.net_stats.rx_bytes += skb->len; rfd_cnt++; } /* end of rfd loop */ -- /Martin From Robert.Olsson@data.slu.se Mon Jun 2 03:59:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 03:59:19 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52AxB2x012408 for ; Mon, 2 Jun 2003 03:59:12 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id MAA08655; Mon, 2 Jun 2003 12:58:32 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16091.11735.721251.925522@robur.slu.se> Date: Mon, 2 Jun 2003 12:58:31 +0200 To: Simon Kirby Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030529205125.GA30058@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 2803 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > Full profile output available here: > > http://blue.netnation.com/sim/ref/ > readprofile.full_route_table_hash_fixed_napi.* > > Note that if I increase the packet rate and NAPI kicks in, all of the > handle_IRQ and similar overhead basically disappears because it no longer > uses IRQs. Pretty spiffy. Here is a profile of that: > Full profile output available as: 8896 rt_garbage_collect 9.4237 8959 ip_route_input_slow 3.8885 10516 dst_alloc 73.0278 10666 kmem_cache_free 66.6625 15339 tg3_rx 16.2489 16553 ipt_do_table 14.9937 20193 fn_hash_lookup 70.1146 26833 rt_intern_hash 34.9388 64803 ip_route_input 150.0069 From DoS perspective a more interesting experiment compared to where you limited input rate to have 30% idle CPU. New dst is coming all the time first seached in hash (ip_route_input) and not found so ip_route_input_slow/fn_hash_lookup/dst_alloc/rt_intern_hash path is taken to add a new dst entry... And later GC have to remove all enties with spin_lock_bh hold (no packet processing runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something to observe. I had some idea to rate-limit new flows and try to isolate the device causing the DoS Something like (ip_route_input): [We don't have an hash entry] /* DoS check... Rate down but do not stop GC and creation of new hash entries until GC frees resources. We limit per interface so hogger dev(s) will be hit hardest. As a side effect we get dst_overrun per device. */ entries = atomic_read(&ipv4_dst_ops.entries); if (entries > ip_rt_max_size) { int drp = 4; if( dev->dst_hash_overrun++ % drp ) { if (net_ratelimit()) printk(KERN_WARNING "dst creation throttled\n"); return -ECONNREFUSED; } /* Also make sure the slow path gets a chance to create the dst entry */ if (ipv4_dst_ops.gc && ipv4_dst_ops.gc()) { RT_CACHE_STAT_INC(gc_dst_overflow); return -ENOBUFS; } } [ip_route_input_slow comes here] But more thinking is needed... Cheers. --ro From sim@netnation.com Mon Jun 2 08:18:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 08:19:05 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52FIr2x022153 for ; Mon, 2 Jun 2003 08:18:53 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Mr5I-0002Cv-6H; Mon, 02 Jun 2003 08:18:52 -0700 Date: Mon, 2 Jun 2003 08:18:52 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress Message-ID: <20030602151852.GA6070@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16091.11735.721251.925522@robur.slu.se> User-Agent: Mutt/1.5.4i X-archive-position: 2804 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 12:58:31PM +0200, Robert Olsson wrote: > New dst is coming all the time first seached in hash (ip_route_input) and not found > so ip_route_input_slow/fn_hash_lookup/dst_alloc/rt_intern_hash path is taken to add > a new dst entry... > > And later GC have to remove all enties with spin_lock_bh hold (no packet processing > runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something > to observe. > > I had some idea to rate-limit new flows and try to isolate the device causing the DoS > Something like (ip_route_input): ... > if (net_ratelimit()) > printk(KERN_WARNING "dst creation throttled\n"); > > return -ECONNREFUSED; This reminds me of the situation we experienced with the dst cache overflowing in early 2.2 kernels. This was a long time ago, when our traffic was only about 10 Mbits/second. We had recently upgraded from a 2.0 kernel. The dst cache was overflowing due to a bug in the garbage collector, and at the time, no messages were printed. It took me a _long_ time to figure out why connections to a server I hadn't previously connected to in a while would only work every so often, and not immediately like they should. I'm affraid this approach will have a similar effect, albeit (hopefully) only under an attack. Is it possible to have a dst LRU or a simpler approximation of such and recycle dst entries rather than deallocating/reallocating them? This would relieve a lot of work from the garbage collector and avoid the periodic large garbage collection latency. It could be tuned to only occur in an attack (I remember Alexey saying that the deferred garbage collection was implemented to reduce latency in normal opreation). Would this work? Cross-CPU thrashing issues? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From Robert.Olsson@data.slu.se Mon Jun 2 09:37:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 09:37:29 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52GbH2x023431 for ; Mon, 2 Jun 2003 09:37:19 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id SAA14178; Mon, 2 Jun 2003 18:36:37 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16091.32021.75335.227150@robur.slu.se> Date: Mon, 2 Jun 2003 18:36:37 +0200 To: Simon Kirby Cc: Robert Olsson , "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030602151852.GA6070@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 2805 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > This reminds me of the situation we experienced with the dst cache > overflowing in early 2.2 kernels. This was a long time ago, when our > traffic was only about 10 Mbits/second. We had recently upgraded from a > 2.0 kernel. The dst cache was overflowing due to a bug in the garbage > collector, and at the time, no messages were printed. It took me a > _long_ time to figure out why connections to a server I hadn't previously > connected to in a while would only work every so often, and not > immediately like they should. I'm affraid this approach will have a > similar effect, albeit (hopefully) only under an attack. We are given more work than we have resources for (max_size) what else than refuse can we do? But yes we have invested pretty much work already. Also remember we are looking into runs were 100% of incoming traffic has one new dst for every packet. So how is the situation in "real life"? In case of multiple devices at least NAPI gives all devs it's share. > Is it possible to have a dst LRU or a simpler approximation of such and > recycle dst entries rather than deallocating/reallocating them? This > would relieve a lot of work from the garbage collector and avoid the > periodic large garbage collection latency. It could be tuned to only > occur in an attack (I remember Alexey saying that the deferred garbage > collection was implemented to reduce latency in normal opreation). I don't see how this can be done. Others may? Cheers. --ro From rddunlap@osdl.org Mon Jun 2 10:08:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 10:08:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52H8T2x024261 for ; Mon, 2 Jun 2003 10:08:30 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52H8GX20706; Mon, 2 Jun 2003 10:08:16 -0700 Date: Mon, 2 Jun 2003 10:07:54 -0700 From: "Randy.Dunlap" To: Andi Kleen Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program Message-Id: <20030602100754.1e3e1ca8.rddunlap@osdl.org> In-Reply-To: <20030531120940.GB11898@wotan.suse.de> References: <20030530090015.7c435c9a.rddunlap@osdl.org> <20030530.171111.71099698.davem@redhat.com> <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030531120940.GB11898@wotan.suse.de> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2806 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Sat, 31 May 2003 14:09:40 +0200 Andi Kleen wrote: | > You really need something like rtnl_talk() or rtnl_dump_filter() | > from libnetlink to do this properly. | | In case it's helpful I wrote manpages for libnetlink some time ago. | | I also have some simple example programs using it. Other examples | can be found in the zebra or bird source code. Does this man page live somewhere? Do some distros ship it? Where can I find your example programs that use it? Couple of corrections below.... | rtnl_listen | Receive netlink data after a request and pass it to | handler. handler is a callback that gets the mes- | sage source address, the message itself, and the | jarg cookie as arguments. It will get called for + It will be called for {'get' should usually be avoided when easily done} | all received messages. Only one message bundle is | received. Unless there is no message pending this | function does not block. | | | rta_addattr32 | Initialize the rtnetlink attribute rta with a __u32 | data value. | | | rta_addattr32 + rta_addattr_l | Initialize the rtnetlink attribute rta with a vari- | able length data value. -- ~Randy From krkumar@us.ibm.com Mon Jun 2 10:33:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 10:34:04 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52HXp2x024952 for ; Mon, 2 Jun 2003 10:33:58 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e3.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52HWdE2115688; Mon, 2 Jun 2003 13:32:39 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52HWbTC245356; Mon, 2 Jun 2003 13:32:37 -0400 Message-ID: <3EDB8A41.2080305@us.ibm.com> Date: Mon, 02 Jun 2003 10:32:49 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 CC: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 References: <3ED80230.2030508@us.ibm.com> <20030531.110249.12960077.yoshfuji@linux-ipv6.org> In-Reply-To: <20030531.110249.12960077.yoshfuji@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2807 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi Yoshifuji, Thanks for your comments. >>+/* prefix list returned to user space in this structure */ >>+struct plist_user_info { > > ^ip6 or ipv6 or so. > >>+ char name[IFNAMSIZ]; /* interface name */ > > ~~~~~~~~~~~~~~~~~~~duplicate information. > Point noted. That can be removed (prefer to have name instead of ifindex). >>+ int ifindex; /* interface index */ >>+ int nprefixes; /* number of elements in 'prefix' */ >>+ struct var_plist_user_info { /* multiple elements */ >>+ char flags[3]; /* router advertised flags */ > > ~~~~~~~~this is not good interface. This is my mistake. When I added the original interface, it was using the proc filesystem and it made sense at that time for a user to cat /proc/net/ and actually see the flags. While converting to use netlink, I forgot to change this to real flags. This was not intended interface :-) >>+ int plen; /* prefix length */ >>+ __u32 valid; /* valid lifetime */ >>+ struct in6_addr ra_addr;/* advertising router */ >>+ struct in6_addr prefix; /* prefix */ >>+ } plist_vars[0]; >>+}; >>+ >> extern void addrconf_init(void); >> extern void addrconf_cleanup(void); >> > > > : > > I think we should use 1 fixed-length message per prefix, > instead of variable length message. I had got this idea from "struct fib_info" which also has variable size structure, but probably it is not worth the extra effort to save a few bytes. >>+ ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, >>+ &p_el->ra_addr); >>+ for (i = 0; i < 8; i++) >>+ pinfo->plist_vars[count].ra_addr.s6_addr16[i] = >>+ __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); >>+ ipv6_addr_copy(&pinfo->plist_vars[count].prefix, >>+ &p_el->pinfo.prefix); >>+ for (i = 0; i < p_el->pinfo.prefix_len/16; i++) >>+ pinfo->plist_vars[count].prefix.s6_addr16[i] = >>+ __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); > > > Absoletely nasty. > - don't use charaters to represent flags; use real flags. > - use network-byte order. network-byte order ? User will get prefix in network byte order, is that correct ? >>+static int prefix_list_proc_dump(char *buffer, char **start, off_t offset, >>+ int length) >>+{ > > : > > Please use seq_file. OK. > Again, what I proposed was to store prefix information on fib with > some flags to represent advertised by routers and give user-space > the RA information using new rtattr (RTA_RA6INFO or something like that). > > struct rta_ra6info { > u32 rta_ra6flags; > }; > In my mail, I had given problems with doing that in the fib. I can look to convert to fib, but please let me know which kernel routines I should look at. Thanks, - KK From sim@netnation.com Mon Jun 2 11:05:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 11:05:50 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52I5b2x025880 for ; Mon, 2 Jun 2003 11:05:38 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Mtgf-0000n4-A7; Mon, 02 Jun 2003 11:05:37 -0700 Date: Mon, 2 Jun 2003 11:05:37 -0700 From: Simon Kirby To: Robert Olsson Cc: "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress Message-ID: <20030602180537.GB30957@netnation.com> References: <20030522.015815.91322249.davem@redhat.com> <20030522.034058.71558626.davem@redhat.com> <20030522114438.GD2961@netnation.com> <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> <16091.32021.75335.227150@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16091.32021.75335.227150@robur.slu.se> User-Agent: Mutt/1.5.4i X-archive-position: 2808 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 06:36:37PM +0200, Robert Olsson wrote: > We are given more work than we have resources for (max_size) what else than > refuse can we do? But yes we have invested pretty much work already. Well, this is the problem. We do not and cannot know which entries we really want to remember (legitimate traffic). Adding code to actually refuse new dst entries is just going to make the DoS effective, which is NOT what we want. > Also remember we are looking into runs were 100% of incoming traffic has one > new dst for every packet. So how is the situation in "real life"? > In case of multiple devices at least NAPI gives all devs it's share. Right, so, when we are traffic saturated, we want to make sure the whole route cache and route path is as fast as possible. Recycling dst entries by simpy rewriting and rehashing them rather than allocating new and eventually freeing them all in the garbage collection cycle should reduce allocator overhead. If this is only done when the table is full, I don't see any downside...if this is in fact doable, that is. :) Simon- From kumarkr@us.ibm.com Mon Jun 2 12:48:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 12:48:44 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52JmR2x029134 for ; Mon, 2 Jun 2003 12:48:34 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52JlbXD283444; Mon, 2 Jun 2003 15:47:38 -0400 Received: from d03nm801.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52JlZhO019342; Mon, 2 Jun 2003 13:47:36 -0600 Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: davem@redhat.com Cc: kuznet@ms2.inr.ac.ru, linux-net@vger.kernel.org, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 Message-ID: From: Krishna Kumar Date: Mon, 2 Jun 2003 12:46:55 -0700 X-MIMETrack: Serialize by Router on D03NM801/03/M/IBM(Release 6.0.1 [IBM]|May 27, 2003) at 06/02/2003 13:45:07 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 2809 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kumarkr@us.ibm.com Precedence: bulk X-list: netdev Regarding my previous mail : > > Again, what I proposed was to store prefix information on fib with > > some flags to represent advertised by routers and give user-space > > the RA information using new rtattr (RTA_RA6INFO or something like that). > > > > struct rta_ra6info { > > u32 rta_ra6flags; > > }; > In my mail, I had given problems with doing that in the fib. I can look to > convert to fib, but please let me know which kernel routines I should look at What I meant is whether you are referring to addrconf_prefix_route() when you mention storing prefix on fib ? > > Again, what I proposed was to store prefix information on fib with > > some flags to represent advertised by routers and give user-space > > the RA information using new rtattr (RTA_RA6INFO or something like that). > > This sounds very reasonable. Also since you prefer it to be implemented as part of routing table, would it be OK to return the prefix list via netstat or route command (this uses rt6_info_route() to print the information). The current user for prefix list has no problem using a command instead of writing netlink user code to get the list. I am not sure why rtnetlink is needed in this case. thanks, - KK From dlstevens@us.ibm.com Mon Jun 2 14:03:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:03:21 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52L322x032636 for ; Mon, 2 Jun 2003 14:03:09 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e31.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h52L2EQs218628; Mon, 2 Jun 2003 17:02:14 -0400 Received: from d03nm121.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h52L2Dba153836; Mon, 2 Jun 2003 15:02:13 -0600 Importance: Normal Sensitivity: Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: krkumar@us.ibm.com, davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: David Stevens Date: Mon, 2 Jun 2003 15:02:03 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.0.1 [IBM]|April 28, 2003) at 06/02/2003 15:02:13 MIME-Version: 1.0 Content-type: text/plain; charset=ISO-2022-JP X-archive-position: 2810 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev On the "location of the data" issue, I have some comment. The users of prefix list data don't know the prefix or the prefix length; they know the interface index, and need to get the prefixes. The data is fundamentally per-interface, and the routing table is per-destination. So, adding the prefixes to the routing table doesn't seem like the best choice because everything that currently uses the routing table will have to skip over these extra entries (which they'll never be interested in) and the users of the prefix data will have to skip over all existing routing table entries (which they're never interested in). Routes and prefixes are independent of each other, so throwing them in the same table to me seems like it only creates work to skip entries that aren't related, and because the users of the prefix data don't have the key needed for a fast look-up in the routing table, prefix users in particular have to skip through everything currently in the routing table, linearly, with no benefit at all for being there. I also see no relation between prefix list data and the FIB; current users are completely independent from prefix list users, and it appears to only slow both of them down. The prefix data is always looked-up by interface index, so I think it really belongs in the inet6 per-interface structure, unless I'm missing something. What benefits are there for lumping this with existing data structures that aren't per-interface, or keyed per-interface? +-DLS YOSHIFUJI Hideaki / 吉藤英明 @vger.kernel.org on 05/30/2003 07:02:49 PM Sent by: linux-net-owner@vger.kernel.org To: krkumar@us.ltcfwd.linux.ibm.com cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 In article <3ED80230.2030508@us.ibm.com> (at Fri, 30 May 2003 18:15:28 -0700), Krishna Kumar says: > +/* prefix list returned to user space in this structure */ > +struct plist_user_info { ^ip6 or ipv6 or so. > + char name[IFNAMSIZ]; /* interface name */ ~~~~~~~~~~~~~~~~~~~duplicate information. > + int ifindex; /* interface index */ > + int nprefixes; /* number of elements in 'prefix' */ > + struct var_plist_user_info { /* multiple elements */ > + char flags[3]; /* router advertised flags */ ~~~~~~~~this is not good interface. > + int plen; /* prefix length */ > + __u32 valid; /* valid lifetime */ > + struct in6_addr ra_addr;/* advertising router */ > + struct in6_addr prefix; /* prefix */ > + } plist_vars[0]; > +}; > + > extern void addrconf_init(void); > extern void addrconf_cleanup(void); > : I think we should use 1 fixed-length message per prefix, instead of variable length message. > + pinfo->plist_vars[count].plen = p_el->pinfo.prefix_len; > + pinfo->plist_vars[count].valid = p_el->pinfo.valid - > + (jiffies - p_el->timestamp)/HZ; > + if ((p_el->ra_flags & (ND_RA_FLAG_MANAGED | > + ND_RA_FLAG_OTHER)) > + == (ND_RA_FLAG_MANAGED|ND_RA_FLAG_OTHER)) > + strcpy(pinfo->plist_vars[count].flags, "MO"); > + else if (p_el->ra_flags & ND_RA_FLAG_MANAGED) > + strcpy(pinfo->plist_vars[count].flags, "M"); > + else if (p_el->ra_flags & ND_RA_FLAG_OTHER) > + strcpy(pinfo->plist_vars[count].flags, "O"); > + else > + strcpy(pinfo->plist_vars[count].flags, "-"); > + ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, > + &p_el->ra_addr); > + for (i = 0; i < 8; i++) > + pinfo->plist_vars[count].ra_addr.s6_addr16[i] = > + __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); > + ipv6_addr_copy(&pinfo->plist_vars[count].prefix, > + &p_el->pinfo.prefix); > + for (i = 0; i < p_el->pinfo.prefix_len/16; i++) > + pinfo->plist_vars[count].prefix.s6_addr16[i] = > + __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); Absoletely nasty. - don't use charaters to represent flags; use real flags. - use network-byte order. > +static int prefix_list_proc_dump(char *buffer, char **start, off_t offset, > + int length) > +{ : Please use seq_file. Again, what I proposed was to store prefix information on fib with some flags to represent advertised by routers and give user-space the RA information using new rtattr (RTA_RA6INFO or something like that). struct rta_ra6info { u32 rta_ra6flags; }; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From rddunlap@osdl.org Mon Jun 2 14:05:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:05:52 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52L5d2x000551 for ; Mon, 2 Jun 2003 14:05:40 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52L5FX04868; Mon, 2 Jun 2003 14:05:15 -0700 Date: Mon, 2 Jun 2003 14:04:52 -0700 From: "Randy.Dunlap" To: "David S. Miller" Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program Message-Id: <20030602140452.039248de.rddunlap@osdl.org> In-Reply-To: <20030530.234211.102567405.davem@redhat.com> References: <20030530090015.7c435c9a.rddunlap@osdl.org> <20030530.171111.71099698.davem@redhat.com> <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2811 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Fri, 30 May 2003 23:42:11 -0700 (PDT) "David S. Miller" wrote: | From: "Randy.Dunlap" | Date: Fri, 30 May 2003 20:22:12 -0700 (PDT) | | Oh well, it's at this URL, bugs and all. | | http://www.xenotime.net/linux/ipv6/rtnl_test.c | | I know you don't want to use libnetlink from iproute2, but I want to | stress that it takes care of all of the minutae of netlink socket | usage that you have to duplicate in your little test program and this | duplication leads to bugs. | | Firstly, you needs to be fixed to call recvmsg() multiple times, | you'll get one entry for each recvmsg call in the table you are | querying. Yes, I noticed that I was getting only 1 msg there. | You really need something like rtnl_talk() or rtnl_dump_filter() | from libnetlink to do this properly. Does anyone have documentation (or semantics) for rtnl_talk()? or just some blurb about it? Andi's libnetlink man page missed it somehow. Thanks, -- ~Randy From pb@bieringer.de Mon Jun 2 14:52:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:52:39 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52Lq02x002042 for ; Mon, 2 Jun 2003 14:52:21 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 098561387A; Mon, 2 Jun 2003 23:12:13 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 302D21387D; Mon, 2 Jun 2003 23:12:12 +0200 (CEST) X-AV-Checked: Mon Jun 2 23:12:12 2003 smtp2.aerasec.de Received: from [192.168.1.2] (p50805317.dip.t-dialin.net [80.128.83.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id D80391387A; Mon, 2 Jun 2003 23:12:10 +0200 (CEST) Date: Mon, 02 Jun 2003 23:12:08 +0200 From: Peter Bieringer To: Maillist netdev Cc: Maillist USAGI-users Subject: Is there already a doc available for the new IPsec code? Message-ID: <36990000.1054588328@worker.muc.bieringer.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2812 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi, i want to play a little bit with the new IPsec code. Is there already a doc available how to use it (config file of IKE daemon, etc., e.g. compared against the FreeS/WAN code). Thank you very much for input, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From acme@conectiva.com.br Mon Jun 2 14:57:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:57:57 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52LvV2x002424 for ; Mon, 2 Jun 2003 14:57:52 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YC5G-0004y9-00; Thu, 03 Jul 2003 18:57:43 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 8850C1966C; Mon, 2 Jun 2003 21:58:16 +0000 (UTC) Date: Mon, 2 Jun 2003 18:58:15 -0300 From: Arnaldo Carvalho de Melo To: Peter Bieringer Cc: Maillist netdev , Maillist USAGI-users Subject: Re: Is there already a doc available for the new IPsec code? Message-ID: <20030602215815.GL9312@conectiva.com.br> References: <36990000.1054588328@worker.muc.bieringer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <36990000.1054588328@worker.muc.bieringer.de> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2813 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Em Mon, Jun 02, 2003 at 11:12:08PM +0200, Peter Bieringer escreveu: > Hi, > > i want to play a little bit with the new IPsec code. > > Is there already a doc available how to use it (config file of IKE daemon, > etc., e.g. compared against the FreeS/WAN code). > > Thank you very much for input, Look at Bert Hubert's LART - Arnaldo From davem@redhat.com Mon Jun 2 14:58:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 14:58:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52LwC2x002530 for ; Mon, 2 Jun 2003 14:58:33 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA24872; Mon, 2 Jun 2003 14:56:20 -0700 Date: Mon, 02 Jun 2003 14:56:19 -0700 (PDT) Message-Id: <20030602.145619.71112623.davem@redhat.com> To: rddunlap@osdl.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <20030602140452.039248de.rddunlap@osdl.org> References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2814 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 14:04:52 -0700 Does anyone have documentation (or semantics) for rtnl_talk()? or just some blurb about it? I always have to wonder about someone who can't live with just working code to study, and absolutely requires some document describing it. What is better or more accurate description than code itself!?!?! From rddunlap@osdl.org Mon Jun 2 15:00:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 15:00:25 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h52M012x003050 for ; Mon, 2 Jun 2003 15:00:21 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h52LxfX22069; Mon, 2 Jun 2003 14:59:41 -0700 Date: Mon, 2 Jun 2003 14:59:17 -0700 From: "Randy.Dunlap" To: Arnaldo Carvalho de Melo Cc: pb@bieringer.de, netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: Is there already a doc available for the new IPsec code? Message-Id: <20030602145917.33fbd05d.rddunlap@osdl.org> In-Reply-To: <20030602215815.GL9312@conectiva.com.br> References: <36990000.1054588328@worker.muc.bieringer.de> <20030602215815.GL9312@conectiva.com.br> Organization: OSDL X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2815 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev On Mon, 2 Jun 2003 18:58:15 -0300 Arnaldo Carvalho de Melo wrote: | Em Mon, Jun 02, 2003 at 11:12:08PM +0200, Peter Bieringer escreveu: | > Hi, | > | > i want to play a little bit with the new IPsec code. | > | > Is there already a doc available how to use it (config file of IKE daemon, | > etc., e.g. compared against the FreeS/WAN code). | > | > Thank you very much for input, | | Look at Bert Hubert's LART | | - Arnaldo that's www.lartc.org ... -- ~Randy From david-b@pacbell.net Mon Jun 2 18:56:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 18:56:11 -0700 (PDT) Received: from mta7.pltn13.pbi.net (mta7.pltn13.pbi.net [64.164.98.8]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h531u62x010771 for ; Mon, 2 Jun 2003 18:56:06 -0700 Received: from pacbell.net (ppp-67-118-246-97.dialup.pltn13.pacbell.net [67.118.246.97]) by mta7.pltn13.pbi.net (8.12.9/8.12.3) with ESMTP id h531trEQ002137; Mon, 2 Jun 2003 18:55:54 -0700 (PDT) Message-ID: <3EDC0047.7030007@pacbell.net> Date: Mon, 02 Jun 2003 18:56:23 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2816 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev David S. Miller wrote: > From: "Randy.Dunlap" > Date: Mon, 2 Jun 2003 14:04:52 -0700 > > Does anyone have documentation (or semantics) for rtnl_talk()? > or just some blurb about it? > > I always have to wonder about someone who can't live with just > working code to study, and absolutely requires some document > describing it. > > What is better or more accurate description than code itself!?!?! Well, the difference between code and its spec is generally a bug that needs to be fixed ... which can be in the code as well as in the spec. And for reasonable design specs, it's more likely in the code. But if there's only the code, it gets a lot more troublesome when things don't behave "as expected". People who are in a position to change the code to meet their expectations may not care, but that's rarely a significant chunk of the user community. And in particular, writing tests against the code is generally the wrong way to go. They need to be written against some kind of spec. - Dave From davem@redhat.com Mon Jun 2 19:04:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:04:42 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5324a2x011315 for ; Mon, 2 Jun 2003 19:04:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA25475; Mon, 2 Jun 2003 19:02:41 -0700 Date: Mon, 02 Jun 2003 19:02:40 -0700 (PDT) Message-Id: <20030602.190240.74724523.davem@redhat.com> To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC0047.7030007@pacbell.net> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0047.7030007@pacbell.net> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2817 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Brownell Date: Mon, 02 Jun 2003 18:56:23 -0700 Well, the difference between code and its spec is generally a bug that needs to be fixed ... See, a document is NOT the spec, the code is the spec. Because where the document is wrong, the code determines the final answer. This is true in all cases. I cannot tell you how much time I've seen people waste because they went for documents first, only to find them to be inaccurate for some corner case whilst the code has all of the accurate answers. When I see someone want docs, I interpret this as "I don't want to have to think or have to comprehend something, I'm too lazy to read the code." Well, such laziness leads the person in question only to be suscpetible to all of the inaccuracies and disconnect that always will exist between said docs (if they even exist) and the code. It is also the mechanism that leads people to send patches that add arbitrary crap all over the ipv4/ipv6 code, totally missing the point that the routing and/or netlink layer did %99 of what they wanted already. For example, I added a hoplimit route attribute to RTNETLINK. Who documented this? What document can you read that would teach you about this feature? None. And don't tell me this is a doc bug, every time I make a change the documentation will be instantly buggy and I'm not going to be required to document every diff I make to the tree. From jsd@monmouth.com Mon Jun 2 19:34:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:34:31 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h532Y42x012360 for ; Mon, 2 Jun 2003 19:34:25 -0700 Received: (qmail 4037 invoked from network); 3 Jun 2003 02:33:59 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 02:33:59 -0000 Message-ID: <3EDC0915.1080109@monmouth.com> Date: Mon, 02 Jun 2003 22:33:57 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, David Brownell Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> In-Reply-To: <20030602.145619.71112623.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2818 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/02/2003 05:56 PM, David S. Miller wrote: >> >> I always have to wonder about someone who can't live with just >> working code to study, and absolutely requires some document >> describing it. >> >> What is better or more accurate description than code itself!?!?! I am very grateful for the kindness of those who have written the code in question and made it available to help others. I wish to repay that with help and kindness, not the opposite. Today I can help by speaking the truth, and the truth is that documentation is sorely needed. There's no point in writing code if few people use it. Linux is hanging in there at a few percent market share. That's not going to grow unless there is better documentation. On 06/02/2003 09:56 PM, David Brownell replied: > > Well, the difference between code and its spec is > generally a bug that needs to be fixed ... which > can be in the code as well as in the spec. And for > reasonable design specs, it's more likely in the code. > > But if there's only the code, it gets a lot more > troublesome when things don't behave "as expected". > > People who are in a position to change the code > to meet their expectations may not care, but that's > rarely a significant chunk of the user community. > > And in particular, writing tests against the code > is generally the wrong way to go. They need to be > written against some kind of spec. I have to agree with Mr. Brownell on this one. >> What is better or more accurate description than code itself!?!?! There are two ideas mixed up there. -- Better documentation. -- Accurate description. 1) Yes, code is the most accurate description of the code. But it is not to be confused with good documentation. 2) Documentation should be clear and concise. Code must attend to all the details. 3) Sometimes efficiency requires that the code be tricky. Documentation must not be tricky. 4) In theory, very well-commented code might approximate its own documentation. But I haven't seen any such code lately. Here are _all_ the comments from xfrm_input.c, a 454-line file: /* Fetch spi and seq frpm ipsec header */ /* Allocate new secpath or COW existing one. */ /* Fetch spi and seq frpm ipsec header */ iph = skb->nh.ipv6h; /* ??? */ if (x->props.mode) { /* XXX */ /* Allocate new secpath or COW existing one. */ #endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ If you were teaching a programming course, how would you grade an assignment turned in with that level of commenting? 5) In the engine of a piston-driven airplane, there are two spark plugs in every cylinder. Obviously that costs twice as much and weighs twice as much as having only one. But the redundancy makes the system thousands of times more reliable. Similarly, writing code _and_ documetation is about twice as expensive as writing the code alone. But the redundancy makes it possible to achieve much greater reliability. Also maintainability and extensibility. 6) Adding extra vehemence '!?!?!' does not add clarity to the discussion. *) et cetera. From davem@redhat.com Mon Jun 2 19:40:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 19:40:56 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h532en2x012713 for ; Mon, 2 Jun 2003 19:40:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA25623; Mon, 2 Jun 2003 19:38:53 -0700 Date: Mon, 02 Jun 2003 19:38:53 -0700 (PDT) Message-Id: <20030602.193853.112598236.davem@redhat.com> To: jsd@monmouth.com Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, david-b@pacbell.net Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC0915.1080109@monmouth.com> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2819 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 22:33:57 -0400 There's no point in writing code if few people use it. People use this "undocumented" area of the kernel every time their machine boots up. From jsd@monmouth.com Mon Jun 2 20:21:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:21:10 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533L12x013419 for ; Mon, 2 Jun 2003 20:21:02 -0700 Received: (qmail 4663 invoked from network); 3 Jun 2003 03:20:56 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 03:20:56 -0000 Message-ID: <3EDC1418.6080808@monmouth.com> Date: Mon, 02 Jun 2003 23:20:56 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: netlink tester program References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> In-Reply-To: <20030602.193853.112598236.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2820 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev I wrote in part: > > There's no point in writing code if few people > use it. On 06/02/2003 10:38 PM, David S. Miller wrote: > > People use this "undocumented" area of the kernel every > time their machine boots up. That's not inconsistent with what I was saying. Mr. Miller said people use it. That's true. Some people use it. I said few people use it. That's true. The context of my original statement was: > Linux is hanging in there at a few > percent market share. That's not going to grow > unless there is better documentation. This is supposed to be open-source software, n'est-ce pas? Software that is copylefted but not documented is open according to the letter of the law, but lacks the spirit of openness. Mr. Miller is very smart and has spent years getting up to speed in this area. Is the code to be open only to those who are equally smart and willing to invest equally huge amounts of time? When people ask for help in understanding the code, it might mean they need help in understanding the code. From davem@redhat.com Mon Jun 2 20:24:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:24:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533OR2x013749 for ; Mon, 2 Jun 2003 20:24:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25722; Mon, 2 Jun 2003 20:22:33 -0700 Date: Mon, 02 Jun 2003 20:22:33 -0700 (PDT) Message-Id: <20030602.202233.39180859.davem@redhat.com> To: jsd@monmouth.com Cc: netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC1418.6080808@monmouth.com> References: <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> <3EDC1418.6080808@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2821 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 23:20:56 -0400 Mr. Miller is very smart and has spent years getting up to speed in this area. Is the code to be open only to those who are equally smart and willing to invest equally huge amounts of time? Are legal rights only available to people who understand the law and have a legal degree? No, this is why we hire lawyers if we choose not to study law ourselves. Your logic is heavily flawed. From david-b@pacbell.net Mon Jun 2 20:32:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:32:48 -0700 (PDT) Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533WO2x014086 for ; Mon, 2 Jun 2003 20:32:44 -0700 Received: from pacbell.net (ppp-67-118-247-59.dialup.pltn13.pacbell.net [67.118.247.59]) by mta4.rcsntx.swbell.net (8.12.9/8.12.3) with ESMTP id h533WDhi012216; Mon, 2 Jun 2003 22:32:18 -0500 (CDT) Message-ID: <3EDC173B.80909@pacbell.net> Date: Mon, 02 Jun 2003 20:34:19 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "David S. Miller" CC: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2823 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev > Well, the difference between code and its spec is > generally a bug that needs to be fixed ... > > See, a document is NOT the spec, the code is the spec. That's hardly the only development model. > Because where the document is wrong, the code determines > the final answer. This is true in all cases. Not "all". "Code-as-spec" works well when there's only one code base, but otherwise it's flawed. Even the model of a "reference implementation" is trouble ... since it invariably evolves into "everyone should use this code". Of course, bugs that stay unfixed for a long time can force the "spec" to change. It's a great vendor lock-in tool, and it can happen accidentally too. But most folk view such interop problems as bugs, not features. > I cannot tell you how much time I've seen people waste because they > went for documents first, only to find them to be inaccurate for some > corner case whilst the code has all of the accurate answers. Or where they notice the code is wrong in that corner case, and they can prove that easily since the spec (implemented correctly in several other places) and the code disagree. Or where this implementation uses this answer, and that one uses that answer ... and the poor user gets caught in the middle of a finger pointing war, which can't be resolved since each implementation's developers claim to be "the spec", and the user eventually gives up saying "a pox on you all!" You clipped out the text where I pointed out that bugs can be in specs as well as code. They can be fixed there, too. > When I see someone want docs, I interpret this as "I don't want to > have to think or have to comprehend something, I'm too lazy to read > the code." Well, such laziness leads the person in question only > to be suscpetible to all of the inaccuracies and disconnect that > always will exist between said docs (if they even exist) and the > code. That's an *extremely negative interpretation*, and while I've seen people that are that lazy, they happen to be in the minority of people I've known to ask for docs/specs. (Thank the Gods!) Consider one thing that docs/specs do that code can't: give the "30,000 foot view" rather than the "tree level view". It's not "lazy" to avoid using the tree-level view; sometimes such low-level perspectives can be counterproductive. People ask for docs for lots of reasons, and most of them have nothing at all to do with laziness. - Dave From rddunlap@osdl.org Mon Jun 2 20:32:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:32:34 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533WS2x014085 for ; Mon, 2 Jun 2003 20:32:29 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533WMX27576; Mon, 2 Jun 2003 20:32:22 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533WM5C029705; Mon, 2 Jun 2003 20:32:22 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:32:22 -0700 (PDT) Message-ID: <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:32:22 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.145619.71112623.davem@redhat.com> References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2822 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > From: "Randy.Dunlap" > Date: Mon, 2 Jun 2003 14:04:52 -0700 > > Does anyone have documentation (or semantics) for rtnl_talk()? > or just some blurb about it? > > I always have to wonder about someone who can't live with just > working code to study, and absolutely requires some document > describing it. > > What is better or more accurate description than code itself!?!?! The code is absolute, no doubt about it. It is the authority. That doesn't make it right in all cases AFAIK. And it lacks documentation, even in the source files. There are no semantics or meaning associated with that code except by the people who developed it. I'm not one of them, so I'm trying to ask them or others who know. And yes, I'm looking for the way that it should be done (IMHO) instead of the way it is done. Now, given that I think that the netlink interface is poorly documented, and that I'm trying to add some kernel code that uses it, and that I'm trying to test said kernel code with a userspace test program, I also plan to add such documentation that I think is warranted to make it easy to use, even by non-kernel devevlopers. This documentation might end up living outside of the kernel tree -- that's OK. But in any case, from both private and mailing list emails, I'm not alone in thinking that it's needed. ~Randy From david-b@pacbell.net Mon Jun 2 20:35:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:35:10 -0700 (PDT) Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533Z62x014700 for ; Mon, 2 Jun 2003 20:35:06 -0700 Received: from pacbell.net (ppp-67-118-247-59.dialup.pltn13.pacbell.net [67.118.247.59]) by mta4.rcsntx.swbell.net (8.12.9/8.12.3) with ESMTP id h533Z1hi017198; Mon, 2 Jun 2003 22:35:01 -0500 (CDT) Message-ID: <3EDC17E4.6070506@pacbell.net> Date: Mon, 02 Jun 2003 20:37:08 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: "John S. Denker" CC: "David S. Miller" , rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program References: <32804.4.64.196.31.1054351332.squirrel@www.osdl.org> <20030530.234211.102567405.davem@redhat.com> <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <3EDC0915.1080109@monmouth.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2824 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: david-b@pacbell.net Precedence: bulk X-list: netdev John S. Denker wrote: > Similarly, writing code _and_ documetation is about > twice as expensive as writing the code alone. But > the redundancy makes it possible to achieve much > greater reliability. Also maintainability and > extensibility. Excellent points. The developement process needs to address communities other than the folk writing the code ... like the people who inherit that code, and the ones trying to use it. (Testing being a rather specialize type of "use".) - Dave From davem@redhat.com Mon Jun 2 20:37:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:37:07 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533b12x015003 for ; Mon, 2 Jun 2003 20:37:01 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25846; Mon, 2 Jun 2003 20:35:05 -0700 Date: Mon, 02 Jun 2003 20:35:05 -0700 (PDT) Message-Id: <20030602.203505.59678701.davem@redhat.com> To: rddunlap@osdl.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> References: <20030602140452.039248de.rddunlap@osdl.org> <20030602.145619.71112623.davem@redhat.com> <33001.4.64.196.31.1054611142.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2825 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:32:22 -0700 (PDT) The code is absolute, no doubt about it. It is the authority. That doesn't make it right in all cases AFAIK. I totally agree. Now, given that I think that the netlink interface is poorly documented, and that I'm trying to add some kernel code that uses it, and that I'm trying to test said kernel code with a userspace test program, I also plan to add such documentation that I think is warranted to make it easy to use, even by non-kernel devevlopers. This is exactly how things should work. Where there is a need for X _AND_ someone willing to create X, it will be created. No arguments from me on this :-) From jsd@monmouth.com Mon Jun 2 20:41:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:41:48 -0700 (PDT) Received: from av8n.net (pcp03191463pcs.midltn01.nj.comcast.net [68.37.175.11]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533fi2x015402 for ; Mon, 2 Jun 2003 20:41:44 -0700 Received: (qmail 4890 invoked from network); 3 Jun 2003 03:41:39 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 3 Jun 2003 03:41:39 -0000 Message-ID: <3EDC18F2.6090505@monmouth.com> Date: Mon, 02 Jun 2003 23:41:38 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: netlink tester program References: <3EDC0915.1080109@monmouth.com> <20030602.193853.112598236.davem@redhat.com> <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> In-Reply-To: <20030602.202233.39180859.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2826 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/02/2003 11:22 PM, David S. Miller wrote: > > Are legal rights only available to people who understand > the law and have a legal degree? > > No, this is why we hire lawyers if we choose not to study > law ourselves. If we are taking the legal system as our model of openness, then open-source software has come to a sorry pass indeed. ========= It is also important to distinguish what's best for *you* and what's best for the project. Maybe *you* don't want to be responsible for doing all the documentation. I can understand that. But the project as a whole would be better off it it had better documentation. Perhaps you could recruit other folks to help with this. But disdaining the whole concept isn't a good way to start the recruiting. From davem@redhat.com Mon Jun 2 20:44:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:44:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533iH2x015769 for ; Mon, 2 Jun 2003 20:44:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25861; Mon, 2 Jun 2003 20:38:35 -0700 Date: Mon, 02 Jun 2003 20:38:34 -0700 (PDT) Message-Id: <20030602.203834.115933659.davem@redhat.com> To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC173B.80909@pacbell.net> References: <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> <3EDC173B.80909@pacbell.net> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2827 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Brownell Date: Mon, 02 Jun 2003 20:34:19 -0700 > See, a document is NOT the spec, the code is the spec. That's hardly the only development model. It's the one that works for _me_ and Alexey and myself, and we're the ones doing all the work. When someone doing the work desires the docs and desires to WRITE it, it will appear. You can expect exactly nothing more in our development model. If you require me to write the docs, you misunderstand how the system works :) You clipped out the text where I pointed out that bugs can be in specs as well as code. They can be fixed there, too. Very true. So when Randy writes the more detailed netlink/rtnetlink docs, we'll be happy :-) There is even an official IETF RFC written by Jamal, Alexey, and others documenting netlink btw :-)))))))))))) Did anybody notice this? From davem@redhat.com Mon Jun 2 20:48:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:48:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533me2x016214 for ; Mon, 2 Jun 2003 20:48:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25892; Mon, 2 Jun 2003 20:46:45 -0700 Date: Mon, 02 Jun 2003 20:46:45 -0700 (PDT) Message-Id: <20030602.204645.48505284.davem@redhat.com> To: jsd@monmouth.com Cc: netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <3EDC18F2.6090505@monmouth.com> References: <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> <3EDC18F2.6090505@monmouth.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2828 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "John S. Denker" Date: Mon, 02 Jun 2003 23:41:38 -0400 If we are taking the legal system as our model of openness, then open-source software has come to a sorry pass indeed. It does have connections where a "user" wants to do something with FOO but does not wish to do the legwork necessary to be an expert in FOO. They hire an expert. Or, in our case, they make an expert interested in the thing they want to do :-))) It is also important to distinguish what's best for *you* and what's best for the project. Maybe *you* don't want to be responsible for doing all the documentation. I'm not even going to attempt to document something that moves as fast as the kernel. I go to bookstores and I see many excellent attempts to document kernel internals, but these books are frozen in time. Specifically they are frozen in the time of the moment the kernel they write for is published. As a consequence they are all obsolete the moment they are published. Some poor student reads these books, written against 2.4.8 or whatever, then they go and try to contribute to 2.5.x and it doesn't work except for certain kinds of drivers where we've kept the APIs more or less the same. But I don't care that people do this, just don't require that I do it. I think this extra fluidity we get from being able to change so fast is a strength not a weakness. From rddunlap@osdl.org Mon Jun 2 20:49:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:49:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533nd2x016498 for ; Mon, 2 Jun 2003 20:49:39 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533nWX32403; Mon, 2 Jun 2003 20:49:32 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533nW5C030705; Mon, 2 Jun 2003 20:49:32 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:49:32 -0700 (PDT) Message-ID: <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:49:32 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.203834.115933659.davem@redhat.com> References: <3EDC0047.7030007@pacbell.net> <20030602.190240.74724523.davem@redhat.com> <3EDC173B.80909@pacbell.net> <20030602.203834.115933659.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , , , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2829 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > From: David Brownell > Date: Mon, 02 Jun 2003 20:34:19 -0700 > > > See, a document is NOT the spec, the code is the spec. > > That's hardly the only development model. > > It's the one that works for _me_ and Alexey and myself, and we're the ones > doing all the work. Do you want it to remain that way? > When someone doing the work desires the docs and desires to > WRITE it, it will appear. > > You can expect exactly nothing more in our development model. > If you require me to write the docs, you misunderstand how the > system works :) > > You clipped out the text where I pointed out that bugs can > be in specs as well as code. They can be fixed there, too. > > Very true. So when Randy writes the more detailed netlink/rtnetlink docs, > we'll be happy :-) > > There is even an official IETF RFC written by Jamal, Alexey, and > others documenting netlink btw :-)))))))))))) > > Did anybody notice this? Yes. ~Randy From rddunlap@osdl.org Mon Jun 2 20:54:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:54:11 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533s62x016967 for ; Mon, 2 Jun 2003 20:54:07 -0700 Received: from fire-1.osdl.org (air1.pdx.osdl.net [172.20.0.5]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h533s1X00402; Mon, 2 Jun 2003 20:54:01 -0700 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-1.osdl.org (8.12.8/8.11.6) with SMTP id h533s15C030774; Mon, 2 Jun 2003 20:54:01 -0700 Received: from 4.64.196.31 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Mon, 2 Jun 2003 20:54:01 -0700 (PDT) Message-ID: <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> Date: Mon, 2 Jun 2003 20:54:01 -0700 (PDT) Subject: Re: netlink tester program From: "Randy.Dunlap" To: In-Reply-To: <20030602.204645.48505284.davem@redhat.com> References: <3EDC1418.6080808@monmouth.com> <20030602.202233.39180859.davem@redhat.com> <3EDC18F2.6090505@monmouth.com> <20030602.204645.48505284.davem@redhat.com> X-Priority: 3 Importance: Normal Cc: , X-Mailer: SquirrelMail (version 1.2.11) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2831 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > It is also important to distinguish what's best > for *you* and what's best for the project. > Maybe *you* don't want to be responsible for > doing all the documentation. > > I'm not even going to attempt to document something that > moves as fast as the kernel. That point is a real problem... > I go to bookstores and I see many excellent attempts to document > kernel internals, but these books are frozen in time. Specifically they are > frozen in the time of the moment the kernel they write for is published. As > a consequence they are all obsolete the moment they are published. No doubt. > Some poor student reads these books, written against 2.4.8 or > whatever, then they go and try to contribute to 2.5.x and it > doesn't work except for certain kinds of drivers where we've > kept the APIs more or less the same. > > But I don't care that people do this, just don't require that I do it. Sure. Are you willing to answer questions about it at least? > I think this extra fluidity we get from being able to change so fast is a > strength not a weakness. If it were only a strength, that would be great. I believe that it's both, however. ~Randy From davem@redhat.com Mon Jun 2 20:53:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:53:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533rt2x016943 for ; Mon, 2 Jun 2003 20:53:56 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25928; Mon, 2 Jun 2003 20:51:56 -0700 Date: Mon, 02 Jun 2003 20:51:56 -0700 (PDT) Message-Id: <20030602.205156.08346169.davem@redhat.com> To: rddunlap@osdl.org Cc: david-b@pacbell.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> References: <3EDC173B.80909@pacbell.net> <20030602.203834.115933659.davem@redhat.com> <33060.4.64.196.31.1054612172.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2830 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:49:32 -0700 (PDT) > It's the one that works for _me_ and Alexey and myself, and we're > the ones doing all the work. Do you want it to remain that way? Doesn't matter to _us_, _we_ know how these things work and how to use them. If we don't, we'll read the code to learn this. Other's care, and if someone writes the docs for _them_, that is _fine_. What I object to is "hey we have to have docs, why didn't dave and alexey write them". :) Franks a lot, David S. Miller davem@redhat.com From davem@redhat.com Mon Jun 2 20:56:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 20:56:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h533uL2x017606 for ; Mon, 2 Jun 2003 20:56:21 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA25955; Mon, 2 Jun 2003 20:54:26 -0700 Date: Mon, 02 Jun 2003 20:54:25 -0700 (PDT) Message-Id: <20030602.205425.21904841.davem@redhat.com> To: rddunlap@osdl.org Cc: jsd@monmouth.com, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> References: <3EDC18F2.6090505@monmouth.com> <20030602.204645.48505284.davem@redhat.com> <33078.4.64.196.31.1054612441.squirrel@www.osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2832 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Randy.Dunlap" Date: Mon, 2 Jun 2003 20:54:01 -0700 (PDT) > But I don't care that people do this, just don't require that I do it. Sure. Are you willing to answer questions about it at least? As long as others like Alexey and Jamal help field these questions and it's not just me sitting here becoming a Linux development support service :-) From pekkas@netcore.fi Mon Jun 2 21:48:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 21:48:56 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h534mf2x019134 for ; Mon, 2 Jun 2003 21:48:42 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h534mDd09020; Tue, 3 Jun 2003 07:48:13 +0300 Date: Tue, 3 Jun 2003 07:48:13 +0300 (EEST) From: Pekka Savola To: David Stevens cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , Subject: Re: [PATCH] Prefix List patch against 2.5.70 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 2833 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Mon, 2 Jun 2003, David Stevens wrote: > The users of prefix list data don't know the prefix or the prefix > length; they know the interface index, and need to get the prefixes. > > The data is fundamentally per-interface, and the routing table is > per-destination. So, adding the prefixes to the routing table doesn't seem > like the best choice because everything that currently uses the routing > table will have to skip over these extra entries (which they'll never be > interested in) Umm.. every prefix should have an interface route, so they're a required subset of the routing table, correct? > and the users of the prefix data will have to skip over all > existing routing table entries (which they're never interested in). ... > Routes and prefixes are independent of each other, so throwing them in the > same table to me seems like it only creates work to skip entries that > aren't related, and because the users of the prefix data don't have the key > needed for a fast look-up in the routing table, prefix users in particular > have to skip through everything currently in the routing table, linearly, > with no benefit at all for being there. > > I also see no relation between prefix list data and the FIB; current users > are completely independent from prefix list users, and it appears to only > slow both of them down. The prefix data is always looked-up by interface > index, so I think it really belongs in the inet6 per-interface structure, > unless I'm missing something. What benefits are there for lumping this with > existing data structures that aren't per-interface, or keyed per-interface? > > +-DLS > > > YOSHIFUJI Hideaki / 吉藤英明 @vger.kernel.org on > 05/30/2003 07:02:49 PM > > Sent by: linux-net-owner@vger.kernel.org > > > To: krkumar@us.ltcfwd.linux.ibm.com > cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, > netdev@oss.sgi.com, linux-net@vger.kernel.org > Subject: Re: [PATCH] Prefix List patch against 2.5.70 > > > > In article <3ED80230.2030508@us.ibm.com> (at Fri, 30 May 2003 18:15:28 > -0700), Krishna Kumar says: > > > +/* prefix list returned to user space in this structure */ > > +struct plist_user_info { > ^ip6 or ipv6 or so. > > + char name[IFNAMSIZ]; /* interface name */ > ~~~~~~~~~~~~~~~~~~~duplicate information. > > + int ifindex; /* interface index */ > > + int nprefixes; /* number of elements in 'prefix' */ > > + struct var_plist_user_info { /* multiple elements */ > > + char flags[3]; /* router advertised flags */ > ~~~~~~~~this is not good interface. > > + int plen; /* prefix length */ > > + __u32 valid; /* valid lifetime */ > > + struct in6_addr ra_addr;/* advertising router */ > > + struct in6_addr prefix; /* prefix */ > > + } plist_vars[0]; > > +}; > > + > > extern void addrconf_init(void); > > extern void addrconf_cleanup(void); > > > > : > > I think we should use 1 fixed-length message per prefix, > instead of variable length message. > > > > + pinfo->plist_vars[count].plen = p_el->pinfo.prefix_len; > > + pinfo->plist_vars[count].valid = p_el->pinfo.valid - > > + (jiffies - p_el->timestamp)/HZ; > > + if ((p_el->ra_flags & (ND_RA_FLAG_MANAGED | > > + ND_RA_FLAG_OTHER)) > > + == (ND_RA_FLAG_MANAGED|ND_RA_FLAG_OTHER)) > > + strcpy(pinfo->plist_vars[count].flags, "MO"); > > + else if (p_el->ra_flags & ND_RA_FLAG_MANAGED) > > + strcpy(pinfo->plist_vars[count].flags, "M"); > > + else if (p_el->ra_flags & ND_RA_FLAG_OTHER) > > + strcpy(pinfo->plist_vars[count].flags, "O"); > > + else > > + strcpy(pinfo->plist_vars[count].flags, "-"); > > + ipv6_addr_copy(&pinfo->plist_vars[count].ra_addr, > > + &p_el->ra_addr); > > + for (i = 0; i < 8; i++) > > + pinfo->plist_vars[count].ra_addr.s6_addr16[i] = > > + > __constant_ntohs(pinfo->plist_vars[count].ra_addr.s6_addr16[i]); > > + ipv6_addr_copy(&pinfo->plist_vars[count].prefix, > > + &p_el->pinfo.prefix); > > + for (i = 0; i < p_el->pinfo.prefix_len/16; i++) > > + pinfo->plist_vars[count].prefix.s6_addr16[i] = > > + > __constant_ntohs(pinfo->plist_vars[count].prefix.s6_addr16[i]); > > Absoletely nasty. > - don't use charaters to represent flags; use real flags. > - use network-byte order. > > > > +static int prefix_list_proc_dump(char *buffer, char **start, off_t > offset, > > + int length) > > +{ > : > > Please use seq_file. > > > Again, what I proposed was to store prefix information on fib with > some flags to represent advertised by routers and give user-space > the RA information using new rtattr (RTA_RA6INFO or something like that). > > struct rta_ra6info { > u32 rta_ra6flags; > }; > > -- > Hideaki YOSHIFUJI @ USAGI Project > GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA > > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From davem@redhat.com Mon Jun 2 21:52:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 21:52:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h534qE2x019507 for ; Mon, 2 Jun 2003 21:52:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA26095; Mon, 2 Jun 2003 21:49:41 -0700 Date: Mon, 02 Jun 2003 21:49:41 -0700 (PDT) Message-Id: <20030602.214941.102551312.davem@redhat.com> To: pekkas@netcore.fi Cc: dlstevens@us.ibm.com, yoshfuji@linux-ipv6.org, krkumar@us.ibm.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Prefix List patch against 2.5.70 From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2834 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Pekka Savola Date: Tue, 3 Jun 2003 07:48:13 +0300 (EEST) Umm.. every prefix should have an interface route, so they're a required subset of the routing table, correct? That's entirely correct, thanks for noticing this :-) This is why I said that they could add to a global list all routes that meet this criteria. Thus making any querying mechanism simple to implement. From vnuorval@tcs.hut.fi Mon Jun 2 23:41:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 23:42:09 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h536fv2x023287 for ; Mon, 2 Jun 2003 23:41:58 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Tue, 3 Jun 2003 09:35:54 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h536ZqjH019079; Tue, 3 Jun 2003 09:35:52 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h536ZjF1019075; Tue, 3 Jun 2003 09:35:46 +0300 Date: Tue, 3 Jun 2003 09:35:45 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030531.000319.114704530.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2836 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Sat, 31 May 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > Let us test the patch. It seemed buggy when USAGI tested before. Any feedback would have been (and still is) of course welcome. The bugs are much easier to locate and fix if people report about them :-) -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From dlstevens@us.ibm.com Mon Jun 2 23:43:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 02 Jun 2003 23:43:29 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h536hP2x023608 for ; Mon, 2 Jun 2003 23:43:25 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h536gXuT188876; Tue, 3 Jun 2003 02:42:33 -0400 Received: from d03nm121.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h536gWba166304; Tue, 3 Jun 2003 00:42:33 -0600 Importance: Normal Sensitivity: Subject: Re: [PATCH] Prefix List patch against 2.5.70 To: Pekka Savola Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , krkumar@us.ibm.com, , , , X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: David Stevens Date: Tue, 3 Jun 2003 00:42:25 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.0.1 [IBM]|April 28, 2003) at 06/03/2003 00:42:32 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-archive-position: 2837 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev >Umm.. every prefix should have an interface route, so they're a required >subset of the routing table, correct? I'm not sure that it has to be, except if you make that the only way to access prefix list information. :-) Administrators and routing daemons are free to mess with the routing table in creative ways (aggregating, creating static routes to enforce some policy, whatever), but when the routing table holds more than just routing information, either those routes can't be messed with, or the prefix information is lost. And that's more relevant when not using address autoconfiguration. The prefix list information that's relevant is the prefix, the prefix length and the M and O bits, as they were in the router advertisement. For routing purposes, it wouldn't be a problem to aggregate interface routes that cover a contiguous portion of the address space, but doing that would lose the prefix information if the routing table is your only source. So, routing daemons would have to check a funky flag and leave prefix-list-relevant routes alone. M and O bits are per-interface; they have no relevance at all in the routing table, but they'd all have to be updated if they changed. There is an example already where routes are installed for per-interface information: local addresses. There are host routes corresponding to local addresses in the routing table now, but there is also a list of local addresses associated with the interface. Is that a bad idea? Certainly, it's possible to flag all of the host routes that are for local addresses (really, just check for interface loopback) and search the entire routing table when trying to answer the question "what addresses are on this interface," but it's much better to have that address list associated directly with the interface (especially for source selection). The consumers of prefix list (DHCPv6 and mobile IPv6) need the entire prefix list, length and M&O bits for a given interface. The prefixes (the key) aren't known for the search, and no other interfaces or destination routes are ever interesting for those consumers. The interface routes can be deleted, forced to something else, or modified now without losing any information, because they are only relevant for packet routing. If the prefix information is divined from the routing table, the interface routes suddenly contain more than routing information, and should then have special restrictions on them that other routes don't have (they should be immutable). I don't think that's a good idea, when you can hang the prefix list right off the interface and return the full list whenever you need it. The interface routes can be overridden or aggregated without messing at all with the prefix list information. That seems pretty simple to me. +-DLS From etsh_cucu@yahoo.com Tue Jun 3 00:57:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 00:57:46 -0700 (PDT) Received: from web14305.mail.yahoo.com (web14305.mail.yahoo.com [216.136.173.81]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h537vg2x025791 for ; Tue, 3 Jun 2003 00:57:42 -0700 Message-ID: <20030603075742.34434.qmail@web14305.mail.yahoo.com> Received: from [213.158.161.140] by web14305.mail.yahoo.com via HTTP; Tue, 03 Jun 2003 00:57:42 PDT Date: Tue, 3 Jun 2003 00:57:42 -0700 (PDT) From: Hisham Kotry Subject: Re: netlink tester program To: david-b@pacbell.net Cc: rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030602.203834.115933659.davem@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2838 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: etsh_cucu@yahoo.com Precedence: bulk X-list: netdev --- "David S. Miller" wrote: > There is even an official IETF RFC written by Jamal, > Alexey, and > others documenting netlink btw :-)))))))))))) > > Did anybody notice this? It was defenitly a nice read, but the netlink2 draft is somewhat inconsistent, it mentions reducing the 32-bit length field to 16-bits and equally distributing the remaining 16-bits between the new version and extended flags fields, but the draft makes no further refrence to the version field. Infact the netlink2 message header diagram on page 16, as well as the pseudo message on page 28, show a 16-bits extended flags field with no version field in the header. So this is probably one of those cases in wich specs aren't clear enough and code usually has the final word in such situations. I mailed Jamal about this a while ago but never got a reply back. BTW, is netlink2 support planned for linux in the near future? David, sorry for the private mail, but it was unintentional as I (by mistake) pressed reply instead of reply all. Chaow, kotry __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com From hch@lst.de Tue Jun 3 01:23:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 01:23:43 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h538NT2x028700 for ; Tue, 3 Jun 2003 01:23:31 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h538NRJT022960 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 3 Jun 2003 10:23:27 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h538NQ0w022958 for netdev@oss.sgi.com; Tue, 3 Jun 2003 10:23:26 +0200 Date: Tue, 3 Jun 2003 10:23:26 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] move dmascc away from setup.c Message-ID: <20030603082326.GA22946@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2839 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Yeah, it's a isa driver but it already was in the setup.c pci probes list. Also use SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT. --- 1.15/drivers/net/setup.c Thu May 29 11:57:13 2003 +++ edited/drivers/net/setup.c Mon Jun 2 12:16:07 2003 @@ -9,8 +9,6 @@ #include #include -extern int dmascc_init(void); - extern int scc_enet_init(void); extern int fec_enet_init(void); @@ -29,10 +27,6 @@ /* * Early setup devices */ - -#if defined(CONFIG_DMASCC) - {dmascc_init, 0}, -#endif #if defined(CONFIG_SCC_ENET) {scc_enet_init, 0}, #endif --- 1.15/drivers/net/hamradio/dmascc.c Fri May 30 08:50:31 2003 +++ edited/drivers/net/hamradio/dmascc.c Mon Jun 2 12:31:48 2003 @@ -250,8 +250,6 @@ /* Function declarations */ - -int dmascc_init(void) __init; static int setup_adapter(int card_base, int type, int n) __init; static void write_scc(struct scc_priv *priv, int reg, int val); @@ -299,23 +297,12 @@ static unsigned long rand; -/* Module functions */ - -#ifdef MODULE - - MODULE_AUTHOR("Klaus Kudielka"); MODULE_DESCRIPTION("Driver for high-speed SCC boards"); MODULE_PARM(io, "1-" __MODULE_STRING(MAX_NUM_DEVS) "i"); MODULE_LICENSE("GPL"); - -int init_module(void) { - return dmascc_init(); -} - - -void cleanup_module(void) { +static void __exit dmascc_exit(void) { int i; struct scc_info *info; @@ -341,24 +328,16 @@ } } - -#else - - +#ifndef MODULE void __init dmascc_setup(char *str, int *ints) { int i; for (i = 0; i < MAX_NUM_DEVS && i < ints[0]; i++) io[i] = ints[i+1]; } - - #endif - -/* Initialization functions */ - -int __init dmascc_init(void) { +static int __init dmascc_init(void) { int h, i, j, n; int base[MAX_NUM_DEVS], tcmd[MAX_NUM_DEVS], t0[MAX_NUM_DEVS], t1[MAX_NUM_DEVS]; @@ -461,6 +440,9 @@ return -EIO; } +module_init(dmascc_init); +module_exit(dmascc_exit); + int __init setup_adapter(int card_base, int type, int n) { int i, irq, chip; @@ -580,6 +562,7 @@ if (sizeof(dev->name) == sizeof(char *)) dev->name = priv->name; #endif sprintf(dev->name, "dmascc%i", 2*n+i); + SET_MODULE_OWNER(dev); dev->base_addr = card_base; dev->irq = irq; dev->open = scc_open; @@ -707,12 +690,9 @@ struct scc_info *info = priv->info; int card_base = priv->card_base; - MOD_INC_USE_COUNT; - /* Request IRQ if not already used by other channel */ if (!info->irq_used) { if (request_irq(dev->irq, scc_isr, 0, "dmascc", info)) { - MOD_DEC_USE_COUNT; return -EAGAIN; } } @@ -722,7 +702,6 @@ if (priv->param.dma >= 0) { if (request_dma(priv->param.dma, "dmascc")) { if (--info->irq_used == 0) free_irq(dev->irq, info); - MOD_DEC_USE_COUNT; return -EAGAIN; } else { unsigned long flags = claim_dma_lock(); @@ -866,7 +845,6 @@ } if (--info->irq_used == 0) free_irq(dev->irq, info); - MOD_DEC_USE_COUNT; return 0; } From aj@dungeon.inka.de Tue Jun 3 02:10:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 02:10:14 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h539A72x032064 for ; Tue, 3 Jun 2003 02:10:08 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19N7nx-0002K9-00; Tue, 03 Jun 2003 11:10:05 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 9D90820FAD; Tue, 3 Jun 2003 10:11:10 +0200 (CEST) From: Andreas Jellinghaus To: Peter Bieringer , Maillist netdev Subject: Re: Is there already a doc available for the new IPsec code? Date: Tue, 3 Jun 2003 10:13:02 +0200 User-Agent: KMail/1.5.2 Cc: Maillist USAGI-users References: <36990000.1054588328@worker.muc.bieringer.de> In-Reply-To: <36990000.1054588328@worker.muc.bieringer.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306031013.02982.aj@dungeon.inka.de> X-archive-position: 2840 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev I will send him my notes (too large for this list). except for the kernel config, the netbsd ipsec howto is a very good source. Andreas From bunk@fs.tum.de Tue Jun 3 06:03:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 06:03:44 -0700 (PDT) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53D3H2x009171 for ; Tue, 3 Jun 2003 06:03:39 -0700 Received: (qmail 3223 invoked from network); 3 Jun 2003 13:03:10 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 3 Jun 2003 13:03:10 -0000 Date: Tue, 3 Jun 2003 15:03:08 +0200 From: Adrian Bunk To: Margit Schubert-While , lksctp-developers@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: SCTP config 2.5.70(-bk) Message-ID: <20030603130308.GC27168@fs.tum.de> References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> User-Agent: Mutt/1.4.1i X-archive-position: 2841 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > CONFIG_IPV6_SCTP__ is always being set to "y" even though > not selected (CONFIG_IPV6 not set) First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't result in anything getting compiled. But besides, it seems a bit broken. From net/sctp/Kconfig: <-- snip --> ... config IPV6_SCTP__ tristate default y if IPV6=n default IPV6 if IPV6 config IP_SCTP tristate "The SCTP Protocol (EXPERIMENTAL)" depends on IPV6_SCTP__ ... <-- snip --> Semantically equivalent is the following for IPV6_SCTP__: config IPV6_SCTP__ tristate default y if IPV6=n || IPV6=y default m if IPV6=m If it was intended to disallow a static IP_SCTP with a modular IPV6 it doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and later compile and install a modular IPV6 for the same kernel. Could someone from the SCTP developers comment on the intentions behind IPV6_SCTP__ ? > Margit cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From gandalf@wlug.westbo.se Tue Jun 3 10:41:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 10:41:29 -0700 (PDT) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53HfE2x016462 for ; Tue, 3 Jun 2003 10:41:17 -0700 Received: by tux.rsn.bth.se (Postfix, from userid 501) id A13E036FED; Tue, 3 Jun 2003 19:41:11 +0200 (CEST) Subject: Re: fix TCP roundtrip time update code From: Martin Josefsson To: davidm@hpl.hp.com Cc: kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com In-Reply-To: <200306031552.h53FqknC023999@napali.hpl.hp.com> References: <200306031552.h53FqknC023999@napali.hpl.hp.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1054662070.701.6.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 03 Jun 2003 19:41:11 +0200 X-archive-position: 2842 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev (trimmed CC line and added netdev) On Tue, 2003-06-03 at 17:52, David Mosberger wrote: > One of those very-hard-to-track-down, trivial-to-fix kind of problems: > without this patch, TCP roundtrip time measurements will corrupt the > routing cache's RTT estimates under heavy network load (the bug causes > RTAX_RTT to go negative, but since its type is u32, you end up with a > huge positive value...). From there on, later TCP connections quickly > will go south. > > The typo was introduced 8 months ago in v1.29 of the file by the patch > entitled "Cleanup DST metrics and abstrct MSS/PMTU further". I tested this patch and it looks like it has cured my mysterious TCP stalls. without patch: cache mtu 1500 rtt 479411ms rttvar 953813ms cwnd 46 advmss 1460 I see that before and during the stall if not using this patch. (rtt is never above 20ms accoring to ping) With the patch I see normal rtt and rttvar times. Havn't seen a stall yet (~30 kernelcompiles with distcc over a sometimes congested link), will continue testing. > ===== net/ipv4/tcp_input.c 1.36 vs edited ===== > --- 1.36/net/ipv4/tcp_input.c Mon Apr 28 09:27:57 2003 > +++ edited/net/ipv4/tcp_input.c Tue Jun 3 08:19:36 2003 > @@ -556,8 +556,8 @@ > if (m >= dst_metric(dst, RTAX_RTTVAR)) > dst->metrics[RTAX_RTTVAR-1] = m; > else > - dst->metrics[RTAX_RTT-1] -= > - (dst->metrics[RTAX_RTT-1] - m)>>2; > + dst->metrics[RTAX_RTTVAR-1] -= > + (dst->metrics[RTAX_RTTVAR-1] - m)>>2; > } > > if (tp->snd_ssthresh >= 0xFFFF) { -- /Martin From garzik@gtf.org Tue Jun 3 10:59:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 10:59:48 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53HxM2x017048 for ; Tue, 3 Jun 2003 10:59:43 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 983EF6641; Tue, 3 Jun 2003 13:59:21 -0400 (EDT) Date: Tue, 3 Jun 2003 13:59:21 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Regarding SET_NETDEV_DEV Message-ID: <20030603175921.GE2079@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2843 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev For janitors and other developers placing this in net drivers... please don't :) This can be done in upper layers, accomplishing the same goal without changing the low-level net driver code at all. Jeff From davidm@napali.hpl.hp.com Tue Jun 3 11:45:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 11:45:51 -0700 (PDT) Received: from palrel12.hp.com (palrel12.hp.com [156.153.255.237]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53IjQ2x018031 for ; Tue, 3 Jun 2003 11:45:47 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel12.hp.com (Postfix) with ESMTP id 434C41C011B1; Tue, 3 Jun 2003 11:45:26 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h53IjOxV004188; Tue, 3 Jun 2003 11:45:25 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h53IjOrK025527; Tue, 3 Jun 2003 11:45:24 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h53IjOlS025523; Tue, 3 Jun 2003 11:45:24 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16092.60612.352739.581639@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 11:45:24 -0700 To: Martin Josefsson Cc: davidm@hpl.hp.com, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <1054662070.701.6.camel@tux.rsn.bth.se> References: <200306031552.h53FqknC023999@napali.hpl.hp.com> <1054662070.701.6.camel@tux.rsn.bth.se> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2844 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev >>>>> On 03 Jun 2003 19:41:11 +0200, Martin Josefsson said: Martin> (trimmed CC line and added netdev) On Tue, 2003-06-03 at Martin> 17:52, David Mosberger wrote: >> One of those very-hard-to-track-down, trivial-to-fix kind of >> problems: without this patch, TCP roundtrip time measurements >> will corrupt the routing cache's RTT estimates under heavy >> network load (the bug causes RTAX_RTT to go negative, but since >> its type is u32, you end up with a huge positive value...). From >> there on, later TCP connections quickly will go south. >> The typo was introduced 8 months ago in v1.29 of the file by the >> patch entitled "Cleanup DST metrics and abstrct MSS/PMTU >> further". Martin> I tested this patch and it looks like it has cured my Martin> mysterious TCP stalls. Yes, this sounds reasonable. I wasn't very clear on this point, but "by going south" I meant that TCP is starting to misbehave. In particular, you'll likely end up with the kernel aborting ESTABLISHED TCP connections with extreme prejudice (and in violation of the TCP protocol), because it thought that it had been unable to communicate with the remote end for a _very_ long time. The net effect typically is that you end up with one end having a connection that's in the ESTABLISHED state and the other end having no trace of that connection. --david From scott.feldman@intel.com Tue Jun 3 13:01:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 13:01:57 -0700 (PDT) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53K1X2x023590 for ; Tue, 3 Jun 2003 13:01:54 -0700 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by caduceus.fm.intel.com (8.11.6p2/8.11.6/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h53JrfA01302 for ; Tue, 3 Jun 2003 19:53:41 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by petasus.fm.intel.com (8.11.6p2/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h53JssK05729 for ; Tue, 3 Jun 2003 19:54:54 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003060313013015442 ; Tue, 03 Jun 2003 13:01:30 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 3 Jun 2003 13:01:30 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6375.0 Subject: RE: [PATCH] fix use after free in e100 Date: Tue, 3 Jun 2003 13:01:29 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] fix use after free in e100 Thread-Index: AcMokjx3yixGSrs6TK+86kV3QlMsXwBeFFag From: "Feldman, Scott" To: "Martin Josefsson" Cc: X-OriginalArrivalTime: 03 Jun 2003 20:01:30.0686 (UTC) FILETIME=[ECC4B5E0:01C32A0A] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h53K1X2x023590 X-archive-position: 2845 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > Here's a fix for a use-after-free in the e100 driver. > You can't touch the skb after a call to netif_rx(), it might > have been free'd. Caught with Manfred's unmap-page-debugging > patch in -mm. Thanks Martin. We'll pick this patch up in our dev driver and propagate the change from there. -scott From jgrimm2@us.ibm.com Tue Jun 3 14:09:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 14:09:39 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h53L8v2x025658 for ; Tue, 3 Jun 2003 14:09:30 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h53L6AuT258362; Tue, 3 Jun 2003 17:06:10 -0400 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h53L66h7148720; Tue, 3 Jun 2003 15:06:07 -0600 Received: from popmail.austin.ibm.com (popmail.austin.ibm.com [9.41.248.164]) by austin.ibm.com (8.12.9/8.12.9) with ESMTP id h53L65sQ056046; Tue, 3 Jun 2003 16:06:05 -0500 Received: from us.ibm.com (sig-9-65-53-56.mts.ibm.com [9.65.53.56]) by popmail.austin.ibm.com (AIX4.3/8.9.3p2/8.7-client1.01) with ESMTP id QAA20796; Tue, 3 Jun 2003 16:06:04 -0500 Message-ID: <3EDD0DFC.4080806@us.ibm.com> Date: Tue, 03 Jun 2003 16:07:08 -0500 From: Jon Grimm Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adrian Bunk CC: Margit Schubert-While , lksctp-developers@lists.sourceforge.net, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: SCTP config 2.5.70(-bk) References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> <20030603130308.GC27168@fs.tum.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2846 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgrimm2@us.ibm.com Precedence: bulk X-list: netdev Hi Adrian, Sorry for a bit of delay... We are away at an SCTP Interoperability event. Adrian Bunk wrote: > On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > > >>CONFIG_IPV6_SCTP__ is always being set to "y" even though >>not selected (CONFIG_IPV6 not set) > > > First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't > result in anything getting compiled. > > But besides, it seems a bit broken. > > From net/sctp/Kconfig: > > <-- snip --> > > ... > > config IPV6_SCTP__ > tristate > default y if IPV6=n > default IPV6 if IPV6 > > config IP_SCTP > tristate "The SCTP Protocol (EXPERIMENTAL)" > depends on IPV6_SCTP__ > ... > > <-- snip --> > > > Semantically equivalent is the following for IPV6_SCTP__: > > config IPV6_SCTP__ > tristate > default y if IPV6=n || IPV6=y > default m if IPV6=m > > > If it was intended to disallow a static IP_SCTP with a modular IPV6 it > doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and > later compile and install a modular IPV6 for the same kernel. > Are you sure? I vaguely remember one of the network structs having #ifdef'd fields for v6. Consequently, if one compiles first without, but the tries later compiles/loads ipv6... bad things happen as the kernel has a different concept of what the sock is. > > Could someone from the SCTP developers comment on the intentions behind > IPV6_SCTP__ ? > Yes. The intent was to at least discourage a configuration that will segfault. Thanks, jon > > >>Margit > > > cu > Adrian > From jmorris@intercode.com.au Tue Jun 3 17:26:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:26:46 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:m/3FwucN+ZbRkqvuBNbRehdxIuxqTgbZ@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540QJ2x031482 for ; Tue, 3 Jun 2003 17:26:41 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h540OEr26556; Wed, 4 Jun 2003 10:24:15 +1000 Date: Wed, 4 Jun 2003 10:24:14 +1000 (EST) From: James Morris To: davidm@hpl.hp.com cc: Martin Josefsson , , , , , "David S. Miller" Subject: Re: fix TCP roundtrip time update code In-Reply-To: <16092.60612.352739.581639@napali.hpl.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2847 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 3 Jun 2003, David Mosberger wrote: > Martin> I tested this patch and it looks like it has cured my > Martin> mysterious TCP stalls. > > Yes, this sounds reasonable. I wasn't very clear on this point, but > "by going south" I meant that TCP is starting to misbehave. In > particular, you'll likely end up with the kernel aborting ESTABLISHED > TCP connections with extreme prejudice (and in violation of the TCP > protocol), because it thought that it had been unable to communicate > with the remote end for a _very_ long time. The net effect typically > is that you end up with one end having a connection that's in the > ESTABLISHED state and the other end having no trace of that > connection. David, This might be the solution to one of the 'must-fix' bugs for the networking, which nobody so far was quite able to track down. - James -- James Morris From yoshfuji@linux-ipv6.org Tue Jun 3 17:39:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:39:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540cr2x031928 for ; Tue, 3 Jun 2003 17:39:14 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h540diBo003194; Wed, 4 Jun 2003 09:39:44 +0900 Date: Wed, 04 Jun 2003 09:39:44 +0900 (JST) Message-Id: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: Ville Nuorvala , netdev@oss.sgi.com Subject: [PATCH] IPV6: Sereral errors on udpv6_connect() From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2848 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. The CONFIG_IPV6_SUBTREE contains multiple fixes and changes. I'm trying to split them. This patch fixes multiple errors in udpv6_connect(). - pointer within an automatic storage class variable fl was illegally cached using ip6_dst_store(). - uninitialized saddr was copied to fl.fl6_src. - don't cache if ipv6_saddr_get() failed. Patch is based on CONFIG_IPV6_SUBTREE patch from Ville Nuorvala . Index: linux25-LINUS/net/ipv6/udp.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/udp.c,v retrieving revision 1.1.1.18 diff -u -r1.1.1.18 udp.c --- linux25-LINUS/net/ipv6/udp.c 26 May 2003 08:04:11 -0000 1.1.1.18 +++ linux25-LINUS/net/ipv6/udp.c 4 Jun 2003 00:29:32 -0000 @@ -254,7 +254,6 @@ struct inet_opt *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *daddr; - struct in6_addr saddr; struct dst_entry *dst; struct flowi fl; struct ip6_flowlabel *flowlabel = NULL; @@ -355,7 +354,7 @@ fl.proto = IPPROTO_UDP; ipv6_addr_copy(&fl.fl6_dst, &np->daddr); - ipv6_addr_copy(&fl.fl6_src, &saddr); + ipv6_addr_copy(&fl.fl6_src, &np->saddr); fl.oif = sk->bound_dev_if; fl.fl_ip_dport = inet->dport; fl.fl_ip_sport = inet->sport; @@ -381,20 +380,23 @@ return err; } - ip6_dst_store(sk, dst, &fl.fl6_dst); - /* get the source address used in the appropriate device */ - err = ipv6_get_saddr(dst, daddr, &saddr); + err = ipv6_get_saddr(dst, daddr, &fl.fl6_src); if (err == 0) { if (ipv6_addr_any(&np->saddr)) - ipv6_addr_copy(&np->saddr, &saddr); + ipv6_addr_copy(&np->saddr, &fl.fl6_src); if (ipv6_addr_any(&np->rcv_saddr)) { - ipv6_addr_copy(&np->rcv_saddr, &saddr); + ipv6_addr_copy(&np->rcv_saddr, &fl.fl6_src); inet->rcv_saddr = LOOPBACK4_IPV6; } + + ip6_dst_store(sk, dst, + !ipv6_addr_cmp(&fl.fl6_dst, &np->daddr) ? + &np->daddr : NULL); + sk->state = TCP_ESTABLISHED; } fl6_sock_release(flowlabel); -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From kuznet@ms2.inr.ac.ru Tue Jun 3 17:46:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 17:46:20 -0700 (PDT) Received: from dub.inr.ac.ru (dub.inr.ac.ru [193.233.7.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h540jp2x032444 for ; Tue, 3 Jun 2003 17:46:13 -0700 Received: (from kuznet@localhost) by dub.inr.ac.ru (8.6.13/ANK) id EAA24505; Wed, 4 Jun 2003 04:43:22 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200306040043.EAA24505@dub.inr.ac.ru> Subject: Re: fix TCP roundtrip time update code To: jmorris@intercode.com.au (James Morris) Date: Wed, 4 Jun 2003 04:43:22 +0400 (MSD) Cc: davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, davem@redhat.com, akpm@digeo.com In-Reply-To: from "James Morris" at Jun 04, 2003 10:24:14 AM X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2849 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > This might be the solution to one of the 'must-fix' bugs for the > networking, which nobody so far was quite able to track down. No doubts. All the symptoms are explained by this. I hope Andrew will confirm that the problem has gone. Alexey From niv@us.ibm.com Tue Jun 3 19:08:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 19:09:02 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5428Q2x001531 for ; Tue, 3 Jun 2003 19:08:54 -0700 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5427HuT170244; Tue, 3 Jun 2003 22:07:17 -0400 Received: from us.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5427Eh7083078; Tue, 3 Jun 2003 20:07:15 -0600 Message-ID: <3EDD52F5.8090706@us.ibm.com> Date: Tue, 03 Jun 2003 19:01:25 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: James Morris , davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, davem@redhat.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> In-Reply-To: <200306040043.EAA24505@dub.inr.ac.ru> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2850 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev kuznet@ms2.inr.ac.ru wrote: > No doubts. All the symptoms are explained by this. I hope Andrew > will confirm that the problem has gone. Yep, great catch! But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). thanks, Nivedita From jmorris@intercode.com.au Tue Jun 3 19:29:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 19:29:37 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:5XN7XGMIBVp5EAxqx0JpHmruTuzfpsp9@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h542T52x002010 for ; Tue, 3 Jun 2003 19:29:27 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h542Sor27032; Wed, 4 Jun 2003 12:28:51 +1000 Date: Wed, 4 Jun 2003 12:28:50 +1000 (EST) From: James Morris To: "David S. Miller" cc: Andrew Morton , Subject: [PATCH] Use new kconfig 'select' for networking crypto Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2851 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev The patch below against recent bk uses the new 'select' feature of kconfig to configure crypto features for ipsec and ipv6 privacy extensions. This should solve a lot of the build problems people have been having, and it also enables the crypto submenu (which previously did not work). The sctp folk may also want to look at this scheme for their stuff. - James -- James Morris diff -urN -X dontdiff bk.pending/crypto/Kconfig bk.w1/crypto/Kconfig --- bk.pending/crypto/Kconfig 2003-06-04 11:41:26.000000000 +1000 +++ bk.w1/crypto/Kconfig 2003-06-04 12:28:36.234711904 +1000 @@ -6,16 +6,12 @@ config CRYPTO bool "Cryptographic API" - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m || INET6_IPCOMP=y || INET6_IPCOMP=m || IPV6_PRIVACY=y help This option provides the core Cryptographic API. config CRYPTO_HMAC bool "HMAC support" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m help HMAC: Keyed-Hashing for Message Authentication (RFC2104). This is required for IPSec. @@ -35,16 +31,12 @@ config CRYPTO_MD5 tristate "MD5 digest algorithm" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m || IPV6_PRIVACY=y help MD5 message digest algorithm (RFC1321). config CRYPTO_SHA1 tristate "SHA1 digest algorithm" depends on CRYPTO - default y if INET_AH=y || INET_AH=m || INET_ESP=y || INET_ESP=m || INET6_AH=y || INET6_AH=m || \ - INET6_ESP=y || INET6_ESP=m help SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2). @@ -72,7 +64,6 @@ config CRYPTO_DES tristate "DES and Triple DES EDE cipher algorithms" depends on CRYPTO - default y if INET_ESP=y || INET_ESP=m || INET6_ESP=y || INET6_ESP=m help DES cipher algorithm (FIPS 46-2), and Triple DES EDE (FIPS 46-3). @@ -138,7 +129,6 @@ config CRYPTO_DEFLATE tristate "Deflate compression algorithm" depends on CRYPTO - default y if INET_IPCOMP=y || INET_IPCOMP=m || INET6_IPCOMP=y || INET6_IPCOMP=m help This is the Deflate algorithm (RFC1951), specified for use in IPSec with the IPCOMP protocol (RFC3173, RFC2394). diff -urN -X dontdiff bk.pending/net/ipv4/Kconfig bk.w1/net/ipv4/Kconfig --- bk.pending/net/ipv4/Kconfig 2003-06-04 11:42:08.000000000 +1000 +++ bk.w1/net/ipv4/Kconfig 2003-06-04 12:24:06.752679400 +1000 @@ -343,6 +343,10 @@ config INET_AH tristate "IP: AH transformation" + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 ---help--- Support for IPsec AH. @@ -350,6 +354,11 @@ config INET_ESP tristate "IP: ESP transformation" + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 + select CRYPTO_DES ---help--- Support for IPsec ESP. @@ -357,6 +366,8 @@ config INET_IPCOMP tristate "IP: IPComp transformation" + select CRYPTO + select CRYPTO_DEFLATE ---help--- Support for IP Paylod Compression (RFC3173), typically needed for IPsec. diff -urN -X dontdiff bk.pending/net/ipv6/Kconfig bk.w1/net/ipv6/Kconfig --- bk.pending/net/ipv6/Kconfig 2003-06-04 11:42:09.000000000 +1000 +++ bk.w1/net/ipv6/Kconfig 2003-06-04 12:24:05.242908920 +1000 @@ -4,6 +4,8 @@ config IPV6_PRIVACY bool "IPv6: Privacy Extensions (RFC 3041) support" depends on IPV6 + select CRYPTO + select CRYPTO_MD5 ---help--- Privacy Extensions for Stateless Address Autoconfiguration in IPv6 support. With this option, additional periodically-alter @@ -20,6 +22,10 @@ config INET6_AH tristate "IPv6: AH transformation" depends on IPV6 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 ---help--- Support for IPsec AH. @@ -28,6 +34,11 @@ config INET6_ESP tristate "IPv6: ESP transformation" depends on IPV6 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 + select CRYPTO_SHA1 + select CRYPTO_DES ---help--- Support for IPsec ESP. @@ -36,6 +47,8 @@ config INET6_IPCOMP tristate "IPv6: IPComp transformation" depends on IPV6 + select CRYPTO + select CRYPTO_DEFLATE ---help--- Support for IP Paylod Compression (RFC3173), typically needed for IPsec. From davem@redhat.com Tue Jun 3 20:11:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:11:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543Bo2x002750 for ; Tue, 3 Jun 2003 20:11:51 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29047; Tue, 3 Jun 2003 20:09:45 -0700 Date: Tue, 03 Jun 2003 20:09:44 -0700 (PDT) Message-Id: <20030603.200944.78736971.davem@redhat.com> To: jgarzik@pobox.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV From: "David S. Miller" In-Reply-To: <20030603175921.GE2079@gtf.org> References: <20030603175921.GE2079@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2852 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 3 Jun 2003 13:59:21 -0400 For janitors and other developers placing this in net drivers... please don't :) This can be done in upper layers, accomplishing the same goal without changing the low-level net driver code at all. Don't say something can be done without showing exactly how :-) How does register_netdevice() know that the device is "whatever" and where to get the generic device struct from? From davem@redhat.com Tue Jun 3 20:26:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:26:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543QH2x006401 for ; Tue, 3 Jun 2003 20:26:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29104; Tue, 3 Jun 2003 20:23:21 -0700 Date: Tue, 03 Jun 2003 20:23:20 -0700 (PDT) Message-Id: <20030603.202320.59680883.davem@redhat.com> To: niv@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <3EDD52F5.8090706@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2853 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Tue, 03 Jun 2003 19:01:25 -0700 But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). I think this means nothing, and that you can infer nothing from such results. My understanding is that the problem case triggers only when a timeout based retransmit occurs. On LAN this tends to be extremely rare. Although under enough traffic load it can occur. So if your old SpecWEB99 lab tended more to trigger timeout based retransmits on LAN, and your new test network does not, then your new test network will tend to not reproduce the bug regardless of whether the bug is present in the kernel or not :-) From davem@redhat.com Tue Jun 3 20:48:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 20:48:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h543mI2x007129 for ; Tue, 3 Jun 2003 20:48:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA29169; Tue, 3 Jun 2003 20:46:12 -0700 Date: Tue, 03 Jun 2003 20:46:12 -0700 (PDT) Message-Id: <20030603.204612.48501825.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV From: "David S. Miller" In-Reply-To: <3EDD6B51.9070909@osdl.org> References: <20030603175921.GE2079@gtf.org> <20030603.200944.78736971.davem@redhat.com> <3EDD6B51.9070909@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2854 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Tue, 03 Jun 2003 20:45:21 -0700 There are enough PCI network devices, that something like alloc_pci_etherdev might be a good future idea. What is sos special about PCI? :-) In this light, alloc_device_etherdev() seems much more appropriate. But we can play this game AD_INFINITUM, for each and every paramter that is common across a class of ethernet devices. At what point do you stop? :-) From davidm@napali.hpl.hp.com Tue Jun 3 21:36:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:36:41 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544aA2x012022 for ; Tue, 3 Jun 2003 21:36:31 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel11.hp.com (Postfix) with ESMTP id EE4C71C01831; Tue, 3 Jun 2003 21:36:09 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h544a4xV008846; Tue, 3 Jun 2003 21:36:04 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h544a3rK029849; Tue, 3 Jun 2003 21:36:03 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h544ZtI4029842; Tue, 3 Jun 2003 21:35:55 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.30507.661714.676184@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 21:35:55 -0700 To: "David S. Miller" Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, davidm@hpl.hp.com, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <20030603.202320.59680883.davem@redhat.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2855 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev >>>>> On Tue, 03 Jun 2003 20:23:20 -0700 (PDT), "David S. Miller" said: DaveM> From: Nivedita Singhvi Date: Tue, 03 Jun DaveM> 2003 19:01:25 -0700 DaveM> But, FYI, DaveM and Alexey, we tried reproducing the DaveM> stalls we (Dave Hansen, Troy Wilson) had seen during DaveM> SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same DaveM> config, etc). So its possible our hang/stalls were some other DaveM> issue that got silently fixed (or more likely, possibly the DaveM> same thing but other changes minimized us running into the DaveM> problem). DaveM> I think this means nothing, and that you can infer nothing DaveM> from such results. DaveM> My understanding is that the problem case triggers only when DaveM> a timeout based retransmit occurs. On LAN this tends to be DaveM> extremely rare. Although under enough traffic load it can DaveM> occur. DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) --david From davem@redhat.com Tue Jun 3 21:39:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:39:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544cf2x012387 for ; Tue, 3 Jun 2003 21:39:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29347; Tue, 3 Jun 2003 21:34:59 -0700 Date: Tue, 03 Jun 2003 21:34:58 -0700 (PDT) Message-Id: <20030603.213458.112594590.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2856 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Fri, 30 May 2003 18:00:55 +0300 (EEST) The patch is sent as an attachment to this mail, but is also available at: You need to fix some things before I will apply this: 1) Bogus #ifdef CONFIG_IPV6_TUNNEL_MODULE. You need not this test around things like MODULE_AUTHOR() and stuff like that, linux/module.h does that for you. 2) Dependency upon subtrees patch, please remove it. There is no agreement on that semantic change to how subtrees work. Thanks. From davem@redhat.com Tue Jun 3 21:42:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:42:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544gB2x012749 for ; Tue, 3 Jun 2003 21:42:32 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29371; Tue, 3 Jun 2003 21:38:30 -0700 Date: Tue, 03 Jun 2003 21:38:30 -0700 (PDT) Message-Id: <20030603.213830.85382657.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2857 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / 吉藤英明 Date: Sat, 31 May 2003 00:38:58 +0900 (JST) In article (at Fri, 30 May 2003 18:00:55 +0300 (EEST)), Ville Nuorvala says: > The tunnels are needed by MIPv6 for encapsulation and decapsulation of > tunneled packets between the home agent and mobile node. Some proctocols > like DHCP are also run over the virtual link between the MN and the home > network according to the MIPv6 specification. I'm not sure if MIP6 will use this tunnel driver. Yes, it is an important issue. I am VERY UPSET that there appears to be NO dialogue between USAGI and MIPV6 folks to discuss design of MIPV6. If you do not talk together, how can you guys possibly coordinate efforts and not avoid duplicated work? And, it is very clear from my perspective that it is the MIPV6 developers who are not communicating. USAGI are making an effort to discuss the issues, but MIPV6 coders disappear for weeks at a time not answering queries made to them or comments made about their patch submissions. That is unacceptable. And this makes me less likely to apply any patches from MIPV6 project, here is why. If some bug shows in some patch I apply from MIPV6 project, can I expect them to act similarly and not respond for weeks at a time? That's intolerable. If you add some bug to the tree, you are responsible to be responsive and fix the problem in a reasonable amount of time. From niv@us.ibm.com Tue Jun 3 21:47:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:47:39 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544lE2x013135 for ; Tue, 3 Jun 2003 21:47:35 -0700 Received: from westrelay05.boulder.ibm.com (westrelay05.boulder.ibm.com [9.17.193.33]) by e32.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h544kCkc135630; Wed, 4 Jun 2003 00:46:12 -0400 Received: from us.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay05.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h544k7qD047262; Tue, 3 Jun 2003 22:46:08 -0600 Message-ID: <3EDD7832.7010804@us.ibm.com> Date: Tue, 03 Jun 2003 21:40:18 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: davidm@hpl.hp.com CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> In-Reply-To: <16093.30507.661714.676184@napali.hpl.hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2858 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David Mosberger wrote: > DaveM> So if your old SpecWEB99 lab tended more to trigger timeout > DaveM> based retransmits on LAN, and your new test network does not, > DaveM> then your new test network will tend to not reproduce the bug > DaveM> regardless of whether the bug is present in the kernel or not > DaveM> :-) > > Is this where I get to plug httperf? It triggered the bug reliably in > less than 10 secs. ;-) Tarnation!! Ran httperf! Didnt hit it! :(. What were your settings? I extracted an old debug patch to implement dropping of packets - have a sysctl that controls the rate at which I can drop IP packets, so can also generate any kind of packet loss..So thought I would bang away with netperf using sendfile()/TCP_CORK. Thought it was in that code path. Will be running tests tmrw and the rest of this week on 2.5.70 +- patch. Will see if I can provoke any further hangs, stalls, wackiness of any flavor... thanks, Nivedita From davem@redhat.com Tue Jun 3 21:51:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 21:51:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h544pT2x013483 for ; Tue, 3 Jun 2003 21:51:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA29427; Tue, 3 Jun 2003 21:47:30 -0700 Date: Tue, 03 Jun 2003 21:47:30 -0700 (PDT) Message-Id: <20030603.214730.08347437.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.30507.661714.676184@napali.hpl.hp.com> References: <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2859 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: David Mosberger Date: Tue, 3 Jun 2003 21:35:55 -0700 Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) distcc was a reliable test case too... From davem@redhat.com Tue Jun 3 22:15:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:15:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545FJ2x014294 for ; Tue, 3 Jun 2003 22:15:20 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29530; Tue, 3 Jun 2003 22:13:13 -0700 Date: Tue, 03 Jun 2003 22:13:13 -0700 (PDT) Message-Id: <20030603.221313.70195889.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] move dmascc away from setup.c From: "David S. Miller" In-Reply-To: <20030603082326.GA22946@lst.de> References: <20030603082326.GA22946@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2860 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Tue, 3 Jun 2003 10:23:26 +0200 Yeah, it's a isa driver but it already was in the setup.c pci probes list. Also use SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT. Applied, thanks. From davidm@napali.hpl.hp.com Tue Jun 3 22:34:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:34:41 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545Ya2x015236 for ; Tue, 3 Jun 2003 22:34:36 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel13.hp.com (Postfix) with ESMTP id D6C341C00F6F; Tue, 3 Jun 2003 22:34:35 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h545YYxV012939; Tue, 3 Jun 2003 22:34:35 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h545YYrK030279; Tue, 3 Jun 2003 22:34:34 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h545YUxT030275; Tue, 3 Jun 2003 22:34:30 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.34022.445246.52398@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 22:34:30 -0700 To: Nivedita Singhvi Cc: davidm@hpl.hp.com, "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <3EDD7832.7010804@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2862 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 1337 Lines: 34 >>>>> On Tue, 03 Jun 2003 21:40:18 -0700, Nivedita Singhvi said: Nivedita> David Mosberger wrote: DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) >> Is this where I get to plug httperf? It triggered the bug >> reliably in less than 10 secs. ;-) Nivedita> Tarnation!! Ran httperf! Didnt hit it! :(. What were your Nivedita> settings? I used: $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ --uri pathto30KBfile on 3 clients (for a total of 3000 conns/sec). You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). This load worked well for a machine with a single GigE card. All network tunables were on the default setting (in particular, the tx queue len was 300, which is were the losses came from). With this load, I saw bad RTT values in the route cache within a couple of seconds after starting the third httperf generator. It then took a bit longer (on the order of 1-2 minutes) until the first TCPAbortFailed errors started to pop up. --david From davem@redhat.com Tue Jun 3 22:50:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:50:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545oE2x015874 for ; Tue, 3 Jun 2003 22:50:35 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29711; Tue, 3 Jun 2003 22:46:58 -0700 Date: Tue, 03 Jun 2003 22:46:57 -0700 (PDT) Message-Id: <20030603.224657.116381839.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: vnuorval@tcs.hut.fi, netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: Sereral errors on udpv6_connect() From: "David S. Miller" In-Reply-To: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> References: <20030604.093944.84705841.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2863 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 965 Lines: 23 From: YOSHIFUJI Hideaki / 吉藤英明 Date: Wed, 04 Jun 2003 09:39:44 +0900 (JST) This patch fixes multiple errors in udpv6_connect(). - pointer within an automatic storage class variable fl was illegally cached using ip6_dst_store(). - uninitialized saddr was copied to fl.fl6_src. - don't cache if ipv6_saddr_get() failed. Applied. All these kinds of things need to be done differently once routing by saddr is supported, more specifically when route6 lookups make source address selection. Look at ipv4 side to see the kind of thing I'm talking about. Yoshfuji-san, remember when Alexey wanted you to change your source address selection so that it occurred at routing layer? This is exactly what I'm talking about. In my view, ipv6 routing is merely a SEVERELY crippled version of ipv4 routing. Most of ipv6 routing changes needed amount to merely "porting over" existing ipv4 routing features. From davem@redhat.com Tue Jun 3 22:52:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:52:03 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545pf2x016007 for ; Tue, 3 Jun 2003 22:52:00 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29724; Tue, 3 Jun 2003 22:49:25 -0700 Date: Tue, 03 Jun 2003 22:49:25 -0700 (PDT) Message-Id: <20030603.224925.68063710.davem@redhat.com> To: jmorris@intercode.com.au Cc: akpm@digeo.com, netdev@oss.sgi.com Subject: Re: [PATCH] Use new kconfig 'select' for networking crypto From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2864 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 435 Lines: 13 From: James Morris Date: Wed, 4 Jun 2003 12:28:50 +1000 (EST) The patch below against recent bk uses the new 'select' feature of kconfig to configure crypto features for ipsec and ipv6 privacy extensions. This should solve a lot of the build problems people have been having, and it also enables the crypto submenu (which previously did not work). Applied, thanks a lot James. From davem@redhat.com Tue Jun 3 22:56:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 22:56:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h545ul2x016588 for ; Tue, 3 Jun 2003 22:56:48 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA29745; Tue, 3 Jun 2003 22:52:45 -0700 Date: Tue, 03 Jun 2003 22:52:45 -0700 (PDT) Message-Id: <20030603.225245.55753285.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.34022.445246.52398@napali.hpl.hp.com> References: <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2865 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 372 Lines: 10 From: David Mosberger Date: Tue, 3 Jun 2003 22:34:30 -0700 You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle It should eliminate this limit. Unfortunately we can't enable this by default because of NAT :( From yoshfuji@linux-ipv6.org Tue Jun 3 23:01:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:01:32 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5461Q2x016967 for ; Tue, 3 Jun 2003 23:01:28 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5462JBo004641; Wed, 4 Jun 2003 15:02:19 +0900 Date: Wed, 04 Jun 2003 15:02:18 +0900 (JST) Message-Id: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> To: davem@redhat.com CC: netdev@oss.sgi.com, Ville Nuorvala Subject: [PATCH] IPV6: typo, unrequired #undef and killing warning From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2866 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 1051 Lines: 38 Hello. - no need to #undef CONFIG_IPV6_SUBTREE - use braces around "&" and "|". - fib_repair_tree() is typo. Thanks. Index: linux25-LINUS/net/ipv6/ip6_fib.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_fib.c,v retrieving revision 1.1.1.12 diff -u -r1.1.1.12 ip6_fib.c --- linux25-LINUS/net/ipv6/ip6_fib.c 26 May 2003 08:04:11 -0000 1.1.1.12 +++ linux25-LINUS/net/ipv6/ip6_fib.c 4 Jun 2003 05:39:49 -0000 @@ -40,7 +40,6 @@ #include #define RT6_DEBUG 2 -#undef CONFIG_IPV6_SUBTREES #if RT6_DEBUG >= 3 #define RT6_TRACE(x...) printk(KERN_DEBUG x) @@ -594,8 +593,8 @@ is orphan. If it is, shoot it. */ st_failure: - if (fn && !(fn->fn_flags&RTN_RTINFO|RTN_ROOT)) - fib_repair_tree(fn); + if (fn && !(fn->fn_flags&(RTN_RTINFO|RTN_ROOT))) + fib6_repair_tree(fn); dst_free(&rt->u.dst); return err; #endif -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From hch@infradead.org Tue Jun 3 23:08:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:08:12 -0700 (PDT) Received: from phoenix.infradead.org (phoenix.mvhi.com [195.224.96.167]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546872x017348 for ; Tue, 3 Jun 2003 23:08:08 -0700 Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 19NRRK-000248-00; Wed, 04 Jun 2003 07:08:02 +0100 Date: Wed, 4 Jun 2003 07:08:01 +0100 From: Christoph Hellwig To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" Cc: davem@redhat.com, netdev@oss.sgi.com, Ville Nuorvala Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning Message-ID: <20030604070801.A7938@infradead.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030604.150218.69810413.yoshfuji@linux-ipv6.org>; from yoshfuji@linux-ipv6.org on Wed, Jun 04, 2003 at 03:02:18PM +0900 X-archive-position: 2867 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 340 Lines: 10 On Wed, Jun 04, 2003 at 03:02:18PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > st_failure: > - if (fn && !(fn->fn_flags&RTN_RTINFO|RTN_ROOT)) > - fib_repair_tree(fn); > + if (fn && !(fn->fn_flags&(RTN_RTINFO|RTN_ROOT))) This still is not the right codingstyle :) it should be if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) From davem@redhat.com Tue Jun 3 23:10:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:10:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546AD2x017675 for ; Tue, 3 Jun 2003 23:10:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA29792; Tue, 3 Jun 2003 23:08:04 -0700 Date: Tue, 03 Jun 2003 23:08:03 -0700 (PDT) Message-Id: <20030603.230803.10324588.davem@redhat.com> To: hch@infradead.org Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, vnuorval@tcs.hut.fi Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: "David S. Miller" In-Reply-To: <20030604070801.A7938@infradead.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> <20030604070801.A7938@infradead.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2868 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 282 Lines: 9 From: Christoph Hellwig Date: Wed, 4 Jun 2003 07:08:01 +0100 This still is not the right codingstyle :) it should be if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) I'll take care of this, Yoshfuji you do not need to make a new patch :) From davidm@napali.hpl.hp.com Tue Jun 3 23:12:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:12:53 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546Cm2x017993 for ; Tue, 3 Jun 2003 23:12:49 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel10.hp.com (Postfix) with ESMTP id 7429C1C01411; Tue, 3 Jun 2003 23:12:48 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h546ClsV017659; Tue, 3 Jun 2003 23:12:47 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h546ClrK030657; Tue, 3 Jun 2003 23:12:47 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h546Cljk030653; Tue, 3 Jun 2003 23:12:47 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.36319.412668.87363@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 23:12:47 -0700 To: "David S. Miller" Cc: davidm@hpl.hp.com, davidm@napali.hpl.hp.com, niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <20030603.225245.55753285.davem@redhat.com> References: <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> <20030603.225245.55753285.davem@redhat.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2869 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 632 Lines: 18 >>>>> On Tue, 03 Jun 2003 22:52:45 -0700 (PDT), "David S. Miller" said: David> From: David Mosberger Date: David> Tue, 3 Jun 2003 22:34:30 -0700 David> You can't go higher than 1000 conn/sec per client (IP David> address) because otherwise you run out of port space (due to David> TIME_WAIT). DaveM> echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle DaveM> It should eliminate this limit. Unfortunately we can't DaveM> enable this by default because of NAT :( Ah, yes, provided PAWS is enabled, this would give you a time_wait timeout of 3.5*RTO. Nice. --david From niv@us.ibm.com Tue Jun 3 23:14:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:14:57 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546Eq2x018362 for ; Tue, 3 Jun 2003 23:14:53 -0700 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5469ruT256040; Wed, 4 Jun 2003 02:09:53 -0400 Received: from us.ibm.com (d03av03.boulder.ibm.com [9.17.193.83]) by westrelay01.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5469pNQ224046; Wed, 4 Jun 2003 00:09:51 -0600 Message-ID: <3EDD8BD2.9040008@us.ibm.com> Date: Tue, 03 Jun 2003 23:04:02 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: davidm@hpl.hp.com CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> In-Reply-To: <16093.34022.445246.52398@napali.hpl.hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2870 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1180 Lines: 34 David Mosberger wrote: > $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ > --uri pathto30KBfile Hmm, ditto, except I was way down at --rate 300 (was seeing client errors of fd-unavail). Have ulimited upwards but am still seeing them.. > on 3 clients (for a total of 3000 conns/sec). You can't go higher > than 1000 conn/sec per client (IP address) because otherwise you run > out of port space (due to TIME_WAIT). You can hike /proc/sys/net/ipv4/tcp_tw_recycle for that. > This load worked well for a machine with a single GigE card. All > network tunables were on the default setting (in particular, the tx > queue len was 300, which is were the losses came from). > > With this load, I saw bad RTT values in the route cache within a > couple of seconds after starting the third httperf generator. It then > took a bit longer (on the order of 1-2 minutes) until the first > TCPAbortFailed errors started to pop up I saw a few AbortOnTimeouts, but no AbortFailed counts. Those should be TCPAbortOnTimeout counts, rather than TCPAbortFailed errors, I would expect? Why AbortFailed? Coming from IP via tcp_transmit_skb()? thanks, Nivedita From davidm@napali.hpl.hp.com Tue Jun 3 23:19:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 03 Jun 2003 23:19:56 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h546JW2x018731 for ; Tue, 3 Jun 2003 23:19:52 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) by palrel10.hp.com (Postfix) with ESMTP id 4011F1C01522; Tue, 3 Jun 2003 23:19:32 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.12.9/8.12.9/HPL-PA Hub) with ESMTP id h546JVsV018240; Tue, 3 Jun 2003 23:19:31 -0700 (PDT) Received: from napali.hpl.hp.com (localhost [127.0.0.1]) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) with ESMTP id h546JVrK030722; Tue, 3 Jun 2003 23:19:31 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.12.3/8.12.3/Debian-5) id h546JVpQ030718; Tue, 3 Jun 2003 23:19:31 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16093.36723.418623.698303@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 23:19:31 -0700 To: Nivedita Singhvi Cc: davidm@hpl.hp.com, "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code In-Reply-To: <3EDD8BD2.9040008@us.ibm.com> References: <200306040043.EAA24505@dub.inr.ac.ru> <3EDD52F5.8090706@us.ibm.com> <20030603.202320.59680883.davem@redhat.com> <16093.30507.661714.676184@napali.hpl.hp.com> <3EDD7832.7010804@us.ibm.com> <16093.34022.445246.52398@napali.hpl.hp.com> <3EDD8BD2.9040008@us.ibm.com> X-Mailer: VM 7.07 under Emacs 21.2.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2871 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 505 Lines: 12 >>>>> On Tue, 03 Jun 2003 23:04:02 -0700, Nivedita Singhvi said: Nivedita> Those should be TCPAbortOnTimeout counts, rather than Nivedita> TCPAbortFailed errors, I would expect? Why AbortFailed? Nivedita> Coming from IP via tcp_transmit_skb()? Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; the TCPAbortFailed errors were indicating that tcp_transmit_skb() had failed, i.e., the tx queue was overrun (that's were the losses came from). --david From davem@redhat.com Wed Jun 4 00:50:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:50:16 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547nk2x021923 for ; Wed, 4 Jun 2003 00:50:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA30035; Wed, 4 Jun 2003 00:47:39 -0700 Date: Wed, 04 Jun 2003 00:47:38 -0700 (PDT) Message-Id: <20030604.004738.26506541.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com, vnuorval@tcs.hut.fi Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: "David S. Miller" In-Reply-To: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2872 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 299 Lines: 11 From: YOSHIFUJI Hideaki / 吉藤英明 Date: Wed, 04 Jun 2003 15:02:18 +0900 (JST) - use braces around "&" and "|". You mean "parentheses", braces define basic block scope in the C language, parentheses group expressions :-) Patch applied, thank you :-) From davem@redhat.com Wed Jun 4 00:56:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:56:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547tt2x022378 for ; Wed, 4 Jun 2003 00:56:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA30052; Wed, 4 Jun 2003 00:51:46 -0700 Date: Wed, 04 Jun 2003 00:51:45 -0700 (PDT) Message-Id: <20030604.005145.98890243.davem@redhat.com> To: davidm@hpl.hp.com, davidm@napali.hpl.hp.com Cc: niv@us.ibm.com, kuznet@ms2.inr.ac.ru, jmorris@intercode.com.au, gandalf@wlug.westbo.se, linux-kernel@vger.kernel.org, linux-ia64@linuxia64.org, netdev@oss.sgi.com, akpm@digeo.com Subject: Re: fix TCP roundtrip time update code From: "David S. Miller" In-Reply-To: <16093.36723.418623.698303@napali.hpl.hp.com> References: <16093.34022.445246.52398@napali.hpl.hp.com> <3EDD8BD2.9040008@us.ibm.com> <16093.36723.418623.698303@napali.hpl.hp.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2873 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 551 Lines: 14 From: David Mosberger Date: Tue, 3 Jun 2003 23:19:31 -0700 Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; This is correct. And it is the reason the connection dies silently. Because such write timeouts invoke tcp_done() which closes the connection off silently. This is correct behavior (sans the RTT bug David fixed of course :)) because a host which hasn't responded at all from so many repeated retransmission attempts isn't likely to get any reset we send either :) From jgarzik@pobox.com Wed Jun 4 00:57:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 00:57:29 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h547v32x022552 for ; Wed, 4 Jun 2003 00:57:24 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19NOwK-0007q1-1G; Wed, 04 Jun 2003 04:27:52 +0100 Message-ID: <3EDD672C.2000701@pobox.com> Date: Tue, 03 Jun 2003 23:27:40 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Regarding SET_NETDEV_DEV References: <20030603175921.GE2079@gtf.org> <20030603.200944.78736971.davem@redhat.com> In-Reply-To: <20030603.200944.78736971.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2874 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 629 Lines: 22 David S. Miller wrote: > From: Jeff Garzik > Date: Tue, 3 Jun 2003 13:59:21 -0400 > > For janitors and other developers placing this in net drivers... > please don't :) This can be done in upper layers, accomplishing the > same goal without changing the low-level net driver code at all. > > Don't say something can be done without showing exactly > how :-) > > How does register_netdevice() know that the device is "whatever" and > where to get the generic device struct from? Doh! You are totally right -- it can't get the association any other way. Folks, ignore me :) Jeff From yoshfuji@linux-ipv6.org Wed Jun 4 02:19:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 02:19:16 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h549J72x025255 for ; Wed, 4 Jun 2003 02:19:08 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h549JxBo005477; Wed, 4 Jun 2003 18:19:59 +0900 Date: Wed, 04 Jun 2003 18:19:59 +0900 (JST) Message-Id: <20030604.181959.41095926.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: netdev@oss.sgi.com Subject: Re: [PATCH] IPV6: typo, unrequired #undef and killing warning From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030604.004738.26506541.davem@redhat.com> References: <20030604.150218.69810413.yoshfuji@linux-ipv6.org> <20030604.004738.26506541.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2875 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 403 Lines: 10 In article <20030604.004738.26506541.davem@redhat.com> (at Wed, 04 Jun 2003 00:47:38 -0700 (PDT)), "David S. Miller" says: > You mean "parentheses", braces define basic block scope in the > C language, parentheses group expressions :-) I'm deeply ashamed... -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From vnuorval@tcs.hut.fi Wed Jun 4 05:44:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 05:45:00 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54CiO2x001635 for ; Wed, 4 Jun 2003 05:44:45 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Wed, 4 Jun 2003 15:40:07 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h54Ce6jH026238; Wed, 4 Jun 2003 15:40:06 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h54Ce2i4026232; Wed, 4 Jun 2003 15:40:02 +0300 Date: Wed, 4 Jun 2003 15:40:02 +0300 (EEST) From: Ville Nuorvala To: "David S. Miller" cc: kuznet@ms2.inr.ac.ru, , , , , , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030603.213458.112594590.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-377318441-1269789112-1054730402=:26066" X-archive-position: 2876 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev Content-Length: 55162 Lines: 922 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---377318441-1269789112-1054730402=:26066 Content-Type: TEXT/PLAIN; charset=US-ASCII On Tue, 3 Jun 2003, David S. Miller wrote: > You need to fix some things before I will apply this: > > 1) Bogus #ifdef CONFIG_IPV6_TUNNEL_MODULE. You need not this test > around things like MODULE_AUTHOR() and stuff like that, > linux/module.h does that for you. > Fixed. > 2) Dependency upon subtrees patch, please remove it. There is no > agreement on that semantic change to how subtrees work. > Done. I'll send a separate patch for the subtrees stuff if needed. The revised version is attached to this mail, but also available at: http://www.mipl.mediapoli.com/patches/ip6-tunnel-r2.patch -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 ---377318441-1269789112-1054730402=:26066 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="ip6-tunnel-r2.patch" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="ip6-tunnel-r2.patch" ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvaW5jbHVkZS9saW51eC9p Zl9hcnAuaCBtZXJnZS0yLjUvaW5jbHVkZS9saW51eC9pZl9hcnAuaA0KLS0t IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lmX2FycC5oCVdlZCBKdW4gIDQg MTM6NDM6MDMgMjAwMw0KKysrIG1lcmdlLTIuNS9pbmNsdWRlL2xpbnV4L2lm X2FycC5oCVdlZCBNYXkgMjggMjE6MTE6NDMgMjAwMw0KQEAgLTYwLDcgKzYw LDcgQEANCiAjZGVmaW5lIEFSUEhSRF9SQVdIRExDCTUxOAkJLyogUmF3IEhE TEMJCQkqLw0KIA0KICNkZWZpbmUgQVJQSFJEX1RVTk5FTAk3NjgJCS8qIElQ SVAgdHVubmVsCQkJKi8NCi0jZGVmaW5lIEFSUEhSRF9UVU5ORUw2CTc2OQkJ LyogSVBJUDYgdHVubmVsCQkJKi8NCisjZGVmaW5lIEFSUEhSRF9UVU5ORUw2 CTc2OQkJLyogSVA2SVA2IHR1bm5lbCAgICAgICAJCSovDQogI2RlZmluZSBB UlBIUkRfRlJBRAk3NzAgICAgICAgICAgICAgLyogRnJhbWUgUmVsYXkgQWNj ZXNzIERldmljZSAgICAqLw0KICNkZWZpbmUgQVJQSFJEX1NLSVAJNzcxCQkv KiBTS0lQIHZpZgkJCSovDQogI2RlZmluZSBBUlBIUkRfTE9PUEJBQ0sJNzcy CQkvKiBMb29wYmFjayBkZXZpY2UJCSovDQpkaWZmIC1OdXIgLS1leGNsdWRl PVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0 IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lwNl90dW5uZWwuaCBtZXJnZS0y LjUvaW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgNCi0tLSBsaW51eC0yLjUv aW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDow MCAxOTcwDQorKysgbWVyZ2UtMi41L2luY2x1ZGUvbGludXgvaXA2X3R1bm5l bC5oCVdlZCBNYXkgMjggMjE6MTE6NDMgMjAwMw0KQEAgLTAsMCArMSwzMiBA QA0KKy8qDQorICogJElkJA0KKyAqLw0KKw0KKyNpZm5kZWYgX0lQNl9UVU5O RUxfSA0KKyNkZWZpbmUgX0lQNl9UVU5ORUxfSA0KKw0KKyNkZWZpbmUgSVBW Nl9UTFZfVE5MX0VOQ0FQX0xJTUlUIDQNCisjZGVmaW5lIElQVjZfREVGQVVM VF9UTkxfRU5DQVBfTElNSVQgNA0KKw0KKy8qIGRvbid0IGFkZCBlbmNhcHN1 bGF0aW9uIGxpbWl0IGlmIG9uZSBpc24ndCBwcmVzZW50IGluIGlubmVyIHBh Y2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX0lHTl9FTkNBUF9MSU1JVCAw eDENCisvKiBjb3B5IHRoZSB0cmFmZmljIGNsYXNzIGZpZWxkIGZyb20gdGhl IGlubmVyIHBhY2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX1VTRV9PUklH X1RDTEFTUyAweDINCisvKiBjb3B5IHRoZSBmbG93bGFiZWwgZnJvbSB0aGUg aW5uZXIgcGFja2V0ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfVVNFX09SSUdf RkxPV0xBQkVMIDB4NA0KKy8qIGJlaW5nIHVzZWQgZm9yIE1vYmlsZSBJUHY2 ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfTUlQNl9ERVYgMHg4DQorDQorc3Ry dWN0IGlwNl90bmxfcGFybSB7DQorCWNoYXIgbmFtZVtJRk5BTVNJWl07CS8q IG5hbWUgb2YgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgbGluazsJCS8qIGlm aW5kZXggb2YgdW5kZXJseWluZyBMMiBpbnRlcmZhY2UgKi8NCisJX191OCBw cm90bzsJCS8qIHR1bm5lbCBwcm90b2NvbCAqLw0KKwlfX3U4IGVuY2FwX2xp bWl0OwkvKiBlbmNhcHN1bGF0aW9uIGxpbWl0IGZvciB0dW5uZWwgKi8NCisJ X191OCBob3BfbGltaXQ7CQkvKiBob3AgbGltaXQgZm9yIHR1bm5lbCAqLw0K KwlfX3UzMiBmbG93aW5mbzsJCS8qIHRyYWZmaWMgY2xhc3MgYW5kIGZsb3ds YWJlbCBmb3IgdHVubmVsICovDQorCV9fdTMyIGZsYWdzOwkJLyogdHVubmVs IGZsYWdzICovDQorCXN0cnVjdCBpbjZfYWRkciBsYWRkcjsJLyogbG9jYWwg dHVubmVsIGVuZC1wb2ludCBhZGRyZXNzICovDQorCXN0cnVjdCBpbjZfYWRk ciByYWRkcjsJLyogcmVtb3RlIHR1bm5lbCBlbmQtcG9pbnQgYWRkcmVzcyAq Lw0KK307DQorDQorI2VuZGlmDQpkaWZmIC1OdXIgLS1leGNsdWRlPVNDQ1Mg LS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4 LTIuNS9pbmNsdWRlL25ldC9pcDZfdHVubmVsLmggbWVyZ2UtMi41L2luY2x1 ZGUvbmV0L2lwNl90dW5uZWwuaA0KLS0tIGxpbnV4LTIuNS9pbmNsdWRlL25l dC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDowMCAxOTcwDQorKysg bWVyZ2UtMi41L2luY2x1ZGUvbmV0L2lwNl90dW5uZWwuaAlXZWQgTWF5IDI4 IDIxOjExOjQzIDIwMDMNCkBAIC0wLDAgKzEsNDQgQEANCisvKg0KKyAqICRJ ZCQNCisgKi8NCisNCisjaWZuZGVmIF9ORVRfSVA2X1RVTk5FTF9IDQorI2Rl ZmluZSBfTkVUX0lQNl9UVU5ORUxfSA0KKw0KKyNpbmNsdWRlIDxsaW51eC9p cHY2Lmg+DQorI2luY2x1ZGUgPGxpbnV4L25ldGRldmljZS5oPg0KKyNpbmNs dWRlIDxsaW51eC9pcDZfdHVubmVsLmg+DQorDQorLyogY2FwYWJsZSBvZiBz ZW5kaW5nIHBhY2tldHMgKi8NCisjZGVmaW5lIElQNl9UTkxfRl9DQVBfWE1J VCAweDEwMDAwDQorLyogY2FwYWJsZSBvZiByZWNlaXZpbmcgcGFja2V0cyAq Lw0KKyNkZWZpbmUgSVA2X1ROTF9GX0NBUF9SQ1YgMHgyMDAwMA0KKw0KKyNk ZWZpbmUgSVA2X1ROTF9NQVggMTI4DQorDQorLyogSVB2NiB0dW5uZWwgKi8N CisNCitzdHJ1Y3QgaXA2X3RubCB7DQorCXN0cnVjdCBpcDZfdG5sICpuZXh0 OwkvKiBuZXh0IHR1bm5lbCBpbiBsaXN0ICovDQorCXN0cnVjdCBuZXRfZGV2 aWNlICpkZXY7CS8qIHZpcnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0 dW5uZWwgKi8NCisJc3RydWN0IG5ldF9kZXZpY2Vfc3RhdHMgc3RhdDsJLyog c3RhdGlzdGljcyBmb3IgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgcmVjdXJz aW9uOwkJLyogZGVwdGggb2YgaGFyZF9zdGFydF94bWl0IHJlY3Vyc2lvbiAq Lw0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJtIHBhcm1zOwkvKiB0dW5uZWwgY29u ZmlndXJhdGlvbiBwYXJhbXRlcnMgKi8NCisJc3RydWN0IGZsb3dpIGZsOwkv KiBmbG93aSB0ZW1wbGF0ZSBmb3IgeG1pdCAqLw0KK307DQorDQorLyogVHVu bmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgZGVzdGluYXRpb24gc3ViLW9wdGlv biAqLw0KKw0KK3N0cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSB7DQorCV9f dTggdHlwZTsJCS8qIHR5cGUtY29kZSBmb3Igb3B0aW9uICAgICAgICAgKi8N CisJX191OCBsZW5ndGg7CQkvKiBvcHRpb24gbGVuZ3RoICAgICAgICAgICAg ICAgICovDQorCV9fdTggZW5jYXBfbGltaXQ7CS8qIHR1bm5lbCBlbmNhcHN1 bGF0aW9uIGxpbWl0ICAgKi8NCit9IF9fYXR0cmlidXRlX18gKChwYWNrZWQp KTsNCisNCisjaWZkZWYgX19LRVJORUxfXw0KKyNpZmRlZiBDT05GSUdfSVBW Nl9UVU5ORUwNCitleHRlcm4gaW50IF9faW5pdCBpcDZfdHVubmVsX2luaXQo dm9pZCk7DQorZXh0ZXJuIHZvaWQgaXA2X3R1bm5lbF9jbGVhbnVwKHZvaWQp Ow0KKyNlbmRpZg0KKyNlbmRpZg0KKyNlbmRpZg0KZGlmZiAtTnVyIC0tZXhj bHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5n ZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvS2NvbmZpZyBtZXJnZS0yLjUvbmV0 L2lwdjYvS2NvbmZpZw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9LY29uZmln CVdlZCBKdW4gIDQgMTM6NDM6NDQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQv aXB2Ni9LY29uZmlnCVdlZCBKdW4gIDQgMTI6MjA6MzMgMjAwMw0KQEAgLTQy LDQgKzQyLDEyIEBADQogDQogCSAgSWYgdW5zdXJlLCBzYXkgWS4NCiANCitj b25maWcgSVBWNl9UVU5ORUwNCisJdHJpc3RhdGUgIklQdjY6IElQdjYtaW4t SVB2NiB0dW5uZWwiDQorCWRlcGVuZHMgb24gSVBWNg0KKwktLS1oZWxwLS0t DQorCSAgU3VwcG9ydCBmb3IgSVB2Ni1pbi1JUHY2IHR1bm5lbHMgZGVzY3Jp YmVkIGluIFJGQyAyNDczLg0KKw0KKwkgIElmIHVuc3VyZSwgc2F5IE4uDQor DQogc291cmNlICJuZXQvaXB2Ni9uZXRmaWx0ZXIvS2NvbmZpZyINCmRpZmYg LU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhj bHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L01ha2VmaWxlIG1l cmdlLTIuNS9uZXQvaXB2Ni9NYWtlZmlsZQ0KLS0tIGxpbnV4LTIuNS9uZXQv aXB2Ni9NYWtlZmlsZQlXZWQgSnVuICA0IDEzOjQzOjA2IDIwMDMNCisrKyBt ZXJnZS0yLjUvbmV0L2lwdjYvTWFrZWZpbGUJV2VkIE1heSAyOCAyMToxMTo1 OSAyMDAzDQpAQCAtMTUsMyArMTUsNSBAQA0KIG9iai0kKENPTkZJR19JTkVU Nl9FU1ApICs9IGVzcDYubw0KIG9iai0kKENPTkZJR19JTkVUNl9JUENPTVAp ICs9IGlwY29tcDYubw0KIG9iai0kKENPTkZJR19ORVRGSUxURVIpCSs9IG5l dGZpbHRlci8NCisNCitvYmotJChDT05GSUdfSVBWNl9UVU5ORUwpICs9IGlw Nl90dW5uZWwubw0KZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVk ZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0 L2lwdjYvYWZfaW5ldDYuYyBtZXJnZS0yLjUvbmV0L2lwdjYvYWZfaW5ldDYu Yw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9hZl9pbmV0Ni5jCVdlZCBKdW4g IDQgMTM6NDM6MDcgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9hZl9p bmV0Ni5jCVdlZCBNYXkgMjggMjE6MTM6MDggMjAwMw0KQEAgLTU3LDYgKzU3 LDkgQEANCiAjaW5jbHVkZSA8bmV0L3RyYW5zcF92Ni5oPg0KICNpbmNsdWRl IDxuZXQvaXA2X3JvdXRlLmg+DQogI2luY2x1ZGUgPG5ldC9hZGRyY29uZi5o Pg0KKyNpZiBDT05GSUdfSVBWNl9UVU5ORUwNCisjaW5jbHVkZSA8bmV0L2lw Nl90dW5uZWwuaD4NCisjZW5kaWYNCiANCiAjaW5jbHVkZSA8YXNtL3VhY2Nl c3MuaD4NCiAjaW5jbHVkZSA8YXNtL3N5c3RlbS5oPg0KQEAgLTc4MCw2ICs3 ODMsMTEgQEANCiAJZXJyID0gbmRpc2NfaW5pdCgmaW5ldDZfZmFtaWx5X29w cyk7DQogCWlmIChlcnIpDQogCQlnb3RvIG5kaXNjX2ZhaWw7DQorI2lmZGVm IENPTkZJR19JUFY2X1RVTk5FTA0KKwllcnIgPSBpcDZfdHVubmVsX2luaXQo KTsNCisJaWYgKGVycikNCisJCWdvdG8gaXA2X3R1bm5lbF9mYWlsOw0KKyNl bmRpZg0KIAllcnIgPSBpZ21wNl9pbml0KCZpbmV0Nl9mYW1pbHlfb3BzKTsN CiAJaWYgKGVycikNCiAJCWdvdG8gaWdtcF9mYWlsOw0KQEAgLTgzNCw2ICs4 NDIsMTAgQEANCiAJaWdtcDZfY2xlYW51cCgpOw0KICNlbmRpZg0KIGlnbXBf ZmFpbDoNCisjaWZkZWYgQ09ORklHX0lQVjZfVFVOTkVMDQorCWlwNl90dW5u ZWxfY2xlYW51cCgpOw0KK2lwNl90dW5uZWxfZmFpbDoNCisjZW5kaWYNCiAJ bmRpc2NfY2xlYW51cCgpOw0KIG5kaXNjX2ZhaWw6DQogCWljbXB2Nl9jbGVh bnVwKCk7DQpAQCAtODY5LDYgKzg4MSw5IEBADQogCWlwNl9yb3V0ZV9jbGVh bnVwKCk7DQogCWlwdjZfcGFja2V0X2NsZWFudXAoKTsNCiAJaWdtcDZfY2xl YW51cCgpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUwNCisJaXA2X3R1 bm5lbF9jbGVhbnVwKCk7DQorI2VuZGlmDQogCW5kaXNjX2NsZWFudXAoKTsN CiAJaWNtcHY2X2NsZWFudXAoKTsNCiAjaWZkZWYgQ09ORklHX1NZU0NUTA0K ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1 bm5lbC5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVsLmMNCi0tLSBs aW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1bm5lbC5jCVRodSBKYW4gIDEgMDI6 MDA6MDAgMTk3MA0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVs LmMJV2VkIEp1biAgNCAxMzo0NDo1MCAyMDAzDQpAQCAtMCwwICsxLDEyNjEg QEANCisvKg0KKyAqCUlQdjYgb3ZlciBJUHY2IHR1bm5lbCBkZXZpY2UNCisg KglMaW51eCBJTkVUNiBpbXBsZW1lbnRhdGlvbg0KKyAqDQorICoJQXV0aG9y czoNCisgKglWaWxsZSBOdW9ydmFsYQkJPHZudW9ydmFsQHRjcy5odXQuZmk+ CQ0KKyAqDQorICoJJElkJA0KKyAqDQorICogICAgICBCYXNlZCBvbjoNCisg KiAgICAgIGxpbnV4L25ldC9pcHY2L3NpdC5jDQorICoNCisgKiAgICAgIFJG QyAyNDczDQorICoNCisgKglUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2Fy ZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yDQorICogICAgICBt b2RpZnkgaXQgdW5kZXIgdGhlIHRlcm1zIG9mIHRoZSBHTlUgR2VuZXJhbCBQ dWJsaWMgTGljZW5zZQ0KKyAqICAgICAgYXMgcHVibGlzaGVkIGJ5IHRoZSBG cmVlIFNvZnR3YXJlIEZvdW5kYXRpb247IGVpdGhlciB2ZXJzaW9uDQorICog ICAgICAyIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFu eSBsYXRlciB2ZXJzaW9uLg0KKyAqDQorICovDQorDQorI2luY2x1ZGUgPGxp bnV4L2NvbmZpZy5oPg0KKyNpbmNsdWRlIDxsaW51eC9tb2R1bGUuaD4NCisj aW5jbHVkZSA8bGludXgvZXJybm8uaD4NCisjaW5jbHVkZSA8bGludXgvdHlw ZXMuaD4NCisjaW5jbHVkZSA8bGludXgvc29ja2V0Lmg+DQorI2luY2x1ZGUg PGxpbnV4L3NvY2tpb3MuaD4NCisjaW5jbHVkZSA8bGludXgvaWYuaD4NCisj aW5jbHVkZSA8bGludXgvaW4uaD4NCisjaW5jbHVkZSA8bGludXgvaXAuaD4N CisjaW5jbHVkZSA8bGludXgvaWZfdHVubmVsLmg+DQorI2luY2x1ZGUgPGxp bnV4L25ldC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pbjYuaD4NCisjaW5jbHVk ZSA8bGludXgvbmV0ZGV2aWNlLmg+DQorI2luY2x1ZGUgPGxpbnV4L2lmX2Fy cC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pY21wdjYuaD4NCisjaW5jbHVkZSA8 bGludXgvaW5pdC5oPg0KKyNpbmNsdWRlIDxsaW51eC9yb3V0ZS5oPg0KKyNp bmNsdWRlIDxsaW51eC9ydG5ldGxpbmsuaD4NCisNCisjaW5jbHVkZSA8YXNt L3VhY2Nlc3MuaD4NCisjaW5jbHVkZSA8YXNtL2F0b21pYy5oPg0KKw0KKyNp bmNsdWRlIDxuZXQvaXAuaD4NCisjaW5jbHVkZSA8bmV0L3NvY2suaD4NCisj aW5jbHVkZSA8bmV0L2lwdjYuaD4NCisjaW5jbHVkZSA8bmV0L3Byb3RvY29s Lmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfcm91dGUuaD4NCisjaW5jbHVkZSA8 bmV0L2FkZHJjb25mLmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfdHVubmVsLmg+ DQorDQorTU9EVUxFX0FVVEhPUigiVmlsbGUgTnVvcnZhbGEiKTsNCitNT0RV TEVfREVTQ1JJUFRJT04oIklQdjYtaW4tSVB2NiB0dW5uZWwiKTsNCitNT0RV TEVfTElDRU5TRSgiR1BMIik7DQorDQorI2RlZmluZSBJUFY2X1RMVl9URUxf RFNUX1NJWkUgOA0KKw0KKyNpZmRlZiBJUDZfVE5MX0RFQlVHDQorI2RlZmlu ZSBJUDZfVE5MX1RSQUNFKHguLi4pIHByaW50ayhLRVJOX0RFQlVHICIlczoi IHggIlxuIiwgX19GVU5DVElPTl9fKQ0KKyNlbHNlDQorI2RlZmluZSBJUDZf VE5MX1RSQUNFKHguLi4pIGRvIHs7fSB3aGlsZSgwKQ0KKyNlbmRpZg0KKw0K KyNkZWZpbmUgSVBWNl9UQ0xBU1NfTUFTSyAoSVBWNl9GTE9XSU5GT19NQVNL ICYgfklQVjZfRkxPV0xBQkVMX01BU0spDQorDQorLyogc29ja2V0KHMpIHVz ZWQgYnkgaXA2aXA2X3RubF94bWl0KCkgZm9yIHJlc2VuZGluZyBwYWNrZXRz ICovDQorc3RhdGljIHN0cnVjdCBzb2NrZXQgKl9faXA2X3NvY2tldFtOUl9D UFVTXTsNCisjZGVmaW5lIGlwNl9zb2NrZXQgX19pcDZfc29ja2V0W3NtcF9w cm9jZXNzb3JfaWQoKV0NCisNCitzdGF0aWMgdm9pZCBpcDZfeG1pdF9sb2Nr KHZvaWQpDQorew0KKwlsb2NhbF9iaF9kaXNhYmxlKCk7DQorCWlmICh1bmxp a2VseSghc3Bpbl90cnlsb2NrKCZpcDZfc29ja2V0LT5zay0+bG9jay5zbG9j aykpKQ0KKwkJQlVHKCk7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlwNl94bWl0 X3VubG9jayh2b2lkKQ0KK3sNCisJc3Bpbl91bmxvY2tfYmgoJmlwNl9zb2Nr ZXQtPnNrLT5sb2NrLnNsb2NrKTsNCit9DQorDQorI2RlZmluZSBIQVNIX1NJ WkUgIDMyDQorDQorI2RlZmluZSBIQVNIKGFkZHIpICgoKGFkZHIpLT5zNl9h ZGRyMzJbMF0gXiAoYWRkciktPnM2X2FkZHIzMlsxXSBeIFwNCisJICAgICAg ICAgICAgIChhZGRyKS0+czZfYWRkcjMyWzJdIF4gKGFkZHIpLT5zNl9hZGRy MzJbM10pICYgXA0KKyAgICAgICAgICAgICAgICAgICAgKEhBU0hfU0laRSAt IDEpKQ0KKw0KK3N0YXRpYyBpbnQgaXA2aXA2X2ZiX3RubF9kZXZfaW5pdChz dHJ1Y3QgbmV0X2RldmljZSAqZGV2KTsNCitzdGF0aWMgaW50IGlwNmlwNl90 bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldik7DQorDQorLyog dGhlIElQdjYgdHVubmVsIGZhbGxiYWNrIGRldmljZSAqLw0KK3N0YXRpYyBz dHJ1Y3QgbmV0X2RldmljZSBpcDZpcDZfZmJfdG5sX2RldiA9IHsNCisJLm5h bWUgPSAiaXA2dG5sMCIsDQorCS5pbml0ID0gaXA2aXA2X2ZiX3RubF9kZXZf aW5pdA0KK307DQorDQorLyogdGhlIElQdjYgZmFsbGJhY2sgdHVubmVsICov DQorc3RhdGljIHN0cnVjdCBpcDZfdG5sIGlwNmlwNl9mYl90bmwgPSB7DQor CS5kZXYgPSAmaXA2aXA2X2ZiX3RubF9kZXYsDQorCS5wYXJtcyA9ey5uYW1l ID0gImlwNnRubDAiLCAucHJvdG8gPSBJUFBST1RPX0lQVjZ9DQorfTsNCisN CisvKiBsaXN0cyBmb3Igc3RvcmluZyB0dW5uZWxzIGluIHVzZSAqLw0KK3N0 YXRpYyBzdHJ1Y3QgaXA2X3RubCAqdG5sc19yX2xbSEFTSF9TSVpFXTsNCitz dGF0aWMgc3RydWN0IGlwNl90bmwgKnRubHNfd2NbMV07DQorc3RhdGljIHN0 cnVjdCBpcDZfdG5sICoqdG5sc1syXSA9IHsgdG5sc193YywgdG5sc19yX2wg fTsNCisNCisvKiBsb2NrIGZvciB0aGUgdHVubmVsIGxpc3RzICovDQorc3Rh dGljIHJ3bG9ja190IGlwNmlwNl9sb2NrID0gUldfTE9DS19VTkxPQ0tFRDsN CisNCisvKioNCisgKiBpcDZpcDZfdG5sX2xvb2t1cCAtIGZldGNoIHR1bm5l bCBtYXRjaGluZyB0aGUgZW5kLXBvaW50IGFkZHJlc3Nlcw0KKyAqICAgQHJl bW90ZTogdGhlIGFkZHJlc3Mgb2YgdGhlIHR1bm5lbCBleGl0LXBvaW50IA0K KyAqICAgQGxvY2FsOiB0aGUgYWRkcmVzcyBvZiB0aGUgdHVubmVsIGVudHJ5 LXBvaW50IA0KKyAqDQorICogUmV0dXJuOiAgDQorICogICB0dW5uZWwgbWF0 Y2hpbmcgZ2l2ZW4gZW5kLXBvaW50cyBpZiBmb3VuZCwNCisgKiAgIGVsc2Ug ZmFsbGJhY2sgdHVubmVsIGlmIGl0cyBkZXZpY2UgaXMgdXAsIA0KKyAqICAg ZWxzZSAlTlVMTA0KKyAqKi8NCisNCitzdHJ1Y3QgaXA2X3RubCAqDQoraXA2 aXA2X3RubF9sb29rdXAoc3RydWN0IGluNl9hZGRyICpyZW1vdGUsIHN0cnVj dCBpbjZfYWRkciAqbG9jYWwpDQorew0KKwl1bnNpZ25lZCBoMCA9IEhBU0go cmVtb3RlKTsNCisJdW5zaWduZWQgaDEgPSBIQVNIKGxvY2FsKTsNCisJc3Ry dWN0IGlwNl90bmwgKnQ7DQorDQorCWZvciAodCA9IHRubHNfcl9sW2gwIF4g aDFdOyB0OyB0ID0gdC0+bmV4dCkgew0KKwkJaWYgKCFpcHY2X2FkZHJfY21w KGxvY2FsLCAmdC0+cGFybXMubGFkZHIpICYmDQorCQkgICAgIWlwdjZfYWRk cl9jbXAocmVtb3RlLCAmdC0+cGFybXMucmFkZHIpICYmDQorCQkgICAgKHQt PmRldi0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJCXJldHVybiB0Ow0KKwl9DQor CWlmICgodCA9IHRubHNfd2NbMF0pICE9IE5VTEwgJiYgKHQtPmRldi0+Zmxh Z3MgJiBJRkZfVVApKQ0KKwkJcmV0dXJuIHQ7DQorDQorCXJldHVybiBOVUxM Ow0KK30NCisNCisvKioNCisgKiBpcDZpcDZfYnVja2V0IC0gZ2V0IGhlYWQg b2YgbGlzdCBtYXRjaGluZyBnaXZlbiB0dW5uZWwgcGFyYW1ldGVycw0KKyAq ICAgQHA6IHBhcmFtZXRlcnMgY29udGFpbmluZyB0dW5uZWwgZW5kLXBvaW50 cyANCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2J1Y2tl dCgpIHJldHVybnMgdGhlIGhlYWQgb2YgdGhlIGxpc3QgbWF0Y2hpbmcgdGhl IA0KKyAqICAgJnN0cnVjdCBpbjZfYWRkciBlbnRyaWVzIGxhZGRyIGFuZCBy YWRkciBpbiBAcC4NCisgKg0KKyAqIFJldHVybjogaGVhZCBvZiBJUHY2IHR1 bm5lbCBsaXN0IA0KKyAqKi8NCisNCitzdGF0aWMgc3RydWN0IGlwNl90bmwg KioNCitpcDZpcDZfYnVja2V0KHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnApDQor ew0KKwlzdHJ1Y3QgaW42X2FkZHIgKnJlbW90ZSA9ICZwLT5yYWRkcjsNCisJ c3RydWN0IGluNl9hZGRyICpsb2NhbCA9ICZwLT5sYWRkcjsNCisJdW5zaWdu ZWQgaCA9IDA7DQorCWludCBwcmlvID0gMDsNCisNCisJaWYgKCFpcHY2X2Fk ZHJfYW55KHJlbW90ZSkgfHwgIWlwdjZfYWRkcl9hbnkobG9jYWwpKSB7DQor CQlwcmlvID0gMTsNCisJCWggPSBIQVNIKHJlbW90ZSkgXiBIQVNIKGxvY2Fs KTsNCisJfQ0KKwlyZXR1cm4gJnRubHNbcHJpb11baF07DQorfQ0KKw0KKy8q Kg0KKyAqIGlwNmlwNl90bmxfbGluayAtIGFkZCB0dW5uZWwgdG8gaGFzaCB0 YWJsZQ0KKyAqICAgQHQ6IHR1bm5lbCB0byBiZSBhZGRlZA0KKyAqKi8NCisN CitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfbGluayhzdHJ1Y3QgaXA2X3Ru bCAqdCkNCit7DQorCXN0cnVjdCBpcDZfdG5sICoqdHAgPSBpcDZpcDZfYnVj a2V0KCZ0LT5wYXJtcyk7DQorDQorCXdyaXRlX2xvY2tfYmgoJmlwNmlwNl9s b2NrKTsNCisJdC0+bmV4dCA9ICp0cDsNCisJd3JpdGVfdW5sb2NrX2JoKCZp cDZpcDZfbG9jayk7DQorCSp0cCA9IHQ7DQorfQ0KKw0KKy8qKg0KKyAqIGlw NmlwNl90bmxfdW5saW5rIC0gcmVtb3ZlIHR1bm5lbCBmcm9tIGhhc2ggdGFi bGUNCisgKiAgIEB0OiB0dW5uZWwgdG8gYmUgcmVtb3ZlZA0KKyAqKi8NCisN CitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfdW5saW5rKHN0cnVjdCBpcDZf dG5sICp0KQ0KK3sNCisJc3RydWN0IGlwNl90bmwgKip0cDsNCisNCisJZm9y ICh0cCA9IGlwNmlwNl9idWNrZXQoJnQtPnBhcm1zKTsgKnRwOyB0cCA9ICYo KnRwKS0+bmV4dCkgew0KKwkJaWYgKHQgPT0gKnRwKSB7DQorCQkJd3JpdGVf bG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwkJCSp0cCA9IHQtPm5leHQ7DQor CQkJd3JpdGVfdW5sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQkJYnJlYWs7 DQorCQl9DQorCX0NCit9DQorDQorLyoqDQorICogaXA2X3RubF9jcmVhdGUo KSAtIGNyZWF0ZSBhIG5ldyB0dW5uZWwNCisgKiAgIEBwOiB0dW5uZWwgcGFy YW1ldGVycw0KKyAqICAgQHB0OiBwb2ludGVyIHRvIG5ldyB0dW5uZWwNCisg Kg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgQ3JlYXRlIHR1bm5lbCBtYXRj aGluZyBnaXZlbiBwYXJhbWV0ZXJzLg0KKyAqIA0KKyAqIFJldHVybjogDQor ICogICAwIG9uIHN1Y2Nlc3MNCisgKiovDQorDQorc3RhdGljIGludA0KK2lw Nl90bmxfY3JlYXRlKHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAsIHN0cnVjdCBp cDZfdG5sICoqcHQpDQorew0KKwlzdHJ1Y3QgbmV0X2RldmljZSAqZGV2Ow0K KwlpbnQgZXJyID0gLUVOT0JVRlM7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0K Kw0KKwlkZXYgPSBrbWFsbG9jKHNpemVvZiAoKmRldikgKyBzaXplb2YgKCp0 KSwgR0ZQX0tFUk5FTCk7DQorCWlmICghZGV2KQ0KKwkJcmV0dXJuIGVycjsN CisNCisJbWVtc2V0KGRldiwgMCwgc2l6ZW9mICgqZGV2KSArIHNpemVvZiAo KnQpKTsNCisJZGV2LT5wcml2ID0gKHZvaWQgKikgKGRldiArIDEpOw0KKwl0 ID0gKHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJdC0+ZGV2ID0g ZGV2Ow0KKwlkZXYtPmluaXQgPSBpcDZpcDZfdG5sX2Rldl9pbml0Ow0KKwlt ZW1jcHkoJnQtPnBhcm1zLCBwLCBzaXplb2YgKCpwKSk7DQorCXQtPnBhcm1z Lm5hbWVbSUZOQU1TSVogLSAxXSA9ICdcMCc7DQorCWlmICh0LT5wYXJtcy5o b3BfbGltaXQgPiAyNTUpDQorCQl0LT5wYXJtcy5ob3BfbGltaXQgPSAtMTsN CisJc3RyY3B5KGRldi0+bmFtZSwgdC0+cGFybXMubmFtZSk7DQorCWlmICgh ZGV2LT5uYW1lWzBdKSB7DQorCQlpbnQgaSA9IDA7DQorCQlpbnQgZXhpc3Rz ID0gMDsNCisNCisJCWRvIHsNCisJCQlzcHJpbnRmKGRldi0+bmFtZSwgImlw NnRubCVkIiwgKytpKTsNCisJCQlleGlzdHMgPSAoX19kZXZfZ2V0X2J5X25h bWUoZGV2LT5uYW1lKSAhPSBOVUxMKTsNCisJCX0gd2hpbGUgKGkgPCBJUDZf VE5MX01BWCAmJiBleGlzdHMpOw0KKw0KKwkJaWYgKGkgPT0gSVA2X1ROTF9N QVgpIHsNCisJCQlnb3RvIGZhaWxlZDsNCisJCX0NCisJCW1lbWNweSh0LT5w YXJtcy5uYW1lLCBkZXYtPm5hbWUsIElGTkFNU0laKTsNCisJfQ0KKwlTRVRf TU9EVUxFX09XTkVSKGRldik7DQorCWlmICgoZXJyID0gcmVnaXN0ZXJfbmV0 ZGV2aWNlKGRldikpIDwgMCkgew0KKwkJZ290byBmYWlsZWQ7DQorCX0NCisJ aXA2aXA2X3RubF9saW5rKHQpOw0KKwkqcHQgPSB0Ow0KKwlyZXR1cm4gMDsN CitmYWlsZWQ6DQorCWtmcmVlKGRldik7DQorCXJldHVybiBlcnI7DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNl90bmxfZGVzdHJveSgpIC0gZGVzdHJveSBvbGQg dHVubmVsDQorICogICBAdDogdHVubmVsIHRvIGJlIGRlc3Ryb3llZA0KKyAq DQorICogUmV0dXJuOg0KKyAqICAgd2hhdGV2ZXIgdW5yZWdpc3Rlcl9uZXRk ZXZpY2UoKSByZXR1cm5zDQorICoqLw0KKw0KK3N0YXRpYyBpbmxpbmUgaW50 DQoraXA2X3RubF9kZXN0cm95KHN0cnVjdCBpcDZfdG5sICp0KQ0KK3sNCisJ cmV0dXJuIHVucmVnaXN0ZXJfbmV0ZGV2aWNlKHQtPmRldik7DQorfQ0KKw0K Ky8qKg0KKyAqIGlwNmlwNl90bmxfbG9jYXRlIC0gZmluZCBvciBjcmVhdGUg dHVubmVsIG1hdGNoaW5nIGdpdmVuIHBhcmFtZXRlcnMNCisgKiAgIEBwOiB0 dW5uZWwgcGFyYW1ldGVycyANCisgKiAgIEBjcmVhdGU6ICE9IDAgaWYgYWxs b3dlZCB0byBjcmVhdGUgbmV3IHR1bm5lbCBpZiBubyBtYXRjaCBmb3VuZA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBpcDZpcDZfdG5sX2xvY2F0 ZSgpIGZpcnN0IHRyaWVzIHRvIGxvY2F0ZSBhbiBleGlzdGluZyB0dW5uZWwN CisgKiAgIGJhc2VkIG9uIEBwYXJtcy4gSWYgdGhpcyBpcyB1bnN1Y2Nlc3Nm dWwsIGJ1dCBAY3JlYXRlIGlzIHNldCBhIG5ldw0KKyAqICAgdHVubmVsIGRl dmljZSBpcyBjcmVhdGVkIGFuZCByZWdpc3RlcmVkIGZvciB1c2UuDQorICoN CisgKiBSZXR1cm46DQorICogICAwIGlmIHR1bm5lbCBsb2NhdGVkIG9yIGNy ZWF0ZWQsDQorICogICAtRUlOVkFMIGlmIHBhcmFtZXRlcnMgaW5jb3JyZWN0 LA0KKyAqICAgLUVOT0RFViBpZiBubyBtYXRjaGluZyB0dW5uZWwgYXZhaWxh YmxlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZfdG5sX2xvY2F0 ZShzdHJ1Y3QgaXA2X3RubF9wYXJtICpwLCBzdHJ1Y3QgaXA2X3RubCAqKnB0 LCBpbnQgY3JlYXRlKQ0KK3sNCisJc3RydWN0IGluNl9hZGRyICpyZW1vdGUg PSAmcC0+cmFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAqbG9jYWwgPSAmcC0+ bGFkZHI7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKw0KKwlpZiAocC0+cHJv dG8gIT0gSVBQUk9UT19JUFY2KQ0KKwkJcmV0dXJuIC1FSU5WQUw7DQorDQor CWZvciAodCA9ICppcDZpcDZfYnVja2V0KHApOyB0OyB0ID0gdC0+bmV4dCkg ew0KKwkJaWYgKCFpcHY2X2FkZHJfY21wKGxvY2FsLCAmdC0+cGFybXMubGFk ZHIpICYmDQorCQkgICAgIWlwdjZfYWRkcl9jbXAocmVtb3RlLCAmdC0+cGFy bXMucmFkZHIpKSB7DQorCQkJKnB0ID0gdDsNCisJCQlyZXR1cm4gKGNyZWF0 ZSA/IC1FRVhJU1QgOiAwKTsNCisJCX0NCisJfQ0KKwlpZiAoIWNyZWF0ZSkg ew0KKwkJcmV0dXJuIC1FTk9ERVY7DQorCX0NCisJcmV0dXJuIGlwNl90bmxf Y3JlYXRlKHAsIHB0KTsNCit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9k ZXZfZGVzdHJ1Y3RvciAtIHR1bm5lbCBkZXZpY2UgZGVzdHJ1Y3Rvcg0KKyAq ICAgQGRldjogdGhlIGRldmljZSB0byBiZSBkZXN0cm95ZWQNCisgKiovDQor DQorc3RhdGljIHZvaWQNCitpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yKHN0 cnVjdCBuZXRfZGV2aWNlICpkZXYpDQorew0KKwlrZnJlZShkZXYpOw0KK30N CisNCisvKioNCisgKiBpcDZpcDZfdG5sX2Rldl91bmluaXQgLSB0dW5uZWwg ZGV2aWNlIHVuaW5pdGlhbGl6ZXINCisgKiAgIEBkZXY6IHRoZSBkZXZpY2Ug dG8gYmUgZGVzdHJveWVkDQorICogICANCisgKiBEZXNjcmlwdGlvbjoNCisg KiAgIGlwNmlwNl90bmxfZGV2X3VuaW5pdCgpIHJlbW92ZXMgdHVubmVsIGZy b20gaXRzIGxpc3QNCisgKiovDQorDQorc3RhdGljIHZvaWQNCitpcDZpcDZf dG5sX2Rldl91bmluaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQor CWlmIChkZXYgPT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQl3cml0ZV9s b2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQl0bmxzX3djWzBdID0gTlVMTDsN CisJCXdyaXRlX3VubG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwl9IGVsc2Ug ew0KKwkJc3RydWN0IGlwNl90bmwgKnQgPSAoc3RydWN0IGlwNl90bmwgKikg ZGV2LT5wcml2Ow0KKwkJaXA2aXA2X3RubF91bmxpbmsodCk7DQorCX0NCit9 DQorDQorLyoqDQorICogcGFyc2VfdHZsX3RubF9lbmNfbGltIC0gaGFuZGxl IGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICogICBAc2tiOiByZWNl aXZlZCBzb2NrZXQgYnVmZmVyDQorICoNCisgKiBSZXR1cm46IA0KKyAqICAg MCBpZiBub25lIHdhcyBmb3VuZCwgDQorICogICBlbHNlIGluZGV4IHRvIGVu Y2Fwc3VsYXRpb24gbGltaXQNCisgKiovDQorDQorc3RhdGljIF9fdTE2DQor cGFyc2VfdGx2X3RubF9lbmNfbGltKHN0cnVjdCBza19idWZmICpza2IsIF9f dTggKiByYXcpDQorew0KKwlzdHJ1Y3QgaXB2NmhkciAqaXB2NmggPSAoc3Ry dWN0IGlwdjZoZHIgKikgcmF3Ow0KKwlfX3U4IG5leHRoZHIgPSBpcHY2aC0+ bmV4dGhkcjsNCisJX191MTYgb2ZmID0gc2l6ZW9mICgqaXB2NmgpOw0KKw0K Kwl3aGlsZSAoaXB2Nl9leHRfaGRyKG5leHRoZHIpICYmIG5leHRoZHIgIT0g TkVYVEhEUl9OT05FKSB7DQorCQlfX3UxNiBvcHRsZW4gPSAwOw0KKwkJc3Ry dWN0IGlwdjZfb3B0X2hkciAqaGRyOw0KKwkJaWYgKHJhdyArIG9mZiArIHNp emVvZiAoKmhkcikgPiBza2ItPmRhdGEgJiYNCisJCSAgICAhcHNrYl9tYXlf cHVsbChza2IsIHJhdyAtIHNrYi0+ZGF0YSArIG9mZiArIHNpemVvZiAoKmhk cikpKQ0KKwkJCWJyZWFrOw0KKw0KKwkJaGRyID0gKHN0cnVjdCBpcHY2X29w dF9oZHIgKikgKHJhdyArIG9mZik7DQorCQlpZiAobmV4dGhkciA9PSBORVhU SERSX0ZSQUdNRU5UKSB7DQorCQkJc3RydWN0IGZyYWdfaGRyICpmcmFnX2hk ciA9IChzdHJ1Y3QgZnJhZ19oZHIgKikgaGRyOw0KKwkJCWlmIChmcmFnX2hk ci0+ZnJhZ19vZmYpDQorCQkJCWJyZWFrOw0KKwkJCW9wdGxlbiA9IDg7DQor CQl9IGVsc2UgaWYgKG5leHRoZHIgPT0gTkVYVEhEUl9BVVRIKSB7DQorCQkJ b3B0bGVuID0gKGhkci0+aGRybGVuICsgMikgPDwgMjsNCisJCX0gZWxzZSB7 DQorCQkJb3B0bGVuID0gaXB2Nl9vcHRsZW4oaGRyKTsNCisJCX0NCisJCWlm IChuZXh0aGRyID09IE5FWFRIRFJfREVTVCkgew0KKwkJCV9fdTE2IGkgPSBv ZmYgKyAyOw0KKwkJCXdoaWxlICgxKSB7DQorCQkJCXN0cnVjdCBpcHY2X3Rs dl90bmxfZW5jX2xpbSAqdGVsOw0KKw0KKwkJCQkvKiBObyBtb3JlIHJvb20g Zm9yIGVuY2Fwc3VsYXRpb24gbGltaXQgKi8NCisJCQkJaWYgKGkgKyBzaXpl b2YgKCp0ZWwpID4gb2ZmICsgb3B0bGVuKQ0KKwkJCQkJYnJlYWs7DQorDQor CQkJCXRlbCA9IChzdHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0gKikgJnJh d1tpXTsNCisJCQkJLyogcmV0dXJuIGluZGV4IG9mIG9wdGlvbiBpZiBmb3Vu ZCBhbmQgdmFsaWQgKi8NCisJCQkJaWYgKHRlbC0+dHlwZSA9PSBJUFY2X1RM Vl9UTkxfRU5DQVBfTElNSVQgJiYNCisJCQkJICAgIHRlbC0+bGVuZ3RoID09 IDEpDQorCQkJCQlyZXR1cm4gaTsNCisJCQkJLyogZWxzZSBqdW1wIHRvIG5l eHQgb3B0aW9uICovDQorCQkJCWlmICh0ZWwtPnR5cGUpDQorCQkJCQlpICs9 IHRlbC0+bGVuZ3RoICsgMjsNCisJCQkJZWxzZQ0KKwkJCQkJaSsrOw0KKwkJ CX0NCisJCX0NCisJCW5leHRoZHIgPSBoZHItPm5leHRoZHI7DQorCQlvZmYg Kz0gb3B0bGVuOw0KKwl9DQorCXJldHVybiAwOw0KK30NCisNCisvKioNCisg KiBpcDZpcDZfZXJyIC0gdHVubmVsIGVycm9yIGhhbmRsZXINCisgKg0KKyAq IERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2VycigpIHNob3VsZCBoYW5k bGUgZXJyb3JzIGluIHRoZSB0dW5uZWwgYWNjb3JkaW5nDQorICogICB0byB0 aGUgc3BlY2lmaWNhdGlvbnMgaW4gUkZDIDI0NzMuDQorICoqLw0KKw0KK3Zv aWQgaXA2aXA2X2VycihzdHJ1Y3Qgc2tfYnVmZiAqc2tiLCBzdHJ1Y3QgaW5l dDZfc2tiX3Bhcm0gKm9wdCwNCisJCSAgIGludCB0eXBlLCBpbnQgY29kZSwg aW50IG9mZnNldCwgX191MzIgaW5mbykNCit7DQorCXN0cnVjdCBpcHY2aGRy ICppcHY2aCA9IChzdHJ1Y3QgaXB2NmhkciAqKSBza2ItPmRhdGE7DQorCXN0 cnVjdCBpcDZfdG5sICp0Ow0KKwlpbnQgcmVsX21zZyA9IDA7DQorCWludCBy ZWxfdHlwZSA9IElDTVBWNl9ERVNUX1VOUkVBQ0g7DQorCWludCByZWxfY29k ZSA9IElDTVBWNl9BRERSX1VOUkVBQ0g7DQorCV9fdTMyIHJlbF9pbmZvID0g MDsNCisJX191MTYgbGVuOw0KKw0KKwkvKiBJZiB0aGUgcGFja2V0IGRvZXNu J3QgY29udGFpbiB0aGUgb3JpZ2luYWwgSVB2NiBoZWFkZXIgd2UgYXJlIA0K KwkgICBpbiB0cm91YmxlIHNpbmNlIHdlIG1pZ2h0IG5lZWQgdGhlIHNvdXJj ZSBhZGRyZXNzIGZvciBmdXJ0ZXIgDQorCSAgIHByb2Nlc3Npbmcgb2YgdGhl IGVycm9yLiAqLw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisJ aWYgKCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5kYWRkciwgJmlw djZoLT5zYWRkcikpID09IE5VTEwpDQorCQlnb3RvIG91dDsNCisNCisJc3dp dGNoICh0eXBlKSB7DQorCQlfX3UzMiB0ZWxpOw0KKwkJc3RydWN0IGlwdjZf dGx2X3RubF9lbmNfbGltICp0ZWw7DQorCQlfX3UzMiBtdHU7DQorCWNhc2Ug SUNNUFY2X0RFU1RfVU5SRUFDSDoNCisJCWlmIChuZXRfcmF0ZWxpbWl0KCkp DQorCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCSAgICAgICAiJXM6IFBh dGggdG8gZGVzdGluYXRpb24gaW52YWxpZCAiDQorCQkJICAgICAgICJvciBp bmFjdGl2ZSFcbiIsIHQtPnBhcm1zLm5hbWUpOw0KKwkJcmVsX21zZyA9IDE7 DQorCQlicmVhazsNCisJY2FzZSBJQ01QVjZfVElNRV9FWENFRUQ6DQorCQlp ZiAoY29kZSA9PSBJQ01QVjZfRVhDX0hPUExJTUlUKSB7DQorCQkJaWYgKG5l dF9yYXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJ CQkgICAgICAgIiVzOiBUb28gc21hbGwgaG9wIGxpbWl0IG9yICINCisJCQkJ ICAgICAgICJyb3V0aW5nIGxvb3AgaW4gdHVubmVsIVxuIiwgDQorCQkJCSAg ICAgICB0LT5wYXJtcy5uYW1lKTsNCisJCQlyZWxfbXNnID0gMTsNCisJCX0N CisJCWJyZWFrOw0KKwljYXNlIElDTVBWNl9QQVJBTVBST0I6DQorCQkvKiBp Z25vcmUgaWYgcGFyYW1ldGVyIHByb2JsZW0gbm90IGNhdXNlZCBieSBhIHR1 bm5lbA0KKwkJICAgZW5jYXBzdWxhdGlvbiBsaW1pdCBzdWItb3B0aW9uICov DQorCQlpZiAoY29kZSAhPSBJQ01QVjZfSERSX0ZJRUxEKSB7DQorCQkJYnJl YWs7DQorCQl9DQorCQl0ZWxpID0gcGFyc2VfdGx2X3RubF9lbmNfbGltKHNr Yiwgc2tiLT5kYXRhKTsNCisNCisJCWlmICh0ZWxpICYmIHRlbGkgPT0gaW5m byAtIDIpIHsNCisJCQl0ZWwgPSAoc3RydWN0IGlwdjZfdGx2X3RubF9lbmNf bGltICopICZza2ItPmRhdGFbdGVsaV07DQorCQkJaWYgKHRlbC0+ZW5jYXBf bGltaXQgPD0gMSkgew0KKwkJCQlpZiAobmV0X3JhdGVsaW1pdCgpKQ0KKwkJ CQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCQkJICAgICAgICIlczogVG9v IHNtYWxsIGVuY2Fwc3VsYXRpb24gIg0KKwkJCQkJICAgICAgICJsaW1pdCBv ciByb3V0aW5nIGxvb3AgaW4gIg0KKwkJCQkJICAgICAgICJ0dW5uZWwhXG4i LCB0LT5wYXJtcy5uYW1lKTsNCisJCQkJcmVsX21zZyA9IDE7DQorCQkJfQ0K KwkJfQ0KKwkJYnJlYWs7DQorCWNhc2UgSUNNUFY2X1BLVF9UT09CSUc6DQor CQltdHUgPSBpbmZvIC0gb2Zmc2V0Ow0KKwkJaWYgKG10dSA8PSBJUFY2X01J Tl9NVFUpIHsNCisJCQltdHUgPSBJUFY2X01JTl9NVFU7DQorCQl9DQorCQl0 LT5kZXYtPm10dSA9IG10dTsNCisNCisJCWlmICgobGVuID0gc2l6ZW9mICgq aXB2NmgpICsgaXB2NmgtPnBheWxvYWRfbGVuKSA+IG10dSkgew0KKwkJCXJl bF90eXBlID0gSUNNUFY2X1BLVF9UT09CSUc7DQorCQkJcmVsX2NvZGUgPSAw Ow0KKwkJCXJlbF9pbmZvID0gbXR1Ow0KKwkJCXJlbF9tc2cgPSAxOw0KKwkJ fQ0KKwkJYnJlYWs7DQorCX0NCisJaWYgKHJlbF9tc2cgJiYgIHBza2JfbWF5 X3B1bGwoc2tiLCBvZmZzZXQgKyBzaXplb2YgKCppcHY2aCkpKSB7DQorCQlz dHJ1Y3QgcnQ2X2luZm8gKnJ0Ow0KKwkJc3RydWN0IHNrX2J1ZmYgKnNrYjIg PSBza2JfY2xvbmUoc2tiLCBHRlBfQVRPTUlDKTsNCisJCWlmICghc2tiMikN CisJCQlnb3RvIG91dDsNCisNCisJCWRzdF9yZWxlYXNlKHNrYjItPmRzdCk7 DQorCQlza2IyLT5kc3QgPSBOVUxMOw0KKwkJc2tiX3B1bGwoc2tiMiwgb2Zm c2V0KTsNCisJCXNrYjItPm5oLnJhdyA9IHNrYjItPmRhdGE7DQorDQorCQkv KiBUcnkgdG8gZ3Vlc3MgaW5jb21pbmcgaW50ZXJmYWNlICovDQorCQlydCA9 IHJ0Nl9sb29rdXAoJnNrYjItPm5oLmlwdjZoLT5zYWRkciwgTlVMTCwgMCwg MCk7DQorDQorCQlpZiAocnQgJiYgcnQtPnJ0NmlfZGV2KQ0KKwkJCXNrYjIt PmRldiA9IHJ0LT5ydDZpX2RldjsNCisNCisJCWljbXB2Nl9zZW5kKHNrYjIs IHJlbF90eXBlLCByZWxfY29kZSwgcmVsX2luZm8sIHNrYjItPmRldik7DQor DQorCQlpZiAocnQpDQorCQkJZHN0X2ZyZWUoJnJ0LT51LmRzdCk7DQorDQor CQlrZnJlZV9za2Ioc2tiMik7DQorCX0NCitvdXQ6DQorCXJlYWRfdW5sb2Nr KCZpcDZpcDZfbG9jayk7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9yY3Yg LSBkZWNhcHN1bGF0ZSBJUHY2IHBhY2tldCBhbmQgcmV0cmFuc21pdCBpdCBs b2NhbGx5DQorICogICBAc2tiOiByZWNlaXZlZCBzb2NrZXQgYnVmZmVyDQor ICoNCisgKiBSZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlwNl9yY3Yo c3RydWN0IHNrX2J1ZmYgKipwc2tiLCB1bnNpZ25lZCBpbnQgKm5ob2ZmcCkN Cit7DQorCXN0cnVjdCBza19idWZmICpza2IgPSAqcHNrYjsNCisJc3RydWN0 IGlwdjZoZHIgKmlwdjZoOw0KKwlzdHJ1Y3QgaXA2X3RubCAqdDsNCisNCisJ aWYgKCFwc2tiX21heV9wdWxsKHNrYiwgc2l6ZW9mICgqaXB2NmgpKSkNCisJ CWdvdG8gZGlzY2FyZDsNCisNCisJaXB2NmggPSBza2ItPm5oLmlwdjZoOw0K Kw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisNCisJaWYgKCh0ID0g aXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5zYWRkciwgJmlwdjZoLT5kYWRk cikpICE9IE5VTEwpIHsNCisJCWlmICghKHQtPnBhcm1zLmZsYWdzICYgSVA2 X1ROTF9GX0NBUF9SQ1YpKSB7DQorCQkJdC0+c3RhdC5yeF9kcm9wcGVkKys7 DQorCQkJcmVhZF91bmxvY2soJmlwNmlwNl9sb2NrKTsNCisJCQlnb3RvIGRp c2NhcmQ7DQorCQl9DQorCQlza2ItPm1hYy5yYXcgPSBza2ItPm5oLnJhdzsN CisJCXNrYi0+bmgucmF3ID0gc2tiLT5kYXRhOw0KKwkJc2tiLT5wcm90b2Nv bCA9IGh0b25zKEVUSF9QX0lQVjYpOw0KKwkJc2tiLT5wa3RfdHlwZSA9IFBB Q0tFVF9IT1NUOw0KKwkJbWVtc2V0KHNrYi0+Y2IsIDAsIHNpemVvZihzdHJ1 Y3QgaW5ldDZfc2tiX3Bhcm0pKTsNCisJCXNrYi0+ZGV2ID0gdC0+ZGV2Ow0K KwkJZHN0X3JlbGVhc2Uoc2tiLT5kc3QpOw0KKwkJc2tiLT5kc3QgPSBOVUxM Ow0KKwkJdC0+c3RhdC5yeF9wYWNrZXRzKys7DQorCQl0LT5zdGF0LnJ4X2J5 dGVzICs9IHNrYi0+bGVuOw0KKwkJbmV0aWZfcngoc2tiKTsNCisJCXJlYWRf dW5sb2NrKCZpcDZpcDZfbG9jayk7DQorCQlyZXR1cm4gMDsNCisJfQ0KKwly ZWFkX3VubG9jaygmaXA2aXA2X2xvY2spOw0KKwlpY21wdjZfc2VuZChza2Is IElDTVBWNl9ERVNUX1VOUkVBQ0gsIElDTVBWNl9BRERSX1VOUkVBQ0gsIDAs IHNrYi0+ZGV2KTsNCitkaXNjYXJkOg0KKwlrZnJlZV9za2Ioc2tiKTsNCisJ cmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIHR4b3B0X2xlbiAtIGdldCBu ZWNlc3Nhcnkgc2l6ZSBmb3IgbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMN CisgKiAgIEBvcmlnX29wdDogb2xkIG9wdGlvbnMNCisgKg0KKyAqIFJldHVy bjoNCisgKiAgIFNpemUgb2Ygb2xkIG9uZSBwbHVzIHNpemUgb2YgdHVubmVs IGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICoqLw0KKw0KK3N0YXRp YyBpbmxpbmUgaW50DQordHhvcHRfbGVuKHN0cnVjdCBpcHY2X3R4b3B0aW9u cyAqb3JpZ19vcHQpDQorew0KKwlpbnQgbGVuID0gc2l6ZW9mICgqb3JpZ19v cHQpICsgODsNCisNCisJaWYgKG9yaWdfb3B0ICYmIG9yaWdfb3B0LT5kc3Qw b3B0KQ0KKwkJbGVuICs9IGlwdjZfb3B0bGVuKG9yaWdfb3B0LT5kc3Qwb3B0 KTsNCisJcmV0dXJuIGxlbjsNCit9DQorDQorLyoqDQorICogbWVyZ2Vfb3B0 aW9ucyAtIGFkZCBlbmNhcHN1bGF0aW9uIGxpbWl0IHRvIG9yaWdpbmFsIG9w dGlvbnMNCisgKiAgIEBlbmNhcF9saW1pdDogbnVtYmVyIG9mIGFsbG93ZWQg ZW5jYXBzdWxhdGlvbiBsaW1pdHMNCisgKiAgIEBvcmlnX29wdDogb3JpZ2lu YWwgb3B0aW9ucw0KKyAqIA0KKyAqIFJldHVybjoNCisgKiAgIFBvaW50ZXIg dG8gbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgY29udGFpbmluZyB0aGUg dHVubmVsDQorICogICBlbmNhcHN1bGF0aW9uIGxpbWl0DQorICoqLw0KKw0K K3N0YXRpYyBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKg0KK21lcmdlX29wdGlv bnMoc3RydWN0IHNvY2sgKnNrLCBfX3U4IGVuY2FwX2xpbWl0LA0KKwkgICAg ICBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9yaWdfb3B0KQ0KK3sNCisJc3Ry dWN0IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCXN0cnVjdCBpcHY2 X3R4b3B0aW9ucyAqb3B0Ow0KKwlfX3U4ICpyYXc7DQorCV9fdTggcGFkX3Rv ID0gODsNCisJaW50IG9wdF9sZW4gPSB0eG9wdF9sZW4ob3JpZ19vcHQpOw0K Kw0KKwlpZiAoIShvcHQgPSBzb2NrX2ttYWxsb2Moc2ssIG9wdF9sZW4sIEdG UF9BVE9NSUMpKSkgew0KKwkJcmV0dXJuIE5VTEw7DQorCX0NCisNCisJbWVt c2V0KG9wdCwgMCwgb3B0X2xlbik7DQorCW9wdC0+dG90X2xlbiA9IG9wdF9s ZW47DQorCW9wdC0+ZHN0MG9wdCA9IChzdHJ1Y3QgaXB2Nl9vcHRfaGRyICop IChvcHQgKyAxKTsNCisJb3B0LT5vcHRfbmZsZW4gPSA4Ow0KKw0KKwlyYXcg PSAoX191OCAqKSBvcHQtPmRzdDBvcHQ7DQorDQorCXRlbCA9IChzdHJ1Y3Qg aXB2Nl90bHZfdG5sX2VuY19saW0gKikgKG9wdC0+ZHN0MG9wdCArIDEpOw0K Kwl0ZWwtPnR5cGUgPSBJUFY2X1RMVl9UTkxfRU5DQVBfTElNSVQ7DQorCXRl bC0+bGVuZ3RoID0gMTsNCisJdGVsLT5lbmNhcF9saW1pdCA9IGVuY2FwX2xp bWl0Ow0KKw0KKwlpZiAob3JpZ19vcHQpIHsNCisJCV9fdTggKm9yaWdfcmF3 Ow0KKw0KKwkJb3B0LT5ob3BvcHQgPSBvcmlnX29wdC0+aG9wb3B0Ow0KKw0K KwkJLyogS2VlcCB0aGUgb3JpZ2luYWwgZGVzdGluYXRpb24gb3B0aW9ucyBw cm9wZXJseQ0KKwkJICAgYWxpZ25lZCBhbmQgbWVyZ2UgcG9zc2libGUgb2xk IHBhZGRpbmdzIHRvIHRoZQ0KKwkJICAgbmV3IHBhZGRpbmcgb3B0aW9uICov DQorCQlpZiAoKG9yaWdfcmF3ID0gKF9fdTggKikgb3JpZ19vcHQtPmRzdDBv cHQpICE9IE5VTEwpIHsNCisJCQlfX3U4IHR5cGU7DQorCQkJaW50IGkgPSBz aXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXBhZF90byArPSBz aXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXdoaWxlIChpIDwg aXB2Nl9vcHRsZW4ob3JpZ19vcHQtPmRzdDBvcHQpKSB7DQorCQkJCXR5cGUg PSBvcmlnX3Jhd1tpKytdOw0KKwkJCQlpZiAodHlwZSA9PSBJUFY2X1RMVl9Q QUQwKQ0KKwkJCQkJcGFkX3RvKys7DQorCQkJCWVsc2UgaWYgKHR5cGUgPT0g SVBWNl9UTFZfUEFETikgew0KKwkJCQkJaW50IGxlbiA9IG9yaWdfcmF3W2kr K107DQorCQkJCQlpICs9IGxlbjsNCisJCQkJCXBhZF90byArPSBsZW4gKyAy Ow0KKwkJCQl9IGVsc2Ugew0KKwkJCQkJYnJlYWs7DQorCQkJCX0NCisJCQl9 DQorCQkJb3B0LT5kc3Qwb3B0LT5oZHJsZW4gPSBvcmlnX29wdC0+ZHN0MG9w dC0+aGRybGVuICsgMTsNCisJCQltZW1jcHkocmF3ICsgcGFkX3RvLCBvcmln X3JhdyArIHBhZF90byAtIDgsDQorCQkJICAgICAgIG9wdF9sZW4gLSBzaXpl b2YgKCpvcHQpIC0gcGFkX3RvKTsNCisJCX0NCisJCW9wdC0+c3JjcnQgPSBv cmlnX29wdC0+c3JjcnQ7DQorCQlvcHQtPm9wdF9uZmxlbiArPSBvcmlnX29w dC0+b3B0X25mbGVuOw0KKw0KKwkJb3B0LT5kc3Qxb3B0ID0gb3JpZ19vcHQt PmRzdDFvcHQ7DQorCQlvcHQtPmF1dGggPSBvcmlnX29wdC0+YXV0aDsNCisJ CW9wdC0+b3B0X2ZsZW4gPSBvcmlnX29wdC0+b3B0X2ZsZW47DQorCX0NCisJ cmF3WzVdID0gSVBWNl9UTFZfUEFETjsNCisNCisJLyogc3VidHJhY3QgbGVu Z3RocyBvZiBkZXN0aW5hdGlvbiBzdWJvcHRpb24gaGVhZGVyLA0KKwkgICB0 dW5uZWwgZW5jYXBzdWxhdGlvbiBsaW1pdCBhbmQgcGFkIE4gaGVhZGVyICov DQorCXJhd1s2XSA9IHBhZF90byAtIDc7DQorDQorCXJldHVybiBvcHQ7DQor fQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfYWRkcl9jb25mbGljdCAtIGNv bXBhcmUgcGFja2V0IGFkZHJlc3NlcyB0byB0dW5uZWwncyBvd24NCisgKiAg IEB0OiB0aGUgb3V0Z29pbmcgdHVubmVsIGRldmljZQ0KKyAqICAgQGhkcjog SVB2NiBoZWFkZXIgZnJvbSB0aGUgaW5jb21pbmcgcGFja2V0IA0KKyAqDQor ICogRGVzY3JpcHRpb246DQorICogICBBdm9pZCB0cml2aWFsIHR1bm5lbGlu ZyBsb29wIGJ5IGNoZWNraW5nIHRoYXQgdHVubmVsIGV4aXQtcG9pbnQgDQor ICogICBkb2Vzbid0IG1hdGNoIHNvdXJjZSBvZiBpbmNvbWluZyBwYWNrZXQu DQorICoNCisgKiBSZXR1cm46IA0KKyAqICAgMSBpZiBjb25mbGljdCwNCisg KiAgIDAgZWxzZQ0KKyAqKi8NCisNCitzdGF0aWMgaW5saW5lIGludA0KK2lw NmlwNl90bmxfYWRkcl9jb25mbGljdChzdHJ1Y3QgaXA2X3RubCAqdCwgc3Ry dWN0IGlwdjZoZHIgKmhkcikNCit7DQorCXJldHVybiAhaXB2Nl9hZGRyX2Nt cCgmdC0+cGFybXMucmFkZHIsICZoZHItPnNhZGRyKTsNCit9DQorDQorLyoq DQorICogaXA2aXA2X3RubF94bWl0IC0gZW5jYXBzdWxhdGUgcGFja2V0IGFu ZCBzZW5kIA0KKyAqICAgQHNrYjogdGhlIG91dGdvaW5nIHNvY2tldCBidWZm ZXINCisgKiAgIEBkZXY6IHRoZSBvdXRnb2luZyB0dW5uZWwgZGV2aWNlIA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBCdWlsZCBuZXcgaGVhZGVy IGFuZCBkbyBzb21lIHNhbml0eSBjaGVja3Mgb24gdGhlIHBhY2tldCBiZWZv cmUgc2VuZGluZw0KKyAqICAgaXQgdG8gaXA2X2J1aWxkX3htaXQoKS4NCisg Kg0KKyAqIFJldHVybjogDQorICogICAwDQorICoqLw0KKw0KK2ludCBpcDZp cDZfdG5sX3htaXQoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3RydWN0IG5ldF9k ZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0cnVj dCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJc3RydWN0IG5ldF9kZXZpY2Vf c3RhdHMgKnN0YXRzID0gJnQtPnN0YXQ7DQorCXN0cnVjdCBpcHY2aGRyICpp cHY2aCA9IHNrYi0+bmguaXB2Nmg7DQorCXN0cnVjdCBpcHY2X3R4b3B0aW9u cyAqb3JpZ19vcHQgPSBOVUxMOw0KKwlzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMg Km9wdCA9IE5VTEw7DQorCV9fdTggZW5jYXBfbGltaXQgPSAwOw0KKwlfX3Ux NiBvZmZzZXQ7DQorCXN0cnVjdCBmbG93aSBmbDsNCisJc3RydWN0IGlwNl9m bG93bGFiZWwgKmZsX2xibCA9IE5VTEw7DQorCWludCBlcnIgPSAwOw0KKwlz dHJ1Y3QgZHN0X2VudHJ5ICpkc3Q7DQorCWludCBsaW5rX2ZhaWx1cmUgPSAw Ow0KKwlzdHJ1Y3Qgc29jayAqc2sgPSBpcDZfc29ja2V0LT5zazsNCisJc3Ry dWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOw0KKwlpbnQgbXR1 Ow0KKw0KKwlpZiAodC0+cmVjdXJzaW9uKyspIHsNCisJCXN0YXRzLT5jb2xs aXNpb25zKys7DQorCQlnb3RvIHR4X2VycjsNCisJfQ0KKwlpZiAoc2tiLT5w cm90b2NvbCAhPSBodG9ucyhFVEhfUF9JUFY2KSB8fA0KKwkgICAgISh0LT5w YXJtcy5mbGFncyAmIElQNl9UTkxfRl9DQVBfWE1JVCkgfHwNCisJICAgIGlw NmlwNl90bmxfYWRkcl9jb25mbGljdCh0LCBpcHY2aCkpIHsNCisJCWdvdG8g dHhfZXJyOw0KKwl9DQorCWlmICgob2Zmc2V0ID0gcGFyc2VfdGx2X3RubF9l bmNfbGltKHNrYiwgc2tiLT5uaC5yYXcpKSA+IDApIHsNCisJCXN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKwkJdGVsID0gKHN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqKSAmc2tiLT5uaC5yYXdbb2Zmc2V0XTsN CisJCWlmICh0ZWwtPmVuY2FwX2xpbWl0IDw9IDEpIHsNCisJCQlpY21wdjZf c2VuZChza2IsIElDTVBWNl9QQVJBTVBST0IsDQorCQkJCSAgICBJQ01QVjZf SERSX0ZJRUxELCBvZmZzZXQgKyAyLCBza2ItPmRldik7DQorCQkJZ290byB0 eF9lcnI7DQorCQl9DQorCQllbmNhcF9saW1pdCA9IHRlbC0+ZW5jYXBfbGlt aXQgLSAxOw0KKwl9IGVsc2UgaWYgKCEodC0+cGFybXMuZmxhZ3MgJiBJUDZf VE5MX0ZfSUdOX0VOQ0FQX0xJTUlUKSkgew0KKwkJZW5jYXBfbGltaXQgPSB0 LT5wYXJtcy5lbmNhcF9saW1pdDsNCisJfQ0KKwlpcDZfeG1pdF9sb2NrKCk7 DQorDQorCW1lbWNweSgmZmwsICZ0LT5mbCwgc2l6ZW9mIChmbCkpOw0KKw0K KwlpZiAoKHQtPnBhcm1zLmZsYWdzICYgSVA2X1ROTF9GX1VTRV9PUklHX1RD TEFTUykpDQorCQlmbC5mbDZfZmxvd2xhYmVsIHw9ICgqKF9fdTMyICopIGlw djZoICYgSVBWNl9UQ0xBU1NfTUFTSyk7DQorCWlmICgodC0+cGFybXMuZmxh Z3MgJiBJUDZfVE5MX0ZfVVNFX09SSUdfRkxPV0xBQkVMKSkNCisJCWZsLmZs Nl9mbG93bGFiZWwgfD0gKCooX191MzIgKikgaXB2NmggJiBJUFY2X0ZMT1dM QUJFTF9NQVNLKTsNCisNCisJaWYgKGZsLmZsNl9mbG93bGFiZWwpIHsNCisJ CWZsX2xibCA9IGZsNl9zb2NrX2xvb2t1cChzaywgZmwuZmw2X2Zsb3dsYWJl bCk7DQorCQlpZiAoZmxfbGJsKQ0KKwkJCW9yaWdfb3B0ID0gZmxfbGJsLT5v cHQ7DQorCX0NCisJaWYgKGVuY2FwX2xpbWl0ID4gMCkgew0KKwkJaWYgKCEo b3B0ID0gbWVyZ2Vfb3B0aW9ucyhzaywgZW5jYXBfbGltaXQsIG9yaWdfb3B0 KSkpIHsNCisJCQlnb3RvIHR4X2Vycl9mcmVlX2ZsX2xibDsNCisJCX0NCisJ fSBlbHNlIHsNCisJCW9wdCA9IG9yaWdfb3B0Ow0KKwl9DQorCWRzdCA9IF9f c2tfZHN0X2NoZWNrKHNrLCBucC0+ZHN0X2Nvb2tpZSk7DQorDQorCWlmIChk c3QpIHsNCisJCWlmIChucC0+ZGFkZHJfY2FjaGUgPT0gTlVMTCB8fA0KKwkJ ICAgIGlwdjZfYWRkcl9jbXAoJmZsLmZsNl9kc3QsIG5wLT5kYWRkcl9jYWNo ZSkgfHwNCisJCSAgICAoZmwub2lmICYmIGZsLm9pZiAhPSBkc3QtPmRldi0+ aWZpbmRleCkpIHsNCisJCQlkc3QgPSBOVUxMOw0KKwkJfQ0KKwl9DQorCWlm IChkc3QgPT0gTlVMTCkgew0KKwkJZHN0ID0gaXA2X3JvdXRlX291dHB1dChz aywgJmZsKTsNCisJCWlmIChkc3QtPmVycm9yKSB7DQorCQkJc3RhdHMtPnR4 X2NhcnJpZXJfZXJyb3JzKys7DQorCQkJbGlua19mYWlsdXJlID0gMTsNCisJ CQlnb3RvIHR4X2Vycl9kc3RfcmVsZWFzZTsNCisJCX0NCisJCS8qIGxvY2Fs IHJvdXRpbmcgbG9vcCAqLw0KKwkJaWYgKGRzdC0+ZGV2ID09IGRldikgew0K KwkJCXN0YXRzLT5jb2xsaXNpb25zKys7DQorCQkJaWYgKG5ldF9yYXRlbGlt aXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORyANCisJCQkJICAgICAg ICIlczogTG9jYWwgcm91dGluZyBsb29wIGRldGVjdGVkIVxuIiwNCisJCQkJ ICAgICAgIHQtPnBhcm1zLm5hbWUpOw0KKwkJCWdvdG8gdHhfZXJyX2RzdF9y ZWxlYXNlOw0KKwkJfQ0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5kYWRkciwg JmZsLmZsNl9kc3QpOw0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5zYWRkciwg JmZsLmZsNl9zcmMpOw0KKwl9DQorCW10dSA9IGRzdF9wbXR1KGRzdCkgLSBz aXplb2YgKCppcHY2aCk7DQorCWlmIChvcHQpIHsNCisJCW10dSAtPSAob3B0 LT5vcHRfbmZsZW4gKyBvcHQtPm9wdF9mbGVuKTsNCisJfQ0KKwlpZiAobXR1 IDwgSVBWNl9NSU5fTVRVKQ0KKwkJbXR1ID0gSVBWNl9NSU5fTVRVOw0KKwlp ZiAoc2tiLT5kc3QgJiYgbXR1IDwgZHN0X3BtdHUoc2tiLT5kc3QpKSB7DQor CQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gKHN0cnVjdCBydDZfaW5mbyAqKSBz a2ItPmRzdDsNCisJCXJ0LT5ydDZpX2ZsYWdzIHw9IFJURl9NT0RJRklFRDsN CisJCXJ0LT51LmRzdC5tZXRyaWNzW1JUQVhfTVRVLTFdID0gbXR1Ow0KKwl9 DQorCWlmIChza2ItPmxlbiA+IG10dSkgew0KKwkJaWNtcHY2X3NlbmQoc2ti LCBJQ01QVjZfUEtUX1RPT0JJRywgMCwgbXR1LCBkZXYpOw0KKwkJZ290byB0 eF9lcnJfb3B0X3JlbGVhc2U7DQorCX0NCisJZXJyID0gaXA2X2FwcGVuZF9k YXRhKHNrLCBpcF9nZW5lcmljX2dldGZyYWcsIHNrYi0+bmgucmF3LCBza2It PmxlbiwgMCwNCisJCQkgICAgICB0LT5wYXJtcy5ob3BfbGltaXQsIG9wdCwg JmZsLCANCisJCQkgICAgICAoc3RydWN0IHJ0Nl9pbmZvICopZHN0LCBNU0df RE9OVFdBSVQpOw0KKw0KKwlpZiAoZXJyKSB7DQorCQlpcDZfZmx1c2hfcGVu ZGluZ19mcmFtZXMoc2spOw0KKwl9IGVsc2Ugew0KKwkJZXJyID0gaXA2X3B1 c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwkJZXJyID0gKGVyciA8IDAgPyBl cnIgOiAwKTsNCisJfQ0KKwlpZiAoIWVycikgew0KKwkJc3RhdHMtPnR4X2J5 dGVzICs9IHNrYi0+bGVuOw0KKwkJc3RhdHMtPnR4X3BhY2tldHMrKzsNCisJ fSBlbHNlIHsNCisJCXN0YXRzLT50eF9lcnJvcnMrKzsNCisJCXN0YXRzLT50 eF9hYm9ydGVkX2Vycm9ycysrOw0KKwl9DQorCWlmIChvcHQgJiYgb3B0ICE9 IG9yaWdfb3B0KQ0KKwkJc29ja19rZnJlZV9zKHNrLCBvcHQsIG9wdC0+dG90 X2xlbik7DQorDQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2 X2RzdF9zdG9yZShzaywgZHN0LCAmbnAtPmRhZGRyKTsNCisJaXA2X3htaXRf dW5sb2NrKCk7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1cnNpb24t LTsNCisJcmV0dXJuIDA7DQordHhfZXJyX2RzdF9yZWxlYXNlOg0KKwlkc3Rf cmVsZWFzZShkc3QpOw0KK3R4X2Vycl9vcHRfcmVsZWFzZToNCisJaWYgKG9w dCAmJiBvcHQgIT0gb3JpZ19vcHQpDQorCQlzb2NrX2tmcmVlX3Moc2ssIG9w dCwgb3B0LT50b3RfbGVuKTsNCit0eF9lcnJfZnJlZV9mbF9sYmw6DQorCWZs Nl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2X3htaXRfdW5sb2NrKCk7 DQorCWlmIChsaW5rX2ZhaWx1cmUpDQorCQlkc3RfbGlua19mYWlsdXJlKHNr Yik7DQordHhfZXJyOg0KKwlzdGF0cy0+dHhfZXJyb3JzKys7DQorCXN0YXRz LT50eF9kcm9wcGVkKys7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1 cnNpb24tLTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlw Nl90bmxfc2V0X2NhcChzdHJ1Y3QgaXA2X3RubCAqdCkNCit7DQorCXN0cnVj dCBpcDZfdG5sX3Bhcm0gKnAgPSAmdC0+cGFybXM7DQorCXN0cnVjdCBpbjZf YWRkciAqbGFkZHIgPSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAq cmFkZHIgPSAmcC0+cmFkZHI7DQorCWludCBsdHlwZSA9IGlwdjZfYWRkcl90 eXBlKGxhZGRyKTsNCisJaW50IHJ0eXBlID0gaXB2Nl9hZGRyX3R5cGUocmFk ZHIpOw0KKw0KKwlwLT5mbGFncyAmPSB+KElQNl9UTkxfRl9DQVBfWE1JVHxJ UDZfVE5MX0ZfQ0FQX1JDVik7DQorDQorCWlmIChsdHlwZSAhPSBJUFY2X0FE RFJfQU5ZICYmIHJ0eXBlICE9IElQVjZfQUREUl9BTlkgJiYNCisJICAgICgo bHR5cGV8cnR5cGUpICYNCisJICAgICAoSVBWNl9BRERSX1VOSUNBU1R8DQor CSAgICAgIElQVjZfQUREUl9MT09QQkFDS3xJUFY2X0FERFJfTElOS0xPQ0FM fA0KKwkgICAgICBJUFY2X0FERFJfTUFQUEVEfElQVjZfQUREUl9SRVNFUlZF RCkpID09IElQVjZfQUREUl9VTklDQVNUKSB7DQorCQlzdHJ1Y3QgbmV0X2Rl dmljZSAqbGRldiA9IE5VTEw7DQorCQlpbnQgbF9vayA9IDE7DQorCQlpbnQg cl9vayA9IDE7DQorDQorCQlpZiAocC0+bGluaykNCisJCQlsZGV2ID0gZGV2 X2dldF9ieV9pbmRleChwLT5saW5rKTsNCisJCQ0KKwkJaWYgKChsdHlwZSZJ UFY2X0FERFJfVU5JQ0FTVCkgJiYgIWlwdjZfY2hrX2FkZHIobGFkZHIsIGxk ZXYpKQ0KKwkJCWxfb2sgPSAwOw0KKwkJDQorCQlpZiAoKHJ0eXBlJklQVjZf QUREUl9VTklDQVNUKSAmJiBpcHY2X2Noa19hZGRyKHJhZGRyLCBOVUxMKSkN CisJCQlyX29rID0gMDsNCisJCQ0KKwkJaWYgKGxfb2sgJiYgcl9vaykgew0K KwkJCWlmIChsdHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+Zmxh Z3MgfD0gSVA2X1ROTF9GX0NBUF9YTUlUOw0KKwkJCWlmIChydHlwZSZJUFY2 X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+ZmxhZ3MgfD0gSVA2X1ROTF9GX0NB UF9SQ1Y7DQorCQl9DQorCQlpZiAobGRldikNCisJCQlkZXZfcHV0KGxkZXYp Ow0KKwl9DQorfQ0KKw0KKw0KK3N0YXRpYyB2b2lkIGlwNmlwNl90bmxfbGlu a19jb25maWcoc3RydWN0IGlwNl90bmwgKnQpDQorew0KKwlzdHJ1Y3QgbmV0 X2RldmljZSAqZGV2ID0gdC0+ZGV2Ow0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJt ICpwID0gJnQtPnBhcm1zOw0KKwlzdHJ1Y3QgZmxvd2kgKmZsOw0KKwkvKiBT ZXQgdXAgZmxvd2kgdGVtcGxhdGUgKi8NCisJZmwgPSAmdC0+Zmw7DQorCWlw djZfYWRkcl9jb3B5KCZmbC0+Zmw2X3NyYywgJnAtPmxhZGRyKTsNCisJaXB2 Nl9hZGRyX2NvcHkoJmZsLT5mbDZfZHN0LCAmcC0+cmFkZHIpOw0KKwlmbC0+ b2lmID0gcC0+bGluazsNCisJZmwtPmZsNl9mbG93bGFiZWwgPSAwOw0KKw0K KwlpZiAoIShwLT5mbGFncyZJUDZfVE5MX0ZfVVNFX09SSUdfVENMQVNTKSkN CisJCWZsLT5mbDZfZmxvd2xhYmVsIHw9IElQVjZfVENMQVNTX01BU0sgJiBo dG9ubChwLT5mbG93aW5mbyk7DQorCWlmICghKHAtPmZsYWdzJklQNl9UTkxf Rl9VU0VfT1JJR19GTE9XTEFCRUwpKQ0KKwkJZmwtPmZsNl9mbG93bGFiZWwg fD0gSVBWNl9GTE9XTEFCRUxfTUFTSyAmIGh0b25sKHAtPmZsb3dpbmZvKTsN CisNCisJaXA2X3RubF9zZXRfY2FwKHQpOw0KKw0KKwlpZiAocC0+ZmxhZ3Mm SVA2X1ROTF9GX0NBUF9YTUlUICYmIHAtPmZsYWdzJklQNl9UTkxfRl9DQVBf UkNWKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7DQorCWVs c2UNCisJCWRldi0+ZmxhZ3MgJj0gfklGRl9QT0lOVE9QT0lOVDsNCisNCisJ aWYgKHAtPmZsYWdzICYgSVA2X1ROTF9GX0NBUF9YTUlUKSB7DQorCQlzdHJ1 Y3QgcnQ2X2luZm8gKnJ0ID0gcnQ2X2xvb2t1cCgmcC0+cmFkZHIsICZwLT5s YWRkciwNCisJCQkJCQkgcC0+bGluaywgMCk7DQorCQlpZiAocnQpIHsNCisJ CQlzdHJ1Y3QgbmV0X2RldmljZSAqcnRkZXY7DQorCQkJaWYgKCEocnRkZXYg PSBydC0+cnQ2aV9kZXYpIHx8DQorCQkJICAgIHJ0ZGV2LT50eXBlID09IEFS UEhSRF9UVU5ORUw2KSB7DQorCQkJCS8qIGFzIGxvbmcgYXMgdHVubmVscyB1 c2UgdGhlIHNhbWUgc29ja2V0IA0KKwkJCQkgICBmb3IgdHJhbnNtaXNzaW9u LCBsb2NhbGx5IG5lc3RlZCB0dW5uZWxzIA0KKwkJCQkgICB3b24ndCB3b3Jr ICovDQorCQkJCWRzdF9yZWxlYXNlKCZydC0+dS5kc3QpOw0KKwkJCQlnb3Rv IG5vX2xpbms7DQorCQkJfSBlbHNlIHsNCisJCQkJZGV2LT5pZmxpbmsgPSBy dGRldi0+aWZpbmRleDsNCisJCQkJZGV2LT5oYXJkX2hlYWRlcl9sZW4gPSBy dGRldi0+aGFyZF9oZWFkZXJfbGVuICsNCisJCQkJCXNpemVvZiAoc3RydWN0 IGlwdjZoZHIpOw0KKwkJCQlkZXYtPm10dSA9IHJ0ZGV2LT5tdHUgLSBzaXpl b2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCQkJaWYgKGRldi0+bXR1IDwgSVBW Nl9NSU5fTVRVKQ0KKwkJCQkJZGV2LT5tdHUgPSBJUFY2X01JTl9NVFU7DQor CQkJCQ0KKwkJCQlkc3RfcmVsZWFzZSgmcnQtPnUuZHN0KTsNCisJCQl9DQor CQl9DQorCX0gZWxzZSB7DQorCW5vX2xpbms6DQorCQlkZXYtPmlmbGluayA9 IDA7DQorCQlkZXYtPmhhcmRfaGVhZGVyX2xlbiA9IExMX01BWF9IRUFERVIg KyBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCWRldi0+bXR1ID0gRVRI X0RBVEFfTEVOIC0gc2l6ZW9mIChzdHJ1Y3QgaXB2Nmhkcik7DQorCX0NCit9 DQorDQorLyoqDQorICogaXA2aXA2X3RubF9jaGFuZ2UgLSB1cGRhdGUgdGhl IHR1bm5lbCBwYXJhbWV0ZXJzDQorICogICBAdDogdHVubmVsIHRvIGJlIGNo YW5nZWQNCisgKiAgIEBwOiB0dW5uZWwgY29uZmlndXJhdGlvbiBwYXJhbWV0 ZXJzDQorICogICBAYWN0aXZlOiAhPSAwIGlmIHR1bm5lbCBpcyByZWFkeSBm b3IgdXNlDQorICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIGlwNmlwNl90 bmxfY2hhbmdlKCkgdXBkYXRlcyB0aGUgdHVubmVsIHBhcmFtZXRlcnMNCisg KiovDQorDQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfY2hhbmdlKHN0cnVj dCBpcDZfdG5sICp0LCBzdHJ1Y3QgaXA2X3RubF9wYXJtICpwKQ0KK3sNCisJ aXB2Nl9hZGRyX2NvcHkoJnQtPnBhcm1zLmxhZGRyLCAmcC0+bGFkZHIpOw0K KwlpcHY2X2FkZHJfY29weSgmdC0+cGFybXMucmFkZHIsICZwLT5yYWRkcik7 DQorCXQtPnBhcm1zLmZsYWdzID0gcC0+ZmxhZ3M7DQorCXQtPnBhcm1zLmhv cF9saW1pdCA9IChwLT5ob3BfbGltaXQgPD0gMjU1ID8gcC0+aG9wX2xpbWl0 IDogLTEpOw0KKwl0LT5wYXJtcy5lbmNhcF9saW1pdCA9IHAtPmVuY2FwX2xp bWl0Ow0KKwl0LT5wYXJtcy5mbG93aW5mbyA9IHAtPmZsb3dpbmZvOw0KKwlp cDZpcDZfdG5sX2xpbmtfY29uZmlnKHQpOw0KKwlyZXR1cm4gMDsNCit9DQor DQorLyoqDQorICogaXA2aXA2X3RubF9pb2N0bCAtIGNvbmZpZ3VyZSBpcHY2 IHR1bm5lbHMgZnJvbSB1c2Vyc3BhY2UgDQorICogICBAZGV2OiB2aXJ0dWFs IGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICogICBAaWZyOiBw YXJhbWV0ZXJzIHBhc3NlZCBmcm9tIHVzZXJzcGFjZQ0KKyAqICAgQGNtZDog Y29tbWFuZCB0byBiZSBwZXJmb3JtZWQNCisgKg0KKyAqIERlc2NyaXB0aW9u Og0KKyAqICAgaXA2aXA2X3RubF9pb2N0bCgpIGlzIHVzZWQgZm9yIG1hbmFn aW5nIElQdjYgdHVubmVscyANCisgKiAgIGZyb20gdXNlcnNwYWNlLiANCisg Kg0KKyAqICAgVGhlIHBvc3NpYmxlIGNvbW1hbmRzIGFyZSB0aGUgZm9sbG93 aW5nOg0KKyAqICAgICAlU0lPQ0dFVFRVTk5FTDogZ2V0IHR1bm5lbCBwYXJh bWV0ZXJzIGZvciBkZXZpY2UNCisgKiAgICAgJVNJT0NBRERUVU5ORUw6IGFk ZCB0dW5uZWwgbWF0Y2hpbmcgZ2l2ZW4gdHVubmVsIHBhcmFtZXRlcnMNCisg KiAgICAgJVNJT0NDSEdUVU5ORUw6IGNoYW5nZSB0dW5uZWwgcGFyYW1ldGVy cyB0byB0aG9zZSBnaXZlbg0KKyAqICAgICAlU0lPQ0RFTFRVTk5FTDogZGVs ZXRlIHR1bm5lbA0KKyAqDQorICogICBUaGUgZmFsbGJhY2sgZGV2aWNlICJp cDZ0bmwwIiwgY3JlYXRlZCBkdXJpbmcgbW9kdWxlIA0KKyAqICAgaW5pdGlh bGl6YXRpb24sIGNhbiBiZSB1c2VkIGZvciBjcmVhdGluZyBvdGhlciB0dW5u ZWwgZGV2aWNlcy4NCisgKg0KKyAqIFJldHVybjoNCisgKiAgIDAgb24gc3Vj Y2VzcywNCisgKiAgICUtRUZBVUxUIGlmIHVuYWJsZSB0byBjb3B5IGRhdGEg dG8gb3IgZnJvbSB1c2Vyc3BhY2UsDQorICogICAlLUVQRVJNIGlmIGN1cnJl bnQgcHJvY2VzcyBoYXNuJ3QgJUNBUF9ORVRfQURNSU4gc2V0DQorICogICAl LUVJTlZBTCBpZiBwYXNzZWQgdHVubmVsIHBhcmFtZXRlcnMgYXJlIGludmFs aWQsDQorICogICAlLUVFWElTVCBpZiBjaGFuZ2luZyBhIHR1bm5lbCdzIHBh cmFtZXRlcnMgd291bGQgY2F1c2UgYSBjb25mbGljdA0KKyAqICAgJS1FTk9E RVYgaWYgYXR0ZW1wdGluZyB0byBjaGFuZ2Ugb3IgZGVsZXRlIGEgbm9uZXhp c3RpbmcgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZf dG5sX2lvY3RsKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYsIHN0cnVjdCBpZnJl cSAqaWZyLCBpbnQgY21kKQ0KK3sNCisJaW50IGVyciA9IDA7DQorCWludCBj cmVhdGU7DQorCXN0cnVjdCBpcDZfdG5sX3Bhcm0gcDsNCisJc3RydWN0IGlw Nl90bmwgKnQgPSBOVUxMOw0KKw0KKwlzd2l0Y2ggKGNtZCkgew0KKwljYXNl IFNJT0NHRVRUVU5ORUw6DQorCQlpZiAoZGV2ID09ICZpcDZpcDZfZmJfdG5s X2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21fdXNlcigmcCwNCisJCQkJCSAg IGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLA0KKwkJCQkJICAgc2l6ZW9mIChw KSkpIHsNCisJCQkJZXJyID0gLUVGQVVMVDsNCisJCQkJYnJlYWs7DQorCQkJ fQ0KKwkJCWlmICgoZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAsICZ0LCAw KSkgPT0gLUVOT0RFVikNCisJCQkJdCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBk ZXYtPnByaXY7DQorCQkJZWxzZSBpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJ CX0gZWxzZQ0KKwkJCXQgPSAoc3RydWN0IGlwNl90bmwgKikgZGV2LT5wcml2 Ow0KKw0KKwkJbWVtY3B5KCZwLCAmdC0+cGFybXMsIHNpemVvZiAocCkpOw0K KwkJaWYgKGNvcHlfdG9fdXNlcihpZnItPmlmcl9pZnJ1LmlmcnVfZGF0YSwg JnAsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJCX0N CisJCWJyZWFrOw0KKwljYXNlIFNJT0NBRERUVU5ORUw6DQorCWNhc2UgU0lP Q0NIR1RVTk5FTDoNCisJCWVyciA9IC1FUEVSTTsNCisJCWNyZWF0ZSA9IChj bWQgPT0gU0lPQ0FERFRVTk5FTCk7DQorCQlpZiAoIWNhcGFibGUoQ0FQX05F VF9BRE1JTikpDQorCQkJYnJlYWs7DQorCQlpZiAoY29weV9mcm9tX3VzZXIo JnAsIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLCBzaXplb2YgKHApKSkgew0K KwkJCWVyciA9IC1FRkFVTFQ7DQorCQkJYnJlYWs7DQorCQl9DQorCQlpZiAo IWNyZWF0ZSAmJiBkZXYgIT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQkJ dCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQlp ZiAoIXQgJiYgKGVyciA9IGlwNmlwNl90bmxfbG9jYXRlKCZwLCAmdCwgY3Jl YXRlKSkpIHsNCisJCQlicmVhazsNCisJCX0NCisJCWlmIChjbWQgPT0gU0lP Q0NIR1RVTk5FTCkgew0KKwkJCWlmICh0LT5kZXYgIT0gZGV2KSB7DQorCQkJ CWVyciA9IC1FRVhJU1Q7DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJCQlpcDZp cDZfdG5sX3VubGluayh0KTsNCisJCQllcnIgPSBpcDZpcDZfdG5sX2NoYW5n ZSh0LCAmcCk7DQorCQkJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkJCW5ldGRl dl9zdGF0ZV9jaGFuZ2UoZGV2KTsNCisJCX0NCisJCWlmIChjb3B5X3RvX3Vz ZXIoaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCSAmdC0+cGFybXMs IHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJCX0gZWxz ZSB7DQorCQkJZXJyID0gMDsNCisJCX0NCisJCWJyZWFrOw0KKwljYXNlIFNJ T0NERUxUVU5ORUw6DQorCQllcnIgPSAtRVBFUk07DQorCQlpZiAoIWNhcGFi bGUoQ0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorDQorCQlpZiAoZGV2 ID09ICZpcDZpcDZfZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21f dXNlcigmcCwgaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCQkgICBz aXplb2YgKHApKSkgew0KKwkJCQllcnIgPSAtRUZBVUxUOw0KKwkJCQlicmVh azsNCisJCQl9DQorCQkJZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAsICZ0 LCAwKTsNCisJCQlpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJCQlpZiAodCA9 PSAmaXA2aXA2X2ZiX3RubCkgew0KKwkJCQllcnIgPSAtRVBFUk07DQorCQkJ CWJyZWFrOw0KKwkJCX0NCisJCX0gZWxzZSB7DQorCQkJdCA9IChzdHJ1Y3Qg aXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQllcnIgPSBpcDZfdG5s X2Rlc3Ryb3kodCk7DQorCQlicmVhazsNCisJZGVmYXVsdDoNCisJCWVyciA9 IC1FSU5WQUw7DQorCX0NCisJcmV0dXJuIGVycjsNCit9DQorDQorLyoqDQor ICogaXA2aXA2X3RubF9nZXRfc3RhdHMgLSByZXR1cm4gdGhlIHN0YXRzIGZv ciB0dW5uZWwgZGV2aWNlIA0KKyAqICAgQGRldjogdmlydHVhbCBkZXZpY2Ug YXNzb2NpYXRlZCB3aXRoIHR1bm5lbA0KKyAqDQorICogUmV0dXJuOiBzdGF0 cyBmb3IgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBzdHJ1Y3QgbmV0X2Rl dmljZV9zdGF0cyAqDQoraXA2aXA2X3RubF9nZXRfc3RhdHMoc3RydWN0IG5l dF9kZXZpY2UgKmRldikNCit7DQorCXJldHVybiAmKCgoc3RydWN0IGlwNl90 bmwgKikgZGV2LT5wcml2KS0+c3RhdCk7DQorfQ0KKw0KKy8qKg0KKyAqIGlw NmlwNl90bmxfY2hhbmdlX210dSAtIGNoYW5nZSBtdHUgbWFudWFsbHkgZm9y IHR1bm5lbCBkZXZpY2UNCisgKiAgIEBkZXY6IHZpcnR1YWwgZGV2aWNlIGFz c29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiAgIEBuZXdfbXR1OiB0aGUgbmV3 IG10dQ0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgMCBvbiBzdWNjZXNzLA0K KyAqICAgJS1FSU5WQUwgaWYgbXR1IHRvbyBzbWFsbA0KKyAqKi8NCisNCitz dGF0aWMgaW50DQoraXA2aXA2X3RubF9jaGFuZ2VfbXR1KHN0cnVjdCBuZXRf ZGV2aWNlICpkZXYsIGludCBuZXdfbXR1KQ0KK3sNCisJaWYgKG5ld19tdHUg PCBJUFY2X01JTl9NVFUpIHsNCisJCXJldHVybiAtRUlOVkFMOw0KKwl9DQor CWRldi0+bXR1ID0gbmV3X210dTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KKy8q Kg0KKyAqIGlwNmlwNl90bmxfZGV2X2luaXRfZ2VuIC0gZ2VuZXJhbCBpbml0 aWFsaXplciBmb3IgYWxsIHR1bm5lbCBkZXZpY2VzDQorICogICBAZGV2OiB2 aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICoNCisg KiBEZXNjcmlwdGlvbjoNCisgKiAgIFNldCBmdW5jdGlvbiBwb2ludGVycyBh bmQgaW5pdGlhbGl6ZSB0aGUgJnN0cnVjdCBmbG93aSB0ZW1wbGF0ZSB1c2Vk DQorICogICBieSB0aGUgdHVubmVsLg0KKyAqKi8NCisNCitzdGF0aWMgdm9p ZA0KK2lwNmlwNl90bmxfZGV2X2luaXRfZ2VuKHN0cnVjdCBuZXRfZGV2aWNl ICpkZXYpDQorew0KKwlzdHJ1Y3QgaXA2X3RubCAqdCA9IChzdHJ1Y3QgaXA2 X3RubCAqKSBkZXYtPnByaXY7DQorCXN0cnVjdCBmbG93aSAqZmwgPSAmdC0+ Zmw7DQorDQorCW1lbXNldChmbCwgMCwgc2l6ZW9mICgqZmwpKTsNCisJZmwt PnByb3RvID0gSVBQUk9UT19JUFY2Ow0KKw0KKwlkZXYtPmRlc3RydWN0b3Ig PSBpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yOw0KKwlkZXYtPnVuaW5pdCA9 IGlwNmlwNl90bmxfZGV2X3VuaW5pdDsNCisJZGV2LT5oYXJkX3N0YXJ0X3ht aXQgPSBpcDZpcDZfdG5sX3htaXQ7DQorCWRldi0+Z2V0X3N0YXRzID0gaXA2 aXA2X3RubF9nZXRfc3RhdHM7DQorCWRldi0+ZG9faW9jdGwgPSBpcDZpcDZf dG5sX2lvY3RsOw0KKwlkZXYtPmNoYW5nZV9tdHUgPSBpcDZpcDZfdG5sX2No YW5nZV9tdHU7DQorCWRldi0+dHlwZSA9IEFSUEhSRF9UVU5ORUw2Ow0KKwlk ZXYtPmZsYWdzIHw9IElGRl9OT0FSUDsNCisJaWYgKGlwdjZfYWRkcl90eXBl KCZ0LT5wYXJtcy5yYWRkcikgJiBJUFY2X0FERFJfVU5JQ0FTVCAmJg0KKwkg ICAgaXB2Nl9hZGRyX3R5cGUoJnQtPnBhcm1zLmxhZGRyKSAmIElQVjZfQURE Ul9VTklDQVNUKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7 DQorCS8qIEhtbS4uLiBNQVhfQUREUl9MRU4gaXMgOCwgc28gdGhlIGlwdjYg YWRkcmVzc2VzIGNhbid0IGJlIA0KKwkgICBjb3BpZWQgdG8gZGV2LT5kZXZf YWRkciBhbmQgZGV2LT5icm9hZGNhc3QsIGxpa2UgdGhlIGlwdjQNCisJICAg YWRkcmVzc2VzIHdlcmUgaW4gaXBpcC5jLCBpcF9ncmUuYyBhbmQgc2l0LmMu ICovDQorCWRldi0+YWRkcl9sZW4gPSAwOw0KK30NCisNCisvKioNCisgKiBp cDZpcDZfdG5sX2Rldl9pbml0IC0gaW5pdGlhbGl6ZXIgZm9yIGFsbCBub24g ZmFsbGJhY2sgdHVubmVsIGRldmljZXMNCisgKiAgIEBkZXY6IHZpcnR1YWwg ZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiovDQorDQorc3Rh dGljIGludA0KK2lwNmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZp Y2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0cnVjdCBp cDZfdG5sICopIGRldi0+cHJpdjsNCisJaXA2aXA2X3RubF9kZXZfaW5pdF9n ZW4oZGV2KTsNCisJaXA2aXA2X3RubF9saW5rX2NvbmZpZyh0KTsNCisJcmV0 dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9mYl90bmxfZGV2X2lu aXQgLSBpbml0aWFsaXplciBmb3IgZmFsbGJhY2sgdHVubmVsIGRldmljZQ0K KyAqICAgQGRldjogZmFsbGJhY2sgZGV2aWNlDQorICoNCisgKiBSZXR1cm46 IDANCisgKiovDQorDQoraW50IGlwNmlwNl9mYl90bmxfZGV2X2luaXQoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCWlwNmlwNl90bmxfZGV2X2lu aXRfZ2VuKGRldik7DQorCXRubHNfd2NbMF0gPSAmaXA2aXA2X2ZiX3RubDsN CisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyBzdHJ1Y3QgaW5ldDZfcHJv dG9jb2wgaXA2aXA2X3Byb3RvY29sID0gew0KKwkuaGFuZGxlciA9IGlwNmlw Nl9yY3YsDQorCS5lcnJfaGFuZGxlciA9IGlwNmlwNl9lcnIsDQorCS5mbGFn cyA9IElORVQ2X1BST1RPX0ZJTkFMDQorfTsNCisNCisvKioNCisgKiBpcDZf dHVubmVsX2luaXQgLSByZWdpc3RlciBwcm90b2NvbCBhbmQgcmVzZXJ2ZSBu ZWVkZWQgcmVzb3VyY2VzDQorICoNCisgKiBSZXR1cm46IDAgb24gc3VjY2Vz cw0KKyAqKi8NCisNCitpbnQgX19pbml0IGlwNl90dW5uZWxfaW5pdCh2b2lk KQ0KK3sNCisJaW50IGksIGosIGVycjsNCisJc3RydWN0IHNvY2sgKnNrOw0K KwlzdHJ1Y3QgaXB2Nl9waW5mbyAqbnA7DQorDQorCWlwNmlwNl9mYl90bmxf ZGV2LnByaXYgPSAodm9pZCAqKSAmaXA2aXA2X2ZiX3RubDsNCisNCisJZm9y IChpID0gMDsgaSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFjcHVfcG9z c2libGUoaSkpDQorCQkJY29udGludWU7DQorDQorCQllcnIgPSBzb2NrX2Ny ZWF0ZShQRl9JTkVUNiwgU09DS19SQVcsIElQUFJPVE9fSVBWNiwgDQorCQkJ CSAgJl9faXA2X3NvY2tldFtpXSk7DQorCQlpZiAoZXJyIDwgMCkgew0KKwkJ CXByaW50ayhLRVJOX0VSUiANCisJCQkgICAgICAgIkZhaWxlZCB0byBjcmVh dGUgdGhlIElQdjYgdHVubmVsIHNvY2tldCAiDQorCQkJICAgICAgICIoZXJy ICVkKS5cbiIsIA0KKwkJCSAgICAgICBlcnIpOw0KKwkJCWdvdG8gZmFpbDsN CisJCX0NCisJCXNrID0gX19pcDZfc29ja2V0W2ldLT5zazsNCisJCXNrLT5h bGxvY2F0aW9uID0gR0ZQX0FUT01JQzsNCisNCisJCW5wID0gaW5ldDZfc2so c2spOw0KKwkJbnAtPmhvcF9saW1pdCA9IDI1NTsNCisJCW5wLT5tY19sb29w ID0gMDsNCisNCisJCXNrLT5wcm90LT51bmhhc2goc2spOw0KKwl9DQorCWlm ICgoZXJyID0gaW5ldDZfYWRkX3Byb3RvY29sKCZpcDZpcDZfcHJvdG9jb2ws IElQUFJPVE9fSVBWNikpIDwgMCkgew0KKwkJcHJpbnRrKEtFUk5fRVJSICJG YWlsZWQgdG8gcmVnaXN0ZXIgSVB2NiBwcm90b2NvbFxuIik7DQorCQlnb3Rv IGZhaWw7DQorCX0NCisNCisJU0VUX01PRFVMRV9PV05FUigmaXA2aXA2X2Zi X3RubF9kZXYpOw0KKwlyZWdpc3Rlcl9uZXRkZXYoJmlwNmlwNl9mYl90bmxf ZGV2KTsNCisNCisJcmV0dXJuIDA7DQorZmFpbDoNCisJZm9yIChqID0gMDsg aiA8IGk7IGorKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaikpDQorCQkJ Y29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0W2pdKTsN CisJCV9faXA2X3NvY2tldFtqXSA9IE5VTEw7DQorCX0NCisJcmV0dXJuIGVy cjsNCit9DQorDQorLyoqDQorICogaXA2X3R1bm5lbF9jbGVhbnVwIC0gZnJl ZSByZXNvdXJjZXMgYW5kIHVucmVnaXN0ZXIgcHJvdG9jb2wNCisgKiovDQor DQordm9pZCBpcDZfdHVubmVsX2NsZWFudXAodm9pZCkNCit7DQorCWludCBp Ow0KKw0KKwl1bnJlZ2lzdGVyX25ldGRldigmaXA2aXA2X2ZiX3RubF9kZXYp Ow0KKw0KKwlpbmV0Nl9kZWxfcHJvdG9jb2woJmlwNmlwNl9wcm90b2NvbCwg SVBQUk9UT19JUFY2KTsNCisNCisJZm9yIChpID0gMDsgaSA8IE5SX0NQVVM7 IGkrKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaSkpDQorCQkJY29udGlu dWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0W2ldKTsNCisJCV9f aXA2X3NvY2tldFtpXSA9IE5VTEw7DQorCX0NCit9DQorDQorI2lmZGVmIE1P RFVMRQ0KK21vZHVsZV9pbml0KGlwNl90dW5uZWxfaW5pdCk7DQorbW9kdWxl X2V4aXQoaXA2X3R1bm5lbF9jbGVhbnVwKTsNCisjZW5kaWYNCmRpZmYgLU51 ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhjbHVk ZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L2lwdjZfc3ltcy5jIG1l cmdlLTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYw0KLS0tIGxpbnV4LTIuNS9u ZXQvaXB2Ni9pcHY2X3N5bXMuYwlXZWQgSnVuICA0IDEzOjQzOjA5IDIwMDMN CisrKyBtZXJnZS0yLjUvbmV0L2lwdjYvaXB2Nl9zeW1zLmMJV2VkIE1heSAy OCAyMToxMjowMiAyMDAzDQpAQCAtMzgsMyArMzgsMTEgQEANCiBFWFBPUlRf U1lNQk9MKGlwNl9mb3VuZF9uZXh0aGRyKTsNCiBFWFBPUlRfU1lNQk9MKHhm cm02X3Jjdik7DQogRVhQT1JUX1NZTUJPTCh4ZnJtNl9jbGVhcl9tdXRhYmxl X29wdGlvbnMpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUxfTU9EVUxF DQorRVhQT1JUX1NZTUJPTChydDZfbG9va3VwKTsNCitFWFBPUlRfU1lNQk9M KGZsNl9zb2NrX2xvb2t1cCk7DQorRVhQT1JUX1NZTUJPTChpcHY2X2V4dF9o ZHIpOw0KK0VYUE9SVF9TWU1CT0woaXA2X2FwcGVuZF9kYXRhKTsNCitFWFBP UlRfU1lNQk9MKGlwNl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyk7DQorRVhQT1JU X1NZTUJPTChpcDZfcHVzaF9wZW5kaW5nX2ZyYW1lcyk7DQorI2VuZGlmDQpk aWZmIC1OdXIgLS1leGNsdWRlPVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAt LWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4LTIuNS9uZXQvbmV0c3ltcy5jIG1l cmdlLTIuNS9uZXQvbmV0c3ltcy5jDQotLS0gbGludXgtMi41L25ldC9uZXRz eW1zLmMJV2VkIEp1biAgNCAxMzo0MzoxMCAyMDAzDQorKysgbWVyZ2UtMi41 L25ldC9uZXRzeW1zLmMJV2VkIE1heSAyOCAyMToxMjowMiAyMDAzDQpAQCAt NDc3LDggKzQ3NywxMCBAQA0KIEVYUE9SVF9TWU1CT0woc3lzY3RsX21heF9z eW5fYmFja2xvZyk7DQogI2VuZGlmDQogDQotRVhQT1JUX1NZTUJPTChpcF9n ZW5lcmljX2dldGZyYWcpOw0KKyNlbmRpZg0KIA0KKyNpZiBkZWZpbmVkIChD T05GSUdfSVBWNl9NT0RVTEUpIHx8IGRlZmluZWQgKENPTkZJR19JUF9TQ1RQ X01PRFVMRSkgfHwgZGVmaW5lZCAoQ09ORklHX0lQVjZfVFVOTkVMX01PRFVM RSkNCitFWFBPUlRfU1lNQk9MKGlwX2dlbmVyaWNfZ2V0ZnJhZyk7DQogI2Vu ZGlmDQogDQogRVhQT1JUX1NZTUJPTCh0Y3BfcmVhZF9zb2NrKTsNCg== ---377318441-1269789112-1054730402=:26066-- From lpetande@tml.hut.fi Wed Jun 4 07:23:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 07:23:58 -0700 (PDT) Received: from smtp-4.hut.fi (root@smtp-4.hut.fi [130.233.228.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54ENM2x003731 for ; Wed, 4 Jun 2003 07:23:43 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-4.hut.fi (8.12.9/8.12.9) with ESMTP id h54EMgDD011293; Wed, 4 Jun 2003 17:22:42 +0300 Message-ID: <3EDE0286.4000304@tml.hut.fi> Date: Wed, 04 Jun 2003 17:30:30 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> <20030603.213830.85382657.davem@redhat.com> In-Reply-To: <20030603.213830.85382657.davem@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-4.hut.fi) X-DCC-HUTCC-Metrics: smtp-4.hut.fi 1165; Body=9 Fuz1=9 Fuz2=9 X-archive-position: 2877 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Content-Length: 3566 Lines: 81 David S. Miller wrote: > I am VERY UPSET that there appears to be NO dialogue between USAGI and > MIPV6 folks to discuss design of MIPV6. If you do not talk together, > how can you guys possibly coordinate efforts and not avoid duplicated > work? I am sorry about the long delay in the extension header addition mechanism discussion. The issues involved with interactions between xfrm stacked destinations and mipv6 were at least to me somewhat fuzzy. To understand them better we developed a prototype of the mipv6 extension header addition and processing. This unfortunately took too long, due to other work. In any case, I hope we can all work better together from now on. In hopes of starting a working dialogue, I'll try to summarize the current situation on our side. 1. Tunnel The tunneling code should be ready, Ville just sent a patch without dependancies on source address based routing. All received and future comments about that are highly appreciated. 2. Source address based routing Ville sent the patch. The semantical changes to the original code were in our opinion necessary to get source address based routing working more as IPv4 policy routing. Let's discuss this more. 3. API We've added kernel support for the API to accept routing header type 2 and to do the additional checks necessary. Also home address option API support has been written. In fact, you can add any destination option to the new DO position with this. There is a (sub)group working on MIPv6 extensions to the Advanced Socket API for IPv6. To me it seems pointless to add anything other to the spec than a way to insert a destination option header to the third possible DO position (i.e. between routing header and fragmentation header). This could be done just by adding new type, let's call it IPV6_NOFRAGDSTOPTS. Everything else should be doable with the existing ASA. We would like to hear comments on this. Is a rtnetlink extension enough for adding mobility routes or do we need to support ioctl too? 4. Source address selection We think adding new home address flag to addresses is the best and easiest way making the source address selection to work with MIPv6. I'm sure USAGI will add the relevant checks to their source address selection code for that. Dave, Antti already brought this up some weeks ago, but got no answer. Is the home address bit OK with you? 5. MIPv6 extension header adding We have been also testing how the mipv6 extension header adding would work in practice through the development of a prototype for the purpose. Based on the work it seems (to me) that the use of xfrm for storing the mipv6 stuff conflicts with its primary use, especially if there are overlapping entries for IPSec and MIPv6. Storing of the mipv6 information would in our opinion be achieved more cleanly by using cached routes which included the mipv6 information (two extra addresses and flags). The routes would contain modified nexthop information and mip6_output as the rt->u.dst.output function. Mip6_output would add the extension headers based on the information stored in the route. The routes would have a stacked dst entry, which would be used for actual output. Our prototype currently works with tcp, tcp + ipsec and raw sockets, but has only a hackish interface through route ioctl for testing. I can send a preview patch of the code for discussion, if the general approach makes sense to you. I would like to hear your opinions on this and also if you (USAGI) have planned something else for storing the mipv6 state. Henrik From peter@bieringer.de Wed Jun 4 08:24:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:24:37 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54FOH2x008182 for ; Wed, 4 Jun 2003 08:24:18 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 30110) id 88F0D1387A; Wed, 4 Jun 2003 16:53:50 +0200 (CEST) From: "Dr. Peter Bieringer " To: "Maillist netdev" Cc: "Maillist USAGI-users" Subject: Compatibility problems IPsec 2.5.70 against FreeS/WAN 1.99 Date: Wed, 04 Jun 2003 16:53:50 +0200 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <20030604145350.88F0D1387A@smtp2.aerasec.de> X-archive-position: 2878 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Content-Length: 1275 Lines: 38 Hi, has anyone successful examples of configuration settings for 2.5.70 IPsec (racoon/SAD/SPD) and FreeS/WAN? I got no success between 2 hosts, neither in tunnel nor in transport mode. (racoon and pluto config looks like ok, the IPsec-SA was proper established, also both hosts send packets with related spi). In transport mode, the comment of Andreas came true that in the ESP packet an IP-in-IP tunnel packet is transported (sent from the 2.5.70-ipsec host): 16:42:06.215546 [|ip] 0x0000 45 E 16:42:08.215348 [|ip] 0x0000 4500 0007 0004 40 E.....@ Looks like FreeS/WAN don't like this. In tunnel mode, ipsec0 interface of FreeS/WAN drops all received packages by the 2.5.70-ipsec host (seen in ifconfig stat). On 2.5.70-ipsec side I currently don't know how to debug, but I only see the ESP packet on the interface, nothing decrpyted. Very strange at all... Any hints available how to let FreeS/WAN communicate with 2.5.70-ipsec? Thank you very much, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From peter@bieringer.de Wed Jun 4 08:40:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:40:15 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Fe92x008892 for ; Wed, 4 Jun 2003 08:40:10 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 30110) id 5AA111387A; Wed, 4 Jun 2003 17:40:03 +0200 (CEST) From: "Dr. Peter Bieringer " To: "Maillist netdev" Cc: "Maillist USAGI-users" Subject: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes Date: Wed, 04 Jun 2003 17:40:03 +0200 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <20030604154003.5AA111387A@smtp2.aerasec.de> X-archive-position: 2879 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Content-Length: 6999 Lines: 152 Hi, is this helpful? Happen on playing around with IPsec on 2.5.70, caused by a ping to a destination (1.2.3.4) in IPsec topology. Jun 4 17:41:31 racoonhost kernel: ------------[ cut here ]------------ Jun 4 17:41:31 racoonhost kernel: kernel BUG at net/xfrm/xfrm_policy.c:185! Jun 4 17:41:31 racoonhost kernel: invalid operand: 0000 [#1] Jun 4 17:41:31 racoonhost kernel: CPU: 0 Jun 4 17:41:31 racoonhost kernel: EIP: 0060:[] Tainted: P Jun 4 17:41:31 racoonhost kernel: EFLAGS: 00010246 Jun 4 17:41:31 racoonhost kernel: eax: c6f80a01 ebx: c1b45000 ecx: c6f80a80 edx: c1b45000 Jun 4 17:41:31 racoonhost kernel: esi: c1b45000 edi: 00000000 ebp: c6f80a80 esp: c0985d04 Jun 4 17:41:31 racoonhost kernel: ds: 007b es: 007b ss: 0068 Jun 4 17:41:31 racoonhost kernel: Process ping (pid: 23407, threadinfo=c0984000 task=c4e6c6a0) Jun 4 17:41:31 racoonhost kernel: Stack: c0985ddc c022d09d c1b45000 c0985ddc 00000002 0000002e 00000001 c6f80a80 Jun 4 17:41:31 racoonhost kernel: c1b45000 c0a79d80 c016eff7 fd010018 c6d9b900 c027c7e0 1f3e030a 00000000 Jun 4 17:41:31 racoonhost kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 4 17:41:31 racoonhost kernel: Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Jun 4 17:41:31 racoonhost kernel: Code: 0f 0b b9 00 89 49 25 c0 8b 8b c8 00 00 00 85 c9 74 08 0f 0b Jun 4 17:41:33 racoonhost kernel: ------------[ cut here ]------------ Jun 4 17:41:33 racoonhost kernel: kernel BUG at net/xfrm/xfrm_policy.c:185! Jun 4 17:41:33 racoonhost kernel: invalid operand: 0000 [#2] Jun 4 17:41:33 racoonhost kernel: CPU: 0 Jun 4 17:41:33 racoonhost kernel: EIP: 0060:[] Tainted: P Jun 4 17:41:33 racoonhost kernel: EFLAGS: 00010246 Jun 4 17:41:33 racoonhost kernel: eax: c6f80a01 ebx: c1b45000 ecx: c6f80a80 edx: c1b45000 Jun 4 17:41:33 racoonhost kernel: esi: 00000002 edi: c1b45000 ebp: c6f80a80 esp: c094bd04 Jun 4 17:41:33 racoonhost kernel: ds: 007b es: 007b ss: 0068 Jun 4 17:41:33 racoonhost kernel: Process ping (pid: 23408, threadinfo=c094a000 task=c4e6c6a0) Jun 4 17:41:33 racoonhost kernel: Stack: c1b45000 c022d09d c1b45000 c2d38ab0 00000002 0000002e c7ee1f00 c6f80a80 Jun 4 17:41:33 racoonhost kernel: c1b45000 c0a79d80 c016eff7 c016f045 c7ee1f00 c7ee3800 00000000 c7ee1f00 Jun 4 17:41:33 racoonhost kernel: c7eb2100 c7ece494 c01767ee c2d38ab0 c0d11424 c2d38ab0 00000000 00000000 Jun 4 17:41:33 racoonhost kernel: Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Jun 4 17:41:33 racoonhost kernel: Code: 0f 0b b9 00 89 49 25 c0 8b 8b c8 00 00 00 85 c9 74 08 0f 0b Btw: ping segfaults...that is not good because ping is usually with suid bit set installed: # stat `which ping` File: "/bin/ping" Size: 35192 Blocks: 72 IO Block: -4611693715008778240 Regular File Device: 303h/771d Inode: 128458 Links: 1 Access: (4755/-rwsr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: Wed Jun 4 17:43:44 2003 Modify: Thu Apr 18 23:40:02 2002 Change: Tue Nov 5 18:25:31 2002 # strace ping 1.2.3.4 execve("/bin/ping", ["ping", "1.2.3.4"], [/* 29 vars */]) = 0 uname({sys="Linux", node="racoonhost.lab.aerasec.de", ...}) = 0 brk(0) = 0x8063000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=14186, ...}) = 0 old_mmap(NULL, 14186, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40014000 close(3) = 0 open("/lib/libresolv.so.2", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\'\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=68925, ...}) = 0 old_mmap(NULL, 69408, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40018000 mprotect(0x40026000, 12064, PROT_NONE) = 0 old_mmap(0x40026000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe000) = 0x40026000 old_mmap(0x40027000, 7968, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40027000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0Pv\1B4\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=1402035, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40029000 old_mmap(0x42000000, 1264960, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x42000000 mprotect(0x4212c000, 36160, PROT_NONE) = 0 old_mmap(0x4212c000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12c000) = 0x4212c000 old_mmap(0x42131000, 15680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x42131000 close(3) = 0 munmap(0x40014000, 14186) = 0 brk(0) = 0x8063000 brk(0x8063030) = 0x8063030 brk(0x8064000) = 0x8064000 socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 getuid32() = 0 setuid32(0) = 0 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 connect(4, {sin_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("1.2.3.4")}}, 16 +++ killed by SIGSEGV +++ # rpm -qf `which ping` iputils-20020124-3 # rpm -qi iputils-20020124-3 Name : iputils Relocations: /usr Version : 20020124 Vendor: Red Hat, Inc. Release : 3 Build Date: Thu 18 Apr 2002 11:40:05 PM CEST Install date: Tue 05 Nov 2002 06:25:31 PM CET Build Host: stripples.devel.redhat.com Group : System Environment/Daemons Source RPM: iputils-20020124-3.src.rpm Size : 188776 License: BSD Packager : Red Hat, Inc. Summary : Network monitoring tools including ping. Description : The iputils package contains basic utilities for monitoring a network, including ping. The ping command sends a series of ICMP protocol ECHO_REQUEST packets to a specified network host to discover whether the target machine is alive and receiving network traffic. Hope this helps, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From yoshfuji@linux-ipv6.org Wed Jun 4 08:49:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 08:49:24 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Fmv2x009504 for ; Wed, 4 Jun 2003 08:49:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h54FnXBo007612; Thu, 5 Jun 2003 00:49:33 +0900 Date: Thu, 05 Jun 2003 00:49:32 +0900 (JST) Message-Id: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> To: lpetande@tml.hut.fi Cc: davem@redhat.com, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EDE0286.4000304@tml.hut.fi> References: <20030531.003858.108351451.yoshfuji@linux-ipv6.org> <20030603.213830.85382657.davem@redhat.com> <3EDE0286.4000304@tml.hut.fi> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2880 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 3032 Lines: 74 Hello. In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > 2. Source address based routing > > Ville sent the patch. The semantical changes to the original code were > in our opinion necessary to get source address based routing working > more as IPv4 policy routing. Let's discuss this more. I'm not sure why you need this (and tunnel) for MIP... Would you clearify for me? (IMHO, I believe we don't need this change if we use XFRM engine.) BTW, source based routing (source address is major, destination is minor) is done by policy routing; It is NOT the task of CONFIG_IPV6_SUBTREE, IMHO. Yes, poeple want to have policy routing, but it is NOT (only) for MIP6. > 3. API : > There is a (sub)group working on MIPv6 extensions to the Advanced Socket > API for IPv6. To me it seems pointless to add anything other to the > spec than a way to insert a destination option header to the third > possible DO position (i.e. between routing header and fragmentation > header). This could be done just by adding new type, let's call it > IPV6_NOFRAGDSTOPTS. Everything else should be doable with the > existing ASA. We would like to hear comments on this. No, user daemon adds a XFRM policy for adding destination option (and/or routing header). Stackable destination will do the real work. (So, we don't need socket options.) > 4. Source address selection > > We think adding new home address flag to addresses is the best and > easiest way making the source address selection to work with MIPv6. > I'm sure USAGI will add the relevant checks to their source address > selection code for that. Dave, Antti already brought this up some weeks > ago, but got no answer. Is the home address bit OK with you? "Yes," is my answer for now. > 5. MIPv6 extension header adding : > Storing of the mipv6 information would in our opinion be achieved more > cleanly by using cached routes which included the mipv6 information (two > extra addresses and flags). The routes would contain modified nexthop > information and mip6_output as the rt->u.dst.output function. > Mip6_output would add the extension headers based on the information > stored in the route. The routes would have a stacked dst entry, which > would be used for actual output. I still belive it is very natural to use XFRM to manage stackable destination. > Our prototype currently works with tcp, tcp + ipsec and raw sockets, > but has only a hackish interface through route ioctl for testing. I can > send a preview patch of the code for discussion, if the general approach > makes sense to you. I would like to hear your opinions on this and also > if you (USAGI) have planned something else for storing the mipv6 state. Okay, anyway, please sent it to David, Alexey and me (at least). We can learn more from the code than documentation. ;-) Thank you. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From aj@dungeon.inka.de Wed Jun 4 09:16:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 09:16:46 -0700 (PDT) Received: from mail.inka.de (mail@quechua.inka.de [193.197.184.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54GGK2x011546 for ; Wed, 4 Jun 2003 09:16:39 -0700 Received: from dungeon.inka.de (uucp@[127.0.0.1]) by mail.inka.de with uucp (rmailwrap 0.5) id 19Navy-0003AL-00; Wed, 04 Jun 2003 18:16:18 +0200 Received: from 192.168.1.12 (unknown [192.168.1.12]) by dungeon.inka.de (Postfix) with ESMTP id 5742120FC1; Wed, 4 Jun 2003 18:16:15 +0200 (CEST) From: Andreas Jellinghaus To: "Dr. Peter Bieringer " , "Maillist netdev" Subject: Re: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes Date: Wed, 4 Jun 2003 18:18:22 +0200 User-Agent: KMail/1.5.2 Cc: "Maillist USAGI-users" References: <20030604154003.5AA111387A@smtp2.aerasec.de> In-Reply-To: <20030604154003.5AA111387A@smtp2.aerasec.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200306041818.22607.aj@dungeon.inka.de> X-archive-position: 2881 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aj@dungeon.inka.de Precedence: bulk X-list: netdev Content-Length: 174 Lines: 9 Am Mittwoch, 4. Juni 2003 17:40 schrieb Dr. Peter Bieringer: > Hi, > > is this helpful? Happen on playing around with IPsec on 2.5.70, at least -bk5 has the fix. Andreas From lpetande@morphine.tml.hut.fi Wed Jun 4 10:32:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 10:33:03 -0700 (PDT) Received: from tml-gw.tml.hut.fi (tml.hut.fi [130.233.44.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54HWS2x029158 for ; Wed, 4 Jun 2003 10:32:49 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id UAA27345 for ; Wed, 4 Jun 2003 20:32:27 +0300 X-Authentication-Warning: tml-gw.tml.hut.fi: smap set sender to using -f Received: from mail.tml.hut.fi(130.233.45.70) by tml-gw.tml.hut.fi via smap (V2.0) id xma027329; Wed, 4 Jun 03 20:32:19 +0300 Received: from localhost (localhost [127.0.0.1]) by mail.tml.hut.fi (Postfix) with ESMTP id B37D018C1AA; Wed, 4 Jun 2003 20:32:18 +0300 (EEST) Received: from mail.tml.hut.fi ([127.0.0.1]) by localhost (mail.tml.hut.fi [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 15696-01; Wed, 4 Jun 2003 20:32:18 +0300 (EEST) Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by mail.tml.hut.fi (Postfix) with ESMTP id 90EC718C1A8; Wed, 4 Jun 2003 20:32:17 +0300 (EEST) Received: from tml.hut.fi (localhost [127.0.0.1]) by morphine.tml.hut.fi (8.12.2+Sun/8.12.2) with ESMTP id h54HWHF5008647; Wed, 4 Jun 2003 20:32:17 +0300 (EEST) Received: from localhost (lpetande@localhost) by tml.hut.fi (8.12.2+Sun/8.12.2/Submit) with ESMTP id h54HVrB8008642; Wed, 4 Jun 2003 20:32:09 +0300 (EEST) Date: Wed, 4 Jun 2003 20:31:53 +0300 (EEST) From: Henrik Petander To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@redhat.com, , , "netdev@oss.sgi.com" , , Venkata Jagana , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 X-archive-position: 2882 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@morphine.tml.hut.fi Precedence: bulk X-list: netdev Content-Length: 3920 Lines: 95 Hello Yoshifuji, On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > 2. Source address based routing > > I'm not sure why you need this (and tunnel) for MIP... > Would you clearify for me? > (IMHO, I believe we don't need this change if we use XFRM engine.) As far as I remember there are three main reasons for this: (Ville correct me if forgot something) 1. Mobile node sends packets to correspondent node through the tunnel, if they have home address as the IPv6 source address (not in home address option). MIPv6 signalling packets are sent at the same time with the home address option (i.e. care-of address as the source address in IPv6 header) to the same destination, but they must not be sent through the tunnel. If the xfrm engine works correctly (for IPSec) and does the lookup using the home address as the source address, the flows appear the same based on the addresses. 2. Home Agent delivers packets with its own address as source address to mobile node using route optimization, i.e. routing header type 2, but tunnels other traffic to MN. Again behaviour depends on the source address. 3. Multihomed mobile hosts: Mobile nodes can have a cellular and WLAN interface. Packets with address from one interface can be sent only through that interface to avoid RPF dropping the traffic. With routing based primarily on source addresses this is easy to achieve. Actually this is more general than MIPv6, but multihoming is essential for real-life mobility. > > BTW, source based routing (source address is major, destination is minor) > is done by policy routing; It is NOT the task of CONFIG_IPV6_SUBTREE, IMHO. > Yes, poeple want to have policy routing, but it is NOT (only) for MIP6. IMO source address subtrees were useless as they were at least for doing mobility and multihoming, whereas with source address as primary selector routing for mobile and multihomed hosts is straightforward. Of course I may miss something of the larger picture ;-) But so far I have heard of no one using them for anything else. As long as Linux lacks "real" IPv6 policy routing, source based routing is the best we have got and works both for mobility and multihoming. The main problem with it is IMO the source address selection using the route as a selection basis. Just my thoughts, though. > > > 3. API > > No, user daemon adds a XFRM policy for adding destination option > (and/or routing header). Stackable destination will do the real work. > (So, we don't need socket options.) Actually we do... Due to some interesting requirements in the MIPv6 spec. the signalling packets are treated differently from data packets: home address option is always present in binding update messages, but it can be used with data packets only after sending a binding update. Routing header type 2 is used in negative binding acks sometimes with addresses which differ from the ones used with data packets. The signalling packets need IPSec protection with final addresses as selectors, so the MIPv6 extension headers can't be added to packets created by a raw socket as the final addresses would be hidden in the extension headers. > > > 5. MIPv6 extension header adding > I still belive it is very natural to use XFRM to manage stackable > destination. If you have a concrete proposal how to do it, I would be eager to hear it ;-) > Okay, anyway, please sent it to David, Alexey and me (at least). > We can learn more from the code than documentation. ;-) Sure, I'll send it tomorrow when i get back to work. Regards, Henrik ---------------------------------- Henrik Petander Helsinki University of Technology, GO/Core Project Henrik.Petander@hut.fi Office: +358 (0)9 451 5846 GSM: +358 (0)40 741 5248 ---------------------------------- From hch@lst.de Wed Jun 4 11:16:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:16:41 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IGU2x030179 for ; Wed, 4 Jun 2003 11:16:32 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h54IGRJT024755 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Wed, 4 Jun 2003 20:16:27 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h54IGReI024753 for netdev@oss.sgi.com; Wed, 4 Jun 2003 20:16:27 +0200 Date: Wed, 4 Jun 2003 20:16:27 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] kill drivers/net/setup.c Message-ID: <20030604181627.GA24733@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2883 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 3932 Lines: 163 The last two drivers are ppc 8xx system devices and Paul (ppc maintainer) said I should just send this patch along, the 8xx guys will have to deal with a possible breakage when brining up their port for 2.5/2.6 again. (Not that I expect anything bad to happen..) --- 1.8/arch/ppc/8260_io/enet.c Sun Apr 27 13:56:50 2003 +++ edited/arch/ppc/8260_io/enet.c Tue Jun 3 22:08:59 2003 @@ -608,7 +608,7 @@ /* Initialize the CPM Ethernet on SCC. */ -int __init scc_enet_init(void) +static int __init scc_enet_init(void) { struct net_device *dev; struct scc_enet_private *cep; @@ -860,3 +860,4 @@ return 0; } +module_init(scc_enet_init); --- 1.9/arch/ppc/8260_io/fcc_enet.c Sun Apr 27 13:56:50 2003 +++ edited/arch/ppc/8260_io/fcc_enet.c Tue Jun 3 22:08:59 2003 @@ -1323,7 +1323,7 @@ /* Initialize the CPM Ethernet on FCC. */ -int __init fec_enet_init(void) +static int __init fec_enet_init(void) { struct net_device *dev; struct fcc_enet_private *cep; @@ -1394,6 +1394,7 @@ return 0; } +module_init(fec_enet_init); /* Make sure the device is shut down during initialization. */ --- 1.10/arch/ppc/8xx_io/enet.c Mon Sep 16 06:51:56 2002 +++ edited/arch/ppc/8xx_io/enet.c Tue Jun 3 22:08:59 2003 @@ -639,7 +639,7 @@ * transmit and receive to make sure we don't catch the CPM with some * inconsistent control information. */ -int __init scc_enet_init(void) +static int __init scc_enet_init(void) { struct net_device *dev; struct scc_enet_private *cep; @@ -964,3 +964,5 @@ return 0; } + +module_init(scc_enet_init); --- 1.13/arch/ppc/8xx_io/fec.c Tue Dec 31 22:10:48 2002 +++ edited/arch/ppc/8xx_io/fec.c Tue Jun 3 22:09:00 2003 @@ -1566,7 +1566,7 @@ /* Initialize the FEC Ethernet on 860T. */ -int __init fec_enet_init(void) +static int __init fec_enet_init(void) { struct net_device *dev; struct fec_enet_private *fep; @@ -1782,6 +1782,7 @@ return 0; } +module_init(fec_enet_init); /* This function is called to start or restart the FEC during a link * change. This only happens when switching between half and full --- 1.16/drivers/net/setup.c Wed Jun 4 07:13:57 2003 +++ edited/drivers/net/setup.c Tue Jun 3 22:12:10 2003 @@ -1,54 +0,0 @@ - -/* - * New style setup code for the network devices - */ - -#include -#include -#include -#include -#include - -extern int scc_enet_init(void); -extern int fec_enet_init(void); - -/* - * Devices in this list must do new style probing. That is they must - * allocate their own device objects and do their own bus scans. - */ - -struct net_probe -{ - int (*probe)(void); - int status; /* non-zero if autoprobe has failed */ -}; - -static struct net_probe pci_probes[] __initdata = { - /* - * Early setup devices - */ -#if defined(CONFIG_SCC_ENET) - {scc_enet_init, 0}, -#endif -#if defined(CONFIG_FEC_ENET) - {fec_enet_init, 0}, -#endif - {NULL, 0}, -}; - - -/* - * Run the updated device probes. These do not need a device passed - * into them. - */ - -void __init net_device_init(void) -{ - struct net_probe *p = pci_probes; - - while (p->probe != NULL) - { - p->status = p->probe(); - p++; - } -} --- 1.83/net/core/dev.c Mon May 26 07:16:23 2003 +++ edited/net/core/dev.c Tue Jun 3 22:12:01 2003 @@ -2861,15 +2861,8 @@ * unhooks any devices that fail to initialise (normally hardware not * present) and leaves us with a valid list of present and active devices. * - */ - -extern void net_device_init(void); -extern void ip_auto_config(void); - - -/* - * This is called single threaded during boot, so no need - * to take the rtnl semaphore. + * This is called single threaded during boot, so no need + * to take the rtnl semaphore. */ static int __init net_dev_init(void) { @@ -3003,7 +2996,6 @@ * Initialise network devices */ - net_device_init(); rc = 0; out: return rc; From hch@lst.de Wed Jun 4 11:18:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:18:17 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IIB2x030483 for ; Wed, 4 Jun 2003 11:18:12 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h54II9JT024813 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Wed, 4 Jun 2003 20:18:09 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h54II9ko024811 for netdev@oss.sgi.com; Wed, 4 Jun 2003 20:18:09 +0200 Date: Wed, 4 Jun 2003 20:18:09 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] switch skfp over to initcalls Message-ID: <20030604181809.GA24779@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2884 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 4257 Lines: 191 This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) --- 1.20/drivers/net/Space.c Wed May 21 03:56:26 2003 +++ edited/drivers/net/Space.c Tue Jun 3 22:17:09 2003 @@ -105,9 +105,6 @@ /* Detachable devices ("pocket adaptors") */ extern int de620_probe(struct net_device *); -/* FDDI adapters */ -extern int skfp_probe(struct net_device *dev); - /* Fibre Channel adapters */ extern int iph5526_probe(struct net_device *dev); @@ -401,29 +398,6 @@ return -ENODEV; } -#ifdef CONFIG_FDDI -static int __init fddiif_probe(struct net_device *dev) -{ - unsigned long base_addr = dev->base_addr; - - if (base_addr == 1) - return 1; /* ENXIO */ - - if (1 -#ifdef CONFIG_APFDDI - && apfddi_init(dev) -#endif -#ifdef CONFIG_SKFP - && skfp_probe(dev) -#endif - && 1 ) { - return 1; /* -ENODEV or -EAGAIN would be more accurate. */ - } - return 0; -} -#endif - - #ifdef CONFIG_NET_FC static int fcif_probe(struct net_device *dev) { @@ -614,52 +588,6 @@ #define NEXT_DEV (&tr0_dev) #endif - -#ifdef CONFIG_FDDI -static struct net_device fddi7_dev = { - .name = "fddi7", - .next = NEXT_DEV, - .init = fddiif_probe -}; -static struct net_device fddi6_dev = { - .name = "fddi6", - .next = &fddi7_dev, - .init = fddiif_probe -}; -static struct net_device fddi5_dev = { - .name = "fddi5", - .next = &fddi6_dev, - .init = fddiif_probe -}; -static struct net_device fddi4_dev = { - .name = "fddi4", - .next = &fddi5_dev, - .init = fddiif_probe -}; -static struct net_device fddi3_dev = { - .name = "fddi3", - .next = &fddi4_dev, - .init = fddiif_probe -}; -static struct net_device fddi2_dev = { - .name = "fddi2", - .next = &fddi3_dev, - .init = fddiif_probe -}; -static struct net_device fddi1_dev = { - .name = "fddi1", - .next = &fddi2_dev, - .init = fddiif_probe -}; -static struct net_device fddi0_dev = { - .name = "fddi0", - .next = &fddi1_dev, - .init = fddiif_probe -}; -#undef NEXT_DEV -#define NEXT_DEV (&fddi0_dev) -#endif - #ifdef CONFIG_NET_FC static struct net_device fc1_dev = { --- 1.12/drivers/net/skfp/skfddi.c Fri May 9 02:40:17 2003 +++ edited/drivers/net/skfp/skfddi.c Tue Jun 3 22:19:04 2003 @@ -2539,72 +2539,25 @@ } // drv_reset_indication - -//--------------- functions for use as a module ---------------- - -#ifdef MODULE -/************************ - * - * Note now that module autoprobing is allowed under PCI. The - * IRQ lines will not be auto-detected; instead I'll rely on the BIOSes - * to "do the right thing". - * - ************************/ -#define LP(a) ((struct s_smc*)(a)) static struct net_device *mdev; -/************************ - * - * init_module - * - * If compiled as a module, find - * adapters and initialize them. - * - ************************/ -int init_module(void) +static int __init skfd_init(void) { struct net_device *p; - PRINTK(KERN_INFO "FDDI init module\n"); if ((mdev = insert_device(NULL, skfp_probe)) == NULL) return -ENOMEM; - for (p = mdev; p != NULL; p = LP(p->priv)->os.next_module) { - PRINTK(KERN_INFO "device to register: %s\n", p->name); + for (p = mdev; p != NULL; p = ((struct s_smc *)p->priv)->os.next_module) { if (register_netdev(p) != 0) { printk("skfddi init_module failed\n"); return -EIO; } } - PRINTK(KERN_INFO "+++++ exit with success +++++\n"); return 0; -} // init_module +} -/************************ - * - * cleanup_module - * - * Release all resources claimed by this module. - * - ************************/ -void cleanup_module(void) -{ - PRINTK(KERN_INFO "cleanup_module\n"); - while (mdev != NULL) { - mdev = unlink_modules(mdev); - } - return; -} // cleanup_module - - -/************************ - * - * unlink_modules - * - * Unregister devices and release their memory. - * - ************************/ static struct net_device *unlink_modules(struct net_device *p) { struct net_device *next = NULL; @@ -2638,5 +2591,11 @@ return next; } // unlink_modules +static void __exit skfd_exit(void) +{ + while (mdev) + mdev = unlink_modules(mdev); +} -#endif /* MODULE */ +module_init(skfd_init); +module_exit(skfd_exit); From shemminger@osdl.org Wed Jun 4 11:21:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:22:00 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54ILq2x030905 for ; Wed, 4 Jun 2003 11:21:52 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54ILaX24456; Wed, 4 Jun 2003 11:21:36 -0700 Date: Wed, 4 Jun 2003 11:21:36 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] tulip/xircom initialization bug Message-Id: <20030604112136.7b8e2cf4.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2885 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 579 Lines: 16 By inspection of device initialization code, this driver unregister's the net device in the error path even though the register_netdevice never succeeded. Compiles, but don't have the hardware. diff -Nru a/drivers/net/tulip/xircom_tulip_cb.c b/drivers/net/tulip/xircom_tulip_cb.c --- a/drivers/net/tulip/xircom_tulip_cb.c Wed Jun 4 11:18:44 2003 +++ b/drivers/net/tulip/xircom_tulip_cb.c Wed Jun 4 11:18:44 2003 @@ -648,7 +648,6 @@ pci_set_drvdata(pdev, NULL); pci_release_regions(pdev); err_out_free_netdev: - unregister_netdev(dev); kfree(dev); return -ENODEV; } From shemminger@osdl.org Wed Jun 4 11:25:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 11:25:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54IPa2x031542 for ; Wed, 4 Jun 2003 11:25:37 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54IP9X25211; Wed, 4 Jun 2003 11:25:09 -0700 Date: Wed, 4 Jun 2003 11:25:09 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] sb1000 driver bugs Message-Id: <20030604112509.1e0cc260.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2886 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1811 Lines: 65 Inspecting the sb1000 driver showed some interesting bugs: - net device pointer is used before the device is allocated; gcc does catch this. - unregister is called even though device not registered successfully - net device is not freed on remove. Compiles but don't have hardware to test. Don't know how it ever worked though. diff -Nru a/drivers/net/sb1000.c b/drivers/net/sb1000.c --- a/drivers/net/sb1000.c Wed Jun 4 11:17:47 2003 +++ b/drivers/net/sb1000.c Wed Jun 4 11:17:47 2003 @@ -162,10 +162,17 @@ irq = pnp_irq(pdev, 0); - if (!request_region(ioaddr[0], 16, dev->name)) + if (!request_region(ioaddr[0], 16, "sb1000")) goto out_disable; - if (!request_region(ioaddr[1], 16, dev->name)) + if (!request_region(ioaddr[1], 16, "sb1000")) goto out_release_region0; + + dev = alloc_etherdev(sizeof(struct sb1000_private)); + if (!dev) { + error = -ENOMEM; + goto out_release_regions; + } + dev->base_addr = ioaddr[0]; /* mem_start holds the second I/O address */ @@ -177,12 +184,6 @@ "S/N %#8.8x, IRQ %d.\n", dev->name, dev->base_addr, dev->mem_start, serial_number, dev->irq); - dev = alloc_etherdev(sizeof(struct sb1000_private)); - if (!dev) { - error = -ENOMEM; - goto out_release_regions; - } - /* * The SB1000 is an rx-only cable modem device. The uplink is a modem * and we do not want to arp on it. @@ -212,11 +213,9 @@ error = register_netdev(dev); if (error) - goto out_unregister; + goto out_release_regions; return 0; - out_unregister: - unregister_netdev(dev); out_release_regions: release_region(ioaddr[1], 16); out_release_region0: @@ -236,6 +235,7 @@ unregister_netdev(dev); release_region(dev->base_addr, 16); release_region(dev->mem_start, 16); + kfree(dev); } static struct pnp_driver sb1000_driver = { From shemminger@osdl.org Wed Jun 4 15:41:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 15:41:35 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54Mf82x005229 for ; Wed, 4 Jun 2003 15:41:29 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54MeMX12290; Wed, 4 Jun 2003 15:40:22 -0700 Date: Wed, 4 Jun 2003 15:40:22 -0700 From: Stephen Hemminger To: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik Cc: akpm@digeo.com, davem@redhat.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: [PATCH 2.5.70] Tun device encapsulation Message-Id: <20030604154022.0ef344ff.shemminger@osdl.org> In-Reply-To: <20030604212528.GA24515@conectiva.com.br> References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2887 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 845 Lines: 31 Tun device was encapsulating the net_device in a private structure then doing: unregister_netdev(&tun->dev); kfree(tun); rtnl_unlock(); This breaks with the delayed cleanup now in the network core. Moving the kfree outside of the rtnl_unlock will fix it. Builds, but not sure how to use TUN to test it. As part of later refcounting changes, I do have a more complex change that uses the same encapsulation as ethernet and other devices. Will save it for later. diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c --- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 +++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 @@ -551,10 +551,12 @@ if (!(tun->flags & TUN_PERSIST)) { dev_close(&tun->dev); unregister_netdevice(&tun->dev); - kfree(tun); } rtnl_unlock(); + + if (!(tun->flags & TUN_PERSIST)) + kfree(tun); return 0; } From jjs@tmsusa.com Wed Jun 4 15:47:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 15:47:38 -0700 (PDT) Received: from freeside.toyota.com (freeside.toyota.com [63.87.74.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54MlB2x005580 for ; Wed, 4 Jun 2003 15:47:32 -0700 Received: from einsten.tms.toyota.com (einstein.tms.toyota.com [10.49.36.228]) by freeside.toyota.com (8.12.8/8.12.5) with ESMTP id h54MkjdH031490; Wed, 4 Jun 2003 15:46:45 -0700 Received: from tmsusa.com (localhost.localdomain [127.0.0.1]) by einsten.tms.toyota.com (Postfix) with ESMTP id 83BA23DE7; Wed, 4 Jun 2003 15:46:44 -0700 (PDT) Message-ID: <3EDE76D4.8070001@tmsusa.com> Date: Wed, 04 Jun 2003 15:46:44 -0700 From: jjs User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger Cc: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik , akpm@digeo.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2888 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jjs@tmsusa.com Precedence: bulk X-list: netdev Content-Length: 964 Lines: 44 I'll be happy to test it out later - Thanks, Joe Stephen Hemminger wrote: >Tun device was encapsulating the net_device in a private structure then doing: > unregister_netdev(&tun->dev); > kfree(tun); > rtnl_unlock(); > >This breaks with the delayed cleanup now in the network core. >Moving the kfree outside of the rtnl_unlock will fix it. > >Builds, but not sure how to use TUN to test it. > >As part of later refcounting changes, I do have a more complex change >that uses the same encapsulation as ethernet and other >devices. Will save it for later. > >diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c >--- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >+++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >@@ -551,10 +551,12 @@ > if (!(tun->flags & TUN_PERSIST)) { > dev_close(&tun->dev); > unregister_netdevice(&tun->dev); >- kfree(tun); > } > > rtnl_unlock(); >+ >+ if (!(tun->flags & TUN_PERSIST)) >+ kfree(tun); > return 0; > } > > > > From shemminger@osdl.org Wed Jun 4 16:14:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:15:16 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NEq2x006337 for ; Wed, 4 Jun 2003 16:14:52 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54NEcX22925; Wed, 4 Jun 2003 16:14:38 -0700 Date: Wed, 4 Jun 2003 16:14:37 -0700 From: Stephen Hemminger To: Jeff Garzik , "David S. Miller" Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: 2.5.70-bk+ broken networking Message-Id: <20030604161437.2b4d3a79.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2889 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 21729 Lines: 583 Test machine running 2.5.70-bk latest can't boot because eth2 won't come up. The same machine and configuration successfully brings up all the devices and runs on 2.5.70. Starting ip6tables: [ OK ] Starting iptables: [ OK ] Setting network parameters: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ OK ] Bringing up interface eth1: [ OK ] Bringing up interface eth2: sender address length == 0 e1000 device does not seem to be present, delaying eth2 initialization. [FAILED] Starting system logger: [ OK ] Starting kernel logger: [ OK ] Starting portmapper: [ OK ] Starting NFS statd: [ OK ] Starting keytable: [ OK ] Initializing random number generator: [ OK ] Starting pcmcia: [ OK ] Mounting other filesystems: [ OK ] Setting NIS domain name osdl: [ OK ] Binding to the NIS domain: [ OK ] Listening for an NIS domain server. (hung) SysRq : Show State free sibling task PC stack pid father child younger older init S 00000001 3414430476 1 0 2 (NOTLB) Call Trace: [] schedule_timeout+0x6a/0xbc [] process_timeout+0x0/0xc [] do_select+0x193/0x2ee [] __pollwait+0x0/0xaa [] sys_select+0x2a6/0x4a8 [] sys_stat64+0x35/0x38 [] syscall_call+0x7/0xb migration/0 S 00000001 4294947312 2 1 3 (L-TLB) Call Trace: [] migration_thread+0x4f3/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/0 S 00000000 4294940388 3 1 4 2 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/1 S 00000001 7996 4 1 5 3 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/1 S 00000000 4294960540 5 1 6 4 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/2 S 00000001 4294953884 6 1 7 5 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/2 S 00000000 4294947324 7 1 8 6 (L-TLB) Call Trace: [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/3 S 00000001 4294940668 8 1 9 7 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/3 S 00000001 8044 9 1 10 8 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/4 S 00000001 4294960492 10 1 11 9 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/4 S 00000001 4294953932 11 1 12 10 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/5 S 00000001 4294947276 12 1 13 11 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/5 S 00000001 4294940716 13 1 14 12 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/6 S 00000001 7996 14 1 15 13 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/6 S 00000001 4294960540 15 1 16 14 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc migration/7 S 00000001 4294953884 16 1 17 15 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] migration_thread+0x4f3/0x534 [] ret_from_fork+0x6/0x14 [] migration_thread+0x0/0x534 [] migration_thread+0x0/0x534 [] kernel_thread_helper+0x5/0xc ksoftirqd/7 S 00000001 4294947324 17 1 18 16 (L-TLB) Call Trace: [] set_cpus_allowed+0x155/0x1d0 [] ksoftirqd+0x95/0xe6 [] ksoftirqd+0x0/0xe6 [] kernel_thread_helper+0x5/0xc events/0 S 00000001 3415176940 18 1 19 17 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] flush_to_ldisc+0x0/0x176 [] preempt_schedule+0x36/0x50 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/1 S 00000001 7620 19 1 20 18 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] blk_unplug_work+0x0/0x16 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/2 S 00000001 4294960436 20 1 21 19 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/3 S 00000001 4294953652 21 1 22 20 (L-TLB) Call Trace: [] worker_thread+0x3a9/0x3ce [] blk_unplug_work+0x0/0x16 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/4 S 00000001 4294947220 22 1 23 21 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/5 S 00000001 4294940612 23 1 24 22 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/6 S 00000001 7940 24 1 25 23 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc events/7 S 00000001 4294960436 25 1 26 24 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc kirqd S 00000001 874843020 26 1 27 25 (L-TLB) Call Trace: [] schedule_timeout+0x6a/0xbc [] process_timeout+0x0/0xc [] balanced_irq+0x4f/0x76 [] balanced_irq+0x0/0x76 [] kernel_thread_helper+0x5/0xc pdflush S 00000001 874836508 27 1 29 26 (L-TLB) Call Trace: [] daemonize+0xd1/0xd8 [] __pdflush+0xdc/0x378 [] preempt_schedule+0x36/0x50 [] schedule_tail+0xc0/0xdc [] pdflush+0x0/0x16 [] pdflush+0x11/0x16 [] kernel_thread_helper+0x5/0xc kswapd0 S F7A37EE4 7884 29 1 28 27 (L-TLB) Call Trace: [] reparent_to_init+0x10a/0x1b0 [] daemonize+0xd1/0xd8 [] kswapd+0xe0/0x10c [] preempt_schedule+0x36/0x50 [] autoremove_wake_function+0x0/0x4c [] ret_from_fork+0x6/0x14 [] autoremove_wake_function+0x0/0x4c [] kswapd+0x0/0x10c [] kernel_thread_helper+0x5/0xc pdflush S 00000001 874828660 28 1 30 29 (L-TLB) Call Trace: [] __pdflush+0xdc/0x378 [] preempt_schedule+0x36/0x50 [] schedule_tail+0xc0/0xdc [] pdflush+0x0/0x16 [] pdflush+0x11/0x16 [] kernel_thread_helper+0x5/0xc aio/0 S F7A0DBF8 4294624600 30 1 31 28 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] preempt_schedule+0x36/0x50 [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/1 S 00000001 4294617956 31 1 32 30 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/2 S 00000001 4294611348 32 1 33 31 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/3 S 00000001 4294604740 33 1 34 32 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/4 S 00000001 7940 34 1 35 33 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/5 S 00000001 4294960436 35 1 36 34 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/6 S 00000001 4294953828 36 1 37 35 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc aio/7 S 00000001 4294947220 37 1 38 36 (L-TLB) Call Trace: [] do_sigaction+0x28d/0x43a [] worker_thread+0x3a9/0x3ce [] default_wake_function+0x0/0x2e [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x2e [] worker_thread+0x0/0x3ce [] kernel_thread_helper+0x5/0xc kseriod S 00000001 4294383536 38 1 43 37 (L-TLB) Call Trace: [] allow_signal+0x5a/0xd8 [] serio_thread+0xbe/0x190 [] default_wake_function+0x0/0x2e [] serio_thread+0x0/0x190 [] kernel_thread_helper+0x5/0xc scsi_eh_0 S 00000000 8052 43 1 44 38 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] __down_failed_interruptible+0xa/0x10 [] .text.lock.scsi_error+0xad/0xb5 [scsi_mod] [] +0x20fb/0x2d40 [scsi_mod] [] scsi_error_handler+0x0/0x23a [scsi_mod] [] kernel_thread_helper+0x5/0xc ahc_dv_0 S F70F6000 4294960340 44 1 45 43 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] ahc_linux_release_simq+0xdb/0x15a [aic7xxx] [] __down_failed_interruptible+0xa/0x10 [] .text.lock.aic7xxx_osm+0x8e/0x1fb [aic7xxx] [] +0x215d/0x2600 [aic7xxx] [] ahc_linux_dv_thread+0x0/0x632 [aic7xxx] [] kernel_thread_helper+0x5/0xc scsi_eh_1 S 00000000 4294359084 45 1 46 44 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] __down_failed_interruptible+0xa/0x10 [] .text.lock.scsi_error+0xad/0xb5 [scsi_mod] [] +0x20fb/0x2d40 [scsi_mod] [] scsi_error_handler+0x0/0x23a [scsi_mod] [] kernel_thread_helper+0x5/0xc ahc_dv_1 S F70D4000 4294349080 46 1 47 45 (L-TLB) Call Trace: [] __down_interruptible+0xe2/0x1fc [] default_wake_function+0x0/0x2e [] ahc_linux_release_simq+0xdb/0x15a [aic7xxx] [] __down_failed_interruptible+0xa/0x10 [] .text.lock.aic7xxx_osm+0x8e/0x1fb [aic7xxx] [] +0x215d/0x2600 [aic7xxx] [] ahc_linux_dv_thread+0x0/0x632 [aic7xxx] [] kernel_thread_helper+0x5/0xc kjournald S 00000001 4294213892 47 1 148 46 (L-TLB) Call Trace: [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 16131824 148 1 149 47 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 192 149 1 150 148 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000000 4287980920 150 1 151 149 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald D 00000001 4287973144 151 1 152 150 (L-TLB) Call Trace: [] blk_run_queues+0xcd/0x1ae [] io_schedule+0x26/0x30 [] __wait_on_buffer+0xcf/0xd2 [] autoremove_wake_function+0x0/0x4c [] autoremove_wake_function+0x0/0x4c [] journal_commit_transaction+0x49b/0x1632 [] smp_apic_timer_interrupt+0xd8/0x140 [] schedule+0x218/0x608 [] default_wake_function+0x2a/0x2e [] default_wake_function+0x0/0x2e [] kjournald+0x163/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc kjournald S 00000001 4287966564 152 1 242 151 (L-TLB) Call Trace: [] default_wake_function+0x2a/0x2e [] interruptible_sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] kjournald+0x14f/0x25a [] commit_timeout+0x0/0xc [] kjournald+0x0/0x25a [] kernel_thread_helper+0x5/0xc rc S 00000001 4294206928 242 1 670 434 152 (NOTLB) Call Trace: [] sys_wait4+0x1e6/0x29a [] sys_rt_sigaction+0xd1/0xf4 [] default_wake_function+0x0/0x2e [] sys_rt_sigprocmask+0xce/0x1b0 [] default_wake_function+0x0/0x2e [] syscall_call+0x7/0xb dhclient S 00000001 4291868976 434 1 557 242 (NOTLB) Call Trace: [] common_interrupt+0x18/0x20 [] schedule_timeout+0x6a/0xbc [] __pollwait+0x38/0xaa [] process_timeout+0x0/0xc [] sock_poll+0x26/0x2a [] do_select+0x193/0x2ee [] __pollwait+0x0/0xaa [] sys_select+0x2a6/0x4a8 [] syscall_call+0x7/0xb syslogd D 00000001 4290889300 557 1 561 434 (NOTLB) Call Trace: [] default_wake_function+0x2a/0x2e [] sleep_on+0x8f/0x158 [] default_wake_function+0x0/0x2e [] log_wait_commit+0x70/0x120 [] log_start_commit+0xea/0x114 [] journal_stop+0x193/0x20e [] journal_force_commit+0xd1/0xea [] ext3_force_commit+0x69/0xe6 [] sys_fsync+0xa3/0xce [] syscall_call+0x7/0xb klogd S 00000001 4289963772 561 1 572 557 (NOTLB) Call Trace: [] fprob+0x2b/0x34 [] schedule_timeout+0xb9/0xbc [] kmalloc+0x188/0x1d6 [] unix_wait_for_peer+0xde/0xea [] autoremove_wake_function+0x0/0x4c [] memcpy_fromiovec+0x88/0x8e [] autoremove_wake_function+0x0/0x4c [] sock_alloc_send_skb+0x2e/0x32 [] unix_dgram_sendmsg+0x2be/0x68c [] filemap_nopage+0x1e5/0x2ce [] pte_chain_alloc+0x94/0x9c [] sock_aio_write+0xbc/0xd8 [] do_sync_write+0x8a/0xb6 [] handle_mm_fault+0x103/0x1fc [] default_wake_function+0x0/0x2e [] run_timer_softirq+0x196/0x25c [] vfs_write+0xe9/0x11a [] sys_write+0x3f/0x5e [] syscall_call+0x7/0xb portmap S 00000001 4292398496 572 1 561 (NOTLB) Call Trace: [] schedule_timeout+0xb9/0xbc [] sock_poll+0x26/0x2a [] do_pollfd+0x57/0x98 [] do_poll+0xa5/0xc4 [] sys_poll+0x160/0x23a [] __pollwait+0x0/0xaa [] syscall_call+0x7/0xb S27ypbind S 00000001 276376 670 242 687 (NOTLB) Call Trace: [] sys_wait4+0x1e6/0x29a [] sys_rt_sigaction+0xd1/0xf4 [] default_wake_function+0x0/0x2e [] sys_rt_sigprocmask+0xce/0x1b0 [] default_wake_function+0x0/0x2e [] syscall_call+0x7/0xb rpcinfo S 00000001 4287705648 687 670 688 (NOTLB) Call Trace: [] tcp_v4_connect+0x42f/0x68c [] schedule_timeout+0xb9/0xbc [] inet_wait_for_connect+0x119/0x298 [] autoremove_wake_function+0x0/0x4c [] autoremove_wake_function+0x0/0x4c [] inet_stream_connect+0x218/0x340 [] move_addr_to_kernel+0x6b/0x70 [] sys_connect+0x78/0x9a [] do_page_fault+0x27f/0x4bd [] sock_create+0x100/0x2b0 [] sys_socket+0x3a/0x56 [] sys_socketcall+0xb2/0x262 [] sys_munmap+0x58/0x78 [] syscall_call+0x7/0xb grep S 00000001 4293967752 688 670 687 (NOTLB) Call Trace: [] pipe_wait+0x8b/0xc0 [] autoremove_wake_function+0x0/0x4c [] cp_new_stat64+0xe6/0xea [] autoremove_wake_function+0x0/0x4c [] pipe_read+0x158/0x246 [] vfs_read+0xaf/0x11a [] sys_read+0x3f/0x5e [] syscall_call+0x7/0xb From shemminger@osdl.org Wed Jun 4 16:25:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:25:46 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NPg2x006821 for ; Wed, 4 Jun 2003 16:25:42 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h54NPSX26806; Wed, 4 Jun 2003 16:25:28 -0700 Date: Wed, 4 Jun 2003 16:25:28 -0700 From: Stephen Hemminger To: Stephen Hemminger Cc: jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604162528.637ae1ff.shemminger@osdl.org> In-Reply-To: <20030604161437.2b4d3a79.shemminger@osdl.org> References: <20030604161437.2b4d3a79.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2891 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 696 Lines: 21 On Wed, 4 Jun 2003 16:14:37 -0700 Stephen Hemminger wrote: > Test machine running 2.5.70-bk latest can't boot because eth2 won't > come up. The same machine and configuration successfully brings up > all the devices and runs on 2.5.70. > > Starting ip6tables: [ OK ] > Starting iptables: [ OK ] > Setting network parameters: [ OK ] > Bringing up loopback interface: [ OK ] > Bringing up interface eth0: [ OK ] > Bringing up interface eth1: [ OK ] > Bringing up interface eth2: sender address length == 0 > e1000 device does not seem to be present, delaying eth2 initialization. > [FAILED] One more piece of info: eth0 and eth1 are e100 eth2 is e1000 From Andrew.Morton@digeo.com Wed Jun 4 16:25:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 16:25:42 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h54NPc2x006820 for ; Wed, 4 Jun 2003 16:25:38 -0700 Received: from digeo.com ([172.17.140.150]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 16:25:32 -0700 Message-ID: <3EDE7FEB.2C7FAEC7@digeo.com> Date: Wed, 04 Jun 2003 16:25:31 -0700 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.70-mm3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Stephen Hemminger CC: Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking References: <20030604161437.2b4d3a79.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 04 Jun 2003 23:25:32.0670 (UTC) FILETIME=[97F8F9E0:01C32AF0] X-archive-position: 2890 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Content-Length: 397 Lines: 11 Stephen Hemminger wrote: > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > come up. The same machine and configuration successfully brings up > all the devices and runs on 2.5.70. kjournald is stuck waiting for IO to complete against some buffer during transaction commit. I'd be suspecting block layer or device drivers. What device driver is handling your /var/log? From patmans@us.ibm.com Wed Jun 4 18:48:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 18:48:13 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h551lt2x010986 for ; Wed, 4 Jun 2003 18:48:05 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h551lftd176574; Wed, 4 Jun 2003 21:47:41 -0400 Received: from DYN318139.beaverton.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h551lbcO178264; Wed, 4 Jun 2003 21:47:38 -0400 Received: (from patman@localhost) by DYN318139.beaverton.ibm.com (8.11.6/8.11.6) id h551hfw10353; Wed, 4 Jun 2003 18:43:41 -0700 X-Authentication-Warning: DYN318139.beaverton.ibm.com: patman set sender to patmans@us.ibm.com using -f Date: Wed, 4 Jun 2003 18:43:41 -0700 From: Patrick Mansfield To: Andrew Morton Cc: Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030604184341.A10256@beaverton.ibm.com> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3EDE7FEB.2C7FAEC7@digeo.com>; from akpm@digeo.com on Wed, Jun 04, 2003 at 04:25:31PM -0700 X-archive-position: 2892 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: patmans@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1498 Lines: 47 On Wed, Jun 04, 2003 at 04:25:31PM -0700, Andrew Morton wrote: > Stephen Hemminger wrote: > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > come up. The same machine and configuration successfully brings up > > all the devices and runs on 2.5.70. > > kjournald is stuck waiting for IO to complete against some buffer > during transaction commit. > > I'd be suspecting block layer or device drivers. What device driver > is handling your /var/log? I also can't get networking up on current bk, I don't know if this is the same problem, the system did not hang (I'm not running NIS?). I also got that "sender address length == 0" message, I have not seen it before, it seems to be output by the "ip -o link". During boot: [ ... ] Enabling local filesystem quotas: [ OK ] Enabling swap space: [ OK ] /bin/cat: /proc/ksyms: No such file or directory INIT: Entering runlevel: 3 Entering non-interactive startup Setting network parameters: [ OK ] Bringing up interface lo: [ OK ] sender address length == 0 sender address length == 0 Starting system logger: [ OK ] Starting kernel logger: [ OK ] Starting portmapper: [ OK ] Starting NFS file locking services: [ ... ] After logging in: [root@elm3b79 root]# ifup eth0 sender address length == 0 [root@elm3b79 root]# ip -o link sender address length == 0 [root@elm3b79 root]# dmesg | grep eth0 eth0: Digital DS21140 Tulip rev 33 at 0xf8800000, 00:00:BC:0F:03:EB, IRQ 36. -- Patrick Mansfield From acme@conectiva.com.br Wed Jun 4 19:32:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 19:33:06 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h552Wr2x021427 for ; Wed, 4 Jun 2003 19:32:54 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YzL2-00009b-00; Sat, 05 Jul 2003 23:33:16 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 568021966C; Thu, 5 Jun 2003 02:33:49 +0000 (UTC) Date: Wed, 4 Jun 2003 23:33:49 -0300 From: Arnaldo Carvalho de Melo To: Andrew Morton Cc: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605023349.GH24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030604185652.31958d1f.akpm@digeo.com> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2893 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 1297 Lines: 33 Em Wed, Jun 04, 2003 at 06:56:52PM -0700, Andrew Morton escreveu: > Andrew Morton wrote: > > > > Stephen Hemminger wrote: > > > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > > come up. The same machine and configuration successfully brings up > > > all the devices and runs on 2.5.70. > > > > kjournald is stuck waiting for IO to complete against some buffer > > during transaction commit. > > > > I'd be suspecting block layer or device drivers. What device driver > > is handling your /var/log? > > I take that back. > > Your sysrq-T woke up syslogd which did a synchronous write which poked > kjournald. You happened to catch it in mid-commit. So that's all normal > and sane. > > Something is up with netdevice initialisation. My eth0 (e100) is in a > strange half-there state and won't come up. Reverting the post-2.5.70 e100 > changes does not help. It's something which went into the tree today I > think. Strange as I'm using 2.5.70-latest-bk as of 30 minutes ago, i.e. uptodate with Linus + my network patches. Thing is related to nfs, please nfs loading at boot time and try again, worked for me, don't know what is wrong with nfs loading tho (haven't checked at all, just disabled loading of the nfs server) :-( - Arnaldo From acme@conectiva.com.br Wed Jun 4 19:41:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 19:41:20 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h552fE2x021867 for ; Wed, 4 Jun 2003 19:41:15 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19YzTB-0000BI-00; Sat, 05 Jul 2003 23:41:41 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id B7A951966C; Thu, 5 Jun 2003 02:42:13 +0000 (UTC) Date: Wed, 4 Jun 2003 23:42:13 -0300 From: Arnaldo Carvalho de Melo To: Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605024212.GI24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , Andrew Morton , shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> <20030605023349.GH24515@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030605023349.GH24515@conectiva.com.br> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2894 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 1476 Lines: 35 Em Wed, Jun 04, 2003 at 11:33:49PM -0300, Arnaldo C. Melo escreveu: > Em Wed, Jun 04, 2003 at 06:56:52PM -0700, Andrew Morton escreveu: > > Andrew Morton wrote: > > > > > > Stephen Hemminger wrote: > > > > > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > > > come up. The same machine and configuration successfully brings up > > > > all the devices and runs on 2.5.70. > > > > > > kjournald is stuck waiting for IO to complete against some buffer > > > during transaction commit. > > > > > > I'd be suspecting block layer or device drivers. What device driver > > > is handling your /var/log? > > > > I take that back. > > > > Your sysrq-T woke up syslogd which did a synchronous write which poked > > kjournald. You happened to catch it in mid-commit. So that's all normal > > and sane. > > > > Something is up with netdevice initialisation. My eth0 (e100) is in a > > strange half-there state and won't come up. Reverting the post-2.5.70 e100 > > changes does not help. It's something which went into the tree today I > > think. > > Strange as I'm using 2.5.70-latest-bk as of 30 minutes ago, i.e. uptodate with > Linus + my network patches. Thing is related to nfs, please nfs loading at Ouch, it should have been "please disable nfs loading..." > boot time and try again, worked for me, don't know what is wrong with nfs > loading tho (haven't checked at all, just disabled loading of the nfs > server) :-( From jmorris@intercode.com.au Wed Jun 4 20:26:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 20:26:59 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:556VwnA7GH4nCqJ3WvCtm1xx63Zyi+XP@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h553Qj2x024044 for ; Wed, 4 Jun 2003 20:26:47 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h553Pxr00868; Thu, 5 Jun 2003 13:26:05 +1000 Date: Thu, 5 Jun 2003 13:25:58 +1000 (EST) From: James Morris To: Patrick Mansfield cc: Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , , , Christoph Hellwig Subject: Re: 2.5.70-bk+ broken networking In-Reply-To: <20030604184341.A10256@beaverton.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2895 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev Content-Length: 695 Lines: 31 On Wed, 4 Jun 2003, Patrick Mansfield wrote: > [root@elm3b79 root]# ifup eth0 > sender address length == 0 This is a bug introduced by a coding style cleanup, fix below. - James -- James Morris --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc address); if (err < 0) return err; - m->msg_name = address; - } else - m->msg_name = NULL; - } + } + m->msg_name = address; + } else + m->msg_name = NULL; size = m->msg_iovlen * sizeof(struct iovec); if (copy_from_user(iov, m->msg_iov, size)) From acme@conectiva.com.br Wed Jun 4 20:31:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 20:31:23 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h553VI2x024366 for ; Wed, 4 Jun 2003 20:31:19 -0700 Received: from [200.181.171.58] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19Z0FW-0000LY-00; Sun, 06 Jul 2003 00:31:38 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 06F5C1966C; Thu, 5 Jun 2003 03:32:09 +0000 (UTC) Date: Thu, 5 Jun 2003 00:32:08 -0300 From: Arnaldo Carvalho de Melo To: James Morris Cc: Patrick Mansfield , Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: Re: 2.5.70-bk+ broken networking Message-ID: <20030605033208.GK24515@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , James Morris , Patrick Mansfield , Andrew Morton , Stephen Hemminger , Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, linux-kernel@vger.kernel.org, Christoph Hellwig References: <20030604184341.A10256@beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-archive-position: 2896 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Content-Length: 894 Lines: 36 For the curious, it was introduced in changeset 1.1259.9.18 - Arnaldo Em Thu, Jun 05, 2003 at 01:25:58PM +1000, James Morris escreveu: > On Wed, 4 Jun 2003, Patrick Mansfield wrote: > > > [root@elm3b79 root]# ifup eth0 > > sender address length == 0 > > This is a bug introduced by a coding style cleanup, fix below. > > > - James > -- > James Morris > > > --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 > +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 > @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc > address); > if (err < 0) > return err; > - m->msg_name = address; > - } else > - m->msg_name = NULL; > - } > + } > + m->msg_name = address; > + } else > + m->msg_name = NULL; > > size = m->msg_iovlen * sizeof(struct iovec); > if (copy_from_user(iov, m->msg_iov, size)) > From davem@redhat.com Wed Jun 4 21:10:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 21:10:28 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h554AM2x025408 for ; Wed, 4 Jun 2003 21:10:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA00325; Wed, 4 Jun 2003 21:08:03 -0700 Date: Wed, 04 Jun 2003 21:08:02 -0700 (PDT) Message-Id: <20030604.210802.115939500.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: Ooops: 2.5.70 kernel BUG at net/xfrm/xfrm_policy.c - ping crashes From: "David S. Miller" In-Reply-To: <20030604154003.5AA111387A@smtp2.aerasec.de> References: <20030604154003.5AA111387A@smtp2.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2897 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 217 Lines: 7 From: "Dr. Peter Bieringer " Date: Wed, 04 Jun 2003 17:40:03 +0200 Happen on playing around with IPsec on 2.5.70, 2.5.70 is ancient, many bugs fixed, please sync up to the current tree From davem@redhat.com Wed Jun 4 21:20:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 21:20:14 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h554KA2x025878 for ; Wed, 4 Jun 2003 21:20:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA00375; Wed, 4 Jun 2003 21:17:51 -0700 Date: Wed, 04 Jun 2003 21:17:50 -0700 (PDT) Message-Id: <20030604.211750.28820261.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] tulip/xircom initialization bug From: "David S. Miller" In-Reply-To: <20030604112136.7b8e2cf4.shemminger@osdl.org> References: <20030604112136.7b8e2cf4.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2898 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 502 Lines: 12 From: Stephen Hemminger Date: Wed, 4 Jun 2003 11:21:36 -0700 By inspection of device initialization code, this driver unregister's the net device in the error path even though the register_netdevice never succeeded. This is fully legal, unregister_netdevice() checks for existence of the netdev in the device list and if not found it returns an error. This severely simplifies error path handling while we convert all these drivers away from init_etherdev(). From davem@redhat.com Wed Jun 4 22:06:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 22:06:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5556T2x028773 for ; Wed, 4 Jun 2003 22:06:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA00526; Wed, 4 Jun 2003 22:03:24 -0700 Date: Wed, 04 Jun 2003 22:03:24 -0700 (PDT) Message-Id: <20030604.220324.116384963.davem@redhat.com> To: acme@conectiva.com.br Cc: jmorris@intercode.com.au, patmans@us.ibm.com, akpm@digeo.com, shemminger@osdl.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, hch@infradead.org Subject: Re: 2.5.70-bk+ broken networking From: "David S. Miller" In-Reply-To: <20030605033208.GK24515@conectiva.com.br> References: <20030604184341.A10256@beaverton.ibm.com> <20030605033208.GK24515@conectiva.com.br> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2899 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 343 Lines: 11 From: Arnaldo Carvalho de Melo Date: Thu, 5 Jun 2003 00:32:08 -0300 For the curious, it was introduced in changeset 1.1259.9.18 Christophe, PLEASE be more careful in the future. I value your changes, very much. However, you really need to get a little more meticulious when you submit changes. Thanks. From joe@tmsusa.com Wed Jun 4 22:33:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 22:33:40 -0700 (PDT) Received: from jyro.mirai.cx (dsl081-085-006.lax1.dsl.speakeasy.net [64.81.85.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h555XY2x029332 for ; Wed, 4 Jun 2003 22:33:34 -0700 Received: from tmsusa.com (neo [192.168.111.123]) by jyro.mirai.cx (Postfix) with ESMTP id 8EC9E17823; Wed, 4 Jun 2003 22:33:33 -0700 (PDT) Message-ID: <3EDED62D.3020808@tmsusa.com> Date: Wed, 04 Jun 2003 22:33:33 -0700 From: Joe User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger Cc: Arnaldo Carvalho de Melo , "David S. Miller" , Jeff Garzik , akpm@digeo.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2900 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: joe@tmsusa.com Precedence: bulk X-list: netdev Content-Length: 961 Lines: 43 This fixes the tun problem nicely here - Joe Stephen Hemminger wrote: >Tun device was encapsulating the net_device in a private structure then doing: > unregister_netdev(&tun->dev); > kfree(tun); > rtnl_unlock(); > >This breaks with the delayed cleanup now in the network core. >Moving the kfree outside of the rtnl_unlock will fix it. > >Builds, but not sure how to use TUN to test it. > >As part of later refcounting changes, I do have a more complex change >that uses the same encapsulation as ethernet and other >devices. Will save it for later. > >diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c >--- a/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >+++ b/drivers/net/tun.c Wed Jun 4 15:38:44 2003 >@@ -551,10 +551,12 @@ > if (!(tun->flags & TUN_PERSIST)) { > dev_close(&tun->dev); > unregister_netdevice(&tun->dev); >- kfree(tun); > } > > rtnl_unlock(); >+ >+ if (!(tun->flags & TUN_PERSIST)) >+ kfree(tun); > return 0; > } > > > > > From davem@redhat.com Wed Jun 4 23:53:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 04 Jun 2003 23:53:08 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h556r22x031893 for ; Wed, 4 Jun 2003 23:53:02 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA00812; Wed, 4 Jun 2003 23:50:05 -0700 Date: Wed, 04 Jun 2003 23:50:05 -0700 (PDT) Message-Id: <20030604.235005.26995218.davem@redhat.com> To: shemminger@osdl.org Cc: acme@conectiva.com.br, jgarzik@pobox.com, akpm@digeo.com, jjs@tmsusa.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Tun device encapsulation From: "David S. Miller" In-Reply-To: <20030604154022.0ef344ff.shemminger@osdl.org> References: <20030604115236.309a173d.akpm@digeo.com> <20030604212528.GA24515@conectiva.com.br> <20030604154022.0ef344ff.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2901 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 498 Lines: 16 From: Stephen Hemminger Date: Wed, 4 Jun 2003 15:40:22 -0700 Tun device was encapsulating the net_device in a private structure then doing: unregister_netdev(&tun->dev); kfree(tun); rtnl_unlock(); This breaks with the delayed cleanup now in the network core. Moving the kfree outside of the rtnl_unlock will fix it. Builds, but not sure how to use TUN to test it. This seems to indeed fix the problem for people, applied thanks. From vnuorval@tcs.hut.fi Thu Jun 5 01:44:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 01:44:17 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h558i52x003783 for ; Thu, 5 Jun 2003 01:44:06 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Thu, 5 Jun 2003 11:36:51 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h558apjH030219; Thu, 5 Jun 2003 11:36:51 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h558ala6030215; Thu, 5 Jun 2003 11:36:47 +0300 Date: Thu, 5 Jun 2003 11:36:47 +0300 (EEST) From: Ville Nuorvala To: Henrik Petander cc: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= , , , "netdev@oss.sgi.com" , , Venkata Jagana , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2903 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Wed, 4 Jun 2003, Henrik Petander wrote: > Hello Yoshifuji, > > On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > > > 2. Source address based routing > > > > I'm not sure why you need this (and tunnel) for MIP... > > Would you clearify for me? > > (IMHO, I believe we don't need this change if we use XFRM engine.) > > As far as I remember there are three main reasons for this: (Ville > correct me if forgot something) I think you've got all the main reasons. I'll get back to this issue if I suddenly remember something you forgot. :) -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From yoshfuji@linux-ipv6.org Thu Jun 5 03:11:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 03:11:55 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55ABf2x010081 for ; Thu, 5 Jun 2003 03:11:43 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h55ACPBo012800; Thu, 5 Jun 2003 19:12:25 +0900 Date: Thu, 05 Jun 2003 19:12:24 +0900 (JST) Message-Id: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> To: vnuorval@tcs.hut.fi Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, yoshfuji@linux-ipv6.org, nakam@linux-ipv6.org, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030531.000319.114704530.yoshfuji@linux-ipv6.org> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2904 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030531.000319.114704530.yoshfuji@linux-ipv6.org> (at Sat, 31 May 2003 00:03:19 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 says: > In article (at Fri, 30 May 2003 17:34:40 +0300 (EEST)), Ville Nuorvala says: > > > here is a patch that fixes CONFIG_IPV6_SUBTREES and allows overriding > > normal routes with source address specific ones. This is for example > > needed in MIPv6 for handling the traffic to and from a mobile node's home > > address correctly. > > Let us test the patch. It seemed buggy when USAGI tested before. I've re-tested your latest CONFIG_IPV6_SUBTREE patch. The results of the restesting seems fine. However, I won't accept your patch as-is for now. The patch consists of several parts: 1. fixing bugs in IPv6 code 2. fixing bugs in CONFIG_IPV6_SUBTREE code 3. changing majority of keys of routing table. There's no problems with 1 and 2. However, We need to discuss on 3. As I said in other thread, the policy routing should be done in the other way. And, it is not good to change the semantics of CONFIG_IPV6_SUBTREE. In original, routing is looked up by destination address, and then, looked up by the source address; destination takes precedence over source. Your patch changes this. Source address takes precedence over destination address. From the point of the policy routing, both (and other attributes) should be considered equally, and this is what IPv4 routing table does. Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. The reason why I won't hurry is because I still believe it is not required for MIP6. Nakamura, one of our member, will describe the details. It takes precedence over "limited" policy(?) routing to introcuce generic policy routing. Anyway, will you split up your patch (into 1-3 above) first, please? Thanks. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From pb@bieringer.de Thu Jun 5 04:54:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 04:54:18 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Bs12x014073 for ; Thu, 5 Jun 2003 04:54:02 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id B90311387A; Thu, 5 Jun 2003 13:21:44 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id A12011387E; Thu, 5 Jun 2003 13:21:43 +0200 (CEST) X-AV-Checked: Thu Jun 5 13:21:43 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id B0BDE1387A; Thu, 5 Jun 2003 13:21:42 +0200 (CEST) Date: Thu, 05 Jun 2003 13:21:40 +0200 From: "Dr. Peter Bieringer" To: Maillist netdev Cc: usagi-users@linux-ipv6.org Subject: 2.5.70-bk9: no IPsec modules are autoloaded Message-ID: <29980000.1054812100@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2905 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi again, now playing around with 2.5.70-bk9...which still not solves the interoperability problem with FreeS/WAN. Are they talking different ESP? Sure known that autoloading of IPsec modules is broken...is this a bug or by design? The error messages of racoon are not very useful: 2003-06-05 11:34:34: INFO: main.c:174:main(): @(#)racoon 20001216 20001216 sakane@kame.net 2003-06-05 11:34:34: INFO: main.c:175:main(): @(#)This product linked OpenSSL 0.9.6b [engine] 9 Jul 2001 (http://www.openssl.org/) racoon: something error happened while pfkey initializing. 2003-06-05 11:34:34: ERROR: pfkey.c:364:pfkey_init(): libipsec failed pfkey open (Address family not supported by protocol) -> missing module "af_key" 2003-06-05 11:42:07: INFO: isakmp.c:1048:isakmp_ph2begin_r(): respond new phase 2 negotiation: 10.3.62.31[0]<=>10.3.62.35[0] 2003-06-05 11:42:08: ERROR: pfkey.c:209:pfkey_handler(): pfkey UPDATE failed: No buffer space available 2003-06-05 11:42:08: ERROR: pfkey.c:209:pfkey_handler(): pfkey ADD failed: No buffer space available 2003-06-05 11:42:22: ERROR: pfkey.c:740:pfkey_timeover(): *remote* give up to get IPsec-SA due to time up to wait. 2003-06-05 11:42:37: INFO: pfkey.c:1367:pk_recvexpire(): IPsec-SA expired: ESP/Transport *remote*->*local* spi=256398122(0xf48532a) 2003-06-05 11:43:07: INFO: isakmp.c:1520:isakmp_ph1expire(): ISAKMP-SA expired *local*[500]-*remote*[500] spi:3087159632fe32b6:88a45a3eabd327fd 2003-06-05 11:43:08: INFO: isakmp.c:1568:isakmp_ph1delete(): ISAKMP-SA deleted *remote*[500]-*local*[500] spi:3087159632fe32b6:88a45a3eabd327fd -> missing module "ah" and "esp" (not so funny, cost me about 15 min to find the solution for "No buffer space available" - "why it worked yesterday and not today") None of the above ones are automagically loaded, while others (e.g. the encrytion ones) are. BTW: is this normal? (host is IPv4 only at the moment): 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): 127.0.0.1[500] used as isakmp port (fd=7) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip1*[500] used as isakmp port (fd=8) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip2*[500] used as isakmp port (fd=9) 2003-06-05 13:17:03: INFO: isakmp.c:1362:isakmp_open(): *ip3*[500] used as isakmp port (fd=10) 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). 2003-06-05 13:17:03: ERROR: isakmp.c:1354:isakmp_open(): failed to bind (Address already in use). Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From davem@redhat.com Thu Jun 5 05:14:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:15:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CEw2x014638 for ; Thu, 5 Jun 2003 05:14:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA01807; Thu, 5 Jun 2003 05:12:38 -0700 Date: Thu, 05 Jun 2003 05:12:38 -0700 (PDT) Message-Id: <20030605.051238.74748591.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: 2.5.70-bk9: no IPsec modules are autoloaded From: "David S. Miller" In-Reply-To: <29980000.1054812100@klopffest.muc.aerasec.de> References: <29980000.1054812100@klopffest.muc.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2906 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Dr. Peter Bieringer" Date: Thu, 05 Jun 2003 13:21:40 +0200 Sure known that autoloading of IPsec modules is broken...is this a bug or by design? You (or someone) has to add the appropriate entries to /etc/modules.conf From lpetande@tml.hut.fi Thu Jun 5 05:17:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:17:32 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CHQ2x014955 for ; Thu, 5 Jun 2003 05:17:27 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55CHO5b017649 for ; Thu, 5 Jun 2003 15:17:24 +0300 Message-ID: <3EDF36AA.9020403@tml.hut.fi> Date: Thu, 05 Jun 2003 15:25:14 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Bug in ipv6 ipsec in handling of packets with extension headers Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=1 Fuz1=1 Fuz2=1 X-archive-position: 2907 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Hi, There's a bug in get_offset function of ah6 and esp6. The function returns also a pointer, prev_hdr, pointing to the last extension header before the IPSec headers. This pointer points to the skb. The ipsec headers go between the payload and the extension header, making the pointer invalid. However, after this the pointer is used for setting the next header field of the extension header to IPPROTO_ESP or IPPROTO_AH. This corrupts the packet, if any extension headers are present. An easy way to test this is to send a data packet with routing header protected by IPSec. A possible fix is to change the pointer into an offset from the start of the packet and use the offset later to set the nexthdr value in the extension header. Thanks, Henrik From davem@redhat.com Thu Jun 5 05:19:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:19:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CJT2x015274 for ; Thu, 5 Jun 2003 05:19:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA01843; Thu, 5 Jun 2003 05:17:10 -0700 Date: Thu, 05 Jun 2003 05:17:09 -0700 (PDT) Message-Id: <20030605.051709.104035049.davem@redhat.com> To: lpetande@tml.hut.fi Cc: netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers From: "David S. Miller" In-Reply-To: <3EDF36AA.9020403@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2908 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Thu, 05 Jun 2003 15:25:14 +0300 A possible fix is to change the pointer into an offset from the start of the packet and use the offset later to set the nexthdr value in the extension header. Please indicate the version of the sources you are looking at when making reports. Thank you. From pb@bieringer.de Thu Jun 5 05:26:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:27:03 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CQw2x015604 for ; Thu, 5 Jun 2003 05:26:59 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 486931387E; Thu, 5 Jun 2003 14:26:52 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 88DE71387A; Thu, 5 Jun 2003 14:26:50 +0200 (CEST) X-AV-Checked: Thu Jun 5 14:26:50 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id C15191387E; Thu, 5 Jun 2003 14:26:49 +0200 (CEST) Date: Thu, 05 Jun 2003 14:26:47 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: 2.5.70-bk9: no IPsec modules are autoloaded Message-ID: <34470000.1054816007@klopffest.muc.aerasec.de> In-Reply-To: <20030605.051238.74748591.davem@redhat.com> References: <29980000.1054812100@klopffest.muc.aerasec.de> <20030605.051238.74748591.davem@redhat.com> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2909 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, June 05, 2003 05:12:38 AM -0700 "David S. Miller" wrote: > From: "Dr. Peter Bieringer" > Date: Thu, 05 Jun 2003 13:21:40 +0200 > > Sure known that autoloading of IPsec modules is broken...is this a bug > or by design? > > You (or someone) has to add the appropriate entries to > /etc/modules.conf Ok, good to know. Are there any aliases possible like alias-something-ike af_key alias-something-cryptobasic-49 ah alias-something-cryptobasic-50 esp alias-something-crypto-modules-0 crypt_null pre-install esp modprobe ah BTW: isn't this file called now "modprobe. conf"? Thanks, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From vnuorval@tcs.hut.fi Thu Jun 5 05:52:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:52:22 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55CqG2x016165 for ; Thu, 5 Jun 2003 05:52:17 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Thu, 5 Jun 2003 15:40:59 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h55CewjH031145; Thu, 5 Jun 2003 15:40:58 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h55Cer6J031141; Thu, 5 Jun 2003 15:40:54 +0300 Date: Thu, 5 Jun 2003 15:40:53 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: lpetande@tml.hut.fi, , , , , , , Subject: Re: [patch]: ipv6 tunnel for MIPv6 In-Reply-To: <20030605.004932.00042147.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2911 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > In article <3EDE0286.4000304@tml.hut.fi> (at Wed, 04 Jun 2003 17:30:30 +0300), Henrik Petander says: > > > 2. Source address based routing > > I'm not sure why you need this (and tunnel) for MIP... > Would you clearify for me? > (IMHO, I believe we don't need this change if we use XFRM engine.) > A few comments about the tunnels: Can you run link-scope protocols over an XFRM tunnel? The MIPv6 spec more or less requires this feature if you are to support protocols like DHCPv6 and MLD. (See eg. sections 8.5, 10.4.3 and 10.4.4 in the MIPv6 draft 22). The only way to get them to work is AFAICS that there is a virtual net_device associated with every tunnel. Are XFRM tunnels like this? At least they didn't seem to be, based on the xfrm6_tunnel patch sent to netdev last week... If the tunnels aren't separate devices I can straight away think of one scenario where we run into trouble. 1. MN receives RA with M or O flags set from a router on the foreign link. 2. MN receives a MPA with M or O flags set from HA. In the first case the DHCP queries should be sent to the current link the MN is attached to, in the latter to the HA. I dont see any way for the MN to separate these two cases while sending the DHCP queries, _unless_ they are sent through different interfaces (i.e. the physical vs the virtual tunnel interface). On a more general note, the driver I sent aims to provide provide a completely RFC 2473 compliant tunnel interface. :) Things (at the moment) missing from the xfrm6_tunnel are at least: - tunnel encapsulation limit destination sub-option support - forwarding of ICMP errors to the original source of the packet - transparent fragmentation of packets if MTU minus size of tunnel headers less than IPV6_MIN_MTU - ability to configure things like traffic class and flowlabel of encapsulating ipv6 header Perhaps we could make feature complete ip6ip6 tunnels if we combined xfrm6_tunnel and ip6_tunnel? :) Regards, Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From lpetande@tml.hut.fi Thu Jun 5 05:51:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 05:51:48 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Cph2x016053 for ; Thu, 5 Jun 2003 05:51:44 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55Cpf5b024962; Thu, 5 Jun 2003 15:51:42 +0300 Message-ID: <3EDF3EB4.8010105@tml.hut.fi> Date: Thu, 05 Jun 2003 15:59:32 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> In-Reply-To: <20030605.051709.104035049.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=2 Fuz1=2 Fuz2=2 X-archive-position: 2910 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev David S. Miller wrote: > From: Henrik Petander > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > A possible fix is to change the pointer into an offset from the start of > the packet and use the offset later to set the nexthdr value in the > extension header. > > Please indicate the version of the sources you are looking > at when making reports. Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: the problem is with esp6 and not with ah6, which does not use the get_offset function. Henrik From pb@bieringer.de Thu Jun 5 06:50:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 06:51:00 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Dog2x025545 for ; Thu, 5 Jun 2003 06:50:43 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id 0CDEE1387A; Thu, 5 Jun 2003 15:07:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 228101387E; Thu, 5 Jun 2003 15:07:40 +0200 (CEST) X-AV-Checked: Thu Jun 5 15:07:40 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 5D90D1387A; Thu, 5 Jun 2003 15:07:39 +0200 (CEST) Date: Thu, 05 Jun 2003 15:07:36 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods Message-ID: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2912 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi again, because I got no success, I've tried different encryption methods than 3DES. And *suddenly* it began to work. One side : 2.5.70-bk9 Other side: FreeS/WAN 1.99 with algopatches 0.8.1rc2 Result: AES --- AES-128: working AES-192: not working AES-256: not working FreeS/WAN: 112 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: initiate 003 "freeswan-racoon-tunnel" #14: ESP transform ESP_AES passed key_len=32 > 16 032 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: internal error 3DES ---- Not working, no message Blowfish -------- blowfish-128: working Other key lengths: not working NO_PROPOSAL_CHOSEN Other algorithms: not tested at the moment I'm very wondering why 3DES is incompatible in IPsec-SA modus, while working in IKE. Can someone confirm and/or extend this compatibility test? TIA, Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From jmorris@intercode.com.au Thu Jun 5 07:16:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:16:21 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:DCeOq5adnwF804cXIr9y5TBswOrhw6Mf@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EGD2x026587 for ; Thu, 5 Jun 2003 07:16:15 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h55EG3r03820; Fri, 6 Jun 2003 00:16:03 +1000 Date: Fri, 6 Jun 2003 00:16:02 +1000 (EST) From: James Morris To: "Dr. Peter Bieringer" cc: netdev@oss.sgi.com, Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2913 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, Dr. Peter Bieringer wrote: > I'm very wondering why 3DES is incompatible in IPsec-SA modus, while > working in IKE. What happens if you use manual configurations (e.g. setkey with the native ipsec) ? With this, we can first establish whether on the wire stuff is fundamentally working, before looking at negotiated configurations. - James -- James Morris From pb@bieringer.de Thu Jun 5 07:20:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:20:23 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EKH2x026907 for ; Thu, 5 Jun 2003 07:20:18 -0700 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 9F0881387E; Thu, 5 Jun 2003 16:20:11 +0200 (CEST) X-AV-Checked: Thu Jun 5 16:20:11 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id D05C51387A; Thu, 5 Jun 2003 16:20:10 +0200 (CEST) Date: Thu, 05 Jun 2003 16:20:09 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: (usagi-users 02412) IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 Message-ID: <3525719.1054830009@[10.3.62.6]> In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2914 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Ohoh, sorry for confusions, my racoon here was a little bit buggy... ...be warned, not using RHL's ipsec-tools from rawhide...looks like the racoon isn't compiled in a proper environment :-( it doesn't support DES and causes trouble on 3DES *grmml*). The reported 3DES problem was solved now by using a fresh compiled one. But the AES one still occurs. > FreeS/WAN: > 112 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: initiate > 003 "freeswan-racoon-tunnel" #14: ESP transform ESP_AES passed key_len=32 > > 16 032 "freeswan-racoon-tunnel" #14: STATE_QUICK_I1: internal error Or on 192 bits: 112 "freeswan-racoon-tunnel" #15: STATE_QUICK_I1: initiate 003 "freeswan-racoon-tunnel" #15: ESP transform ESP_AES passed key_len=24 > 16 032 "freeswan-racoon-tunnel" #15: STATE_QUICK_I1: internal error Strange, looks like racoon reports always AES key length 16*8, but in raccoon.conf was "aes 192" or "aes 256" specified. Peter, partially happy now -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From jmorris@intercode.com.au Thu Jun 5 07:26:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:26:16 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:HuI/ir4pemYSIE2wCKzZONjohwNV9+Mv@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EQ62x027256 for ; Thu, 5 Jun 2003 07:26:07 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h55EPxr03865; Fri, 6 Jun 2003 00:25:59 +1000 Date: Fri, 6 Jun 2003 00:25:58 +1000 (EST) From: James Morris To: "Dr. Peter Bieringer" cc: netdev@oss.sgi.com, Subject: Re: (usagi-users 02412) IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 In-Reply-To: <3525719.1054830009@[10.3.62.6]> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2915 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, Dr. Peter Bieringer wrote: > Ohoh, sorry for confusions, my racoon here was a little bit buggy... > > ...be warned, not using RHL's ipsec-tools from rawhide...looks like the > racoon isn't compiled in a proper environment :-( it doesn't support DES > and causes trouble on 3DES *grmml*). Actually, the ABI changed recently, due to renumbering the algorithim ids in pfkeyv2.h. (This will affect setkey as well). - James -- James Morris From lpetande@tml.hut.fi Thu Jun 5 07:28:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:28:29 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55ESG2x027577 for ; Thu, 5 Jun 2003 07:28:17 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h55ERY5b012776; Thu, 5 Jun 2003 17:27:34 +0300 Message-ID: <3EDF552D.3060003@tml.hut.fi> Date: Thu, 05 Jun 2003 17:35:25 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alexey , "David S. Miller" , yoshfuji@linux-ipv6.org, Venkata Jagana , Krishna Kumar , Antti Tuominen , Ville Nuorvala , netdev@oss.sgi.com Subject: RFC: Mechanism for adding MIPv6 extension headers Content-Type: multipart/mixed; boundary="------------080703090805070306090804" X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=8 Fuz1=8 Fuz2=8 X-archive-position: 2916 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------080703090805070306090804 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hello all, Attached is a patch which includes the relevant functionality for adding mipv6 extension headers to data packets. It probably does not compile, as I included only the files which are directly involved in the adding mechanism to minimize the size of the patch. If you want to try out the mechanism I can prepare a full patch. The patch is against bk changeset 1.1325 and is meant as a basis for discussion about the extension header addition mechanism. The code is still work in progress and lacks a proper interface. The mechanism has been tested with tcp and raw sockets and with tcp+ipsec. An overview of the system: User inserts the mipv6 information into the kernel. Based on this information ip6_add_miproute adds a new cached route. This cached route contains mip6_output as output function for adding the extension headers, a decreased pmtu and mipv6 binding information. The route also contains a pointer (u.dst->child) to a new route which contains correct forwarding information for mipv6 intermediate hops and the raw pmtu. Adding of extension headers in mip6_output is done as in esp6_output. The mechanism is fairly close to xfrm, except for storing the mipv6 information only in a cached route. Thus the state for a mipv6 binding is soft. This is a tradeoff between keeping the overhead of mipv6 small and having persistent state. If routes change, the mipv6 state can be easily reinserted into the kernel, since the userspace daemon needs to keep track of it for signaling purposes anyhow. I will not go more into details here, but I am happy to answer any questions about the design. Your comments are much appreciated. Regards, Henrik --------------080703090805070306090804 Content-Type: text/plain; name="mip6-exthdr.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mip6-exthdr.patch" --- net/ipv6/mip6.c 1969-12-31 22:00:00.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/mip6.c 2003-06-05 04:57:00.000000000 -0200 @@ -0,0 +1,341 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define IPPROTO_MOBILITY 62 + + +struct mipv6_be +{ + u8 payload; /* Payload Protocol */ + u8 length; /* MH Length */ + u8 type; /* MH Type */ + u8 reserved; /* Reserved */ + u16 checksum; /* Checksum */ + u8 status; /* Error code */ + u8 reserved_2; + struct in6_addr home_addr; +} __attribute__ ((packed)); + +struct socket *mipv6_mh_socket = NULL; + +static int dstopts_getfrag(const void *data, struct in6_addr *addr, + char *buff, unsigned int offset, unsigned int len) +{ + memcpy(buff, data + offset, len); + return 0; +} +static __inline__ void mip6_xmit_lock(void) +{ + + + local_bh_disable(); + if (unlikely(!spin_trylock(&mipv6_mh_socket->sk->lock.slock))) + BUG(); +} + +static __inline__ void mip6_xmit_unlock(void) +{ + spin_unlock_bh(&mipv6_mh_socket->sk->lock.slock); +} + + +void mip6_send_be(struct in6_addr *daddr, + struct in6_addr *saddr, + struct in6_addr *hao_addr) +{ + struct flowi fl; + struct mipv6_be be; + struct sock *sk = mipv6_mh_socket->sk; + + memset(&fl, 0, sizeof(fl)); + fl.proto = IPPROTO_MOBILITY; + ipv6_addr_copy(&fl.fl6_dst, daddr); + ipv6_addr_copy(&fl.fl6_src, saddr); + fl.fl6_flowlabel = 0; + fl.oif = sk->bound_dev_if; + + memset(&be, 0, sizeof(be)); + be.payload = NEXTHDR_NONE; + be.length = 2; + be.type = 7; + ipv6_addr_copy(&be.home_addr, hao_addr); + be.status = 1; /* Home address option without binding */ + + mip6_xmit_lock(); + ip6_build_xmit(sk, dstopts_getfrag, &be, &fl, sizeof(be), NULL, 255, + MSG_DONTWAIT); + mip6_xmit_unlock(); +} +/* TODO: Move the home address option / BCE check to tcp/udp/raw + * processing so cached route in socket can be used + * to avoid route lookup + */ +int mip6_hao_check(struct sk_buff *skb, u8 nexthdr) +{ + struct inet6_skb_parm *opt = (struct inet6_skb_parm *) skb->cb; + struct in6_addr *coaddr; + struct rt6_info *rt; + /* Home address option in mobility header messages is checked + by userspace mipv6 daemon */ + + if (!opt || !opt->dst_nofrag || nexthdr == IPPROTO_MOBILITY) + return 0; + if (opt && opt->dst_nofrag) { + rt = rt6_lookup(&skb->nh.ipv6h->saddr, &skb->nh.ipv6h->daddr, 0, 0); + if (rt) { + if (rt->binding.flags & MIPV6_F_BCE) { + dst_release(&rt->u.dst); + return 0; + } + else + dst_release(&rt->u.dst); + } + coaddr = (struct in6_addr *)((u8 *)skb->nh.raw + opt->dst_nofrag); + mip6_send_be(coaddr, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); + return -1; + } +} + +/** + * mipv6_append_rt2hdr - Add Type 2 Routing Header + * @rt: buffer for new routing header + * @addr: intermediate hop address + * + * Adds a Routing Header Type 2 in a packet. Stores newly created + * routing header in buffer @rt. Type 2 RT only carries one address, + * so there is no need to process old routing header. @rt must have + * allocated space for 24 bytes. + **/ +void mipv6_append_rt2hdr(struct rt2_hdr *rt, struct in6_addr *addr) +{ + struct rt2_hdr *rt2 = (struct rt2_hdr *)rt; + + memset(rt2, 0, sizeof(*rt2)); + rt2->rt_hdr.type = 2; + rt2->rt_hdr.hdrlen = 2; + rt2->rt_hdr.segments_left = 1; + ipv6_addr_copy(&rt2->addr, addr); +} + +struct mipv6_padn +{ + __u8 type; + __u8 length; + __u8 data[0]; +} __attribute__ ((packed)); + +/* + * Add Pad1 or PadN option to data + */ +int mipv6_add_pad(u8 *data, int n) +{ + struct mipv6_padn *padn; + + if (n <= 0) return 0; + if (n == 1) { + *data = MIPV6_OPT_PAD1; + return 1; + } + padn = (struct mipv6_padn *)data; + padn->type = MIPV6_OPT_PADN; + padn->length = n - 2; + memset(padn->data, 0, n - 2); + return n; +} + +/** + * mipv6_append_home_addr - Add Home Address Option + * @opt: buffer for Home Address Option + * @offset: offset from beginning of @opt + * @addr: address for HAO + * + * Adds a Home Address Option to a packet. Option is stored in + * @offset from beginning of @opt. The option is created but the + * original source address in IPv6 header is left intact. The source + * address will be changed from home address to CoA after the checksum + * has been calculated in getfrag. Padding is done automatically, and + * @opt must have allocated space for both actual option and pad. + * Returns offset from @opt to end of options. + **/ +int mipv6_append_home_addr(u8 *opt, struct in6_addr *addr) +{ + int pad; + struct ipv6_dstopt_homeaddr *ho; + int offset = sizeof(struct ipv6_opt_hdr); + + pad = (6 - offset) & 7; + mipv6_add_pad(opt + offset, pad); + + ho = (struct ipv6_dstopt_homeaddr *)(opt + offset + pad); + ho->type = IPV6_TLV_HOMEADDR; + ho->length = sizeof(*ho) - 2; + ipv6_addr_copy(&ho->addr, addr); + + return offset + pad + sizeof(*ho); +} + + +static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, int *offset_prevhdr) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); + u8 nextnexthdr; + + *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; + + while (offset + 1 < packet_len) { + + switch (*nexthdr) { + + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + *offset_prevhdr = offset; + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + case NEXTHDR_DEST: + nextnexthdr = + ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; + /* XXX We know the option is inner dest opt + with next next header check. */ + if (nextnexthdr != NEXTHDR_HOP && + nextnexthdr != NEXTHDR_ROUTING && + nextnexthdr != NEXTHDR_DEST) { + return offset; + } + *offset_prevhdr = offset; + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + default : + return offset; + } + } + + return offset; +} + + +int mip6_output(struct sk_buff *skb) + +{ + struct ipv6hdr *iph = NULL, *top_iph; + struct dst_entry *dst = skb->dst; + struct ipv6_opt_hdr *prevhdr = NULL; + struct rt6_info *rt = (struct rt6_info *)skb->dst; + u8 nexthdr; + int offset_prevhdr = 0; + int hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &offset_prevhdr); + int len, err = 0; + + if (nexthdr == IPPROTO_MOBILITY) /* No exthdrs for MH */ + goto out; + + /* First, if the skb is not checksummed, complete checksum. */ + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) { + err = -EINVAL; + goto error; + } + iph = kmalloc(hdr_len, GFP_ATOMIC); + + if (!iph) { + err = -ENOMEM; + goto error; + } + + memcpy(iph, skb->nh.raw, hdr_len); + __skb_pull(skb, hdr_len); + + /* TODO: Is this correct ? */ + if ((err = skb_cow(skb, mip6_hdrlen(rt->binding.flags)) != 0)) + goto error; + if (rt->binding.flags & MIPV6_F_BULE) { + struct ipv6_opt_hdr *dstopt; + dstopt = (struct ipv6_opt_hdr *)skb_push(skb, sizeof(struct ipv6_dstopt_homeaddr) + 6); + dstopt->nexthdr = nexthdr; + len = mipv6_append_home_addr((u8 *)dstopt, &iph->saddr); + dstopt->hdrlen = (len >> 3) - 1; + ipv6_addr_copy(&iph->saddr, &rt->binding.lcoa); + skb->h.raw = (unsigned char *)dstopt; + nexthdr = IPPROTO_DSTOPTS; + + } + if (rt->binding.flags & MIPV6_F_BCE) { + struct rt2_hdr *rt2; + rt2 = (struct rt2_hdr *)skb_push(skb, sizeof(struct rt2_hdr)); + skb->h.raw = (unsigned char *)rt2; + mipv6_append_rt2hdr(rt2, &iph->daddr); + ipv6_addr_copy(&iph->daddr, &rt->binding.rcoa); + rt2->rt_hdr.nexthdr = nexthdr; + nexthdr = IPPROTO_ROUTING; + } + + top_iph = (struct ipv6hdr *)skb_push(skb, hdr_len); + memcpy(top_iph, iph, hdr_len); + skb->nh.raw = skb->data; + kfree(iph); + top_iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + if (offset_prevhdr) { + prevhdr = (struct ipv6_opt_hdr *)((int *)top_iph + offset_prevhdr); + prevhdr->nexthdr = nexthdr; + + } else { + top_iph->nexthdr = nexthdr; + } + + out: + if ((skb->dst = dst_pop(dst)) == NULL) { + err = -EHOSTUNREACH; + goto error; + } + + return NET_XMIT_BYPASS; + error: + kfree_skb(skb); + return err; +} + +int mip6_init(void) +{ + mipv6_mh_socket = sock_alloc(); + mipv6_mh_socket->type = SOCK_RAW; + struct sock *sk; + int err; + + if ((err = sock_create(PF_INET6, SOCK_RAW, IPPROTO_MOBILITY, + &mipv6_mh_socket)) < 0) { + printk(KERN_ERR + "Failed to initialize the MIP6 MH control socket (err %d).\n", + err); + sock_release(mipv6_mh_socket); + mipv6_mh_socket = NULL; /* for safety */ + return err; + } + + sk = mipv6_mh_socket->sk; + sk->allocation = GFP_ATOMIC; + sk->sndbuf = SK_WMEM_MAX; + sk->prot->unhash(sk); + return 0; +} + +void mip6_cleanup(void) +{ + if (mipv6_mh_socket) sock_release(mipv6_mh_socket); + mipv6_mh_socket = NULL; /* For safety. */ +} + + +MODULE_LICENSE("GPL"); --- net/ipv6/route.c 2003-06-05 08:48:57.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/route.c 2003-06-05 06:37:59.000000000 -0200 @@ -52,7 +52,7 @@ #include #include #include - +#include #include #ifdef CONFIG_SYSCTL @@ -336,7 +336,7 @@ return err; } -/* No rt6_lock! If COW failed, the function returns dead route entry +/* No rt6_lock! If COW faild, the function returns dead route entry with dst->error set to errno value. */ @@ -363,12 +363,8 @@ rt->u.dst.flags |= DST_HOST; #ifdef CONFIG_IPV6_SUBTREES - if (rt->rt6i_src.plen && saddr) { - ipv6_addr_copy(&rt->rt6i_src.addr, saddr); - rt->rt6i_src.plen = 128; - } + rt->rt6i_src.plen = ort->rt6i_src.plen; #endif - rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway); dst_hold(&rt->u.dst); @@ -885,7 +881,7 @@ struct rt6_info *rt, *nrt; /* Locate old route to this destination. */ - rt = rt6_lookup(dest, NULL, neigh->dev->ifindex, 1); + rt = rt6_lookup(dest, saddr, neigh->dev->ifindex, 1); if (rt == NULL) return; @@ -1052,6 +1048,9 @@ nrt = ip6_rt_copy(rt); if (nrt == NULL) goto out; +#ifdef CONFIG_IPV6_SUBTREES + nrt->rt6i_src.plen = rt->rt6i_src.plen; +#endif ipv6_addr_copy(&nrt->rt6i_dst.addr, daddr); nrt->rt6i_dst.plen = 128; nrt->u.dst.flags |= DST_HOST; @@ -1162,7 +1161,107 @@ } read_unlock_bh(&rt6_lock); } +/* TODO: Move struct definition + * to a header file under include/linux +*/ +struct mipv6_info_user +{ + struct mip6_info bind; + unsigned long expires; + struct in6_addr src; + struct in6_addr dst; +}; + +/* Adds mip6 related info and a stacked dst entry to the new cached route. + */ +static void fill_mip6_rt(struct rt6_info *mip6rt, struct rt6_info *coart, struct mip6_info *bind) +{ + mip6rt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; + mip6rt->u.dst.flags = DST_HOST; + mip6rt->u.dst.header_len = mip6_hdrlen(mip6rt->binding.flags); + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1] - + mip6rt->u.dst.header_len; + mip6rt->u.dst.metrics[RTAX_ADVMSS-1] = max_t(unsigned int, dst_pmtu(&mip6rt->u.dst) - 60, ip6_rt_min_advmss); + if (mip6rt->u.dst.metrics[RTAX_ADVMSS-1] > 65535-20) + mip6rt->u.dst.metrics[RTAX_ADVMSS-1] = 65535; + mip6rt->u.dst.child = dst_clone(&coart->u.dst); /* Is this correct ? */ + memcpy(&mip6rt->binding, bind, sizeof(bind)); + mip6rt->u.dst.output = mip6_output; +} +/* Add mipv6 information to a new cache route entry. + * Mostly copied code from rt6_pmtu_discovery + */ +int ip6_add_miproute(struct mipv6_info_user *mipinfo) +{ + /* First look up the coa route */ + struct rt6_info *rt, *mip6rt, *coart = NULL; + int err = 0; + + if ((rt = (struct rt6_info *)rt6_lookup(&mipinfo->dst, &mipinfo->src, 0, 0)) == NULL) { + return -ENOENT; + } + + /* + * Delete old host route before adding new one. TODO: Could we just modify the existing cache + * route after locking the routing table ? + */ + if (rt->rt6i_flags & RTF_CACHE) { + ip6_del_rt(rt, NULL, NULL); + rt = NULL; + } + + if ((coart = rt6_lookup(&mipinfo->bind.rcoa, &mipinfo->bind.lcoa, 0, 0)) == NULL) { + err = -NOENT; + goto out; + } + /* Network route. + Two cases are possible: + 1. It is connected route. Action: COW + 2. It is gatewayed route or NONEXTHOP route. Action: clone it. + */ + if (!coart->rt6i_nexthop && !(coart->rt6i_flags & RTF_NONEXTHOP)) { + mip6rt = rt6_cow(coart, &mipinfo->dst, &mipinfo->src); + if (!mip6rt->u.dst.error) { + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1]; + dst_set_expires(&mip6rt->u.dst, HZ*mipinfo->expires); + fill_mip6_rt(mip6rt, coart, &mipinfo->bind); + dst_release(&mip6rt->u.dst); + } + } else { + + mip6rt = ip6_rt_copy(coart); + ipv6_addr_copy(&mip6rt->rt6i_dst.addr, &mipinfo->dst); + +#ifdef CONFIG_IPV6_SUBTREES + ipv6_addr_copy(&mip6rt->rt6i_src.addr, &mipinfo->src); + mip6rt->rt6i_src.plen = 128; +#endif + mip6rt->rt6i_dst.plen = 128; + mip6rt->u.dst.flags |= DST_HOST; + mip6rt->rt6i_nexthop = neigh_clone(coart->rt6i_nexthop); + dst_set_expires(&mip6rt->u.dst, HZ*mipinfo->expires); + mip6rt->rt6i_flags |= RTF_DYNAMIC|RTF_CACHE|RTF_EXPIRES; + mip6rt->u.dst.metrics[RTAX_MTU-1] = coart->u.dst.metrics[RTAX_MTU-1]; + fill_mip6_rt(mip6rt, coart, &mipinfo->bind); + rt6_ins(mip6rt, NULL, NULL); + } + out: + if (coart) dst_release(&coart->u.dst); + if (rt) dst_release(&rt->u.dst); + return err; +} + +static int add_mip6_binding(void *arg) +{ + + struct mipv6_info_user mip; + if (copy_from_user(&mip, arg, sizeof(mip))) { + return -EINVAL; + } + + return ip6_add_miproute(&mip); +} int ipv6_route_ioctl(unsigned int cmd, void *arg) { struct in6_rtmsg rtmsg; @@ -1192,9 +1291,18 @@ rtnl_unlock(); return err; + case SIOCADDMIPINFO: + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + rtnl_lock(); + err = add_mip6_binding(arg); + rtnl_unlock(); + return err; }; - return -EINVAL; + + + return -EINVAL; } /* @@ -1786,12 +1894,11 @@ static int rt6_stats_seq_show(struct seq_file *seq, void *v) { - seq_printf(seq, "%04x %04x %04x %04x %04x %04x %04x\n", + seq_printf(seq, "%04x %04x %04x %04x %04x %04x\n", rt6_stats.fib_nodes, rt6_stats.fib_route_nodes, rt6_stats.fib_rt_alloc, rt6_stats.fib_rt_entries, rt6_stats.fib_rt_cache, - atomic_read(&ip6_dst_ops.entries), - rt6_stats.fib_discarded_routes); + atomic_read(&ip6_dst_ops.entries)); return 0; } --- net/ipv6/af_inet6.c 2003-06-05 08:48:57.000000000 -0200 +++ ../mipv6-kernel/net/ipv6/af_inet6.c 2003-06-03 10:11:38.000000000 -0200 @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -310,7 +311,7 @@ } else { if (addr_type != IPV6_ADDR_ANY) { /* ipv4 addr of the socket is invalid. Only the - * unspecified and mapped address have a v4 equivalent. + * unpecified and mapped address have a v4 equivalent. */ v4addr = LOOPBACK4_IPV6; if (!(addr_type & IPV6_ADDR_MULTICAST)) { @@ -475,7 +476,7 @@ case SIOCADDRT: case SIOCDELRT: - + case SIOCADDMIPINFO: return(ipv6_route_ioctl(cmd,(void *)arg)); case SIOCSIFADDR: @@ -780,6 +781,14 @@ err = ndisc_init(&inet6_family_ops); if (err) goto ndisc_fail; +#ifdef CONFIG_IPV6_TUNNEL + err = ip6_tunnel_init(); + if (err) + goto ip6_tunnel_fail; +#endif + err = mip6_init(); + if (err) + goto mip6_fail; err = igmp6_init(&inet6_family_ops); if (err) goto igmp_fail; @@ -816,7 +825,6 @@ /* Init v6 transport protocols. */ udpv6_init(); tcpv6_init(); - return 0; #ifdef CONFIG_PROC_FS @@ -834,6 +842,12 @@ igmp6_cleanup(); #endif igmp_fail: + mip6_cleanup(); +mip6_fail: +#ifdef CONFIG_IPV6_TUNNEL + ip6_tunnel_cleanup(); +ip6_tunnel_fail: +#endif ndisc_cleanup(); ndisc_fail: icmpv6_cleanup(); @@ -869,6 +883,10 @@ ip6_route_cleanup(); ipv6_packet_cleanup(); igmp6_cleanup(); + mip6_cleanup(); +#ifdef CONFIG_IPV6_TUNNEL + ip6_tunnel_cleanup(); +#endif ndisc_cleanup(); icmpv6_cleanup(); #ifdef CONFIG_SYSCTL --- include/net/mipv6.h 1969-12-31 22:00:00.000000000 -0200 +++ ../mipv6-kernel/include/net/mipv6.h 2003-06-05 06:21:37.000000000 -0200 @@ -0,0 +1,53 @@ +/* mipv6.h - Mobile IPv6 kernel support */ + +#ifndef _NET_MIPV6_H +#define _NET_MIPV6_H + +#define MIPV6_F_BULE 0x1 +#define MIPV6_F_BCE 0x2 +#define MIPV6_OPT_PAD1 0x00 +#define MIPV6_OPT_PADN 0x01 +/** + * NIPV6ADDR - macro for IPv6 addresses + * @addr: Network byte order IPv6 address + * + * Macro for printing IPv6 addresses. Used in conjunction with + * printk() or derivatives (such as DEBUG macro). + **/ +#define NIPV6ADDR(addr) \ + ntohs(((u16 *)addr)[0]), \ + ntohs(((u16 *)addr)[1]), \ + ntohs(((u16 *)addr)[2]), \ + ntohs(((u16 *)addr)[3]), \ + ntohs(((u16 *)addr)[4]), \ + ntohs(((u16 *)addr)[5]), \ + ntohs(((u16 *)addr)[6]), \ + ntohs(((u16 *)addr)[7]) + +struct ipv6_dstopt_homeaddr +{ + __u8 type; /* type-code for option */ + __u8 length; /* option length */ + struct in6_addr addr; /* home address */ +} __attribute__ ((packed)); +static inline int mip6_hdrlen(int flags) +{ + int miphdrlen = 0; + + if (flags & MIPV6_F_BULE) + miphdrlen = sizeof(struct ipv6_dstopt_homeaddr) + 6; + if (flags & MIPV6_F_BCE) + miphdrlen += sizeof(struct rt2_hdr); + return miphdrlen; +} +int mip6_output(struct sk_buff *skb); +struct ipv6_txoptions * +mipv6_modify_txoptions(struct sock *sk, + struct ipv6_txoptions *old_opt, struct flowi *fl, + struct dst_entry **dst); + +int mip6_hao_check(struct sk_buff *skb, u8 nexthdr); +int mip6_init(void); +void mip6_cleanup(void); + +#endif /* _NET_MIPV6_H */ --- include/net/ip6_fib.h 2003-06-05 08:48:46.000000000 -0200 +++ ../mipv6-kernel/include/net/ip6_fib.h 2003-06-03 10:11:19.000000000 -0200 @@ -50,6 +50,13 @@ int plen; }; +struct mip6_info +{ + struct in6_addr lcoa; + struct in6_addr rcoa; + u32 flags; +}; + struct rt6_info { union { @@ -71,8 +78,9 @@ struct rt6key rt6i_dst; struct rt6key rt6i_src; - + u8 rt6i_protocol; + struct mip6_info binding; }; struct fib6_walker_t @@ -111,10 +119,9 @@ struct rt6_statistics { __u32 fib_nodes; __u32 fib_route_nodes; - __u32 fib_rt_alloc; /* permanent routes */ + __u32 fib_rt_alloc; /* permanet routes */ __u32 fib_rt_entries; /* rt entries in table */ __u32 fib_rt_cache; /* cache routes */ - __u32 fib_discarded_routes; }; #define RTN_TL_ROOT 0x0001 --------------080703090805070306090804-- From pb@bieringer.de Thu Jun 5 07:57:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 07:57:43 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55EvM2x029532 for ; Thu, 5 Jun 2003 07:57:23 -0700 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id BE48C1387E; Thu, 5 Jun 2003 16:20:40 +0200 (CEST) X-AV-Checked: Thu Jun 5 16:20:40 2003 smtp2.aerasec.de Received: from [10.3.62.6] (pD9E8B60A.dip.t-dialin.net [217.232.182.10]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 2E69F1387A; Thu, 5 Jun 2003 16:20:40 +0200 (CEST) Date: Thu, 05 Jun 2003 16:20:38 +0200 From: "Dr. Peter Bieringer" To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: IPsec 2.5.70-bk9 and Check Point VPN-1 NG FP4 RC Message-ID: <3555172.1054830038@[10.3.62.6]> In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-Mailer: Mulberry/3.0.3 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2917 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev Hi, Here are some results (tunnel mode only tested, auth=SHA1): DES : ok 3DES : ok AES-128: ok AES-192: not supported by CP VPN-1 AES-256: ok CAST* : not supported by used Linux kernel BTW: be warned, not using RHL's ipsec-tools from rawhide...looks like the racoon isn't compiled in a proper environment :-( it doesn't support DES and causes trouble on 3DES *grmml*). Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From shemminger@osdl.org Thu Jun 5 09:55:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 09:55:43 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Gtc2x031390 for ; Thu, 5 Jun 2003 09:55:38 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h55GtDX21057; Thu, 5 Jun 2003 09:55:13 -0700 Date: Thu, 5 Jun 2003 09:55:12 -0700 From: Stephen Hemminger To: James Morris Cc: patmans@us.ibm.com, akpm@digeo.com, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, hch@infradead.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030605095512.022ea3be.shemminger@osdl.org> In-Reply-To: References: <20030604184341.A10256@beaverton.ibm.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2919 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 5 Jun 2003 13:25:58 +1000 (EST) James Morris wrote: > On Wed, 4 Jun 2003, Patrick Mansfield wrote: > > > [root@elm3b79 root]# ifup eth0 > > sender address length == 0 > > This is a bug introduced by a coding style cleanup, fix below. > > > - James > -- > James Morris > > > --- bk.pending/net/core/iovec.c 2003-06-05 11:12:59.000000000 +1000 > +++ bk.w1/net/core/iovec.c 2003-06-05 13:30:06.000000000 +1000 > @@ -47,10 +47,10 @@ int verify_iovec(struct msghdr *m, struc > address); > if (err < 0) > return err; > - m->msg_name = address; > - } else > - m->msg_name = NULL; > - } > + } > + m->msg_name = address; > + } else > + m->msg_name = NULL; > > size = m->msg_iovlen * sizeof(struct iovec); > if (copy_from_user(iov, m->msg_iov, size)) Thanks, this works for me. I will see if it fixes the other gnome mystery as well. From mk@karaba.org Thu Jun 5 09:54:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 09:54:50 -0700 (PDT) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55Gsd2x031239 for ; Thu, 5 Jun 2003 09:54:40 -0700 Received: from [3ffe:501:1057:710:202:b3ff:feb4:25aa] (helo=mokuba.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 19Ny04-0002qd-00; Fri, 06 Jun 2003 01:54:04 +0900 Date: Fri, 06 Jun 2003 01:54:06 +0900 Message-ID: <873cioqxch.wl@karaba.org> From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: Henrik Petander Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: Bug in ipv6 ipsec in handling of packets with extension headers In-Reply-To: <3EDF3EB4.8010105@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> <3EDF3EB4.8010105@tml.hut.fi> MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 2918 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello, At Thu, 05 Jun 2003 15:59:32 +0300, Henrik Petander wrote: > > David S. Miller wrote: > > From: Henrik Petander > > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > > > A possible fix is to change the pointer into an offset from the start of > > the packet and use the offset later to set the nexthdr value in the > > extension header. > > > > Please indicate the version of the sources you are looking > > at when making reports. > > Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: > the problem is with esp6 and not with ah6, which does not use the > get_offset function. I have fixed this in our tree (replaced by ip6_found_nexthdr()). I will send a patch related to these ipsec6 fix collection by this weekend ASAP. Regards, -mk From Andrew.Morton@digeo.com Thu Jun 5 13:38:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 13:38:28 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55KcK2x012769 for ; Thu, 5 Jun 2003 13:38:21 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 20:26:22 -0700 Date: Wed, 4 Jun 2003 20:26:22 -0700 From: Andrew Morton To: Arnaldo Carvalho de Melo Cc: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604202622.1be40092.akpm@digeo.com> In-Reply-To: <20030605023349.GH24515@conectiva.com.br> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> <20030604185652.31958d1f.akpm@digeo.com> <20030605023349.GH24515@conectiva.com.br> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 Jun 2003 03:26:22.0152 (UTC) FILETIME=[3C88F480:01C32B12] X-archive-position: 2920 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev broken "cleanup" net/core/iovec.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff -puN net/core/iovec.c~iovec-fix net/core/iovec.c --- 25/net/core/iovec.c~iovec-fix 2003-06-04 20:23:03.000000000 -0700 +++ 25-akpm/net/core/iovec.c 2003-06-04 20:24:05.000000000 -0700 @@ -47,9 +47,10 @@ int verify_iovec(struct msghdr *m, struc address); if (err < 0) return err; - m->msg_name = address; - } else - m->msg_name = NULL; + } + m->msg_name = address; + } else { + m->msg_name = NULL; } size = m->msg_iovlen * sizeof(struct iovec); _ From Andrew.Morton@digeo.com Thu Jun 5 14:00:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 14:00:30 -0700 (PDT) Received: from pao-ex01.pao.digeo.com (pao-ex01.pao.digeo.com [12.47.58.20]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h55L0M2x013214 for ; Thu, 5 Jun 2003 14:00:22 -0700 Received: from mnm ([172.17.144.18]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 4 Jun 2003 18:56:52 -0700 Date: Wed, 4 Jun 2003 18:56:52 -0700 From: Andrew Morton To: shemminger@osdl.org, jgarzik@pobox.com, davem@redhat.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 2.5.70-bk+ broken networking Message-Id: <20030604185652.31958d1f.akpm@digeo.com> In-Reply-To: <3EDE7FEB.2C7FAEC7@digeo.com> References: <20030604161437.2b4d3a79.shemminger@osdl.org> <3EDE7FEB.2C7FAEC7@digeo.com> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 Jun 2003 01:56:52.0736 (UTC) FILETIME=[BC1D1800:01C32B05] X-archive-position: 2921 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@digeo.com Precedence: bulk X-list: netdev Andrew Morton wrote: > > Stephen Hemminger wrote: > > > > Test machine running 2.5.70-bk latest can't boot because eth2 won't > > come up. The same machine and configuration successfully brings up > > all the devices and runs on 2.5.70. > > kjournald is stuck waiting for IO to complete against some buffer > during transaction commit. > > I'd be suspecting block layer or device drivers. What device driver > is handling your /var/log? I take that back. Your sysrq-T woke up syslogd which did a synchronous write which poked kjournald. You happened to catch it in mid-commit. So that's all normal and sane. Something is up with netdevice initialisation. My eth0 (e100) is in a strange half-there state and won't come up. Reverting the post-2.5.70 e100 changes does not help. It's something which went into the tree today I think. From davem@redhat.com Thu Jun 5 22:01:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:01:43 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5651a2x026593 for ; Thu, 5 Jun 2003 22:01:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA03527; Thu, 5 Jun 2003 21:59:08 -0700 Date: Thu, 05 Jun 2003 21:59:07 -0700 (PDT) Message-Id: <20030605.215907.71090944.davem@redhat.com> To: pb@bieringer.de Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods From: "David S. Miller" In-Reply-To: <35410000.1054818456@klopffest.muc.aerasec.de> References: <35410000.1054818456@klopffest.muc.aerasec.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2923 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Dr. Peter Bieringer" Date: Thu, 05 Jun 2003 15:07:36 +0200 because I got no success, I've tried different encryption methods than 3DES. And *suddenly* it began to work. Sounds like an out-of-date include/linux/pfkeyv2.h file used during tool building. From garzik@gtf.org Thu Jun 5 22:40:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:40:49 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565ed2x027263 for ; Thu, 5 Jun 2003 22:40:40 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 5F3396657; Fri, 6 Jun 2003 01:40:38 -0400 (EDT) Date: Fri, 6 Jun 2003 01:40:38 -0400 From: Jeff Garzik To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [PATCHES] 2.4.x net driver updates Message-ID: <20030606054038.GA3479@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2924 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev BK users may do bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.4 Others may obtain the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-rc8-netdrvr1.patch.bz2 This will update the following files: drivers/net/bonding.c | 3434 ------------------------- Documentation/Configure.help | 9 Documentation/networking/bonding.txt | 537 ++- Documentation/networking/ifenslave.c | 496 ++- drivers/net/8139cp.c | 9 drivers/net/8139too.c | 6 drivers/net/Config.in | 3 drivers/net/Makefile | 8 drivers/net/amd8111e.c | 1063 ++++--- drivers/net/amd8111e.h | 968 +++---- drivers/net/arcnet/arcnet.c | 2 drivers/net/arcnet/rfc1201.c | 6 drivers/net/bonding.c | 266 + drivers/net/bonding/Makefile | 18 drivers/net/bonding/bond_3ad.c | 2667 ++++++++++++++++++- drivers/net/bonding/bond_3ad.h | 342 ++ drivers/net/bonding/bond_alb.c | 1585 +++++++++++ drivers/net/bonding/bond_alb.h | 129 drivers/net/bonding/bond_main.c | 4795 ++++++++++++++++++++++++++++++++--- drivers/net/bonding/bonding.h | 209 + drivers/net/dl2k.h | 1 drivers/net/e1000/e1000.h | 3 drivers/net/e1000/e1000_main.c | 167 + drivers/net/eepro.c | 2 drivers/net/ns83820.c | 2 drivers/net/pci-skeleton.c | 4 drivers/net/pcnet32.c | 7 drivers/net/r8169.c | 52 drivers/net/sk98lin/skge.c | 2 drivers/net/sundance.c | 144 - drivers/net/tg3.c | 2 drivers/net/tlan.c | 258 + drivers/net/tlan.h | 7 drivers/net/tokenring/olympic.c | 3 drivers/net/tulip/tulip_core.c | 7 drivers/net/typhoon.c | 4 drivers/net/via-rhine.c | 2 drivers/net/wireless/airo.c | 2 include/linux/ethtool.h | 27 include/linux/if_arcnet.h | 4 include/linux/if_bonding.h | 101 include/linux/if_vlan.h | 1 include/linux/skbuff.h | 4 include/net/if_inet6.h | 5 include/net/irda/irlan_common.h | 2 net/core/dev.c | 4 net/core/skbuff.c | 3 net/ipv6/addrconf.c | 13 net/ipv6/ndisc.c | 3 net/irda/irlan/irlan_eth.c | 6 50 files changed, 11856 insertions(+), 5538 deletions(-) through these ChangeSets: (03/06/06 1.1205) [PATCH] Bonding 2.4 update patch 6 Fix to the ifenslave -c fix, fix to version control (plus change log update). I've got an additional fix for version control that I'll send you on Monday. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1204) [PATCH] Bonding 2.4 update patch 5 Fix to prevent routes on the bonding device from being lost during enslavement processing. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1203) [PATCH] Bonding 2.4 update patch 4 A fix for ifenslave -c. Later patches have fixes for this fix. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/ifenslave.c (03/06/06 1.1202) [PATCH] Bonding 2.4 update patch 3 A patch with some miscellaneous little stuff (comments, mode names, fix a printk). Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1201) [PATCH] Bonding 2.4 update patch 2 Small patch to fix endless failover problem in the ARP monitor. Index: linux-2.4.21-rc6-netdrvr1/drivers/net/bonding/bond_main.c (03/06/06 1.1200) [PATCH] Bonding 2.4 update patch 1 Documentation. Index: linux-2.4.21-rc6-netdrvr1/Documentation/networking/bonding.txt (03/06/06 1.1199) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1198) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/05 1.1197) [netdrvr amd8111e] link against mii lib (03/06/04 1.1196) [netdrvr] gcc 3.3 cleanups Mostly marking 64-bit constants as ULL. (03/05/29 1.1185.1.52) [netdrvr amd8111e] remove out-of-tree feature that snuck in (03/05/29 1.1185.1.51) [netdrvr amd8111e] interrupt coalescing, libmii, bug fixes * Dynamic interrupt coalescing * mii lib support * dynamic IPG support (disabled by default) * jumbo frame fix * vlan fix * rx irq coalescing fix (03/05/29 1.1185.1.50) [netdrvr tlan] fix 64-bit issues (03/05/29 1.1185.1.49) [netdrvr r8169] sync with 2.5 (backport whitespace cleanups) (03/05/29 1.1185.1.48) [netdrvr r8169] use alloc_etherdev (fix race), pci_disable_device (03/05/29 1.1185.1.47) [netdrvr olympic] fix build with gcc 3.3 (03/05/29 1.1185.6.3) [netdrvr 8139too] add comment, whitespace cleanup (03/05/28 1.1185.6.2) [netdrvr] s/init_etherdev/alloc_etherdev/ in code comments, in 8139too and pci-skeleton drivers. (03/05/28 1.1185.6.1) [netdrvr tlan] backport fixes and cleanups from 2.5 * alloc_etherdev (fixes race) * PCI DMA API * C99 initializers * speling fixes * use pci_{request,release}_regions for PCI devices * propagate error returns back from pci_xxx functions * call pci_set_dma_mask * use keventd for adapter error reset (2.5 uses workqueue) (03/05/27 1.1185.1.45) [netdrvr pcnet32] bug fixes I would like to see a couple of the pcnet32 changes that I think we can agree on be put into the trees so a couple of the potential defects can be avoided. The following patch contains just these pieces. The only controversial one is an arbitrary change in the number of iterations in a while loop spinning on hardware state. No matter how this is done, I am not especially fond of this bit of code as it has no reasonable error recovery path -- however, as a half-way, incremental solution, increasing the polling time should help as the 100 value was certainly found to be insufficient. 1000 may not be sufficient either, but it is certainly no worse. Both of the other changes were hit in testing (and I belive the wmb() at a customer even), so it would help reduce some debug if these go in. Any feedback is appreciated - thanks. (03/05/27 1.1185.1.44) [netdrvr eepro] update MODULE_AUTHOR per old-author request (03/05/27 1.1185.1.43) [netdrvr sundance] fix another flow control bug (03/05/27 1.1185.1.42) [netdrvr sundance] fix flow control bug (03/05/27 1.1185.1.41) [netdrvr bonding] fix ABI version control problem This fix makes bonding not commit to a specific ABI version if the ioctl command is not supported by bonding. (It also removes the '\n' in the continuous printk reporting the link down event in bond_mii_monitor - it got in there by mistake in our previous patch set and caused log messages to appear funny in some situations). (03/05/27 1.1185.1.40) [netdrvr bonding] fix long failover in 802.3ad mode This patch fixes the bug reported by Jay on April 3rd regarding long failover time when releasing the last slave in the active aggregator. The fix, as suggested by Jay, is to follow the spec recommendation and send a LACPDU to the partner saying this port is no longer aggregatable and therefore trigger an immediate re-selection of a new aggregator instead of waiting the entire expiration timeout. (03/05/25 1.1185.1.39) IPv6 over ARCnet (RFC2497) support, IPv6 part. (03/05/25 1.1185.1.38) IPv6 over ARCnet (RFC2497) support, driver part (03/05/25 1.1185.1.37) [irda] module refcounts for irlan (03/05/23 1.1185.3.7) [bonding] small cleanups (03/05/23 1.1185.3.6) [bonding] add rcv load balancing mode This patch adds a new mode that enables receive load balancing for IPv4 traffic on top of the transmit load balancing mode. This capability is achieved by intercepting and manipulating the ARP negotiation to teach clients several MAC addresses for the bond and thus distribute incoming traffic among all slaves with the highest link speed. In order to function properly, slaves are required to be able to have their MAC address set even while the interface is up since once the primary slave looses its link, the new primary slave (and only it) must be able to take over and receive the incoming traffic instead. If a non-primary slave looses its link, ARP packets will be sent to all clients communicating through it in order to teach them a replacement MAC address, and the primary slave will be put in promiscuous mode for 10 seconds for fault tolerance reasons. This patch is against bonding-20030415, but must come only after the locking scheme changing patch since it uses dev_set_promiscuity() that would otherwise cause a system hang. (03/05/23 1.1185.3.5) [bonding] support xmit load balancing mode (03/05/23 1.1185.3.4) [bonding] much improved locking This patch replaces the use of lock_irqsave/unlock_irqrestore in bonding with lock/unlock or lock_bh/unlock_bh as appropriate according to context. This change is based on a previous discussion regarding the fact that holding a lock_irqsave doesn't prevent softirqs from running which can cause deadlocks in certain situations. This new locking scheme has already undergone massive testing cycle by our QA group and we feel it is ready for release (some new modes and enhancements will not work properly without it). (03/05/23 1.1185.3.3) [bonding] better 802.3ad mode control, some cleanup This patch adds the lacp_rate module param to enable better control over the IEEE 802.3ad mode. This param controls the rate at which the partner system is asked to send LACPDUs to bonding. Two options exist: - slow (or 0) - LACPDUs are 30 seconds apart - fast (or 1) - LACPDUs are 1 second apart The default is slow (like most switches around). There are also some code beautifications (mainly converting comments to C style in code segments we added in the past). (03/05/23 1.1185.3.2) [bonding] ABI versioning This patch adds user-land to kernel ABI version control in bonding to restore backward compatibility between different versions of ifenslave and the bonding module. It uses ethtool's GDRVINFO ioctl to pass the ABI version number between ifenslave and the bonding module in both directions so both the driver and the application can tell which partner they're working against and take the appropriate measures when enslaving/releasing an interface. The bonding module remembers the ABI version received from the application, and from that moment on will deny enslave and release commands from an application using a different ABI version, which means that if you want to switch to an ifenslave with a different ABI version (or with non at all), you'll first have to re-load the bonding module. This patch also changes the driver/application versioning scheme to contain 3 fields X.Y.Z with the follows meaning: X - Major version - big behavior changes Y - Minor version - addition of features Z - Extra version - minor changes and bug fixes There are also three minor bug fixes: 1. Prevent enslaving an interface that is already a slave. 2. Prevent enslaving if the bond is down. 3. In bond_release_all, save old value of current_slave before assigning NULL to it to enable using it's original value later on. This patch is against bonding-20030415. (03/04/27 1.1137.1.6) [netdrvr e1000] add TSO support -- disabled * Copy TSO support for 2.5 e1000. Wrapped with NETIF_F_TSO, so not currently enabled in 2.4. Done to keep 2.4 and 2.5 drivers in-sync as much as possible. (03/04/27 1.1137.1.5) [netdrvr e1000] add support for NAPI * Copy NAPI support from 2.5 e1000 driver * Add CONFIG_E1000_NAPI option (03/04/27 1.1137.1.4) [netdrvr tulip] support DM910x chip from ALi (03/04/27 1.1137.1.3) Remove duplicate CONFIG_TULIP_MWI entry in Configure.help Noticed by Geert Uytterhoeven (03/04/27 1.1137.1.2) [netdrvr 8139cp] enable MWI via pci_set_mwi, rather than manually (03/04/26 1.1131.2.6) [netdrvr typhoon] s/#if/#ifdef/ for a CONFIG_ var (03/04/25 1.1131.2.5) [netdrvr sundance] small cleanups from 2.5 - s/long flag/unsigned long flag/ - C99 initializers (03/04/25 1.1131.2.4) [netdrvr sundance] bug fixes, VLAN support - Fix tx bugs in big-endian machines - Remove unused max_interrupt_work module parameter, the new NAPI-like rx scheme doesn't need it. - Remove redundancy get_stats() in intr_handler(), those I/O access could affect performance in ARM-based system - Add Linux software VLAN support - Fix bug of custom mac address (StationAddr register only accept word write) (03/04/25 1.1131.2.3) [netdrvr via-rhine] fix promisc mode I found a via-rhine bug, it can't receive BPDU (mac: 0180c2000000) in promiscuous mode. Fill all "1" in hash table to fix this problem in promiscuous mode. (RCR remain 0x1c, write it as 0x1f don't work) (03/04/25 1.1131.2.2) [wireless airo] fix end-of-array test FYI statsLabels[] is an array of char*, so the fix below is pretty obvious. (03/04/25 1.1131.2.1) [PATCH] fix .text.exit error in drivers/net/r8169.c In drivers/net/r8169.c the function rtl8169_remove_one is __devexit but the pointer to it didn't use __devexit_p resulting in a.text.exit compile error when !CONFIG_HOTPLUG. The fix is simple: (03/04/17 1.1101.8.7) [bonding] add support for IEEE 802.3ad Dynamic link aggregation Contributed by Shmulik Hen @ Intel, merge by Jay Vosburgh @ IBM (03/04/17 1.1101.8.6) [bonding] move private decls into new drv/net/bonding/bonding.h file (03/04/17 1.1101.8.5) [bonding] move driver into new drivers/net/bonding directory (03/04/17 1.1101.8.4) [bonding] Moved setting slave mac addr, and open, from app to the driver This patch enables support of modes that need to use the unique mac address of each slave. It moves setting the slave's mac address and opening it from the application to the driver. This breaks backward compatibility between the new driver and older applications ! It also blocks possibility of enslaving before the master is up (to prevent putting the system in an unstable state), and removes the code that unconditionally restores all base driver's flags (flags are automatically restored once all undo stages are done in proper order). Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.3) [bonding] add support for getting slave's speed and duplex via ethtool Contributed by Shmulik Hen @ Intel (03/04/17 1.1101.8.2) [bonding] fix comment to prevent future merge difficulties Contributed by Jay Vosburgh @ IBM (03/04/17 1.1101.8.1) [net] store physical device a packet arrives in on (Needed for bonding) Contributed by Jay Vosburgh @ IBM, Shmulik Hen @ Intel, and others. From nakam@linux-ipv6.org Thu Jun 5 22:42:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:42:28 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565gI2x027582 for ; Thu, 5 Jun 2003 22:42:19 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19O9w9-0000SR-00; Fri, 06 Jun 2003 14:38:49 +0900 From: Masahide NAKAMURA To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Cc: usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030606143844.0604c306.nakam@linux-ipv6.org> In-Reply-To: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Date: Fri, 06 Jun 2003 14:38:49 +0900 X-archive-position: 2925 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello, I'm Nakamura, a member of USAGI. On Thu, 05 Jun 2003 19:12:24 +0900 (JST) YOSHIFUJI Hideaki / 吉藤英明 wrote: > Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. > The reason why I won't hurry is because I still believe it is not > required for MIP6. Nakamura, one of our member, will describe the details. > It takes precedence over "limited" policy(?) routing to introcuce generic > policy routing. As you know, we've been planning the MIPv6 design to use XFRM. If we use MIPv6, we need some fix and extension to XFRM and it results to make XFRM more generic. On output processing, our design is like below: Through netlink/xfrm from userland, we have to set xfrm_policy and xfrm_state with something like ip command(or extended ip command). The xfrm_policy has two templates now: - template handling Routing Header type 2(RT2) ...(a) - template handling Destination Options Header(DST) ...(b) And we have to add one address field(c) in xfrm_state for MIPv6. (Currently it is named mip6_state.addr.) Template-(a) finds a xfrm_state that points function like mip6_rthdr_output() to insert RT2 and replace dst address of IP header with specified address-(c). Also, template-(b) finds a xfrm_state that points function like mip6_destopt_output() to insert DST and replace src address of IP header with specified address-(c). Of course, both mip6_rthdr_output() and mip6_destopt_output() are callled as dst_output in XFRM world internally. For example, if two state is found, the packet will be append both RT2 and DST. We have tested that on our tree. In case of tunneling, We think we also make it to add a template and prepare a function for dst_output on XFRM world like above. (Maybe xfrm6_tunnel needs some fix to use MIPv6, as Henrik said.) Could you give us comments? BTW, I have read Henrik's patch(mip6-exthdr.patch) sent to netdev in other thread and I feel that is simple code to implement MIPv6 and is clean one. Thanks, Henrik. As he said, it is similar one to use XFRM like ours. We know that the big difference between yours and ours is to modify either routing table or XFRM. Anyway, we'll show you our patch later. Regards, -- Masahide NAKAMURA From garzik@gtf.org Thu Jun 5 22:42:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:42:37 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565gV2x027616 for ; Thu, 5 Jun 2003 22:42:32 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 7E1376611; Fri, 6 Jun 2003 01:42:31 -0400 (EDT) Date: Fri, 6 Jun 2003 01:42:31 -0400 From: Jeff Garzik To: torvalds@transmeta.com Cc: davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [BK PATCHES] net driver merges Message-ID: <20030606054231.GA3545@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 2926 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Linus, please do a bk pull bk://kernel.bkbits.net/jgarzik/net-drivers-2.5 Others may obtain the patch from ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.70-bk10-netdrvr1.patch.bz2 This will update the following files: MAINTAINERS | 7 drivers/net/8139cp.c | 2 drivers/net/Makefile | 2 drivers/net/arcnet/arc-rawmode.c | 10 drivers/net/arcnet/arcnet.c | 10 drivers/net/arcnet/rfc1051.c | 10 drivers/net/arcnet/rfc1201.c | 12 drivers/net/dl2k.h | 1 drivers/net/ns83820.c | 2 drivers/net/pcmcia/fmvj18x_cs.c | 12 drivers/net/sb1000.c | 22 drivers/net/sk98lin/skge.c | 2 drivers/net/tg3.c | 2 drivers/net/wireless/Kconfig | 15 drivers/net/wireless/Makefile | 1 drivers/net/wireless/atmel.c | 3943 +++++++++++++++++++++++++++++++++++++++ drivers/net/wireless/atmel_cs.c | 768 +++++++ include/linux/ethtool.h | 27 18 files changed, 4791 insertions(+), 57 deletions(-) through these ChangeSets: (03/06/06 1.1313) [netdrvr] C99 initializers for arcnet (03/06/06 1.1312) [PATCH] remove ethtool privileged references dev_ioctl already checks capable(CAP_NET_ADMIN) for SOICETHTOOL, so privileged reference are not necessary. (03/06/06 1.1311) [PATCH] 10GbE ethtool support Add 10GbE support for ethtool. (03/06/06 1.1310) [PATCH] cli/sti cleanup for fmvj18x This one should be safe as we're protected by the xmit_lock in all instances (03/06/06 1.1309) [netdrvr] add MAINTAINERS entry for atmel wireless driver (03/06/06 1.1308) [netdrvr] add atmel[_cs], new wireless driver Attached is a driver for Atmel at76c50x WiFi cards. This code started out as a GPL release from Atmel of pretty horrible quality and I've extensively re-worked it with the aim of making it acceptable in the kernel. Please could you take a look and either pass it into the patch stream or let me know what's wrong with it? The code has been tested on at least three different brand cards by different people. Jean Tourrilhes took a look at an earlier version an was positive. He's put incorporating this into 2.6 as a priority 1. The patch works fine on 2.5.70. The firmware issue has been addressed now. The only firmware in the driver is a small stub which reads the MAC address from NVRAM on the card. The source for that is included so there are no GPL issues. The main firmware is loaded from userspace using Manuel Estrada Sainz's sysfs firmware class. I know that the patch for that has been accepted but it hasn't turned up anywhere I can see yet. The driver compiles fine even without the firmware class. I've made a package of the firmware images which is available from my website. The remaining issues with the driver are migrating PCMCIA to the new driver model and PCI support. I'm happy to produce followup patches as the PCMCIA system gets evolved to the new driver model: the timing on that is controlled by others. This set of chips includes a PCI version and the driver should support that, but AFAIK there is no PCI hardware available anywhere. If Atmel can provide me with some it will be simple to add PCI support. The driver uses the CRC32 library module and the firmware loader. I've not put in dependencies on those, but when the lastest set of patches go into Kconfig I'll set it up so that selecting the Atmel driver selects CRC32 and FW_LOADER too. (03/06/06 1.1307) [PATCH] sb1000 driver bugs Inspecting the sb1000 driver showed some interesting bugs: - net device pointer is used before the device is allocated; gcc does catch this. - unregister is called even though device not registered successfully - net device is not freed on remove. Compiles but don't have hardware to test. Don't know how it ever worked though. (03/06/05 1.1306) [netdrvr amd8111e] link against mii lib (03/06/05 1.1305) [netdrvr skge] add ULL modifier to 64-bit constant (03/06/05 1.1304) [netdrvr] gcc 3.3 cleanups Mostly adding 'ULL' modifier to 64-bit constants. From kazunori@miyazawa.org Thu Jun 5 22:48:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:48:38 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565mS2x028208 for ; Thu, 5 Jun 2003 22:48:29 -0700 Received: from monza.miyazawa.org (softdnserr [3ffe:501:41c:3:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Fri, 06 Jun 2003 14:46:20 +0900 Date: Fri, 6 Jun 2003 14:49:25 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2927 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, I observed invalid refcnt incrementation when using IPsec in IPv6. I configured IPsec and did ping6 then refcnt of dst was incremented two by two. I observed it with using "route -A inet6". I also check it with using printk. This patch fixes dst reference count management. In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. In output functions, dst is changed by xfrm_lookup if there is any matching policy. Therefore original dst which is held before calling xfrm_lookup will be never released. When xfrm_lookup scceeds and dst is changed, original dst should be release. Patch-Name: fix dst refcnt with xfrm Patch-Id: FIX_2_5_70+CS1_1259_DST_REFCNT_WITH_XFRM Patch-Author: Kazunori Miyazawa / USAGI Project Credit: Kazunori Miyazawa / USAGI Project Index: linux25/include/net/dst.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/dst.h,v retrieving revision 1.1.1.9 retrieving revision 1.1.1.9.22.1 diff -u -r1.1.1.9 -r1.1.1.9.22.1 --- linux25/include/net/dst.h 17 Apr 2003 18:15:56 -0000 1.1.1.9 +++ linux25/include/net/dst.h 6 Jun 2003 05:02:36 -0000 1.1.1.9.22.1 @@ -160,10 +160,7 @@ static inline struct dst_entry *dst_pop(struct dst_entry *dst) { - struct dst_entry *child = dst_clone(dst->child); - - dst_release(dst); - return child; + return dst->child; } extern void * dst_alloc(struct dst_ops * ops); Index: linux25/net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.16 retrieving revision 1.1.1.16.14.1 diff -u -r1.1.1.16 -r1.1.1.16.14.1 --- linux25/net/ipv6/ip6_output.c 26 May 2003 08:04:10 -0000 1.1.1.16 +++ linux25/net/ipv6/ip6_output.c 6 Jun 2003 05:00:58 -0000 1.1.1.16.14.1 @@ -211,6 +211,8 @@ if ((err = xfrm_lookup(&skb->dst, fl, sk, 0)) < 0) { return err; } + if (dst != skb->dst) + dst_release(dst); if (opt) { int head_room; @@ -595,10 +597,13 @@ pktlength = length; if (dst) { + struct dst_entry *dst0 = dst; if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0) { dst_release(dst); return -ENETUNREACH; } + if (dst0 != dst) + dst_release(dst0); } if (hlimit < 0) { @@ -1194,10 +1199,13 @@ } if (*dst) { + struct dst_entry *dst0 = *dst; if ((err = xfrm_lookup(dst, fl, sk, 0)) < 0) { dst_release(*dst); return -ENETUNREACH; } + if (*dst != dst0) + dst_release(dst0); } return 0; From davem@redhat.com Thu Jun 5 22:58:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 22:58:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h565wt2x028553 for ; Thu, 5 Jun 2003 22:58:55 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA03766; Thu, 5 Jun 2003 22:55:47 -0700 Date: Thu, 05 Jun 2003 22:55:47 -0700 (PDT) Message-Id: <20030605.225547.28789693.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm From: "David S. Miller" In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2928 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Fri, 6 Jun 2003 14:49:25 +0900 In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. Ok, so the idea is to hold onto top-level parent DST entry the entire time, and this prevents the DST and all it's children from being destroyed. Is this correct? Let me study this a little bit, I want to make sure it is correct. From pb@bieringer.de Thu Jun 5 23:25:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 23:26:03 -0700 (PDT) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h566Pv2x029425 for ; Thu, 5 Jun 2003 23:25:58 -0700 Received: by smtp2.aerasec.de (Postfix, from userid 995) id BA7111387E; Fri, 6 Jun 2003 08:25:51 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id D88461387F; Fri, 6 Jun 2003 08:25:50 +0200 (CEST) X-AV-Checked: Fri Jun 6 08:25:50 2003 smtp2.aerasec.de Received: from p50805418.dip.t-dialin.net (p50805418.dip.t-dialin.net [80.128.84.24]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 919BB1387E; Fri, 6 Jun 2003 08:25:49 +0200 (CEST) Date: Fri, 06 Jun 2003 08:25:47 +0200 From: Peter Bieringer To: netdev@oss.sgi.com Cc: usagi-users@linux-ipv6.org Subject: Re: IPsec 2.5.70-bk9 and FreeS/WAN 1.99 with algopatches 0.8.1rc2 (in)compatible encryption methods Message-ID: <122560000.1054880747@gate.muc.bieringer.de> In-Reply-To: <20030605.215907.71090944.davem@redhat.com> References: <35410000.1054818456@klopffest.muc.aerasec.de> <20030605.215907.71090944.davem@redhat.com> X-Mailer: Mulberry/3.0.3 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2929 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, June 05, 2003 09:59:07 PM -0700 "David S. Miller" wrote: > From: "Dr. Peter Bieringer" > Date: Thu, 05 Jun 2003 15:07:36 +0200 > > because I got no success, I've tried different encryption methods than > 3DES. And *suddenly* it began to work. > > Sounds like an out-of-date include/linux/pfkeyv2.h file > used during tool building. Yes, it looks like. BTW: is there something like a "version information" which is used in that way that user space tools can detect and report such changes at runtime? Would be perhaps helpful if racoon reports something like "incompatible" in this case. Very much better than such strange problems... Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From kazunori@miyazawa.org Thu Jun 5 23:36:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 05 Jun 2003 23:36:32 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h566aP2x029878 for ; Thu, 5 Jun 2003 23:36:25 -0700 Received: from monza.miyazawa.org (softdnserr [3ffe:501:41c:3:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Fri, 06 Jun 2003 15:34:18 +0900 Date: Fri, 6 Jun 2003 15:37:18 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030606153718.4923bbf9.kazunori@miyazawa.org> In-Reply-To: <20030605.225547.28789693.davem@redhat.com> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> <20030605.225547.28789693.davem@redhat.com> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2930 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Thu, 05 Jun 2003 22:55:47 -0700 (PDT) "David S. Miller" wrote: > From: Kazunori Miyazawa > Date: Fri, 6 Jun 2003 14:49:25 +0900 > > In dst_pop refernce cound of dsts except for last are incremented in > dst_clone and decremented in next call dst_pop but last dst refernce > count will be never decremented. > All dst are held by xfrm_policy and there is no need to touch the > refernce count here. > > Ok, so the idea is to hold onto top-level parent DST entry the entire > time, and this prevents the DST and all it's children from being > destroyed. Is this correct? > Yes. Additionally DST is incremented in the process but never decremented correctly. Let me explain it. It must be "Don't try to teach your grandmother to suck eggs" :-) "O" is original dst structure and its refcnt 1 in routing table. "C" is the child "DEST" is some paramter in the stack. (X) after "O" or "C" represents reference count of it. At first in the result of routing lookup DEST holds "O" with calling dst_hold/dst_clone. DEST=>O(2) In xfrm_lookup and related functions the child is created and connect to "O". Those referenct count are incremented for xfrm_policy holding them. Then the stack builds up stackable destination like this DEST=>C(1) |=>O(3) After this the stack regards "C" as the original destination. I assume the process is datagram. The stack call dst_clone before passing DST to skb->dst. skb->dst=DEST=>C(2) |=>O(3) In dst_pop it increments O with dst_clone and release C with dst_release skb->dst => C(2) |=>O(3) call dst_pop.... skb->dst =>O(4) DST=>C(1) |=>O(4) The stack done the process and it release DST with dst_release. "O"'s reference count is decremented in kfree_skb. skb->dst=DEST=>C(0) |=>O(3) I hope this helps you. I don't think I understand whole dst life cycle. Please teach me if I misunderstand. BTW, why the stack set "0" to dst refernce count at the initialization. IMHO it should be "1". Thank you, --Kazunori Miyazawa (Yokogawa Electric Corporation) From davem@redhat.com Fri Jun 6 00:30:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 00:31:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h567UP2x030603 for ; Fri, 6 Jun 2003 00:30:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA04066; Fri, 6 Jun 2003 00:27:20 -0700 Date: Fri, 06 Jun 2003 00:27:19 -0700 (PDT) Message-Id: <20030606.002719.35016156.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm From: "David S. Miller" In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2931 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Fri, 6 Jun 2003 14:49:25 +0900 In dst_pop refernce cound of dsts except for last are incremented in dst_clone and decremented in next call dst_pop but last dst refernce count will be never decremented. All dst are held by xfrm_policy and there is no need to touch the refernce count here. Ok, there is problem with this logic. Final dst is set to skb->dst, and when SKB is freed then we do dst_release(skb->dst). Therefore it _IS_ decremented. (see net/core/skbuff.c:__kfree_skb(), it is where this final DST reference is decremented). Something is going wrong in ipv6 code if this is not happening. If you modify skb->dst, it is your job to maintain reference properly. Look at how ipv4 works, we do all the work in the route lookup and furthermore we never pass &skb->dst into these lookups. What ipv6 output does looks really really strange. It is silly to do flow lookups in places like ip6_xmit(). And this is where all the refcount bugs are really coming from. Like ipv4, flow lookups should be occuring at end of ip6_route_output() processing. As far as I can tell, ip6_xmit() makes calculations based upon "dst" and this is wrong. It updates only skb->dst, but this is not what that function uses to make decisions. 'dst' is old copy :( From vnuorval@tcs.hut.fi Fri Jun 6 01:57:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 01:57:24 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h568vD2x000352 for ; Fri, 6 Jun 2003 01:57:14 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Fri, 6 Jun 2003 11:48:38 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h568mcjH002713; Fri, 6 Jun 2003 11:48:38 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h568maNb002709; Fri, 6 Jun 2003 11:48:36 +0300 Date: Fri, 6 Jun 2003 11:48:36 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2932 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Thu, 5 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > In article <20030531.000319.114704530.yoshfuji@linux-ipv6.org> (at Sat, 31 May 2003 00:03:19 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 says: > > > In article (at Fri, 30 May 2003 17:34:40 +0300 (EEST)), Ville Nuorvala says: > > > > > here is a patch that fixes CONFIG_IPV6_SUBTREES and allows overriding > > > normal routes with source address specific ones. This is for example > > > needed in MIPv6 for handling the traffic to and from a mobile node's home > > > address correctly. > > > > Let us test the patch. It seemed buggy when USAGI tested before. > > I've re-tested your latest CONFIG_IPV6_SUBTREE patch. > The results of the restesting seems fine. Great! :) > However, I won't accept your patch as-is for now. > > The patch consists of several parts: > > 1. fixing bugs in IPv6 code > 2. fixing bugs in CONFIG_IPV6_SUBTREE code > 3. changing majority of keys of routing table. > > There's no problems with 1 and 2. > However, We need to discuss on 3. I have of course no objections against 1. :) However 2 and 3 are in my view quite interrelated. > As I said in other thread, the policy routing should be done in the > other way. And, it is not good to change the semantics of > CONFIG_IPV6_SUBTREE. Even if the semantics are flawed? I'll try to explain my reasoning below. > In original, routing is looked up by destination address, and then, > looked up by the source address; destination takes precedence over source. > Your patch changes this. Source address takes precedence over destination > address. The main problem with the original destination,source lookup are the cached host routes created by ip6_route_{input,output} (or actually rt6_cow). Since these routes have destination prefix length 128, they will override all source routes, unless they also are host routes. This happens because the (non-host) source route ends up in the subtree of a node higher up in the destination tree, which will never be reached because the cached host route already matches the destination address. Since the initial mode of communication between a mobile node (using its home address) and any correspondent node is reverse tunneling we at least need something like a default (i.e. a non-host) route through the tunnel for the MN's home address. Not until route optimization is set up between the MN and the CN do we actually get host routes for the traffic between the two. If we switch the order of keys to source,destination we don't get this problem since the cached host routes end up at the bottom of the subtrees and wont interfere with the normal routing. Prefix routes also cause problems with the destination,source key order, since we must create a duplicate route for each prefix and home address. Hope I explained it clearly enough :) > From the point of the policy routing, both (and other attributes) should be > considered equally, and this is what IPv4 routing table does. This of course seems like the optimal solution. > Well, I won't hurry intorducing IPv6 policy routing just because of MIP6. > The reason why I won't hurry is because I still believe it is not > required for MIP6. Nakamura, one of our member, will describe the details. > It takes precedence over "limited" policy(?) routing to introcuce generic > policy routing. I still think _some_ routing changes are necessary, but I guess we need to discuss what the changes are. I'm btw willing to help with the IPv6 policy routing if that helps getting it into the kernel sooner. > Anyway, will you split up your patch (into 1-3 above) first, please? I'll check if there still is anything to do in 1 after the patch you already submitted, but let's please discuss 3 before I split it into 2 and 3. Thanks, Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From yoshfuji@linux-ipv6.org Fri Jun 6 03:31:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 03:31:50 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56AVc2x007066 for ; Fri, 6 Jun 2003 03:31:39 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h56AWJBo020634; Fri, 6 Jun 2003 19:32:19 +0900 Date: Fri, 06 Jun 2003 19:32:18 +0900 (JST) Message-Id: <20030606.193218.117654914.yoshfuji@linux-ipv6.org> To: vnuorval@tcs.hut.fi Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, nakam@linux-ipv6.org, usagi-core@linux-ipv6.org Subject: Re: CONFIG_IPV6_SUBTREES (was [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20030605.191224.68706097.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2933 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article (at Fri, 6 Jun 2003 11:48:36 +0300 (EEST)), Ville Nuorvala says: > Since the initial mode of communication between a mobile node (using its > home address) and any correspondent node is reverse tunneling we at least > need something like a default (i.e. a non-host) route through the tunnel > for the MN's home address. Excuse me, please forget anything related to "Mobile IP" during this discussion; do not assume that Mobile IP is the only user of CONFIG_IPV6_SUBTREES. Thank you. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From lpetande@tml.hut.fi Fri Jun 6 04:07:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 04:07:40 -0700 (PDT) Received: from smtp-2.hut.fi (root@smtp-2.hut.fi [130.233.228.92]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56B7M2x008423 for ; Fri, 6 Jun 2003 04:07:28 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-2.hut.fi (8.12.9/8.12.9) with ESMTP id h56B6hdv012100; Fri, 6 Jun 2003 14:06:43 +0300 Message-ID: <3EE0779B.9080300@tml.hut.fi> Date: Fri, 06 Jun 2003 14:14:35 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Masahide NAKAMURA CC: =?ISO-2022-JP?B?WU9TSElGVUpJIEhpZGVha2kgLyAbJEI1SEYjMVFMQBsoQg==?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> <20030606143844.0604c306.nakam@linux-ipv6.org> In-Reply-To: <20030606143844.0604c306.nakam@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-2.hut.fi) X-DCC-HUTCC-Metrics: smtp-2.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 2934 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Hello Nakamura, Masahide NAKAMURA wrote: > Hello, > I'm Nakamura, a member of USAGI. > > > As you know, we've been planning the MIPv6 design to use XFRM. > If we use MIPv6, we need some fix and extension to XFRM and it results to make XFRM > more generic. I like your idea, as it allows a high level of flexibility in use of mipv6 with flows through the definition of policies. This is the way I would also do mipv6 extension header addition with xfrm. However, if you insert mipv6 policies into xfrm, you need to take care of the interactions between ipsec and mipv6 policies. The system needs to cope with data flows to which both ipsec and mipv6 should be applied. As a result of this the logic of the xfrm lookups probably needs some changes to return both the matching ipsec and mipv6 policies. How have you planned to solve this problem? BTW, feel free to use relevant parts of my code for the output functionality to speed up the work. After your code is ready we can look at which approach is better suited for implementing the kernel support and get a working kernel infrastructure for mipv6 into 2.6 kernels. Henrik From vnuorval@tcs.hut.fi Fri Jun 6 04:21:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 04:22:06 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56BLs2x008832 for ; Fri, 6 Jun 2003 04:21:55 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Fri, 6 Jun 2003 14:17:00 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h56BGxjH003777; Fri, 6 Jun 2003 14:16:59 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h56BGwIQ003772; Fri, 6 Jun 2003 14:16:58 +0300 Date: Fri, 6 Jun 2003 14:16:57 +0300 (EEST) From: Ville Nuorvala To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: davem@redhat.com, , , , , , , , Subject: Re: CONFIG_IPV6_SUBTREES (was [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6) In-Reply-To: <20030606.193218.117654914.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-15 X-archive-position: 2935 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, YOSHIFUJI Hideaki / [iso-2022-jp] 吉藤英明 wrote: > Excuse me, please forget anything related to "Mobile IP" during this > discussion; do not assume that Mobile IP is the only user of > CONFIG_IPV6_SUBTREES. At the moment it is :) I was just making a point about the IMHO flawed semantics of CONFIG_IPV6_SUBTREES. If you keep the original (first dest, then src) key ordering you basically can't use the subtrees for anything else but storing source address specific host routes. With the reversed order you can do a lot more... -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 From nakam@linux-ipv6.org Fri Jun 6 06:34:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 06:34:45 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56DYb2x011827 for ; Fri, 6 Jun 2003 06:34:38 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19OHJ7-0001Hh-00; Fri, 06 Jun 2003 22:31:01 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> In-Reply-To: <3EE0779B.9080300@tml.hut.fi> References: <20030424132559.GA15894@morphine.tml.hut.fi> <20030531.000319.114704530.yoshfuji@linux-ipv6.org> <20030605.191224.68706097.yoshfuji@linux-ipv6.org> <20030606143844.0604c306.nakam@linux-ipv6.org> <3EE0779B.9080300@tml.hut.fi> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Fri, 06 Jun 2003 22:31:01 +0900 X-archive-position: 2936 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello Henrik, On Fri, 06 Jun 2003 14:14:35 +0300 Henrik Petander wrote: > I like your idea, as it allows a high level of flexibility in use of > mipv6 with flows through the definition of policies. This is the way I > would also do mipv6 extension header addition with xfrm. Thanks. > However, if you insert mipv6 policies into xfrm, you need to take care > of the interactions between ipsec and mipv6 policies. The system needs > to cope with data flows to which both ipsec and mipv6 should be applied. > As a result of this the logic of the xfrm lookups probably needs some > changes to return both the matching ipsec and mipv6 policies. How have > you planned to solve this problem? We don't think we have to change the logic handling policy with the reason because we can treat MIPv6 policy just like IPsec. When we want to apply both MIPv6 and IPsec to the same target, we need one policy that has two or more of templates(e.g. one is MIPv6's template and the other is IPsec's). Regarding above case, however, we have a problem like below: draft(9.3.1 in draft-ietf-mobileip-ipv6-22) says, When attempting to verify AH authentication data in a packet that contains a Home Address option, the receiving node MUST calculate the AH authentication data as if the following were true: The Home Address option contains the care-of address, and the source IPv6 address field of the IPv6 header contains the home address. Because xfrm decides to call dst_output in the order of templates, at first we had no idea which is the former template, MIPv6 or IPsec(Home Address Option or AH). Then we discussed about that with our IPsec guys and now we guess we have an idea to use xfrm6_clear_mutable_options() to re-replace address for calculating for AH when calling ah6_output(). Anyway, I think this is not specialized matter of xfrm. (Did you also point this, Henrik?) Or, could you have any idea? > BTW, feel free to use relevant parts of my code for the output > functionality to speed up the work. After your code is ready we can look > at which approach is better suited for implementing the kernel support > and get a working kernel infrastructure for mipv6 into 2.6 kernels. Thank you for your kindness. Of course I agree with you. Regards, -- Masahide NAKAMURA From hadmut@danisch.de Fri Jun 6 09:53:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 09:53:47 -0700 (PDT) Received: from sklave3.rackland.de (sklave3.rackland.de [213.133.101.23]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56GrZ2x019947 for ; Fri, 6 Jun 2003 09:53:36 -0700 Received: from sodom (uucp@localhost) by sklave3.rackland.de (8.12.9/8.12.9/Debian-1) with BSMTP id h56GrYLe024482 for netdev@oss.sgi.com; Fri, 6 Jun 2003 18:53:34 +0200 Received: (from hadmut@localhost) by sodom.home.danisch.de (8.12.9/8.12.9/Debian-1) id h56GrEqq012738 for netdev@oss.sgi.com; Fri, 6 Jun 2003 18:53:14 +0200 From: Hadmut Danisch Date: Fri, 6 Jun 2003 18:53:14 +0200 To: netdev@oss.sgi.com Subject: Cisco Aironet Problem Message-ID: <20030606165314.GA12669@danisch.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 2937 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadmut@danisch.de Precedence: bulk X-list: netdev Hi, I have a problem with 2.4.20 and my Cisco Aironet 340 PCMCIA card. When I use the keyboard while traffic is high, something gets broken in the kernel: Both the keyboard and the network device freeze permanently, a reboot is required. I've reported this bug to the aironet driver team (sourceforge), but did not receive any feedback. Meanwhile I found a bug report in the dmesg after a crash. Unfortunately I'm not familiar with that part of the kernel. Maybe you can give me a hint what could be the reason for such a bug message in dmesg? airo: BAP error 4000 2 Warning: kfree_skb passed an skb still on a list (from c01206ea). kernel BUG at skbuff.c:315! invalid operand: 0000 CPU: 0 EIP: 0010:[] Not tainted EFLAGS: 00013286 eax: 00000045 ebx: cfde31c0 ecx: cc6f2000 edx: 00000001 esi: c12f5f84 edi: 00000000 ebp: c12f4000 esp: c12f5f6c ds: 0018 es: 0018 ss: 0018 Process keventd (pid: 2, stackpage=c12f5000) Stack: c0238740 c01206ea 00000000 c12f5f84 c01206ea cfde31c0 cc4002e4 cc4002e4 00000000 00000000 c0128c83 c02489d0 c12f5fb0 00000000 c12f4560 c12f4570 c12f4000 00000001 00000000 c12c9f80 00010000 00000000 00000700 c0128b50 Call Trace: [] [] [] [] [] [] [] Code: 0f 0b 3b 01 5f 79 23 c0 8b 5c 24 14 e9 ce fe ff ff 8d 74 26 regards Hadmut From mk@karaba.org Fri Jun 6 11:17:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 11:17:26 -0700 (PDT) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56IHE2x022281 for ; Fri, 6 Jun 2003 11:17:15 -0700 Received: from [3ffe:501:1057:710::1] (helo=hyakusiki.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 19OLlz-0002Fw-00; Sat, 07 Jun 2003 03:17:08 +0900 Date: Sat, 07 Jun 2003 03:17:10 +0900 Message-ID: <87wufzxe8p.wl@karaba.org> From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: "David S. Miller" Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: [PATCH] fix esp6 extension headers handling In-Reply-To: <873cioqxch.wl@karaba.org> <3EDF36AA.9020403@tml.hut.fi> <3EDF3EB4.8010105@tml.hut.fi> References: <3EDF36AA.9020403@tml.hut.fi> <20030605.051709.104035049.davem@redhat.com> <3EDF3EB4.8010105@tml.hut.fi> <873cioqxch.wl@karaba.org> MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 2938 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello, At Thu, 05 Jun 2003 15:59:32 +0300, Henrik Petander wrote: > > David S. Miller wrote: > > From: Henrik Petander > > Date: Thu, 05 Jun 2003 15:25:14 +0300 > > > > A possible fix is to change the pointer into an offset from the start of > > the packet and use the offset later to set the nexthdr value in the > > extension header. > > > > Please indicate the version of the sources you are looking > > at when making reports. > > Sure, esp6.c bitkeeper version was 1.16. Also a fix to the bug report: > the problem is with esp6 and not with ah6, which does not use the > get_offset function. > > Henrik > The attached diff fixes esp6 extension headers handling bug which reported by Henrik. I introduced ip6_find_1stfragopt() instead of get_offset(). # ip6_found_nexthdr() is just renamed ip6_find_1stfragopt() # in order to represent collect functionality. Regards, -mk Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/ipv6.h,v retrieving revision 1.1.1.12 retrieving revision 1.1.1.12.8.1 diff -u -r1.1.1.12 -r1.1.1.12.8.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h 31 May 2003 07:30:34 -0000 1.1.1.12 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/include/net/ipv6.h 6 Jun 2003 15:43:46 -0000 1.1.1.12.8.1 @@ -315,7 +315,7 @@ unsigned length, struct ipv6_txoptions *opt, int hlimit, int flags); -extern int ip6_found_nexthdr(struct sk_buff *skb, u8 **nexthdr); +extern int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr); extern int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/esp6.c,v retrieving revision 1.1.1.13 retrieving revision 1.1.1.13.12.1 diff -u -r1.1.1.13 -r1.1.1.13.12.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c 26 May 2003 08:04:11 -0000 1.1.1.13 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/esp6.c 6 Jun 2003 16:23:01 -0000 1.1.1.13.12.1 @@ -39,57 +39,6 @@ #define MAX_SG_ONSTACK 4 -/* BUGS: - * - we assume replay seqno is always present. - */ - -/* Move to common area: it is shared with AH. */ -/* Common with AH after some work on arguments. */ - -static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, struct ipv6_opt_hdr **prevhdr) -{ - u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); - u8 nextnexthdr; - - *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; - - while (offset + 1 < packet_len) { - - switch (*nexthdr) { - - case NEXTHDR_HOP: - case NEXTHDR_ROUTING: - offset += ipv6_optlen(exthdr); - *nexthdr = exthdr->nexthdr; - *prevhdr = exthdr; - exthdr = (struct ipv6_opt_hdr*)(packet + offset); - break; - - case NEXTHDR_DEST: - nextnexthdr = - ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; - /* XXX We know the option is inner dest opt - with next next header check. */ - if (nextnexthdr != NEXTHDR_HOP && - nextnexthdr != NEXTHDR_ROUTING && - nextnexthdr != NEXTHDR_DEST) { - return offset; - } - offset += ipv6_optlen(exthdr); - *nexthdr = exthdr->nexthdr; - *prevhdr = exthdr; - exthdr = (struct ipv6_opt_hdr*)(packet + offset); - break; - - default : - return offset; - } - } - - return offset; -} - int esp6_output(struct sk_buff *skb) { int err; @@ -101,12 +50,12 @@ struct crypto_tfm *tfm; struct esp_data *esp; struct sk_buff *trailer; - struct ipv6_opt_hdr *prevhdr = NULL; int blksize; int clen; int alen; int nfrags; - u8 nexthdr; + u8 *prevhdr; + u8 nexthdr = 0; /* First, if the skb is not checksummed, complete checksum. */ if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) { @@ -123,7 +72,9 @@ /* Strip IP header in transport mode. Save it. */ if (!x->props.mode) { - hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &prevhdr); + hdr_len = ip6_find_1stfragopt(skb, &prevhdr); + nexthdr = *prevhdr; + *prevhdr = IPPROTO_ESP; iph = kmalloc(hdr_len, GFP_ATOMIC); if (!iph) { err = -ENOMEM; @@ -178,18 +129,12 @@ ipv6_addr_copy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr); } else { - /* XXX exthdr */ esph = (struct ipv6_esp_hdr*)skb_push(skb, x->props.header_len); skb->h.raw = (unsigned char*)esph; top_iph = (struct ipv6hdr*)skb_push(skb, hdr_len); memcpy(top_iph, iph, hdr_len); kfree(iph); top_iph->payload_len = htons(skb->len + alen - sizeof(struct ipv6hdr)); - if (prevhdr) { - prevhdr->nexthdr = IPPROTO_ESP; - } else { - top_iph->nexthdr = IPPROTO_ESP; - } *(u8*)(trailer->tail - 1) = nexthdr; } @@ -302,6 +247,7 @@ struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; struct scatterlist *sg = sgbuf; u8 padlen; + u8 *prevhdr; if (unlikely(nfrags > MAX_SG_ONSTACK)) { sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); @@ -325,11 +271,13 @@ } /* ... check padding bits here. Silly. :-) */ - ret_nexthdr = ((struct ipv6hdr*)tmp_hdr)->nexthdr = nexthdr[1]; pskb_trim(skb, skb->len - alen - padlen - 2); skb->h.raw = skb_pull(skb, sizeof(struct ipv6_esp_hdr) + esp->conf.ivlen); skb->nh.raw += sizeof(struct ipv6_esp_hdr) + esp->conf.ivlen; memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + ip6_find_1stfragopt(skb, &prevhdr); + ret_nexthdr = *prevhdr = nexthdr[1]; } kfree(tmp_hdr); return ret_nexthdr; Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.16 retrieving revision 1.1.1.16.16.1 diff -u -r1.1.1.16 -r1.1.1.16.16.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c 26 May 2003 08:04:10 -0000 1.1.1.16 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ip6_output.c 6 Jun 2003 15:43:34 -0000 1.1.1.16.16.1 @@ -887,7 +887,7 @@ #endif } -int ip6_found_nexthdr(struct sk_buff *skb, u8 **nexthdr) +int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr) { u16 offset = sizeof(struct ipv6hdr); struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.ipv6h + 1); @@ -929,7 +929,7 @@ u8 *prevhdr, nexthdr = 0; dev = rt->u.dst.dev; - hlen = ip6_found_nexthdr(skb, &prevhdr); + hlen = ip6_find_1stfragopt(skb, &prevhdr); nexthdr = *prevhdr; mtu = dst_pmtu(&rt->u.dst) - hlen - sizeof(struct frag_hdr); Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipcomp6.c,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.14.1 diff -u -r1.1.1.2 -r1.1.1.2.14.1 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c 21 May 2003 13:15:20 -0000 1.1.1.2 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipcomp6.c 6 Jun 2003 15:43:34 -0000 1.1.1.2.14.1 @@ -105,7 +105,7 @@ iph = skb->nh.ipv6h; iph->payload_len = htons(skb->len); - ip6_found_nexthdr(skb, &prevhdr); + ip6_find_1stfragopt(skb, &prevhdr); *prevhdr = nexthdr; out: if (tmp_hdr) @@ -160,7 +160,7 @@ skb->nh.raw = skb->data; /* == top_iph */ skb->h.raw = skb->nh.raw + hdr_len; } else { - hdr_len = ip6_found_nexthdr(skb, &prevhdr); + hdr_len = ip6_find_1stfragopt(skb, &prevhdr); nexthdr = *prevhdr; } @@ -203,7 +203,7 @@ top_iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); skb->nh.raw = skb->data; /* top_iph */ - ip6_found_nexthdr(skb, &prevhdr); + ip6_find_1stfragopt(skb, &prevhdr); *prevhdr = IPPROTO_COMP; ipch = (struct ipv6_comp_hdr *)((unsigned char *)top_iph + hdr_len); Index: linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipv6_syms.c,v retrieving revision 1.1.1.12 retrieving revision 1.1.1.12.16.4 diff -u -r1.1.1.12 -r1.1.1.12.16.4 --- linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c 26 May 2003 08:04:11 -0000 1.1.1.12 +++ linux25-b2_5_70+CS1_1314_IPSEC6_CLEANUP/net/ipv6/ipv6_syms.c 6 Jun 2003 17:38:20 -0000 1.1.1.12.16.4 @@ -35,6 +35,6 @@ EXPORT_SYMBOL(in6addr_any); EXPORT_SYMBOL(in6addr_loopback); EXPORT_SYMBOL(in6_dev_finish_destroy); -EXPORT_SYMBOL(ip6_found_nexthdr); +EXPORT_SYMBOL(ip6_find_1stfragopt); EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_clear_mutable_options); From shemminger@osdl.org Fri Jun 6 14:58:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 14:59:04 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56Lwr2x025321 for ; Fri, 6 Jun 2003 14:58:53 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h56LwZX11697; Fri, 6 Jun 2003 14:58:35 -0700 Date: Fri, 6 Jun 2003 14:58:35 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030606145835.3a263df8.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2939 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This is the first phase of a sequence of patches to resolve network device reference count issues exposed by the new sysfs interface. Phase I: introduces release_netdev which is the hook to allow later changes to hold onto the net device after the device has potentially unloaded. Includes patch for the easy to fix devices. Phase II: fixes devices that encapsulate network device structure inside their own structure, or allocate private data in a way that will break later. Phase III: changes release_netdev to handle the case of delayed freeing of the network device, and appropriate state checking. diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Thu Jun 5 14:44:28 2003 +++ b/include/linux/netdevice.h Thu Jun 5 14:44:28 2003 @@ -491,6 +491,7 @@ extern int dev_queue_xmit(struct sk_buff *skb); extern int register_netdevice(struct net_device *dev); extern int unregister_netdevice(struct net_device *dev); +extern void release_netdev(struct net_device *dev); extern void synchronize_net(void); extern int register_netdevice_notifier(struct notifier_block *nb); extern int unregister_netdevice_notifier(struct notifier_block *nb); diff -Nru a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c Thu Jun 5 14:44:28 2003 +++ b/net/core/dev.c Thu Jun 5 14:44:28 2003 @@ -2768,6 +2768,21 @@ } } + +/** + * release_netdev - free network device + * @dev: device + * + * This function does the last stage of destroying an allocated device + * interface. Currently, it just frees the device. + * + */ + +void release_netdev(struct net_device *dev) +{ + kfree(dev); +} + /* Synchronize with packet receive processing. */ void synchronize_net(void) { diff -Nru a/net/netsyms.c b/net/netsyms.c --- a/net/netsyms.c Thu Jun 5 14:44:28 2003 +++ b/net/netsyms.c Thu Jun 5 14:44:28 2003 @@ -558,6 +558,7 @@ EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); +EXPORT_SYMBOL(release_netdev); EXPORT_SYMBOL(synchronize_net); EXPORT_SYMBOL(netdev_state_change); EXPORT_SYMBOL(dev_new_index); From shemminger@osdl.org Fri Jun 6 16:07:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 06 Jun 2003 16:08:08 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h56N7s2x007585 for ; Fri, 6 Jun 2003 16:07:55 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h56N7dX32762; Fri, 6 Jun 2003 16:07:39 -0700 Date: Fri, 6 Jun 2003 16:07:39 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030606160739.0581bf39.shemminger@osdl.org> In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h56N7s2x007585 X-archive-position: 2940 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Here is a patch to convert the "easy" drivers to use release_netdev, instead of directly freeing the net_device. They all compile but only e100 and e1000 have been tested with real hardware. diff -Nru a/drivers/net/3c59x.c b/drivers/net/3c59x.c --- a/drivers/net/3c59x.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/3c59x.c Thu Jun 5 15:51:50 2003 @@ -1021,7 +1021,7 @@ outw (TotalReset|0x14, ioaddr + EL3_CMD); release_region (ioaddr, VORTEX_TOTAL_SIZE); - kfree (dev); + release_netdev (dev); return 0; } #endif @@ -3072,7 +3072,7 @@ vp->rx_ring_dma); if (vp->must_free_region) release_region(dev->base_addr, vp->io_size); - kfree(dev); + release_netdev(dev); } diff -Nru a/drivers/net/8139cp.c b/drivers/net/8139cp.c --- a/drivers/net/8139cp.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/8139cp.c Thu Jun 5 15:51:50 2003 @@ -1969,7 +1969,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } #ifdef CONFIG_PM diff -Nru a/drivers/net/8139too.c b/drivers/net/8139too.c --- a/drivers/net/8139too.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/8139too.c Thu Jun 5 15:51:50 2003 @@ -721,7 +721,7 @@ sizeof (struct rtl8139_private)); #endif /* RTL8139_NDEBUG */ - kfree (dev); + release_netdev (dev); pci_set_drvdata (pdev, NULL); } diff -Nru a/drivers/net/a2065.c b/drivers/net/a2065.c --- a/drivers/net/a2065.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/a2065.c Thu Jun 5 15:51:50 2003 @@ -820,7 +820,7 @@ release_mem_region(ZTWO_PADDR(dev->base_addr), sizeof(struct lance_regs)); release_mem_region(ZTWO_PADDR(dev->mem_start), A2065_RAM_SIZE); - kfree(dev); + release_netdev(dev); root_a2065_dev = next; } #endif diff -Nru a/drivers/net/amd8111e.c b/drivers/net/amd8111e.c --- a/drivers/net/amd8111e.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/amd8111e.c Thu Jun 5 15:51:50 2003 @@ -1709,7 +1709,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct amd8111e_priv *)(dev->priv))->mmio); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/ariadne.c b/drivers/net/ariadne.c --- a/drivers/net/ariadne.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ariadne.c Thu Jun 5 15:51:50 2003 @@ -852,7 +852,7 @@ unregister_netdev(dev); release_mem_region(ZTWO_PADDR(dev->base_addr), sizeof(struct Am79C960)); release_mem_region(ZTWO_PADDR(dev->mem_start), ARIADNE_RAM_SIZE); - kfree(dev); + release_netdev(dev); root_ariadne_dev = next; } #endif diff -Nru a/drivers/net/ariadne2.c b/drivers/net/ariadne2.c --- a/drivers/net/ariadne2.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ariadne2.c Thu Jun 5 15:51:50 2003 @@ -413,7 +413,7 @@ unregister_netdev(dev); free_irq(IRQ_AMIGA_PORTS, dev); release_mem_region(ZTWO_PADDR(dev->base_addr), NE_IO_EXTENT*2); - kfree(dev); + release_netdev(dev); root_ariadne2_dev = next; } #endif diff -Nru a/drivers/net/arm/am79c961a.c b/drivers/net/arm/am79c961a.c --- a/drivers/net/arm/am79c961a.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/am79c961a.c Thu Jun 5 15:51:50 2003 @@ -677,7 +677,7 @@ release_region(dev->base_addr, 0x18); nodev: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } diff -Nru a/drivers/net/arm/ether1.c b/drivers/net/arm/ether1.c --- a/drivers/net/arm/ether1.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/ether1.c Thu Jun 5 15:51:50 2003 @@ -1058,7 +1058,7 @@ release_region(dev->base_addr, 16); release_region(dev->base_addr + 0x800, 4096); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } @@ -1073,7 +1073,7 @@ release_region(dev->base_addr, 16); release_region(dev->base_addr + 0x800, 4096); - kfree(dev); + release_netdev(dev); } static const struct ecard_id ether1_ids[] = { diff -Nru a/drivers/net/arm/ether3.c b/drivers/net/arm/ether3.c --- a/drivers/net/arm/ether3.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/arm/ether3.c Thu Jun 5 15:51:50 2003 @@ -899,7 +899,7 @@ release_region(dev->base_addr, 128); free: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); out: return ret; } @@ -912,7 +912,7 @@ unregister_netdev(dev); release_region(dev->base_addr, 128); - kfree(dev); + release_netdev(dev); } static const struct ecard_id ether3_ids[] = { diff -Nru a/drivers/net/au1000_eth.c b/drivers/net/au1000_eth.c --- a/drivers/net/au1000_eth.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/au1000_eth.c Thu Jun 5 15:51:50 2003 @@ -823,7 +823,7 @@ MAX_BUF_SIZE * (NUM_TX_BUFFS+NUM_RX_BUFFS)); printk(KERN_ERR "%s: au1000_probe1 failed. Returns %d\n", dev->name, retval); - kfree(dev); + release_netdev(dev); return retval; } diff -Nru a/drivers/net/b44.c b/drivers/net/b44.c --- a/drivers/net/b44.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/b44.c Thu Jun 5 15:51:50 2003 @@ -1830,7 +1830,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct b44 *)(dev->priv))->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/bmac.c b/drivers/net/bmac.c --- a/drivers/net/bmac.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/bmac.c Thu Jun 5 15:51:50 2003 @@ -1452,7 +1452,7 @@ pmac_call_feature(PMAC_FTR_BMAC_ENABLE, bp->node, 0, 0); } unregister_netdev(dev); - kfree(dev); + release_netdev(dev); } static int bmac_open(struct net_device *dev) @@ -1710,7 +1710,7 @@ free_irq(bp->tx_dma_intr, dev); free_irq(bp->rx_dma_intr, dev); - kfree(dev); + release_netdev(dev); } while (bmac_devs != NULL); } diff -Nru a/drivers/net/declance.c b/drivers/net/declance.c --- a/drivers/net/declance.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/declance.c Thu Jun 5 15:51:50 2003 @@ -1203,7 +1203,7 @@ err_out: unregister_netdev(dev); - kfree(dev); + release_netdev(dev); return ret; } diff -Nru a/drivers/net/dl2k.c b/drivers/net/dl2k.c --- a/drivers/net/dl2k.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/dl2k.c Thu Jun 5 15:51:50 2003 @@ -1844,7 +1844,7 @@ #ifdef MEM_MAPPING iounmap ((char *) (dev->base_addr)); #endif - kfree (dev); + release_netdev (dev); pci_release_regions (pdev); pci_disable_device (pdev); } diff -Nru a/drivers/net/e100/e100_main.c b/drivers/net/e100/e100_main.c --- a/drivers/net/e100/e100_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/e100/e100_main.c Thu Jun 5 15:51:50 2003 @@ -716,7 +716,7 @@ e100_dealloc_space(bdp); err_dev: pci_set_drvdata(pcid, NULL); - kfree(dev); + release_netdev(dev); out: return rc; } @@ -738,7 +738,7 @@ e100_dealloc_space(bdp); pci_set_drvdata(bdp->pdev, NULL); - kfree(dev); + release_netdev(dev); } static void __devexit diff -Nru a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/e1000/e1000_main.c Thu Jun 5 15:51:50 2003 @@ -542,7 +542,7 @@ iounmap(adapter->hw.hw_addr); pci_release_regions(pdev); - kfree(netdev); + release_netdev(netdev); } /** diff -Nru a/drivers/net/eepro100.c b/drivers/net/eepro100.c --- a/drivers/net/eepro100.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/eepro100.c Thu Jun 5 15:51:50 2003 @@ -2364,7 +2364,7 @@ + sizeof(struct speedo_stats), sp->tx_ring, sp->tx_ring_dma); pci_disable_device(pdev); - kfree(dev); + release_netdev(dev); } static struct pci_device_id eepro100_pci_tbl[] __devinitdata = { diff -Nru a/drivers/net/epic100.c b/drivers/net/epic100.c --- a/drivers/net/epic100.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/epic100.c Thu Jun 5 15:51:50 2003 @@ -1482,7 +1482,7 @@ iounmap((void*) dev->base_addr); #endif pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); /* pci_power_off(pdev, -1); */ } diff -Nru a/drivers/net/fealnx.c b/drivers/net/fealnx.c --- a/drivers/net/fealnx.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/fealnx.c Thu Jun 5 15:51:50 2003 @@ -712,7 +712,7 @@ #ifndef USE_IO_OPS iounmap((void *)dev->base_addr); #endif - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); } else diff -Nru a/drivers/net/hamachi.c b/drivers/net/hamachi.c --- a/drivers/net/hamachi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/hamachi.c Thu Jun 5 15:51:50 2003 @@ -1976,7 +1976,7 @@ hmp->tx_ring_dma); unregister_netdev(dev); iounmap((char *)dev->base_addr); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/hydra.c b/drivers/net/hydra.c --- a/drivers/net/hydra.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/hydra.c Thu Jun 5 15:51:50 2003 @@ -243,7 +243,7 @@ unregister_netdev(dev); free_irq(IRQ_AMIGA_PORTS, dev); release_mem_region(ZTWO_PADDR(dev->base_addr)-HYDRA_NIC_BASE, 0x10000); - kfree(dev); + release_netdev(dev); root_hydra_dev = next; } #endif diff -Nru a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c --- a/drivers/net/ixgb/ixgb_main.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ixgb/ixgb_main.c Thu Jun 5 15:51:50 2003 @@ -478,7 +478,7 @@ iounmap((void *) adapter->hw.hw_addr); pci_release_regions(pdev); - kfree(netdev); + release_netdev(netdev); } /** diff -Nru a/drivers/net/mace.c b/drivers/net/mace.c --- a/drivers/net/mace.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/mace.c Thu Jun 5 15:51:50 2003 @@ -254,7 +254,7 @@ release_OF_resource(mp->of_node, 1); release_OF_resource(mp->of_node, 2); } - kfree(dev); + release_netdev(dev); } static void dbdma_reset(volatile struct dbdma_regs *dma) @@ -976,7 +976,7 @@ release_OF_resource(mp->of_node, 1); release_OF_resource(mp->of_node, 2); - kfree(dev); + release_netdev(dev); } if (dummy_buf != NULL) { kfree(dummy_buf); diff -Nru a/drivers/net/myri_sbus.c b/drivers/net/myri_sbus.c --- a/drivers/net/myri_sbus.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/myri_sbus.c Thu Jun 5 15:51:50 2003 @@ -1090,7 +1090,7 @@ return 0; err: unregister_netdev(dev); /* This will also free the co-allocated 'dev->priv' */ - kfree(dev); + release_netdev(dev); return -ENODEV; } diff -Nru a/drivers/net/natsemi.c b/drivers/net/natsemi.c --- a/drivers/net/natsemi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/natsemi.c Thu Jun 5 15:51:50 2003 @@ -838,7 +838,7 @@ if (i) { pci_release_regions(pdev); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); return i; } diff -Nru a/drivers/net/ne2k-pci.c b/drivers/net/ne2k-pci.c --- a/drivers/net/ne2k-pci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/ne2k-pci.c Thu Jun 5 15:51:50 2003 @@ -635,7 +635,7 @@ unregister_netdev(dev); release_region(dev->base_addr, NE_IO_EXTENT); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/pci-skeleton.c b/drivers/net/pci-skeleton.c --- a/drivers/net/pci-skeleton.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pci-skeleton.c Thu Jun 5 15:51:50 2003 @@ -871,7 +871,7 @@ sizeof (struct netdrv_private)); #endif /* NETDRV_NDEBUG */ - kfree (dev); + release_netdev (dev); pci_set_drvdata (pdev, NULL); diff -Nru a/drivers/net/pcmcia/ibmtr_cs.c b/drivers/net/pcmcia/ibmtr_cs.c --- a/drivers/net/pcmcia/ibmtr_cs.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pcmcia/ibmtr_cs.c Thu Jun 5 15:51:50 2003 @@ -310,7 +310,7 @@ /* Unlink device structure, free bits */ *linkp = link->next; unregister_netdev(dev); - kfree(dev); + release_netdev(dev); } /* ibmtr_detach */ /*====================================================================== diff -Nru a/drivers/net/pcnet32.c b/drivers/net/pcnet32.c --- a/drivers/net/pcnet32.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/pcnet32.c Thu Jun 5 15:51:50 2003 @@ -1762,7 +1762,7 @@ if (lp->pci_dev) pci_unregister_driver(&pcnet32_driver); pci_free_consistent(lp->pci_dev, sizeof(*lp), lp, lp->dma_addr); - kfree(pcnet32_dev); + release_netdev(pcnet32_dev); pcnet32_dev = next_dev; } } diff -Nru a/drivers/net/r8169.c b/drivers/net/r8169.c --- a/drivers/net/r8169.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/r8169.c Thu Jun 5 15:51:50 2003 @@ -646,7 +646,7 @@ sizeof (struct net_device) + sizeof (struct rtl8169_private)); pci_disable_device(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/rrunner.c b/drivers/net/rrunner.c --- a/drivers/net/rrunner.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/rrunner.c Thu Jun 5 15:51:50 2003 @@ -253,7 +253,7 @@ rr->tx_ring_dma); unregister_netdev(dev); iounmap(rr->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/sis900.c b/drivers/net/sis900.c --- a/drivers/net/sis900.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sis900.c Thu Jun 5 15:51:50 2003 @@ -493,7 +493,7 @@ pci_set_drvdata(pci_dev, NULL); pci_release_regions(pci_dev); err_out: - kfree(net_dev); + release_netdev(net_dev); return ret; } @@ -2189,7 +2189,7 @@ pci_free_consistent(pci_dev, TX_TOTAL_SIZE, sis_priv->tx_ring, sis_priv->tx_ring_dma); unregister_netdev(net_dev); - kfree(net_dev); + release_netdev(net_dev); pci_release_regions(pci_dev); pci_set_drvdata(pci_dev, NULL); } diff -Nru a/drivers/net/skfp/skfddi.c b/drivers/net/skfp/skfddi.c --- a/drivers/net/skfp/skfddi.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/skfp/skfddi.c Thu Jun 5 15:51:50 2003 @@ -2633,7 +2633,7 @@ } unregister_netdev(p); printk("%s: unloaded\n", p->name); - kfree(p); /* Free the device structure */ + release_netdev(p); /* Free the device structure */ return next; } // unlink_modules diff -Nru a/drivers/net/starfire.c b/drivers/net/starfire.c --- a/drivers/net/starfire.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/starfire.c Thu Jun 5 15:51:50 2003 @@ -2196,7 +2196,7 @@ pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); /* Will also free np!! */ + release_netdev(dev); /* Will also free np!! */ } diff -Nru a/drivers/net/sunbmac.c b/drivers/net/sunbmac.c --- a/drivers/net/sunbmac.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sunbmac.c Thu Jun 5 15:51:50 2003 @@ -1209,7 +1209,7 @@ unregister_netdev(dev); /* This also frees the co-located 'dev->priv' */ - kfree(dev); + release_netdev(dev); return -ENODEV; } diff -Nru a/drivers/net/sundance.c b/drivers/net/sundance.c --- a/drivers/net/sundance.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sundance.c Thu Jun 5 15:51:50 2003 @@ -730,7 +730,7 @@ #endif pci_release_regions(pdev); err_out_netdev: - kfree (dev); + release_netdev(dev); return -ENODEV; } @@ -1784,7 +1784,7 @@ #ifndef USE_IO_OPS iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } } diff -Nru a/drivers/net/sungem.c b/drivers/net/sungem.c --- a/drivers/net/sungem.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sungem.c Thu Jun 5 15:51:50 2003 @@ -2885,7 +2885,7 @@ gp->gblock_dvma); iounmap((void *) gp->regs); pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/sunhme.c b/drivers/net/sunhme.c --- a/drivers/net/sunhme.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/sunhme.c Thu Jun 5 15:51:50 2003 @@ -3351,7 +3351,7 @@ pci_release_regions(hp->happy_dev); } #endif - kfree(dev); + release_netdev(dev); root_happy_dev = next; } diff -Nru a/drivers/net/tc35815.c b/drivers/net/tc35815.c --- a/drivers/net/tc35815.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tc35815.c Thu Jun 5 15:51:50 2003 @@ -1762,7 +1762,7 @@ next_dev = ((struct tc35815_local *)dev->priv)->next_module; iounmap((void *)(dev->base_addr)); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); root_tc35815_dev = next_dev; } } diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c --- a/drivers/net/tg3.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tg3.c Thu Jun 5 15:51:50 2003 @@ -6942,7 +6942,7 @@ if (dev) { unregister_netdev(dev); iounmap((void *) ((struct tg3 *)(dev->priv))->regs); - kfree(dev); + release_netdev(dev); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/tlan.c b/drivers/net/tlan.c --- a/drivers/net/tlan.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tlan.c Thu Jun 5 15:51:50 2003 @@ -447,7 +447,7 @@ pci_release_regions(pdev); - kfree( dev ); + release_netdev( dev ); pci_set_drvdata( pdev, NULL ); } @@ -695,7 +695,7 @@ release_region( dev->base_addr, 0x10); unregister_netdev( dev ); TLan_Eisa_Devices = priv->nextDevice; - kfree( dev ); + release_netdev( dev ); tlan_have_eisa--; } } diff -Nru a/drivers/net/tokenring/abyss.c b/drivers/net/tokenring/abyss.c --- a/drivers/net/tokenring/abyss.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/abyss.c Thu Jun 5 15:51:50 2003 @@ -443,7 +443,7 @@ release_region(dev->base_addr-0x10, ABYSS_IO_EXTENT); free_irq(dev->irq, dev); tmsdev_term(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tokenring/lanstreamer.c b/drivers/net/tokenring/lanstreamer.c --- a/drivers/net/tokenring/lanstreamer.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/lanstreamer.c Thu Jun 5 15:51:50 2003 @@ -433,7 +433,7 @@ /* shouldn't we do iounmap here? */ release_region(pci_resource_start(pdev, 0), pci_resource_len(pdev,0)); release_mem_region(pci_resource_start(pdev, 1), pci_resource_len(pdev,1)); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tokenring/olympic.c b/drivers/net/tokenring/olympic.c --- a/drivers/net/tokenring/olympic.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/olympic.c Thu Jun 5 15:51:50 2003 @@ -1778,7 +1778,7 @@ iounmap(olympic_priv->olympic_lap) ; pci_release_regions(pdev) ; pci_set_drvdata(pdev,NULL) ; - kfree(dev) ; + release_netdev(dev) ; } static struct pci_driver olympic_driver = { diff -Nru a/drivers/net/tokenring/smctr.c b/drivers/net/tokenring/smctr.c --- a/drivers/net/tokenring/smctr.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/smctr.c Thu Jun 5 15:51:50 2003 @@ -5730,7 +5730,7 @@ if (dev) { unregister_netdev(dev); cleanup_card(dev); - kfree(dev); + release_netdev(dev); } } } diff -Nru a/drivers/net/tokenring/tmspci.c b/drivers/net/tokenring/tmspci.c --- a/drivers/net/tokenring/tmspci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tokenring/tmspci.c Thu Jun 5 15:51:50 2003 @@ -229,7 +229,7 @@ release_region(dev->base_addr, TMS_PCI_IO_EXTENT); free_irq(dev->irq, dev); tmsdev_term(dev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c --- a/drivers/net/tulip/de2104x.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/de2104x.c Thu Jun 5 15:51:50 2003 @@ -2153,7 +2153,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } #ifdef CONFIG_PM diff -Nru a/drivers/net/tulip/dmfe.c b/drivers/net/tulip/dmfe.c --- a/drivers/net/tulip/dmfe.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/dmfe.c Thu Jun 5 15:51:50 2003 @@ -478,7 +478,7 @@ db->buf_pool_ptr, db->buf_pool_dma_ptr); unregister_netdev(dev); pci_release_regions(pdev); - kfree(dev); /* free board information */ + release_netdev(dev); /* free board information */ pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c --- a/drivers/net/tulip/tulip_core.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/tulip_core.c Thu Jun 5 15:51:50 2003 @@ -1767,7 +1767,7 @@ #ifndef USE_IO_OPS iounmap((void *)dev->base_addr); #endif - kfree (dev); + release_netdev (dev); pci_release_regions (pdev); pci_set_drvdata (pdev, NULL); diff -Nru a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c --- a/drivers/net/tulip/winbond-840.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/winbond-840.c Thu Jun 5 15:51:50 2003 @@ -1623,7 +1623,7 @@ #ifndef USE_IO_OPS iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); } pci_set_drvdata(pdev, NULL); diff -Nru a/drivers/net/tulip/xircom_cb.c b/drivers/net/tulip/xircom_cb.c --- a/drivers/net/tulip/xircom_cb.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/xircom_cb.c Thu Jun 5 15:51:50 2003 @@ -338,7 +338,7 @@ } release_region(dev->base_addr, 128); unregister_netdev(dev); - kfree(dev); + release_netdev(dev); leave("xircom_remove"); } diff -Nru a/drivers/net/tulip/xircom_tulip_cb.c b/drivers/net/tulip/xircom_tulip_cb.c --- a/drivers/net/tulip/xircom_tulip_cb.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/tulip/xircom_tulip_cb.c Thu Jun 5 15:51:50 2003 @@ -645,11 +645,11 @@ return 0; err_out_cleardev: + unregister_netdev(dev); pci_set_drvdata(pdev, NULL); pci_release_regions(pdev); err_out_free_netdev: - unregister_netdev(dev); - kfree(dev); + release_netdev(dev); return -ENODEV; } @@ -1702,7 +1702,7 @@ printk(KERN_INFO "xircom_remove_one(%s)\n", dev->name); unregister_netdev(dev); pci_release_regions(pdev); - kfree(dev); + release_netdev(dev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/typhoon.c b/drivers/net/typhoon.c --- a/drivers/net/typhoon.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/typhoon.c Thu Jun 5 15:51:50 2003 @@ -2476,7 +2476,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); } static struct pci_driver typhoon_driver = { diff -Nru a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c --- a/drivers/net/via-rhine.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/via-rhine.c Thu Jun 5 15:51:50 2003 @@ -1872,7 +1872,7 @@ iounmap((char *)(dev->base_addr)); #endif - kfree(dev); + release_netdev(dev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); } diff -Nru a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c --- a/drivers/net/wireless/airo.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/airo.c Thu Jun 5 15:51:50 2003 @@ -1573,7 +1573,7 @@ release_region( dev->base_addr, 64 ); } del_airo_dev( dev ); - kfree( dev ); + release_netdev( dev ); } EXPORT_SYMBOL(stop_airo_card); diff -Nru a/drivers/net/wireless/orinoco_cs.c b/drivers/net/wireless/orinoco_cs.c --- a/drivers/net/wireless/orinoco_cs.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_cs.c Thu Jun 5 15:51:50 2003 @@ -290,8 +290,9 @@ DEBUG(0, "orinoco_cs: About to unregister net device %p\n", dev); unregister_netdev(dev); - } - kfree(dev); + release_netdev(dev); + } else + kfree(dev); } /* orinoco_cs_detach */ /* diff -Nru a/drivers/net/wireless/orinoco_pci.c b/drivers/net/wireless/orinoco_pci.c --- a/drivers/net/wireless/orinoco_pci.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_pci.c Thu Jun 5 15:51:50 2003 @@ -289,7 +289,7 @@ iounmap((unsigned char *) priv->hw.iobase); pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); pci_disable_device(pdev); } diff -Nru a/drivers/net/wireless/orinoco_tmd.c b/drivers/net/wireless/orinoco_tmd.c --- a/drivers/net/wireless/orinoco_tmd.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/wireless/orinoco_tmd.c Thu Jun 5 15:51:50 2003 @@ -182,7 +182,7 @@ pci_set_drvdata(pdev, NULL); - kfree(dev); + release_netdev(dev); release_region(pci_resource_start(pdev, 2), pci_resource_len(pdev, 2)); diff -Nru a/drivers/net/yellowfin.c b/drivers/net/yellowfin.c --- a/drivers/net/yellowfin.c Thu Jun 5 15:51:50 2003 +++ b/drivers/net/yellowfin.c Thu Jun 5 15:51:50 2003 @@ -1486,7 +1486,7 @@ iounmap ((void *) dev->base_addr); #endif - kfree (dev); + release_netdev (dev); pci_set_drvdata(pdev, NULL); } From davem@redhat.com Sat Jun 7 02:08:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 02:08:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h579872x023757 for ; Sat, 7 Jun 2003 02:08:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA07413; Sat, 7 Jun 2003 02:05:28 -0700 Date: Sat, 07 Jun 2003 02:05:28 -0700 (PDT) Message-Id: <20030607.020528.68152135.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com, viro@parcelfarce.linux.theplanet.co.uk Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> References: <20030606145835.3a263df8.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2941 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Fri, 6 Jun 2003 14:58:35 -0700 Phase I: introduces release_netdev which is the hook to allow later changes to hold onto the net device after the device has potentially unloaded. Includes patch for the easy to fix devices. Besides naming (thought this was going to be named netdev_drop) I have no problems. Al? From davem@redhat.com Sat Jun 7 02:25:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 02:25:26 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h579PM2x025507 for ; Sat, 7 Jun 2003 02:25:22 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA07492; Sat, 7 Jun 2003 02:22:41 -0700 Date: Sat, 07 Jun 2003 02:22:41 -0700 (PDT) Message-Id: <20030607.022241.98862720.davem@redhat.com> To: mk@linux-ipv6.org Cc: netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] fix esp6 extension headers handling From: "David S. Miller" In-Reply-To: <87wufzxe8p.wl@karaba.org> References: <3EDF3EB4.8010105@tml.hut.fi> <873cioqxch.wl@karaba.org> <87wufzxe8p.wl@karaba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2942 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / 神田 充 Date: Sat, 07 Jun 2003 03:17:10 +0900 The attached diff fixes esp6 extension headers handling bug which reported by Henrik. I introduced ip6_find_1stfragopt() instead of get_offset(). # ip6_found_nexthdr() is just renamed ip6_find_1stfragopt() # in order to represent collect functionality. Patch applied, thank you. From davem@redhat.com Sat Jun 7 03:35:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 03:35:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57AZD2x027804 for ; Sat, 7 Jun 2003 03:35:14 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA07711; Sat, 7 Jun 2003 03:30:59 -0700 Date: Sat, 07 Jun 2003 03:30:59 -0700 (PDT) Message-Id: <20030607.033059.48393210.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: "David S. Miller" In-Reply-To: References: <20030603.213458.112594590.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2943 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Wed, 4 Jun 2003 15:40:02 +0300 (EEST) The revised version is attached to this mail, Looks ok, but sorry two things need to be fixed up first: 1) Doesn't apply anymore, I think it's because of the struct sock member renames, just replace sk->foo with sk->sk_foo 2) Just export all those routines from net/ipv6/ipv6_syms.c always, remove the ifdefs. I promise to apply it after you fix this stuff up :))) Thank you. From yoshfuji@linux-ipv6.org Sat Jun 7 03:41:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 03:41:12 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57Af72x028217 for ; Sat, 7 Jun 2003 03:41:07 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h57AfpBo027249; Sat, 7 Jun 2003 19:41:51 +0900 Date: Sat, 07 Jun 2003 19:41:51 +0900 (JST) Message-Id: <20030607.194151.112395246.yoshfuji@linux-ipv6.org> To: davem@redhat.com, vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: [patch]: ipv6 tunnel for MIPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030607.033059.48393210.davem@redhat.com> References: <20030603.213458.112594590.davem@redhat.com> <20030607.033059.48393210.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2944 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030607.033059.48393210.davem@redhat.com> (at Sat, 07 Jun 2003 03:30:59 -0700 (PDT)), "David S. Miller" says: > I promise to apply it after you fix this stuff up :))) Please be sure not to include "for MIPv6" from the changeset. :-) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From shemminger@osdl.org Sat Jun 7 08:25:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 08:25:25 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57FPJ2x002720 for ; Sat, 7 Jun 2003 08:25:20 -0700 Received: from mylinux.hemminger.net (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h57FP3X20042; Sat, 7 Jun 2003 08:25:03 -0700 Date: Sat, 7 Jun 2003 08:25:15 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: jgarzik@pobox.com, netdev@oss.sgi.com, viro@parcelfarce.linux.theplanet.co.uk Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-Id: <20030607082515.6168be46.shemminger@osdl.org> In-Reply-To: <20030607.020528.68152135.davem@redhat.com> References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607.020528.68152135.davem@redhat.com> Organization: OSDL X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2945 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 07 Jun 2003 02:05:28 -0700 (PDT) "David S. Miller" wrote: > From: Stephen Hemminger > Date: Fri, 6 Jun 2003 14:58:35 -0700 > > Phase I: introduces release_netdev which is the hook to allow later > changes to hold onto the net device after the device has potentially > unloaded. Includes patch for the easy to fix devices. > > Besides naming (thought this was going to be named netdev_drop) > I have no problems. > > Al? My (admittedly weak) rational for this was: - it seemed more like part of the register/unregister process and those functions are named {un}register_netdevice - RTNL should not be held, same as unregister_netdev (vs unregister_netdevice which requires it). - release rather than drop because release is used as name in the kobject callback hook But it's easy to change now. From bunk@fs.tum.de Sat Jun 7 12:12:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:12:54 -0700 (PDT) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JCg2x011120 for ; Sat, 7 Jun 2003 12:12:43 -0700 Received: (qmail 5044 invoked from network); 7 Jun 2003 19:12:37 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 7 Jun 2003 19:12:37 -0000 Date: Sat, 7 Jun 2003 21:12:35 +0200 From: Adrian Bunk To: Jon Grimm Cc: Margit Schubert-While , lksctp-developers@lists.sourceforge.net, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: SCTP config 2.5.70(-bk) Message-ID: <20030607191235.GE13377@fs.tum.de> References: <5.1.0.14.2.20030602094232.00aeda18@pop.t-online.de> <20030603130308.GC27168@fs.tum.de> <3EDD0DFC.4080806@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EDD0DFC.4080806@us.ibm.com> User-Agent: Mutt/1.4.1i X-archive-position: 2946 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev On Tue, Jun 03, 2003 at 04:07:08PM -0500, Jon Grimm wrote: > Hi Adrian, Hi Jon, > Sorry for a bit of delay... We are away at an SCTP Interoperability > event. the delay before my answer was bigger... > Adrian Bunk wrote: > >On Mon, Jun 02, 2003 at 09:53:04AM +0200, Margit Schubert-While wrote: > > > > > >>CONFIG_IPV6_SCTP__ is always being set to "y" even though > >>not selected (CONFIG_IPV6 not set) > > > > > >First, this doesn't do any harm since CONFIG_IPV6_SCTP__ alone doensn't > >result in anything getting compiled. > > > >But besides, it seems a bit broken. > > > >From net/sctp/Kconfig: > > > ><-- snip --> > > > >... > > > >config IPV6_SCTP__ > > tristate > > default y if IPV6=n > > default IPV6 if IPV6 > > > >config IP_SCTP > > tristate "The SCTP Protocol (EXPERIMENTAL)" > > depends on IPV6_SCTP__ > >... > > > ><-- snip --> > > > > > >Semantically equivalent is the following for IPV6_SCTP__: > > > >config IPV6_SCTP__ > > tristate > > default y if IPV6=n || IPV6=y > > default m if IPV6=m > > > > > >If it was intended to disallow a static IP_SCTP with a modular IPV6 it > >doesn't work: It's perfectly allowed to set IPV6=n and IP_SCTP=y and > >later compile and install a modular IPV6 for the same kernel. > > > > Are you sure? I vaguely remember one of the network structs having > #ifdef'd fields for v6. Consequently, if one compiles first without, > but the tries later compiles/loads ipv6... bad things happen as the > kernel has a different concept of what the sock is. after reading this at net/Kconfig: <-- snip --> ... # IPv6 as module will cause a CRASH if you try to unload it config IPV6 tristate "The IPv6 protocol (EXPERIMENTAL)" ... <-- snip --> I'm wondering whether it might be an idea to disallow the modular building of IPv6 support? > >Could someone from the SCTP developers comment on the intentions behind > >IPV6_SCTP__ ? > > > > Yes. The intent was to at least discourage a configuration that will > segfault. It's currently discouraged but not completelyt impossible to select... > Thanks, > jon cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From garzik@gtf.org Sat Jun 7 12:15:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:15:27 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JFN2x011431 for ; Sat, 7 Jun 2003 12:15:23 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 595FF6611; Sat, 7 Jun 2003 15:15:22 -0400 (EDT) Date: Sat, 7 Jun 2003 15:15:22 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030607191522.GB3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 2947 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Fri, Jun 06, 2003 at 02:58:35PM -0700, Stephen Hemminger wrote: > This is the first phase of a sequence of patches to resolve network > device reference count issues exposed by the new sysfs interface. > > Phase I: introduces release_netdev which is the hook to allow later > changes to hold onto the net device after the device has potentially > unloaded. Includes patch for the easy to fix devices. > > Phase II: fixes devices that encapsulate network device structure > inside their own structure, or allocate private data in a way > that will break later. I would prefer to fix the drivers _before_ anything else. i.e. Phase 2 becomes Phase 1. These often need to be merged into 2.4 as well, and they can be applied to all drivers without any API changes. The changes are separated out from any refcounting/sysfs stuff, and can (potentially) be considered and reviewed by the respective maintainers. Jeff From garzik@gtf.org Sat Jun 7 12:16:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 12:16:32 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h57JGS2x011749 for ; Sat, 7 Jun 2003 12:16:29 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 984AA6611; Sat, 7 Jun 2003 15:16:28 -0400 (EDT) Date: Sat, 7 Jun 2003 15:16:28 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030607191628.GC3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030606145835.3a263df8.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 2948 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev IOW, release_netdev is basically a search-n-replace change that can be done to drivers anytime. Let's apply the "meat" changes to mainline first, the bug fixes / cleanups to use dynamic alloc. Jeff From ryan@michonline.com Sat Jun 7 18:49:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 07 Jun 2003 18:49:41 -0700 (PDT) Received: from michonline.com (mail@pcp01184054pcs.strl301.mi.comcast.net [68.60.186.73]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h581nU2x022061 for ; Sat, 7 Jun 2003 18:49:30 -0700 Received: from mythical ([10.37.3.11] ident=mail) by michonline.com with esmtp (Exim 3.36 #1 (Debian)) id 19OpJG-0007g7-00; Sat, 07 Jun 2003 21:49:26 -0400 Received: from ryan by mythical with local (Exim 3.36 #1 (Debian)) id 19OpOl-0004yl-00; Sat, 07 Jun 2003 21:55:07 -0400 Date: Sat, 7 Jun 2003 21:55:07 -0400 From: Ryan Anderson To: linux-kernel@vger.kernel.org, Linus Torvalds , "David S. Miller" , kernel-janitor-discuss@lists.sourceforge.net Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Remove K&R prototypes in ppp_deflate.c Message-ID: <20030608015507.GA19133@michonline.com> Mail-Followup-To: linux-kernel@vger.kernel.org, Linus Torvalds , "David S. Miller" , kernel-janitor-discuss@lists.sourceforge.net, netdev@oss.sgi.com References: <20030608003916.GF20872@michonline.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608003916.GF20872@michonline.com> User-Agent: Mutt/1.5.4i X-archive-position: 2949 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ryan@michonline.com Precedence: bulk X-list: netdev I forgot to cc: netdev at first, sorry! On Sat, Jun 07, 2003 at 08:39:16PM -0400, Ryan Anderson wrote: > This patch removes the K&R initializers in ppp_deflate.c in favor of > more modern constructions. > > Once the other zlib cleanups appear to be stabilized, I'll look at > moving those cleanups into ppp_deflate.c as well. > > Dave, I think I sent this to you already once, if it's in your queue > already, please ignore this resend. > > # This is a BitKeeper generated patch for the following project: > # Project Name: Linux kernel tree > # This patch format is intended for GNU patch command version 2.5 or higher. > # This patch includes the following deltas: > # ChangeSet 1.1259 -> 1.1260 > # drivers/net/ppp_deflate.c 1.10 -> 1.11 > # > # The following is the BitKeeper ChangeSet Log > # -------------------------------------------- > # 03/06/02 ryan@mythryan2.(none) 1.1260 > # Remove the use of K&R prototypes from ppp_deflate.c > # -------------------------------------------- > # > diff -Nru a/drivers/net/ppp_deflate.c b/drivers/net/ppp_deflate.c > --- a/drivers/net/ppp_deflate.c Mon Jun 2 09:39:01 2003 > +++ b/drivers/net/ppp_deflate.c Mon Jun 2 09:39:01 2003 > @@ -78,8 +78,7 @@ > static void z_comp_stats __P((void *state, struct compstat *stats)); > > static void > -z_comp_free(arg) > - void *arg; > +z_comp_free(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -95,9 +94,7 @@ > * Allocate space for a compressor. > */ > static void * > -z_comp_alloc(options, opt_len) > - unsigned char *options; > - int opt_len; > +z_comp_alloc(unsigned char *options, int opt_len) > { > struct ppp_deflate_state *state; > int w_size; > @@ -136,10 +133,8 @@ > } > > static int > -z_comp_init(arg, options, opt_len, unit, hdrlen, debug) > - void *arg; > - unsigned char *options; > - int opt_len, unit, hdrlen, debug; > +z_comp_init(void *arg, unsigned char *options, > + int opt_len, int unit, int hdrlen, int debug) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -161,8 +156,7 @@ > } > > static void > -z_comp_reset(arg) > - void *arg; > +z_comp_reset(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -171,11 +165,9 @@ > } > > int > -z_compress(arg, rptr, obuf, isize, osize) > - void *arg; > - unsigned char *rptr; /* uncompressed packet (in) */ > - unsigned char *obuf; /* compressed packet (out) */ > - int isize, osize; > +z_compress(void *arg, unsigned char *rptr, /* uncompressed packet (in) */ > + unsigned char *obuf, /* compressed packet (out) */ > + int isize, int osize) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int r, proto, off, olen, oavail; > @@ -252,9 +244,7 @@ > } > > static void > -z_comp_stats(arg, stats) > - void *arg; > - struct compstat *stats; > +z_comp_stats(void *arg, struct compstat *stats) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -262,8 +252,7 @@ > } > > static void > -z_decomp_free(arg) > - void *arg; > +z_decomp_free(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -279,9 +268,7 @@ > * Allocate space for a decompressor. > */ > static void * > -z_decomp_alloc(options, opt_len) > - unsigned char *options; > - int opt_len; > +z_decomp_alloc(unsigned char *options, int opt_len) > { > struct ppp_deflate_state *state; > int w_size; > @@ -318,10 +305,8 @@ > } > > static int > -z_decomp_init(arg, options, opt_len, unit, hdrlen, mru, debug) > - void *arg; > - unsigned char *options; > - int opt_len, unit, hdrlen, mru, debug; > +z_decomp_init(void *arg, unsigned char *options, > + int opt_len, int unit, int hdrlen, int mru, int debug) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -344,8 +329,7 @@ > } > > static void > -z_decomp_reset(arg) > - void *arg; > +z_decomp_reset(void *arg) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > > @@ -370,12 +354,8 @@ > * compression, even though they are detected by inspecting the input. > */ > int > -z_decompress(arg, ibuf, isize, obuf, osize) > - void *arg; > - unsigned char *ibuf; > - int isize; > - unsigned char *obuf; > - int osize; > +z_decompress(void *arg, unsigned char *ibuf, int isize, > + unsigned char *obuf, int osize) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int olen, seq, r; > @@ -478,10 +458,7 @@ > * Incompressible data has arrived - add it to the history. > */ > static void > -z_incomp(arg, ibuf, icnt) > - void *arg; > - unsigned char *ibuf; > - int icnt; > +z_incomp(void *arg, unsigned char *ibuf, int icnt) > { > struct ppp_deflate_state *state = (struct ppp_deflate_state *) arg; > int proto, r; > > > -- > > Ryan Anderson > sometimes Pug Majere > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ryan Anderson sometimes Pug Majere From davem@redhat.com Sun Jun 8 00:01:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 00:01:24 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5871F2x026737 for ; Sun, 8 Jun 2003 00:01:15 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA08997; Sat, 7 Jun 2003 23:58:26 -0700 Date: Sat, 07 Jun 2003 23:58:25 -0700 (PDT) Message-Id: <20030607.235825.71096085.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030607191522.GB3346@gtf.org> References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607191522.GB3346@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2950 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Sat, 7 Jun 2003 15:15:22 -0400 I would prefer to fix the drivers _before_ anything else. i.e. Phase 2 becomes Phase 1. These often need to be merged into 2.4 as well, and they can be applied to all drivers without any API changes. The changes are separated out from any refcounting/sysfs stuff, and can (potentially) be considered and reviewed by the respective maintainers. Have you extracted out all the init_etherdev() killings Al and myself did so you can backport them to 2.4.x too? If you're not going to do that, there is not much point in trying to sync other such things back to 2.4.x as well. From fw@deneb.enyo.de Sun Jun 8 04:40:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 04:40:19 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58Be42x010062 for ; Sun, 8 Jun 2003 04:40:06 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19OyWb-0001Jh-00; Sun, 08 Jun 2003 13:39:49 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19OyWb-0001RY-FS; Sun, 08 Jun 2003 13:39:49 +0200 To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress References: <87adda6uro.fsf@deneb.enyo.de> <20030526.002934.132904126.davem@redhat.com> <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> From: Florian Weimer Date: Sun, 08 Jun 2003 13:39:49 +0200 In-Reply-To: <20030526.233211.54217447.davem@redhat.com> (David S. Miller's message of "Mon, 26 May 2003 23:32:11 -0700 (PDT)") Message-ID: <87he70re62.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2951 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev "David S. Miller" writes: > Of course, this will result in vastly decreased functionality (no > arbitary netmasks, no policy-based routing, code will be fine-tuned > for typical Internet routing tables), so this proposal definitely > comes at a price. > > As a general purpose operating system, where people DO in fact use > these features quite regularly, Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking devices (and routing protocols!) which support them. 8-/ Anyway, I've played a bit with something inspired by CEF (more precisely speaking, one diagram in the IOS internals book and some IOS diagnostic output). Basically, it's a 256-way trie, with "adjacency information" at the leaves (consisting of L2 addressing information and the prefix length). The leaves contain a full list of child nodes which reference to the leaf itself. This allows for branch-free routing (see below). (A further optimization would not allocate the self-referencing pointers for leaves which are at the fourth layer of the trie, but this is unlikely to have a hughe performance impact.) The trie has 7862 internal nodes for my copy of the Internet routing table, which amounts to 8113584 bytes (excluding memory management overhead, twice the value for 64 bit architectures). The numer of internal nodes does not depend on the number of interfaces/peerings, and prefix filtering based on their lengths (/27 or even /24) doesn't make a huge difference either. For each adjacency, space for the L2 addressing information is required plus 256 pointers for the self-references (of course, for each relevant prefix length, so you have a few kilobytes for a typical peering). The routing function looks like this: struct cef_entry * cef_route (struct cef_table *table, ipv4_t address) { unsigned char octet1 = address >> 24; unsigned char octet2 = (address >> 16) & 0xFF; unsigned char octet3 = (address >> 8) & 0xFF; unsigned char octet4 = address & 0xFF; struct cef_entry * entry1 = table->children[octet1]; struct cef_entry * entry2 = entry1->table[octet2]; struct cef_entry * entry3 = entry2->table[octet3]; struct cef_entry * entry4 = entry3->table[octet4]; return entry4; } For the full routing table with "maximum" adjacency information (different L2 addressing information for each origin AS) and "real-world" addresses (captured at the border of a medium-size network, local addresses filtered), the function needs about 82 cycles per routing decision on my Athlon XP (including function call overhead). For random addresses, we have 155 cycles. In a simulation of a moderate peering (only 94 adjacencies, simulated interfaces to half a dozen AS concentrated in Germany), I measured 45 cycles per routing decision for real-world traffic, and 70 cycles for random addresses. (More peerings result in more adjacencies which lead to fewer cache hits.) You can save 1K (or 2K on 64-bit architectures) per adjacency if you introduce data-dependent branches: struct cef_entry * cef_route (struct cef_table *table, ipv4_t address) { unsigned char octet1 = address >> 24; struct cef_entry * entry1 = table->children[octet1]; if (entry1->prefix_length < 0) { unsigned char octet2 = (address >> 16) & 0xFF; struct cef_entry * entry2 = entry1->table[octet2]; if (entry2->prefix_length < 0) { unsigned char octet3 = (address >> 8) & 0xFF; struct cef_entry * entry3 = entry2->table[octet3]; if (entry3->prefix_length < 0) { unsigned char octet4 = address & 0xFF; struct cef_entry * entry4 = entry3->table[octet4]; return entry4; } else { return entry3; } } else { return entry2; } } else { return entry1; } } However, this decreases performance (even on my Athlon XP with just 256 KB cache). At the moment, I've got a userspace prototype for simulations which can build the trie and make routing decisions. Removing entries is a bit tricky and requires more data because formerly overridden prefixes might have to be resurrected. I'm unsure which data structures should be used to solve this problem. Memory management is a related question, too. And locking. *sigh* From davem@redhat.com Sun Jun 8 05:07:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 05:07:59 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58C7p2x010585 for ; Sun, 8 Jun 2003 05:07:52 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA15712; Sun, 8 Jun 2003 05:05:01 -0700 Date: Sun, 08 Jun 2003 05:05:00 -0700 (PDT) Message-Id: <20030608.050500.28795668.davem@redhat.com> To: fw@deneb.enyo.de Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <87he70re62.fsf@deneb.enyo.de> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2952 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Florian Weimer Date: Sun, 08 Jun 2003 13:39:49 +0200 "David S. Miller" writes: > As a general purpose operating system, where people DO in fact use > these features quite regularly, Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking devices (and routing protocols!) which support them. 8-/ Yes, people use source based routing to block specific IPs and subnets, it's also needed to Mobile IPV4. Anyway, I've played a bit with something inspired by CEF (more precisely speaking, one diagram in the IOS internals book and some IOS diagnostic output). Thanks, Alexey and myself will need to study this deeply. Although, I hope it's not "too similar" to what CEF does because undoubtedly Cisco has a bazillion patents in this area. This is actually an argument for coming up with out own algorithms without any knowledge of what CEF does or might do. :( From fw@deneb.enyo.de Sun Jun 8 06:10:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 06:10:45 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58DAc2x011651 for ; Sun, 8 Jun 2003 06:10:39 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19OzwH-0006NA-00; Sun, 08 Jun 2003 15:10:25 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.14) id 19OzwH-0001c8-Lv; Sun, 08 Jun 2003 15:10:25 +0200 To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> From: Florian Weimer Date: Sun, 08 Jun 2003 15:10:25 +0200 In-Reply-To: <20030608.050500.28795668.davem@redhat.com> (David S. Miller's message of "Sun, 08 Jun 2003 05:05:00 -0700 (PDT)") Message-ID: <874r30r9z2.fsf@deneb.enyo.de> User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2953 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev "David S. Miller" writes: > Although, I hope it's not "too similar" to what CEF does because > undoubtedly Cisco has a bazillion patents in this area. Most things in this area are patented, and the patents are extremely fuzzy (e.g. policy-based routing with hierarchical sequence of decisions has been patented countless times). 8-( > This is actually an argument for coming up with out own algorithms > without any knowledge of what CEF does or might do. :( The branchless variant is not described in the IOS book, and I can't tell if Cisco routers use it. If this idea is really novel, we are in pretty good shape because we no longer use trees, tries or whatever, but a DFA. 8-) Further parameters which could be tweaked is the kind of adjacency information (where to store the L2 information, whether to include the prefix length in the adjacency record etc.). From jmorris@intercode.com.au Sun Jun 8 08:49:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 08:49:05 -0700 (PDT) Received: from blackbird.intercode.com.au (IDENT:+znejfElQ/lNDmBoGnO9/mGdqkwwfswy@blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58Fmw2x024215 for ; Sun, 8 Jun 2003 08:48:59 -0700 Received: from excalibur.intercode.com.au (excalibur.intercode.com.au [203.32.101.12]) by blackbird.intercode.com.au (8.11.6p2/8.9.3) with ESMTP id h58Flrr27183; Mon, 9 Jun 2003 01:47:53 +1000 Date: Mon, 9 Jun 2003 01:47:52 +1000 (EST) From: James Morris To: Kazunori Miyazawa cc: davem@redhat.com, , , Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm In-Reply-To: <20030606144925.29ad2a9f.kazunori@miyazawa.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2954 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, Kazunori Miyazawa wrote: > In output functions, dst is changed by xfrm_lookup if there is > any matching policy. Therefore original dst which is held before > calling xfrm_lookup will be never released. > When xfrm_lookup scceeds and dst is changed, original dst should > be release. It is released in xfrm_lookup(): *dst_p = dst; ip_rt_put(rt); xfrm_pol_put(policy); return 0; - James -- James Morris From pekkas@netcore.fi Sun Jun 8 10:59:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 10:59:21 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58HxD2x025483 for ; Sun, 8 Jun 2003 10:59:14 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h58HwvY03749; Sun, 8 Jun 2003 20:58:58 +0300 Date: Sun, 8 Jun 2003 20:58:57 +0300 (EEST) From: Pekka Savola To: Florian Weimer cc: "David S. Miller" , , Subject: Re: Route cache performance under stress In-Reply-To: <87he70re62.fsf@deneb.enyo.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2955 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Florian Weimer wrote: > "David S. Miller" writes: > > > Of course, this will result in vastly decreased functionality (no > > arbitary netmasks, no policy-based routing, code will be fine-tuned > > for typical Internet routing tables), so this proposal definitely > > comes at a price. > > > > As a general purpose operating system, where people DO in fact use > > these features quite regularly, > > Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking > devices (and routing protocols!) which support them. 8-/ Do you mean netmasks like "255.128.255.0" ? Those are a real abomination and probably not supported.. and I don't know of anything that would require them. Or do you mean netmasks such as 1.1.1.1/19? I don't know of any credible networking devices which wouldn't support them. If so, please come out of the cave. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From sim@netnation.com Sun Jun 8 16:49:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 16:49:33 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h58NnQ2x028846 for ; Sun, 8 Jun 2003 16:49:27 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19P9ug-00037G-9G; Sun, 08 Jun 2003 16:49:26 -0700 Date: Sun, 8 Jun 2003 16:49:26 -0700 From: Simon Kirby To: Florian Weimer Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030608234926.GA9453@netnation.com> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874r30r9z2.fsf@deneb.enyo.de> User-Agent: Mutt/1.5.4i X-archive-position: 2956 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From xerox@foonet.net Sun Jun 8 17:20:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 17:20:49 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h590Kb2x029346 for ; Sun, 8 Jun 2003 17:20:38 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h58Nuveq028645; Sun, 8 Jun 2003 19:56:57 -0400 From: "CIT/Paul" To: "'Simon Kirby'" , "'Florian Weimer'" Cc: , Subject: RE: Route cache performance under stress Date: Sun, 8 Jun 2003 19:55:58 -0400 Organization: CIT Message-ID: <001001c32e19$81bc7ea0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608234926.GA9453@netnation.com> Importance: Normal X-archive-position: 2957 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). Anything with seeminly Random source ips (something like juno-z.101f.c will generate worst case scenario for forwarding packets) will cause the cache to constantly Add new entries at pretty much the rate of the attack.. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what The cpu is with less than 100,000 packets per second and in mosts cases less than 30k.. That's why it's just no acceptable for companies using it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, etc.) which can do 300K+ packet per second of routing (and yes it can even route juno-z.101f.c at 300kpps, I have tested it). Linux has no problem doing 300kpps from a single source to a single destination provided you have NAPI or ITR or something limiting the interrupts.. The overhead is the route cache and the related systems that use it and also netfilter is very slow :/ One of these days they will fix it..... If anyone has any ideas or needs a test-bed to try out code on or would like me to test some of their code I would be happy to test it on our development platforms (single and dual processor with intel e1000 82545/6 and above, also e100 and tulip). Thanks for your time P.S. to answer your iteration question.. It does not seem to be such overhead on the cpu even if the route-cache is 600,000 in size.. I have tested this and while there is a definite increase in cpu it comes nothing close to the code that has to add every new arriving packet to the list. IMHO the best way to do this would be like CEF w/ adjacency lists and not have it add every new packet that comes along Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Simon Kirby Sent: Sunday, June 08, 2003 7:49 PM To: Florian Weimer Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From kazunori@miyazawa.org Sun Jun 8 17:48:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 17:48:29 -0700 (PDT) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h590mO2x029798 for ; Sun, 8 Jun 2003 17:48:24 -0700 Received: from monza.miyazawa.org ([2001:200:1b0:1000:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Mon, 09 Jun 2003 09:46:01 +0900 Date: Mon, 9 Jun 2003 09:49:18 +0900 From: Kazunori Miyazawa To: James Morris Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Message-Id: <20030609094918.3c26d296.kazunori@miyazawa.org> In-Reply-To: References: <20030606144925.29ad2a9f.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.9.0 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2958 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 01:47:52 +1000 (EST) James Morris wrote: > On Fri, 6 Jun 2003, Kazunori Miyazawa wrote: > > > In output functions, dst is changed by xfrm_lookup if there is > > any matching policy. Therefore original dst which is held before > > calling xfrm_lookup will be never released. > > When xfrm_lookup scceeds and dst is changed, original dst should > > be release. > > It is released in xfrm_lookup(): > > *dst_p = dst; > ip_rt_put(rt); > xfrm_pol_put(policy); > return 0; > > I overlooked it. Thank you. --Kazunori Miyazawa (Yokogawa Electric Corporation) From hadi@shell.cyberus.ca Sun Jun 8 18:35:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 18:35:44 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h591Zb2x030372 for ; Sun, 8 Jun 2003 18:35:38 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PBZ0-0008eR-0q; Sun, 08 Jun 2003 21:35:10 -0400 Date: Sun, 8 Jun 2003 21:35:09 -0400 (EDT) From: Jamal Hadi To: Hisham Kotry cc: david-b@pacbell.net, rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program In-Reply-To: <20030603075742.34434.qmail@web14305.mail.yahoo.com> Message-ID: <20030608212033.Y33230@shell.cyberus.ca> References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2959 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 3 Jun 2003, Hisham Kotry wrote: > It was defenitly a nice read, but the netlink2 draft > is somewhat inconsistent, it mentions reducing the > 32-bit length field to 16-bits and equally > distributing the remaining 16-bits between the new > version and extended flags fields, but the draft makes > no further refrence to the version field. Infact the > netlink2 message header diagram on page 16, as well as > the pseudo message on page 28, show a 16-bits extended > flags field with no version field in the header. So > this is probably one of those cases in wich specs > aren't clear enough and code usually has the final > word in such situations. > > I mailed Jamal about this a while ago but never got a > reply back. > apologies, I actually have a unrelated daytime job that tends to keep me too occupied at times ;-> Netlink2 draft is work in progress. The draft tends to lag reality. I believe what you refer to has been fixed. Refer to the slides at: http://www.zurich.ibm.com/~rha/netlink2.pdf > BTW, is netlink2 support planned for linux in the near > future? > You will see code from us that is GPL. Consider netlink2 as a distributed netlink. netlink is already proven so why reinvent the wheel? Essentially you should be able to manager clusters of linux network devices (think firewalls, routers etc) with netlink/netlink2. There are some mechanisms for distributdness that are missing. These are the holes we are going to fill. Note some of the stuff i am working on at: www.cyberus.ca/~hadi/patches/action which fits the whole forces paradigm and works quiet well with netlink today and netlink2 next. (I stopped updating that web page for sometime now, talk to me if interested in the patches and if you would like to help in testing, coding, etc) cheers, jamal From hadi@shell.cyberus.ca Sun Jun 8 20:16:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 20:16:19 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h593G62x031157 for ; Sun, 8 Jun 2003 20:16:07 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PD8M-0008hG-U2; Sun, 08 Jun 2003 23:15:46 -0400 Date: Sun, 8 Jun 2003 23:15:46 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'Florian Weimer'" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> Message-ID: <20030608230300.X33412@shell.cyberus.ca> References: <001001c32e19$81bc7ea0$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2960 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, CIT/Paul wrote: > The problem with the route cache as it stands is that it adds every new > packet that isn't in the route cache to the cache, say you have > A denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst case > scenario for forwarding packets) will cause the cache to constantly > Add new entries at pretty much the rate of the attack.. This can stifle > just about any linux router with a measly 10 megabits/second of traffic > unless foo have you tried the latest patches posted recently? get the latest kernel 2.5.x and try it out. BTW, i dont think it is true you can die with 10mbps. I was reading some emails where someone said it was a few 100 pps that will kill the linux sytem (theory mixed with nonsense;->) > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what > The cpu is with less than 100,000 packets per second and in mosts cases > less than 30k.. btw thats waay above 10Mbps. > That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. The > overhead is the route cache and the related systems that use it and also > netfilter is very slow :/ One of these days they will fix it..... If > anyone has any ideas or needs a test-bed to try out code on or would > like me to test some of their code I would be happy to test it on our > development platforms (single and dual processor with intel e1000 > 82545/6 and above, also e100 and tulip). > I think Robert has some numbers with the new patches with similar setups as you. Why dont you compare how much the cost of a CISCO npex devices with Linux PCs with e1000s as well while you are at it ?;-> I am sure there are people who will like to sell you linux devices at half the cisco prices doing Millions of PPS via hardware assists. Support these linux supporting companies instead ;-> The more i think about it the more i think CEF is a lame escape from route caches. What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). You can avoid the packet drive cache generation event by being a little creative if it gets overwhelming. Fix zebra to resolve each BGP nexthop fully every periodic time. In any case who said forwarding by itself was sexy anymore? cheers, jamal > Thanks for your time > > P.S. to answer your iteration question.. It does not seem to be such > overhead on the cpu even if the route-cache is 600,000 in size.. I have > tested this and while there is a definite increase in cpu it comes > nothing close to the code that has to add every new arriving packet to > the list. IMHO the best way to do this would be like CEF w/ adjacency > lists and not have it add every new packet that comes along > > Paul xerox@foonet.net http://www.httpd.net > > > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Simon Kirby > Sent: Sunday, June 08, 2003 7:49 PM > To: Florian Weimer > Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org > Subject: Re: Route cache performance under stress > > > On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > > > Further parameters which could be tweaked is the kind of adjacency > > information (where to store the L2 information, whether to include the > > > prefix length in the adjacency record etc.). > > What is the problem with the current approach? Does the overhead come > from having to iterate through the hashes for each prefix? > > Simon- > > [ Simon Kirby ][ Network Operations ] > [ sim@netnation.com ][ NetNation Communications Inc. ] > [ Opinions expressed are not necessarily those of my employer. ] > > > > From jgarzik@pobox.com Sun Jun 8 20:52:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 20:52:19 -0700 (PDT) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h593q92x031603 for ; Sun, 8 Jun 2003 20:52:10 -0700 Received: from rdu26-227-011.nc.rr.com ([66.26.227.11] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.14) id 19PDhY-00038h-GA; Mon, 09 Jun 2003 04:52:08 +0100 Message-ID: <3EE4045D.4040002@pobox.com> Date: Sun, 08 Jun 2003 23:51:57 -0400 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup References: <20030606145835.3a263df8.shemminger@osdl.org> <20030607191522.GB3346@gtf.org> <20030607.235825.71096085.davem@redhat.com> In-Reply-To: <20030607.235825.71096085.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2961 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev David S. Miller wrote: > Have you extracted out all the init_etherdev() killings Al and > myself did so you can backport them to 2.4.x too? That's the plan, yes. Jeff From xerox@foonet.net Sun Jun 8 22:28:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:29:07 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595Sq2x032490 for ; Sun, 8 Jun 2003 22:28:53 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h595Smeq001442; Mon, 9 Jun 2003 01:28:48 -0400 From: "CIT/Paul" To: "'Jamal Hadi'" Cc: "'Simon Kirby'" , "'Florian Weimer'" , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 01:27:48 -0400 Organization: CIT Message-ID: <000701c32e47$ddd25290$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> Importance: Normal X-archive-position: 2962 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. Nothing you can do About that except make the cache and everthing related to it wicked faster, OR remove the per packet additions to the cache (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? That's what conntrack does and we all Know how much you despise that heheheh) And yes, you can die with 10mbps......Try putting in some netfilter rules and try putting some basic traffic on it and then hit it with 10mbps of juno-z and see what happens to your cpu. Granted if there is a linux router doing ABSOUTELY NOTHING you might be able to hit 50kpps of juno with dual p3 cpus w/ 512k cache each and tricked out settings for the hash and route cache but you will also drop some packets along the way..Still this is not acceptable yet :> Point me at some decent cost linux hardware assist platforms.. IMHO the only thing that needs hardware assist is the darn route cache (in its entierty) BTW, Juno-z can send 12,000 packets per second or more and it's still 10mbps :> If anyone has any ideas please feel free to e-amil me direct :> Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Jamal Hadi [mailto:hadi@shell.cyberus.ca] Sent: Sunday, June 08, 2003 11:16 PM To: CIT/Paul Cc: 'Simon Kirby'; 'Florian Weimer'; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: RE: Route cache performance under stress On Sun, 8 Jun 2003, CIT/Paul wrote: > The problem with the route cache as it stands is that it adds every > new packet that isn't in the route cache to the cache, say you have A > denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst > case scenario for forwarding packets) will cause the cache to > constantly Add new entries at pretty much the rate of the attack.. > This can stifle just about any linux router with a measly 10 > megabits/second of traffic unless foo have you tried the latest patches posted recently? get the latest kernel 2.5.x and try it out. BTW, i dont think it is true you can die with 10mbps. I was reading some emails where someone said it was a few 100 pps that will kill the linux sytem (theory mixed with nonsense;->) > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what The cpu is with less than 100,000 packets per second and in mosts > cases less than 30k.. btw thats waay above 10Mbps. > That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. > The overhead is the route cache and the related systems that use it > and also netfilter is very slow :/ One of these days they will fix > it..... If anyone has any ideas or needs a test-bed to try out code on > or would like me to test some of their code I would be happy to test > it on our development platforms (single and dual processor with intel > e1000 82545/6 and above, also e100 and tulip). > I think Robert has some numbers with the new patches with similar setups as you. Why dont you compare how much the cost of a CISCO npex devices with Linux PCs with e1000s as well while you are at it ?;-> I am sure there are people who will like to sell you linux devices at half the cisco prices doing Millions of PPS via hardware assists. Support these linux supporting companies instead ;-> The more i think about it the more i think CEF is a lame escape from route caches. What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). You can avoid the packet drive cache generation event by being a little creative if it gets overwhelming. Fix zebra to resolve each BGP nexthop fully every periodic time. In any case who said forwarding by itself was sexy anymore? cheers, jamal > Thanks for your time > > P.S. to answer your iteration question.. It does not seem to be such > overhead on the cpu even if the route-cache is 600,000 in size.. I > have tested this and while there is a definite increase in cpu it > comes nothing close to the code that has to add every new arriving > packet to the list. IMHO the best way to do this would be like CEF w/ > adjacency lists and not have it add every new packet that comes along > > Paul xerox@foonet.net http://www.httpd.net > > > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Simon Kirby > Sent: Sunday, June 08, 2003 7:49 PM > To: Florian Weimer > Cc: netdev@oss.sgi.com; linux-net@vger.kernel.org > Subject: Re: Route cache performance under stress > > > On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > > > Further parameters which could be tweaked is the kind of adjacency > > information (where to store the L2 information, whether to include > > the > > > prefix length in the adjacency record etc.). > > What is the problem with the current approach? Does the overhead come > from having to iterate through the hashes for each prefix? > > Simon- > > [ Simon Kirby ][ Network Operations ] > [ sim@netnation.com ][ NetNation Communications Inc. ] > [ Opinions expressed are not necessarily those of my employer. ] > > > > From davem@redhat.com Sun Jun 8 22:41:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:41:34 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595fO2x000407 for ; Sun, 8 Jun 2003 22:41:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16685; Sun, 8 Jun 2003 22:38:25 -0700 Date: Sun, 08 Jun 2003 22:38:25 -0700 (PDT) Message-Id: <20030608.223825.104049415.davem@redhat.com> To: sim@netnation.com Cc: fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030608234926.GA9453@netnation.com> References: <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> <20030608234926.GA9453@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2963 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 16:49:26 -0700 On Sun, Jun 08, 2003 at 03:10:25PM +0200, Florian Weimer wrote: > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). What is the problem with the current approach? Does the overhead come from having to iterate through the hashes for each prefix? It comes from doing the slow path, which actually had a bug (wouldn't grow the hash tables past a certain point). I bet most of Florian's performance problems go away if he runs with the fib_hash fix that I put into the tree. In fact, the current slow path is _OPTIMAL_ for any sane routing table. The lookups are exactly O(n_prefixes) where n_prefixes in the number of unique subnet prefixes you've added to your routing table. This is precisely the same complexity as you'd get with a trie based approach with guarenteed depth not exceeding 32. I think most people are unaware of how the slow path we have actually works. The place I see bugs are in routing cache GC operation, it can't keep up with how fast we can create new routing cache entries, and this is merely because it isn't tuned not because it is not capable of keeping equilibrium properly. This is why I really wish Florian would explore this area instead of ripping the whole thing apart :-) From davem@redhat.com Sun Jun 8 22:47:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:48:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595lo2x000740 for ; Sun, 8 Jun 2003 22:47:51 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16705; Sun, 8 Jun 2003 22:44:47 -0700 Date: Sun, 08 Jun 2003 22:44:46 -0700 (PDT) Message-Id: <20030608.224446.78724665.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2964 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Sun, 8 Jun 2003 19:55:58 -0400 The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). We perform now rather acceptibly in such scenerios. Robert Olsson has demonstrated that even if the attacker could fill up your entire bandwidth with random source address packets, we'd still provide 50kpps routing speed. And this can be made much higher because the performance limiter is the routing cache GC which isn't tuned properly. It can't keep up because it doesn't try to purge the right amount entries each pass. All the performance problems I've seen have been algorithmic or outright bugs. Bad hash functions and limits in how big the FIB hash tables would grow. And what's left is fixing GC. There is nothing AT ALL fundamental about a routing cache that precludes it from behaving sanely in the presence of a random source address DoS load. Absolutely NOTHING. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless Not true, that happens because of BUGs. Not because routing caches cannot behave sanely in such situations. The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what And today, this is because of BUGs in how the GC works. You can design the GC process so that it does the right thing and recycles only the DoS entries (those being very non-localized). You should interact with Robert Olsson who has been doing tests on the effect of gigabit rate full-on DoS runs where every packet creates a new routing cache entry. Franks a lot, David S. Miller davem@redhat.com From xerox@foonet.net Sun Jun 8 22:52:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:52:54 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595qm2x001135 for ; Sun, 8 Jun 2003 22:52:49 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h595qjeq027523; Mon, 9 Jun 2003 01:52:45 -0400 From: "CIT/Paul" To: "'David S. Miller'" Cc: , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 01:51:45 -0400 Organization: CIT Message-ID: <001501c32e4b$35d67d60$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608.224446.78724665.davem@redhat.com> Importance: Normal X-archive-position: 2965 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I'd love to test this out.. If it could do full gigabit line rate with random ips that would be soooooooo nice :> We wouldn't have to have so many routers any more!! :) Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: David S. Miller [mailto:davem@redhat.com] Sent: Monday, June 09, 2003 1:45 AM To: xerox@foonet.net Cc: sim@netnation.com; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "CIT/Paul" Date: Sun, 8 Jun 2003 19:55:58 -0400 The problem with the route cache as it stands is that it adds every new packet that isn't in the route cache to the cache, say you have A denial of service attack going on, OR you just have millions of hosts going through the router (if you were an ISP). We perform now rather acceptibly in such scenerios. Robert Olsson has demonstrated that even if the attacker could fill up your entire bandwidth with random source address packets, we'd still provide 50kpps routing speed. And this can be made much higher because the performance limiter is the routing cache GC which isn't tuned properly. It can't keep up because it doesn't try to purge the right amount entries each pass. All the performance problems I've seen have been algorithmic or outright bugs. Bad hash functions and limits in how big the FIB hash tables would grow. And what's left is fixing GC. There is nothing AT ALL fundamental about a routing cache that precludes it from behaving sanely in the presence of a random source address DoS load. Absolutely NOTHING. This can stifle just about any linux router with a measly 10 megabits/second of traffic unless Not true, that happens because of BUGs. Not because routing caches cannot behave sanely in such situations. The router is tuned up to a large degree (NAPI, certain nics, route cache timings, etc.) and even then it can still be destroyed no matter what And today, this is because of BUGs in how the GC works. You can design the GC process so that it does the right thing and recycles only the DoS entries (those being very non-localized). You should interact with Robert Olsson who has been doing tests on the effect of gigabit rate full-on DoS runs where every packet creates a new routing cache entry. Franks a lot, David S. Miller davem@redhat.com From davem@redhat.com Sun Jun 8 22:56:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 22:56:12 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h595u82x001454 for ; Sun, 8 Jun 2003 22:56:08 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16763; Sun, 8 Jun 2003 22:53:10 -0700 Date: Sun, 08 Jun 2003 22:53:09 -0700 (PDT) Message-Id: <20030608.225309.39172149.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <3EE4045D.4040002@pobox.com> References: <20030607191522.GB3346@gtf.org> <20030607.235825.71096085.davem@redhat.com> <3EE4045D.4040002@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2966 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Sun, 08 Jun 2003 23:51:57 -0400 David S. Miller wrote: > Have you extracted out all the init_etherdev() killings Al and > myself did so you can backport them to 2.4.x too? That's the plan, yes. That's your plan, but did you do any of this yet? It'll keep going deeper and deeper into bitkeeper history the longer that you wait :-) From davem@redhat.com Sun Jun 8 23:01:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:01:52 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5961h2x001807 for ; Sun, 8 Jun 2003 23:01:44 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA16798; Sun, 8 Jun 2003 22:58:37 -0700 Date: Sun, 08 Jun 2003 22:58:37 -0700 (PDT) Message-Id: <20030608.225837.115923841.davem@redhat.com> To: xerox@foonet.net Cc: hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <000701c32e47$ddd25290$4a00000a@badass> References: <20030608230300.X33412@shell.cyberus.ca> <000701c32e47$ddd25290$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2967 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:27:48 -0400 Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. So you make packets that do things like this GC the oldest (LRU) routing cache entry. This isn't rocket science, and well behaved flows will still get all the benefits of the routing cache. The only person penalized will be the attacker since his routing cache entries will purge out quickly and as a response to HIS traffic. Nothing you can do No, there are many things we can do. Prove to me that routing caches are unable to behave acceptibly in random source address DoS situations. (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? Because %99 of traffic is well behaved flows, trains of packets. Even the most loaded core routers see flow lifetimes of at least 8 or 9 packets. Even if the flows lasted 3 packets, the input route lookup work saved (source address validation in particular, which requires access to a centralized global table and thus does not scale well on SMP) is entriely worth it. From davem@redhat.com Sun Jun 8 23:06:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:06:37 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5966W2x002155 for ; Sun, 8 Jun 2003 23:06:33 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16822; Sun, 8 Jun 2003 23:03:32 -0700 Date: Sun, 08 Jun 2003 23:03:32 -0700 (PDT) Message-Id: <20030608.230332.48514434.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001501c32e4b$35d67d60$4a00000a@badass> References: <20030608.224446.78724665.davem@redhat.com> <001501c32e4b$35d67d60$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2968 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:51:45 -0400 I'd love to test this out.. If it could do full gigabit line rate with random ips that would be soooooooo nice :> It isn't impossible with the current design, that I am quire sure of. Here is a simple idea, make the routing cache miss case steal an entry sitting at the end of the hash chain this new one will map to. It only steals entries which have not been recently used. The big problem area on SMP is fib_validate_source. I'm sure some clear thinking can wipe that off the profiles too. From xerox@foonet.net Sun Jun 8 23:29:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:29:39 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596TX2x002674 for ; Sun, 8 Jun 2003 23:29:33 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h596TTeq022703; Mon, 9 Jun 2003 02:29:29 -0400 From: "CIT/Paul" To: "'David S. Miller'" Cc: , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 02:28:30 -0400 Organization: CIT Message-ID: <001801c32e50$57ef0750$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <20030608.225837.115923841.davem@redhat.com> Importance: Normal X-archive-position: 2970 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev OK so let's try this.. If you can show me a linux router can can route 100mbps or more of juno-z.101f.c attack without dropping packets I will be thoroughly impressed :) I am willing to test out any code/patches and settings that you can think of and post some results.. Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: David S. Miller [mailto:davem@redhat.com] Sent: Monday, June 09, 2003 1:59 AM To: xerox@foonet.net Cc: hadi@shell.cyberus.ca; sim@netnation.com; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "CIT/Paul" Date: Mon, 9 Jun 2003 01:27:48 -0400 Ahah Jamal!! Yes I have tried.. It does absoutely nothing for the constant randomness of packets. It increases the overall distribution of the hash in the cache but it does nothing for the addition of new packets.. Try fowarding packets generated by juno-z.101f.c and it adds EVERY packet to the route cache.. Every one. And at 30,000 pps It destroys the cache because every single packet coming in is NOT in the route cache because it's random ips. So you make packets that do things like this GC the oldest (LRU) routing cache entry. This isn't rocket science, and well behaved flows will still get all the benefits of the routing cache. The only person penalized will be the attacker since his routing cache entries will purge out quickly and as a response to HIS traffic. Nothing you can do No, there are many things we can do. Prove to me that routing caches are unable to behave acceptibly in random source address DoS situations. (I'm not Even sure why this is necessary anyway.. Who would want to add every single src/dst flow to a cache? Because %99 of traffic is well behaved flows, trains of packets. Even the most loaded core routers see flow lifetimes of at least 8 or 9 packets. Even if the flows lasted 3 packets, the input route lookup work saved (source address validation in particular, which requires access to a centralized global table and thus does not scale well on SMP) is entriely worth it. From davem@redhat.com Sun Jun 8 23:28:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:28:50 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596Sd2x002595 for ; Sun, 8 Jun 2003 23:28:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16891; Sun, 8 Jun 2003 23:25:37 -0700 Date: Sun, 08 Jun 2003 23:25:37 -0700 (PDT) Message-Id: <20030608.232537.102562046.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2969 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Sun, 8 Jun 2003 23:15:46 -0400 (EDT) The more i think about it the more i think CEF is a lame escape from route caches. It is one perspective :-) What we need is multi-tries at the slow path and perhaps a binary tree on hash collisions buckets of the dst cache (instead of a linked list). I do not believe that slow path is slow. In fact after I fixed hash table growth in fib_hash.c Simon showed us clearly how DoS performance was _NOT_ tied to the number of routes loaded into the kernel. What is slow are things like fib_validate_source() on SMP and the GC (and some other things, I need to study Simon's profiles more deeply). The GC is aparently really badly behaved now during DoS like traffic. My main current quick idea is to make rt_intern_hash() attempt to flush out entries in the same hash chain instead of allocating new entries. I also question the setting of ip_rt_max_size in relation to the number of hash chains (it's set to n_hashchains * 16 currently, that sounds wrong, maybe something more like n_hashchains * 2 or even n_hashchains * 3). I'll try to cook up a patch to test. We might even be able to kill of route cache GC entriely if this scheme works well. From davem@redhat.com Sun Jun 8 23:31:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:31:35 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596VU2x003089 for ; Sun, 8 Jun 2003 23:31:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16930; Sun, 8 Jun 2003 23:28:28 -0700 Date: Sun, 08 Jun 2003 23:28:27 -0700 (PDT) Message-Id: <20030608.232827.88487519.davem@redhat.com> To: xerox@foonet.net Cc: hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <001801c32e50$57ef0750$4a00000a@badass> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2971 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 02:28:30 -0400 OK so let's try this.. If you can show me a linux router can can route 100mbps or more of juno-z.101f.c attack without dropping packets I will be thoroughly impressed :) I am willing to test out any code/patches and settings that you can think of and post some results.. Ok, Robert are you willing to help too? :-) From sim@netnation.com Sun Jun 8 23:47:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:47:25 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596lJ2x003568 for ; Sun, 8 Jun 2003 23:47:20 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGR5-00065R-4W; Sun, 08 Jun 2003 23:47:19 -0700 Date: Sun, 8 Jun 2003 23:47:19 -0700 From: Simon Kirby To: CIT/Paul Cc: "'Florian Weimer'" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609064719.GA20613@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001001c32e19$81bc7ea0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2972 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 07:55:58PM -0400, CIT/Paul wrote: > A denial of service attack going on, OR you just have millions of hosts > going through the router (if you were an ISP). Anything with seeminly > Random source ips (something like juno-z.101f.c will generate worst case > scenario for forwarding packets) will cause the cache to constantly > Add new entries at pretty much the rate of the attack.. This can stifle > just about any linux router with a measly 10 megabits/second of traffic > unless > The router is tuned up to a large degree (NAPI, certain nics, route > cache timings, etc.) and even then it can still be destroyed no matter > what > The cpu is with less than 100,000 packets per second and in mosts cases > less than 30k.. That's why it's just no acceptable for companies using > it as a replacement for say a cisco 7200 VXR series (npe300,400 nsf-1, > etc.) which can do 300K+ packet per second of routing (and yes it can > even route juno-z.101f.c at 300kpps, I have tested it). Linux has no > problem doing 300kpps from a single source to a single destination > provided you have NAPI or ITR or something limiting the interrupts.. The > overhead is the route cache and the related systems that use it and also > netfilter is very slow :/ One of these days they will fix it..... If Whoa, wait a second. You got a 7200 VXR to do 300kpps? I would have liked to see that. We couldn't get our 7206 VXR routers to do anything more than about 12 Mbit/second of small packets, which I believe is about 40,000 packets per second. This is with CEF disabled, because it ended up duplicating packets and doing some other strange things with CEF enabled. Also, I remember trying with a bucketload of netfilter rules and finding that the performance difference was hardly noticeable. Linux can route small packets with random src/dst at much faster than 10 Mbit/sec. It depeends on the hardware as you say, but it shouldn't ever be that slow on reasonable hardware. I remember back even in 1998 with the 2.0 kernel (before the route cache existed) on a Celeron 300A with eepro100 cards (eepro100 driver, no interrupt coalescing, definitely no NAPI) was cable of routing at least 20 Mbit/second of SYN packets from random sources. In fact, I remember it happily choking some old 3Com switches we had at the time. I recently saw 90 Mbit/second of additional traffic (small packets with random sources) going through our routers (now single Athlon 1800MP (MP for APIC), tg3, NAPI, BGP routing tables), and they didn't seem to care. It's definitely not yet perfect, but it's not bad. The hashing fixes for large routing tables which Dave M. recently posted has made the situation much better -- it was very broken before. What did your routing table look like when you were doing tests? I have fiddled with the route cache garbage collection parameters a bit, but I haven't really been able to reduce the CPU usage by much at all. Really, though, shouldn't the route cache overhead be fairly small in comparison to everything else involved in forwarding? Simon- From sim@netnation.com Sun Jun 8 23:52:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:52:14 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596qB2x003901 for ; Sun, 8 Jun 2003 23:52:11 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGVn-00067g-2Y; Sun, 08 Jun 2003 23:52:11 -0700 Date: Sun, 8 Jun 2003 23:52:11 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609065211.GB20613@netnation.com> References: <20030608.224446.78724665.davem@redhat.com> <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.230332.48514434.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2973 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:03:32PM -0700, David S. Miller wrote: > I'd love to test this out.. If it could do full gigabit line rate with > random ips that would be soooooooo nice :> Agreed. :) > It isn't impossible with the current design, that I am > quire sure of. > > Here is a simple idea, make the routing cache miss case steal > an entry sitting at the end of the hash chain this new one will > map to. It only steals entries which have not been recently used. I just asked whether this was possible in a previous email, but you must have missed it. I am seeing a lot of memory management stuff in profiles, so I think recycling routing cache entries (if only when the table is full and the garbage collector would otherwise need to run) would be very helpful. Is it possible to get a good guess of what cache entry to recycle without walking for a while or without some kind of LRU? > The big problem area on SMP is fib_validate_source. I'm sure some > clear thinking can wipe that off the profiles too. Not running the important stuff with SMP yet, so I don't care about this at the moment. O:) Simon- From davem@redhat.com Sun Jun 8 23:52:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:52:51 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596ql2x003985 for ; Sun, 8 Jun 2003 23:52:48 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA16999; Sun, 8 Jun 2003 23:49:46 -0700 Date: Sun, 08 Jun 2003 23:49:46 -0700 (PDT) Message-Id: <20030608.234946.35677224.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609064719.GA20613@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2974 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:47:19 -0700 Really, though, shouldn't the route cache overhead be fairly small in comparison to everything else involved in forwarding? If GC is just doing dumb things, it is possible. These costs can be hidden in non-rtcache places in the form of cache misses and displacement on rtcache objects which can show up as higher costs in other places. From davem@redhat.com Sun Jun 8 23:59:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 08 Jun 2003 23:59:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596xO2x004566 for ; Sun, 8 Jun 2003 23:59:24 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA17051; Sun, 8 Jun 2003 23:56:23 -0700 Date: Sun, 08 Jun 2003 23:56:22 -0700 (PDT) Message-Id: <20030608.235622.38700262.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609065211.GB20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2975 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:52:11 -0700 On Sun, Jun 08, 2003 at 11:03:32PM -0700, David S. Miller wrote: > Here is a simple idea, make the routing cache miss case steal > an entry sitting at the end of the hash chain this new one will > map to. It only steals entries which have not been recently used. I just asked whether this was possible in a previous email, but you must have missed it. I am seeing a lot of memory management stuff in profiles, so I think recycling routing cache entries (if only when the table is full and the garbage collector would otherwise need to run) would be very helpful. Yes, indeed. Is it possible to get a good guess of what cache entry to recycle without walking for a while or without some kind of LRU? This is what my (and therefore your) suggested scheme is trying to do. We have to walk the entire destination hash chain _ANYWAYS_ to verify that a matching entry has not been put into the cache while we were procuring the new one. During this walk we can also choose a candidate rtcache entry to free. Something like the patch at the end of this email, doesn't compile it's just a work in progress. The trick is picking TIMEOUT1 and TIMEOUT2 :) Another point is that the default ip_rt_gc_min_interval is absolutely horrible for DoS like attacks. When DoS traffic can fill the rtcache multiple times per second, using a GC interval of 5 seconds is the worst possible choice. :) When I see things like this, I can only come to the conclusion that the tuning Alexey originally did when coding up the rtcache merely needs to be scaled up to modern day packet rates. --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Sun Jun 8 23:45:47 2003 @@ -717,14 +717,15 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp) { - struct rtable *rth, **rthp; - unsigned long now = jiffies; + struct rtable *rth, **rthp, *cand, **candp; + unsigned long now = jiffies, cand_use = now; int attempts = !in_softirq(); restart: rthp = &rt_hash_table[hash].chain; spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; while ((rth = *rthp) != NULL) { if (compare_keys(&rth->fl, &rt->fl)) { /* Put it first */ @@ -753,7 +754,21 @@ return 0; } + if (rt_may_expire(rth, TIMEOUT1, TIMEOUT2)) { + unsigned long this_use = rth->u.dst.lastuse; + + if (time_before_eq(this_use, cand_use)) { + cand = rth; + candp = rthp; + cand_use = this_use; + } + } rthp = &rth->u.rt_next; + } + + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); } /* Try to bind route to arp only if it is output From sim@netnation.com Sun Jun 8 23:59:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:00:01 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h596xu2x004600 for ; Sun, 8 Jun 2003 23:59:56 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGdH-0006FW-RS; Sun, 08 Jun 2003 23:59:55 -0700 Date: Sun, 8 Jun 2003 23:59:55 -0700 From: Simon Kirby To: "David S. Miller" Cc: hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609065955.GC20613@netnation.com> References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> <20030608.232537.102562046.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.232537.102562046.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2976 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:25:37PM -0700, David S. Miller wrote: > I do not believe that slow path is slow. In fact after I fixed > hash table growth in fib_hash.c Simon showed us clearly how DoS > performance was _NOT_ tied to the number of routes loaded into > the kernel. Not anymore. :) Btw, that patch seems to be stable here. Will we be seeing it sneak into 2.4? > My main current quick idea is to make rt_intern_hash() attempt > to flush out entries in the same hash chain instead of allocating > new entries. > > I also question the setting of ip_rt_max_size in relation to the > number of hash chains (it's set to n_hashchains * 16 currently, > that sounds wrong, maybe something more like n_hashchains * 2 or > even n_hashchains * 3). The route cache on our routers here grows to several thousand entries most of the time because of the quantity of traffic we route, and then all gets happily blown away when the next BGP table change comes along, which seems to happen about 10-20 times per miunte (!). It would probably be beneficial for us to reduce the amount of work required when blowing it away and keep it as small as possible. > I'll try to cook up a patch to test. We might even be able to Woohoo! > kill of route cache GC entriely if this scheme works well. I asked Alexey about this before and he mentioned it was there because it made a big difference in processing latency to postpone cleanup to a GC run. It should be possible to do recycling only when the table is full (when the box is getting smashed). This way latencies would be lowest in the common case and it would recycle and not have spurts of GC latency in the DoS case. Simon- From davem@redhat.com Mon Jun 9 00:06:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:06:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597662x005309 for ; Mon, 9 Jun 2003 00:06:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA17107; Mon, 9 Jun 2003 00:03:01 -0700 Date: Mon, 09 Jun 2003 00:03:00 -0700 (PDT) Message-Id: <20030609.000300.35030075.davem@redhat.com> To: sim@netnation.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609065955.GC20613@netnation.com> References: <20030608230300.X33412@shell.cyberus.ca> <20030608.232537.102562046.davem@redhat.com> <20030609065955.GC20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2977 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Sun, 8 Jun 2003 23:59:55 -0700 On Sun, Jun 08, 2003 at 11:25:37PM -0700, David S. Miller wrote: > I do not believe that slow path is slow. In fact after I fixed > hash table growth in fib_hash.c Simon showed us clearly how DoS > performance was _NOT_ tied to the number of routes loaded into > the kernel. Not anymore. :) Btw, that patch seems to be stable here. Will we be seeing it sneak into 2.4? Yes, 2.4.22-pre1 will get it or somewhere thereabouts. > I also question the setting of ip_rt_max_size in relation to the > number of hash chains (it's set to n_hashchains * 16 currently, > that sounds wrong, maybe something more like n_hashchains * 2 or > even n_hashchains * 3). The route cache on our routers here grows to several thousand entries most of the time because of the quantity of traffic we route, and then all gets happily blown away when the next BGP table change comes along, which seems to happen about 10-20 times per miunte (!). It would probably be beneficial for us to reduce the amount of work required when blowing it away and keep it as small as possible. This is simple, by using a generation count. When route lookup sees a matching entry with a stale generation count, we pass this entry as-is into ip_route_{input,output}_slow() and use it instead of allocating new entry. It is the same trick as used by the flow cache. I'll code this up as well. > kill of route cache GC entriely if this scheme works well. I asked Alexey about this before and he mentioned it was there because it made a big difference in processing latency to postpone cleanup to a GC run. The problem is that GC cannot currently keep up with DoS like traffic pattern. As a result, routing latency is not smooth at all, you get spikes because each GC run goes for up to an entire jiffie because it has so much work to do. Meanwhile, during this expensive GC processing, packet processing is frozen on UP system. net/core/flow.c:flow_cache_lookup() is instructive, it implements several of these ideas being discussed today. From sim@netnation.com Mon Jun 9 00:13:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:13:51 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597DV2x006021 for ; Mon, 9 Jun 2003 00:13:31 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PGqQ-0006RC-Je; Mon, 09 Jun 2003 00:13:30 -0700 Date: Mon, 9 Jun 2003 00:13:30 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609071330.GD20613@netnation.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001801c32e50$57ef0750$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2978 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > OK so let's try this.. If you can show me a linux router can can route > 100mbps or more of juno-z.101f.c attack without dropping packets I will > be thoroughly impressed :) > > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) In the tests I've been doing so far, I've been dropping responses (in the INPUT chain), so I haven't been testing the forwarding through of packets (though it is testing the routing input). I'll see if I can set up a router, target, and DoS box. I haven't been able to get juno-z.101f.c to saturate 100 Mbit/sec outgoing, but I've only tried it on eepro100 boxes. Has anybody got it to send more? Mmm, need more tg3 cards... Simon- From sim@netnation.com Mon Jun 9 00:36:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 00:36:54 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h597ai2x007239 for ; Mon, 9 Jun 2003 00:36:45 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHCu-0006gK-5x; Mon, 09 Jun 2003 00:36:44 -0700 Date: Mon, 9 Jun 2003 00:36:44 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609073644.GE20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.235622.38700262.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2979 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > We have to walk the entire destination hash chain _ANYWAYS_ to verify > that a matching entry has not been put into the cache while we were > procuring the new one. During this walk we can also choose a > candidate rtcache entry to free. Ah, neat. I should try reading this stuff. :) > Something like the patch at the end of this email, doesn't compile > it's just a work in progress. The trick is picking TIMEOUT1 and > TIMEOUT2 :) > > Another point is that the default ip_rt_gc_min_interval is > absolutely horrible for DoS like attacks. When DoS traffic > can fill the rtcache multiple times per second, using a GC > interval of 5 seconds is the worst possible choice. :) Yes, I've reduced the gc_min_interval to 1, and it has been that way for some time. BTW, you may be interested in this old email from Alexey: http://www.tux.org/hypermail/linux-kernel/1999week05/1113.html (This was back when the GC was limited so much that legitimate traffic was overflowing the table. DoS attacks must have been really effective then. :)) Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From xerox@foonet.net Mon Jun 9 01:12:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:12:05 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598Bx2x008585 for ; Mon, 9 Jun 2003 01:12:00 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h598Bseq021814; Mon, 9 Jun 2003 04:11:55 -0400 From: "CIT/Paul" To: "'Simon Kirby'" Cc: "'David S. Miller'" , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 04:10:55 -0400 Organization: CIT Message-ID: <000401c32e5e$a707b6d0$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030609071330.GD20613@netnation.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 2980 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 1.26 ghz routers.. I can't even send 50mbit of this though one of my routers Without it using 100% of both cpus because of the route cache.. It goes up to 500,000 entries if I let it and it adds 80,000 new entries per second and they are all cache misses.. I'd be glad to show you the setup sometime :) I showed it to jamal and we tested some stuff. Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Simon Kirby [mailto:sim@netnation.com] Sent: Monday, June 09, 2003 3:14 AM To: CIT/Paul Cc: 'David S. Miller'; hadi@shell.cyberus.ca; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > OK so let's try this.. If you can show me a linux router can can route > 100mbps or more of juno-z.101f.c attack without dropping packets I > will be thoroughly impressed :) > > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) In the tests I've been doing so far, I've been dropping responses (in the INPUT chain), so I haven't been testing the forwarding through of packets (though it is testing the routing input). I'll see if I can set up a router, target, and DoS box. I haven't been able to get juno-z.101f.c to saturate 100 Mbit/sec outgoing, but I've only tried it on eepro100 boxes. Has anybody got it to send more? Mmm, need more tg3 cards... Simon- From sim@netnation.com Mon Jun 9 01:18:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:18:11 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598I32x009001 for ; Mon, 9 Jun 2003 01:18:04 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHqt-00075s-2l; Mon, 09 Jun 2003 01:18:03 -0700 Date: Mon, 9 Jun 2003 01:18:03 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609081803.GF20613@netnation.com> References: <001501c32e4b$35d67d60$4a00000a@badass> <20030608.230332.48514434.davem@redhat.com> <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030608.235622.38700262.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2981 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > + if (cand) { > + *candp = cand->u.rt_next; > + rt_free(cand); > } Hmm...It looks like this is still freeing the entry.. Is it possible to recycle the dst without reallocating it? This is the end of the time-sorted profile output of the test box saturated by incoming juno packets (firewalled in INPUT chain to avoid responses to spoofed src IPs), NAPI 100% of the time, tg3: 158 tg3_poll 0.5197 1630 ip_rcv_finish 2.8348 142 ipv4_dst_destroy 2.9583 429 fib_rules_policy 3.8304 8959 ip_route_input_slow 3.8885 2438 ip_rcv 4.3536 2504 alloc_skb 5.2167 1991 __kfree_skb 5.4103 2279 netif_receive_skb 5.6975 929 skb_release_data 6.4514 669 ip_local_deliver 6.9688 1175 __constant_c_and_count_memset 7.3438 2367 tcp_match 7.3969 124 kmem_cache_alloc 7.7500 4535 fib_validate_source 8.0982 598 __fib_res_prefsrc 9.3438 8896 rt_garbage_collect 9.4237 3582 inet_select_addr 9.7337 1747 kfree 9.9261 717 ipt_hook 11.2031 938 kmalloc 11.7250 1747 jhash_3words 12.1319 6879 nf_hook_slow 12.6452 2439 eth_type_trans 12.7031 1695 kfree_skbmem 13.2422 2358 nf_iterate 13.3977 872 rt_hash_code 13.6250 2933 fib_semantic_match 14.1010 16553 ipt_do_table 14.9937 15339 tg3_rx 16.2489 2482 tg3_recycle_rx 17.2361 5967 __kmem_cache_alloc 18.6469 1237 ipt_route_hook 19.3281 3120 do_gettimeofday 21.6667 8299 ip_packet_match 24.6994 8031 fib_lookup 25.0969 1877 fib_rule_put 29.3281 6088 dst_destroy 34.5909 26833 rt_intern_hash 34.9388 10666 kmem_cache_free 66.6625 20193 fn_hash_lookup 70.1146 10516 dst_alloc 73.0278 64803 ip_route_input 150.0069 This is with a routing table of 300,000 entries (though only one prefix) and with your hash fix patch. ip_route_input is still highest, but dst_alloc is an obvious second. ip_route_input is actually always the highest (excluding the IRQ handling stuff), and doesn't seem to change at all based on routing table size. http://blue.netnation.com/sim/ref/ Simon- [ Simon Kirby ][ Network Operations ] [ sim@netnation.com ][ NetNation Communications Inc. ] [ Opinions expressed are not necessarily those of my employer. ] From davem@redhat.com Mon Jun 9 01:25:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:25:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598P52x009351 for ; Mon, 9 Jun 2003 01:25:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA17380; Mon, 9 Jun 2003 01:22:02 -0700 Date: Mon, 09 Jun 2003 01:22:02 -0700 (PDT) Message-Id: <20030609.012202.68055632.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2982 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 01:18:03 -0700 On Sun, Jun 08, 2003 at 11:56:22PM -0700, David S. Miller wrote: > + if (cand) { > + *candp = cand->u.rt_next; > + rt_free(cand); > } Hmm...It looks like this is still freeing the entry.. Is it possible to recycle the dst without reallocating it? Yes, can you test the patch I just sent you? We can modify that to recycle easily instead of freeing. Well... one problem is that in 2.5.x we have to kill off entries using RCU so such recycling may not be so easy there. This is with a routing table of 300,000 entries (though only one prefix) and with your hash fix patch. ip_route_input is still highest, but dst_alloc is an obvious second. ip_route_input is actually always the highest (excluding the IRQ handling stuff), and doesn't seem to change at all based on routing table size. We spend a decent amount of time mucking with fib rules, turning off multiple-tables support would kill that, although I suspect you're actually using that :) From sim@netnation.com Mon Jun 9 01:27:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:27:28 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598RJ2x009750 for ; Mon, 9 Jun 2003 01:27:19 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PHzq-0007An-Pu; Mon, 09 Jun 2003 01:27:18 -0700 Date: Mon, 9 Jun 2003 01:27:18 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609082718.GG20613@netnation.com> References: <20030609071330.GD20613@netnation.com> <000401c32e5e$a707b6d0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000401c32e5e$a707b6d0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 2983 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 04:10:55AM -0400, CIT/Paul wrote: > I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 > 1.26 ghz routers.. I can't even send 50mbit of this though one of my > routers > Without it using 100% of both cpus because of the route cache.. It goes > up to 500,000 entries if I let it and it adds 80,000 new entries per > second and they are all cache misses.. I'd be glad to show you the setup > sometime :) I showed it to jamal and we tested some stuff. Hmm.. We're running on single 1800MP Athlons here. Have you had a chance to profile it? - add "profile=1" to the kernel command line - reboot - run juno-z.101f.c from remote box - run "readprofile -r" on the router - twiddle fingers for a while - run "readprofile -n -m your_System.map > foo" - stop juno :) - run "sort -n +2 < foo > readprofile.time_sorted" I'm interested to see if your profile results line up to what I'm seeing here on UP (though I have the kernel compiled SMP...Oops). Wait a second... 500,000 entries in the route cache? WTF? What is your max_size set to? That will massively overfill the hash bucket and definitely take up way too much CPU. It shouldn't be able to get there at all unless you have raised max_size. Here I have: echo 4 > gc_elasticity # Higher is weaker, 0 will nuke all [dfl: 8] echo 1 > gc_interval # Garbage collection interval (seconds) [dfl: 60] echo 1 > gc_min_interval # Garbage collection min interval (seconds) [dfl: 5] echo 90 > gc_timeout # Entry lifetime (seconds) [dfl: 300] [sroot@r1:/proc/sys/net/ipv4/route]# grep . * ... gc_elasticity:4 gc_interval:1 gc_min_interval:1 gc_thresh:4096 gc_timeout:90 max_delay:10 max_size:65536 Simon- From sim@netnation.com Mon Jun 9 01:32:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 01:32:10 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598W02x011901 for ; Mon, 9 Jun 2003 01:32:00 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PI4N-0007DJ-Sw; Mon, 09 Jun 2003 01:31:59 -0700 Date: Mon, 9 Jun 2003 01:31:59 -0700 From: Simon Kirby To: "David S. Miller" Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609083159.GH20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> <20030609.012202.68055632.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.012202.68055632.davem@redhat.com> User-Agent: Mutt/1.5.4i X-archive-position: 2984 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 01:22:02AM -0700, David S. Miller wrote: > Hmm...It looks like this is still freeing the entry.. Is it possible to > recycle the dst without reallocating it? > > Yes, can you test the patch I just sent you? We can modify that > to recycle easily instead of freeing. Cool. I'll see if I can set something up to try that at work tomorrow. Insufficient hardware here at home. > We spend a decent amount of time mucking with fib rules, turning > off multiple-tables support would kill that, although I suspect > you're actually using that :) We use it occasionally for various things. I'll try profiling with it turned off to see how much of an impact it has. Simon- From davem@redhat.com Mon Jun 9 01:59:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:00:02 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h598xs2x014044 for ; Mon, 9 Jun 2003 01:59:54 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA17478; Mon, 9 Jun 2003 01:56:49 -0700 Date: Mon, 09 Jun 2003 01:56:48 -0700 (PDT) Message-Id: <20030609.015648.55736734.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609071330.GD20613@netnation.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030609071330.GD20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2985 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 00:13:30 -0700 On Mon, Jun 09, 2003 at 02:28:30AM -0400, CIT/Paul wrote: > I am willing to test out any code/patches and settings that you can > think of and post some results.. I'll see if I can set up a test bed this week. I think we should already be able to do close to this, but I'll let the numbers will do the talking. :) BTW, ignoring juno, Robert Olsson has some pktgen hacks that allow that to generate new-dst-per-packet DoS like traffic. It's much more effective than Juno-z Robert could you should these guys your hacks to do that? Next, here is an interesting first pass patch to try. Once we hit gc_thresh, at every new DST allocation we try to shrink the destination hash chain. It ought to be very effective in the presence of poorly behaved traffic such as random-src-address DoS. The patch is against 2.5.x current... The next task is to try and handle rt_cache_flush more cheaply, given Simon's mention that he gets from 10 to 20 BGP updates per minute. Another idea to this dilemma is maybe to see if Zebra can batch things a little bit... but that kind of solution might not be possible since I don't know how that stuff works. --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Mon Jun 9 01:09:45 2003 @@ -882,6 +882,42 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(&rt_hash_table[hash].lock); } +static void __rt_hash_shrink(unsigned int hash) +{ + struct rtable *rth, **rthp; + struct rtable *cand, **candp; + unsigned int min_use = ~(unsigned int) 0; + + spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; + candp = NULL; + rthp = &rt_hash_table[hash].chain; + while ((rth = *rthp) != NULL) { + if (!atomic_read(&rth->u.dst.__refcnt) && + ((unsigned int) rth->u.dst.__use) < min_use) { + cand = rth; + candp = rthp; + min_use = rth->u.dst.__use; + } + rthp = &rth->u.rt_next; + } + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); + } + + spin_unlock_bh(&rt_hash_table[hash].lock); +} + +static inline struct rtable *ip_rt_dst_alloc(unsigned int hash) +{ + if (atomic_read(&ipv4_dst_ops.entries) > + ipv4_dst_ops.gc_thresh) + __rt_hash_shrink(hash); + + return dst_alloc(&ipv4_dst_ops); +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, u8 tos, struct net_device *dev) { @@ -912,9 +948,10 @@ void ip_rt_redirect(u32 old_gw, u32 dadd for (i = 0; i < 2; i++) { for (k = 0; k < 2; k++) { - unsigned hash = rt_hash_code(daddr, - skeys[i] ^ (ikeys[k] << 5), - tos); + unsigned int hash = rt_hash_code(daddr, + skeys[i] ^ + (ikeys[k] << 5), + tos); rthp=&rt_hash_table[hash].chain; @@ -942,7 +979,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd dst_hold(&rth->u.dst); rcu_read_unlock(); - rt = dst_alloc(&ipv4_dst_ops); + rt = ip_rt_dst_alloc(hash); if (rt == NULL) { ip_rt_put(rth); in_dev_put(in_dev); @@ -1352,7 +1389,7 @@ static void rt_set_nexthop(struct rtable static int ip_route_input_mc(struct sk_buff *skb, u32 daddr, u32 saddr, u8 tos, struct net_device *dev, int our) { - unsigned hash; + unsigned int hash; struct rtable *rth; u32 spec_dst; struct in_device *in_dev = in_dev_get(dev); @@ -1375,7 +1412,9 @@ static int ip_route_input_mc(struct sk_b dev, &spec_dst, &itag) < 0) goto e_inval; - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1421,7 +1460,6 @@ static int ip_route_input_mc(struct sk_b RT_CACHE_STAT_INC(in_slow_mc); in_dev_put(in_dev); - hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); return rt_intern_hash(hash, rth, (struct rtable**) &skb->dst); e_nobufs: @@ -1584,7 +1622,7 @@ int ip_route_input_slow(struct sk_buff * goto e_inval; } - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1663,7 +1701,7 @@ brd_input: RT_CACHE_STAT_INC(in_brd); local_input: - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2048,7 +2086,10 @@ make_route: } } - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(oldflp->fl4_dst, + oldflp->fl4_src ^ (oldflp->oif << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2107,7 +2148,6 @@ make_route: rth->rt_flags = flags; - hash = rt_hash_code(oldflp->fl4_dst, oldflp->fl4_src ^ (oldflp->oif << 5), tos); err = rt_intern_hash(hash, rth, rp); done: if (free_res) From davem@redhat.com Mon Jun 9 02:04:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:04:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5994L2x014390 for ; Mon, 9 Jun 2003 02:04:21 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA17503; Mon, 9 Jun 2003 02:01:16 -0700 Date: Mon, 09 Jun 2003 02:01:16 -0700 (PDT) Message-Id: <20030609.020116.10308258.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2986 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 01:18:03 -0700 10516 dst_alloc 73.0278 Gross, we effectively initialize a new dst multiple times :( In fact, we modify the same cache lines at least 3 times. There's a lot more we can do in this area. But this patch below kills some of it. Again, patch is against 2.5.x-current. Actually, it is a relatively good sign, it means this is a relatively unexplored area of the networking :-))) --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 +++ net/core/dst.c Mon Jun 9 01:53:41 2003 @@ -122,13 +122,31 @@ void * dst_alloc(struct dst_ops * ops) dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + dst->next = NULL; atomic_set(&dst->__refcnt, 0); - dst->ops = ops; + dst->__use = 0; + dst->child = NULL; + dst->dev = NULL; + dst->obsolete = 0; + dst->flags = 0; dst->lastuse = jiffies; + dst->expires = 0; + dst->header_len = 0; + dst->trailer_len = 0; + memset(dst->metrics, 0, sizeof(dst->metrics)); dst->path = dst; + dst->rate_last = 0; + dst->rate_tokens = 0; + dst->error = 0; + dst->neighbour = NULL; + dst->hh = NULL; + dst->xfrm = NULL; dst->input = dst_discard; dst->output = dst_blackhole; + dst->ops = ops; + INIT_RCU_HEAD(&dst->rcu_head); + memset(dst->info, 0, + ops->entry_size - offsetof(struct dst_entry, info)); #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif From lpetande@morphine.tml.hut.fi Mon Jun 9 02:07:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:07:13 -0700 (PDT) Received: from tml-gw.tml.hut.fi (tml.hut.fi [130.233.44.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h599762x014739 for ; Mon, 9 Jun 2003 02:07:07 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id MAA32560 for ; Mon, 9 Jun 2003 12:07:05 +0300 X-Authentication-Warning: tml-gw.tml.hut.fi: smap set sender to using -f Received: from mail.tml.hut.fi(130.233.45.70) by tml-gw.tml.hut.fi via smap (V2.0) id xma032548; Mon, 9 Jun 03 12:06:43 +0300 Received: from localhost (localhost [127.0.0.1]) by mail.tml.hut.fi (Postfix) with ESMTP id 7B84018C235; Mon, 9 Jun 2003 12:06:43 +0300 (EEST) Received: from mail.tml.hut.fi ([127.0.0.1]) by localhost (mail.tml.hut.fi [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 26944-01-3; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by mail.tml.hut.fi (Postfix) with ESMTP id AC40F18C233; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from tml.hut.fi (localhost [127.0.0.1]) by morphine.tml.hut.fi (8.12.2+Sun/8.12.2) with ESMTP id h5996gF5025239; Mon, 9 Jun 2003 12:06:42 +0300 (EEST) Received: from localhost (lpetande@localhost) by tml.hut.fi (8.12.2+Sun/8.12.2/Submit) with ESMTP id h5996Zfr025236; Mon, 9 Jun 2003 12:06:41 +0300 (EEST) Date: Mon, 9 Jun 2003 12:06:35 +0300 (EEST) From: Henrik Petander To: Masahide NAKAMURA Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 In-Reply-To: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2987 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@morphine.tml.hut.fi Precedence: bulk X-list: netdev On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: > > We don't think we have to change the logic handling policy with > the reason because we can treat MIPv6 policy just like IPsec. > > When we want to apply both MIPv6 and IPsec to the same target, > we need one policy that has two or more of templates(e.g. one is > MIPv6's template and the other is IPsec's). Does this also mean that the IPSec and MIPv6 policies and SAs need to be configured at the same time or is it possible to add templates to an existing policy? > > Regarding above case, however, we have a problem like below: > > draft(9.3.1 in draft-ietf-mobileip-ipv6-22) says, > > When attempting to verify AH authentication data in a packet that > contains a Home Address option, the receiving node MUST calculate > the AH authentication data as if the following were true: The Home > Address option contains the care-of address, and the source IPv6 > address field of the IPv6 header contains the home address. Yes, and this also applies to routing header types 0 and 2. They also need to be processed by AH so that the addresses are as the receiver sees them after processing the headers: home address in destination address and care-of address in the routing header. This is just not said in the mipv6 spec as the routing header IPSec interactions are not specified by it. > > Because xfrm decides to call dst_output in the order of templates, > at first we had no idea which is the former template, MIPv6 or IPsec(Home > Address Option or AH). MIPv6 headers should be added first for AH to work. A different issue related to the different addresses is that the SPD lookup should be done with the original source address, i.e. home address, if home address option is used and with the final destination address, if routing header is used. SPD lookup works now for TCP (with RT header), but not for raw sockets, which the mipv6 daemon will use. We will provide a patch for fixing the SPD lookups with raw sockets, which add routing header and home address option from socket options. Henrik ---------------------------------- Henrik Petander Helsinki University of Technology, GO/Core Project Henrik.Petander@hut.fi Office: +358 (0)9 451 5846 GSM: +358 (0)40 741 5248 ---------------------------------- From ak@suse.de Mon Jun 9 02:47:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 02:47:49 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h599lf2x019157 for ; Mon, 9 Jun 2003 02:47:42 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 1D6BA1483F; Mon, 9 Jun 2003 11:47:36 +0200 (MEST) Date: Mon, 9 Jun 2003 11:47:34 +0200 From: Andi Kleen To: "David S. Miller" Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-ID: <20030609094734.GD2728@wotan.suse.de> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.020116.10308258.davem@redhat.com> X-archive-position: 2988 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 02:01:16AM -0700, David S. Miller wrote: > From: Simon Kirby > Date: Mon, 9 Jun 2003 01:18:03 -0700 > > 10516 dst_alloc 73.0278 > > Gross, we effectively initialize a new dst multiple times :( > In fact, we modify the same cache lines at least 3 times. > > There's a lot more we can do in this area. But this patch below kills > some of it. Again, patch is against 2.5.x-current. > > Actually, it is a relatively good sign, it means this is a relatively > unexplored area of the networking :-))) > > --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 > +++ net/core/dst.c Mon Jun 9 01:53:41 2003 > @@ -122,13 +122,31 @@ void * dst_alloc(struct dst_ops * ops) > dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); > if (!dst) > return NULL; > - memset(dst, 0, ops->entry_size); > + dst->next = NULL; > atomic_set(&dst->__refcnt, 0); > - dst->ops = ops; > + dst->__use = 0; > + dst->child = NULL; > + dst->dev = NULL; > + dst->obsolete = 0; > + dst->flags = 0; > dst->lastuse = jiffies; > + dst->expires = 0; > + dst->header_len = 0; > + dst->trailer_len = 0; > + memset(dst->metrics, 0, sizeof(dst->metrics)); gcc will generate a lot better code for the memsets if you can tell it somehow they are long aligned and a multiple of 8 bytes. e.g. redeclare them as long instead of char. If it cannot figure out the alignment it often (or least on x86) calls to the external memset function. > dst->path = dst; > + dst->rate_last = 0; > + dst->rate_tokens = 0; > + dst->error = 0; > + dst->neighbour = NULL; > + dst->hh = NULL; > + dst->xfrm = NULL; > dst->input = dst_discard; > dst->output = dst_blackhole; > + dst->ops = ops; > + INIT_RCU_HEAD(&dst->rcu_head); > + memset(dst->info, 0, > + ops->entry_size - offsetof(struct dst_entry, info)); Same here. -Andi From davem@redhat.com Mon Jun 9 03:06:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:06:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59A6e2x020087 for ; Mon, 9 Jun 2003 03:06:40 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17659; Mon, 9 Jun 2003 03:03:35 -0700 Date: Mon, 09 Jun 2003 03:03:34 -0700 (PDT) Message-Id: <20030609.030334.02284330.davem@redhat.com> To: ak@suse.de Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609094734.GD2728@wotan.suse.de> References: <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> <20030609094734.GD2728@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2989 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Mon, 9 Jun 2003 11:47:34 +0200 gcc will generate a lot better code for the memsets if you can tell it somehow they are long aligned and a multiple of 8 bytes. True, but the real bug is that we're initializing any of this crap here at all. Right now we write over the same cachelines 3 or so times. It should really just happen once. From ak@suse.de Mon Jun 9 03:13:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:13:14 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AD82x020428 for ; Mon, 9 Jun 2003 03:13:09 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 49DA51480C; Mon, 9 Jun 2003 12:13:03 +0200 (MEST) Date: Mon, 9 Jun 2003 12:13:02 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-ID: <20030609101302.GA9643@wotan.suse.de> References: <20030609081803.GF20613@netnation.com> <20030609.020116.10308258.davem@redhat.com> <20030609094734.GD2728@wotan.suse.de> <20030609.030334.02284330.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.030334.02284330.davem@redhat.com> X-archive-position: 2990 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 03:03:34AM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Mon, 9 Jun 2003 11:47:34 +0200 > > gcc will generate a lot better code for the memsets if you can tell > it somehow they are long aligned and a multiple of 8 bytes. > > True, but the real bug is that we're initializing any of this > crap here at all. Right now we write over the same cachelines > 3 or so times. It should really just happen once. It's unlikely to be the reason for the profile hit on a modern x86. They are all really fast at reading/writing L1. More likely it is the cache miss for fetching the lines initially. Perhaps it is cache thrashing the dst_entry heads. Adding a strategic prefetch somewhere early may help a lot. -Andi From davem@redhat.com Mon Jun 9 03:16:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:16:50 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AGj2x020736 for ; Mon, 9 Jun 2003 03:16:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17734; Mon, 9 Jun 2003 03:13:41 -0700 Date: Mon, 09 Jun 2003 03:13:41 -0700 (PDT) Message-Id: <20030609.031341.77044985.davem@redhat.com> To: ak@suse.de Cc: sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609101302.GA9643@wotan.suse.de> References: <20030609094734.GD2728@wotan.suse.de> <20030609.030334.02284330.davem@redhat.com> <20030609101302.GA9643@wotan.suse.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2991 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Andi Kleen Date: Mon, 9 Jun 2003 12:13:02 +0200 On Mon, Jun 09, 2003 at 03:03:34AM -0700, David S. Miller wrote: > True, but the real bug is that we're initializing any of this > crap here at all. Right now we write over the same cachelines > 3 or so times. It should really just happen once. It's unlikely to be the reason for the profile hit on a modern x86. They are all really fast at reading/writing L1. It's store buffer compression that's being messed up. I've seen this on just about any processor. This is also why the net/core/skbuff.c initialization hacks are so effective as well. Trust me, this has every symptom of excess store buffer traffic :) From yoshfuji@wide.ad.jp Mon Jun 9 03:40:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:40:26 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59AeH2x021150 for ; Mon, 9 Jun 2003 03:40:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h59AemBo007231; Mon, 9 Jun 2003 19:40:49 +0900 Date: Mon, 09 Jun 2003 19:40:46 +0900 (JST) Message-Id: <20030609.194046.29425359.yoshfuji@wide.ad.jp> To: davem@redhat.com Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030609.031341.77044985.davem@redhat.com> References: <20030609.030334.02284330.davem@redhat.com> <20030609101302.GA9643@wotan.suse.de> <20030609.031341.77044985.davem@redhat.com> X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2992 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@wide.ad.jp Precedence: bulk X-list: netdev In article <20030609.031341.77044985.davem@redhat.com> (at Mon, 09 Jun 2003 03:13:41 -0700 (PDT)), "David S. Miller" says: > It's unlikely to be the reason for the profile hit on a modern x86. > They are all really fast at reading/writing L1. : > This is also why the net/core/skbuff.c initialization hacks are so > effective as well. > > Trust me, this has every symptom of excess store buffer traffic :) Ok, how about this? Index: linux25/include/net/dst.h =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux25/include/net/dst.h,v retrieving revision 1.7 diff -u -r1.7 dst.h --- linux25/include/net/dst.h 20 Apr 2003 14:55:48 -0000 1.7 +++ linux25/include/net/dst.h 9 Jun 2003 10:26:30 -0000 @@ -38,7 +38,7 @@ struct dst_entry { struct dst_entry *next; - atomic_t __refcnt; /* client references */ + int __use; struct dst_entry *child; struct net_device *dev; @@ -48,14 +48,12 @@ #define DST_NOXFRM 2 #define DST_NOPOLICY 4 #define DST_NOHASH 8 - unsigned long lastuse; unsigned long expires; unsigned short header_len; /* more space at head required */ unsigned short trailer_len; /* space to reserve at tail */ u32 metrics[RTAX_MAX]; - struct dst_entry *path; unsigned long rate_last; /* rate limiting for ICMP */ unsigned long rate_tokens; @@ -66,16 +64,24 @@ struct hh_cache *hh; struct xfrm_state *xfrm; - int (*input)(struct sk_buff*); - int (*output)(struct sk_buff*); - #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; #endif - struct dst_ops *ops; struct rcu_head rcu_head; - + + /* These elements should be at the end of dst_entry{}; + * see net/core/dst.c:dst_alloc() -- yoshfuji */ + u32 __dst_memset_tail[0]; + + atomic_t __refcnt; /* client references */ + unsigned long lastuse; + + struct dst_entry *path; + int (*input)(struct sk_buff*); + int (*output)(struct sk_buff*); + struct dst_ops *ops; + char info[0]; }; Index: linux25/net/core/dst.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux25/net/core/dst.c,v retrieving revision 1.1.1.9 diff -u -r1.1.1.9 dst.c --- linux25/net/core/dst.c 27 May 2003 02:59:54 -0000 1.1.1.9 +++ linux25/net/core/dst.c 9 Jun 2003 10:26:30 -0000 @@ -122,13 +122,16 @@ dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + memset(dst, 0, offsetof(struct dst_entry, __dst_memset_tail)); atomic_set(&dst->__refcnt, 0); - dst->ops = ops; dst->lastuse = jiffies; dst->path = dst; dst->input = dst_discard; dst->output = dst_blackhole; + dst->ops = ops; + if (ops->entry_size > offsetof(struct dst_entry, info)) + memset(&dst->info, 0, ops->entry_size - offsetof(struct dst_entry, info)); + #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Mon Jun 9 03:43:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 03:43:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Ahn2x021455 for ; Mon, 9 Jun 2003 03:43:50 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA17839; Mon, 9 Jun 2003 03:40:40 -0700 Date: Mon, 09 Jun 2003 03:40:39 -0700 (PDT) Message-Id: <20030609.034039.26980950.davem@redhat.com> To: yoshfuji@wide.ad.jp Cc: ak@suse.de, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609.194046.29425359.yoshfuji@wide.ad.jp> References: <20030609101302.GA9643@wotan.suse.de> <20030609.031341.77044985.davem@redhat.com> <20030609.194046.29425359.yoshfuji@wide.ad.jp> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2993 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / 吉藤英明 Date: Mon, 09 Jun 2003 19:40:46 +0900 (JST) Ok, how about this? The memset_tail thing is unnecessary, and better to put the non-zero objects at the beginning then you can go. memset(dst->${FIRST_ZERO_MEMBER}, 0, ops->entry_size - offsetof(struct dst_entry, ${FIRST_ZERO_MEMBER})); But even _THIS_ is stupid. All this initialization really should move to caller. We can provide a "dst_init()" helper for protocols that don't want to bother optimizing this. From hadi@shell.cyberus.ca Mon Jun 9 04:39:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:39:33 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59BdK2x022548 for ; Mon, 9 Jun 2003 04:39:21 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PKz7-0008yC-0u; Mon, 09 Jun 2003 07:38:45 -0400 Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'David S. Miller'" , fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <000401c32e5e$a707b6d0$4a00000a@badass> Message-ID: <20030609072227.R34462@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2994 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, CIT/Paul wrote: > I've got juno-z.101f.c to send 500,000 pps at 300+mbit on our dual p3 > 1.26 ghz routers.. I can't even send 50mbit of this though one of my > routers > Without it using 100% of both cpus because of the route cache.. It goes > up to 500,000 entries if I let it and it adds 80,000 new entries per > second and they are all cache misses.. I'd be glad to show you the setup > sometime :) I showed it to jamal and we tested some stuff. > Yes, you have a nice setup and thats why you should test all the patches DaveM is posting. Dave, Paul is running in a real ISP environment i think he is very valuable in helping to test these patches and collect any says that might be needed. Now watch him disapear ;-> BTW, re: BGP, someone should fix zebra to do batching if it doesnt do it already (i saw that in one emails). In addition arp all the nexthops right before installing the entries in the FIB. Repeat the arp every X timeout. nexthops failinjg ARPs should be removed. That should give you something close to what i think CEF was designed for i.e when the packets get to us, part of the route is resolved already. Additional thought Dave: i think prefetching the rth would help in 2.5 at least when you have lotsa collisions. call prefetch(nextrth) right after smp_read_barrier_depends() everywhere in route.c cheers, jamal PS:- this is one of those fun times i wish i had a setup and time ;-> From vnuorval@tcs.hut.fi Mon Jun 9 04:53:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:53:34 -0700 (PDT) Received: from saturn.tcs.hut.fi (root@saturn.tcs.hut.fi [130.233.215.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59BrA2x022952 for ; Mon, 9 Jun 2003 04:53:11 -0700 Received: from rhea.tcs.hut.fi (really [130.233.215.147]) by tcs.hut.fi via smail with esmtp id (Debian Smail3.2.0.102) for ; Mon, 9 Jun 2003 14:43:18 +0300 (EEST) Received: from rhea.tcs.hut.fi (localhost [127.0.0.1]) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h59BhHjH014978; Mon, 9 Jun 2003 14:43:17 +0300 Received: from localhost (vnuorval@localhost) by rhea.tcs.hut.fi (8.12.3/8.12.3/Debian-5) with ESMTP id h59BhBQ7014972; Mon, 9 Jun 2003 14:43:11 +0300 Date: Mon, 9 Jun 2003 14:43:11 +0300 (EEST) From: Ville Nuorvala To: "David S. Miller" cc: kuznet@ms2.inr.ac.ru, , , , , , Subject: ipv6 tunnel patch (was: Re: [patch]: ipv6 tunnel for MIPv6) In-Reply-To: <20030607.033059.48393210.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-377318441-1375410448-1055158991=:13811" X-archive-position: 2995 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vnuorval@tcs.hut.fi Precedence: bulk X-list: netdev This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---377318441-1375410448-1055158991=:13811 Content-Type: TEXT/PLAIN; charset=US-ASCII On Sat, 7 Jun 2003, David S. Miller wrote: > Looks ok, but sorry two things need to be fixed up first: > > 1) Doesn't apply anymore, I think it's because of the > struct sock member renames, just replace sk->foo > with sk->sk_foo > Done... > 2) Just export all those routines from net/ipv6/ipv6_syms.c > always, remove the ifdefs. ...and done! > > I promise to apply it after you fix this stuff up :))) Ok here's the last revision of the patch :) It's done against ChangeSet 1.1308. Also available at: http://www.mipl.mediapoli.com/patches/ip6-tunnel-r3.patch Thanks! -Ville -- Ville Nuorvala Research Assistant, Institute of Digital Communications, Helsinki University of Technology email: vnuorval@tcs.hut.fi, phone: +358 (0)9 451 5257 ---377318441-1375410448-1055158991=:13811 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="ip6-tunnel-r3.patch" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="ip6-tunnel-r3.patch" ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvaW5jbHVkZS9saW51eC9p Zl9hcnAuaCBtZXJnZS0yLjUvaW5jbHVkZS9saW51eC9pZl9hcnAuaA0KLS0t IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lmX2FycC5oCVdlZCBKdW4gIDQg MTM6NDM6MDMgMjAwMw0KKysrIG1lcmdlLTIuNS9pbmNsdWRlL2xpbnV4L2lm X2FycC5oCU1vbiBKdW4gIDkgMTA6MTQ6MjQgMjAwMw0KQEAgLTYwLDcgKzYw LDcgQEANCiAjZGVmaW5lIEFSUEhSRF9SQVdIRExDCTUxOAkJLyogUmF3IEhE TEMJCQkqLw0KIA0KICNkZWZpbmUgQVJQSFJEX1RVTk5FTAk3NjgJCS8qIElQ SVAgdHVubmVsCQkJKi8NCi0jZGVmaW5lIEFSUEhSRF9UVU5ORUw2CTc2OQkJ LyogSVBJUDYgdHVubmVsCQkJKi8NCisjZGVmaW5lIEFSUEhSRF9UVU5ORUw2 CTc2OQkJLyogSVA2SVA2IHR1bm5lbCAgICAgICAJCSovDQogI2RlZmluZSBB UlBIUkRfRlJBRAk3NzAgICAgICAgICAgICAgLyogRnJhbWUgUmVsYXkgQWNj ZXNzIERldmljZSAgICAqLw0KICNkZWZpbmUgQVJQSFJEX1NLSVAJNzcxCQkv KiBTS0lQIHZpZgkJCSovDQogI2RlZmluZSBBUlBIUkRfTE9PUEJBQ0sJNzcy CQkvKiBMb29wYmFjayBkZXZpY2UJCSovDQpkaWZmIC1OdXIgLS1leGNsdWRl PVNDQ1MgLS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0 IGxpbnV4LTIuNS9pbmNsdWRlL2xpbnV4L2lwNl90dW5uZWwuaCBtZXJnZS0y LjUvaW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgNCi0tLSBsaW51eC0yLjUv aW5jbHVkZS9saW51eC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDow MCAxOTcwDQorKysgbWVyZ2UtMi41L2luY2x1ZGUvbGludXgvaXA2X3R1bm5l bC5oCU1vbiBKdW4gIDkgMTA6MTQ6MjQgMjAwMw0KQEAgLTAsMCArMSwzMiBA QA0KKy8qDQorICogJElkJA0KKyAqLw0KKw0KKyNpZm5kZWYgX0lQNl9UVU5O RUxfSA0KKyNkZWZpbmUgX0lQNl9UVU5ORUxfSA0KKw0KKyNkZWZpbmUgSVBW Nl9UTFZfVE5MX0VOQ0FQX0xJTUlUIDQNCisjZGVmaW5lIElQVjZfREVGQVVM VF9UTkxfRU5DQVBfTElNSVQgNA0KKw0KKy8qIGRvbid0IGFkZCBlbmNhcHN1 bGF0aW9uIGxpbWl0IGlmIG9uZSBpc24ndCBwcmVzZW50IGluIGlubmVyIHBh Y2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX0lHTl9FTkNBUF9MSU1JVCAw eDENCisvKiBjb3B5IHRoZSB0cmFmZmljIGNsYXNzIGZpZWxkIGZyb20gdGhl IGlubmVyIHBhY2tldCAqLw0KKyNkZWZpbmUgSVA2X1ROTF9GX1VTRV9PUklH X1RDTEFTUyAweDINCisvKiBjb3B5IHRoZSBmbG93bGFiZWwgZnJvbSB0aGUg aW5uZXIgcGFja2V0ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfVVNFX09SSUdf RkxPV0xBQkVMIDB4NA0KKy8qIGJlaW5nIHVzZWQgZm9yIE1vYmlsZSBJUHY2 ICovDQorI2RlZmluZSBJUDZfVE5MX0ZfTUlQNl9ERVYgMHg4DQorDQorc3Ry dWN0IGlwNl90bmxfcGFybSB7DQorCWNoYXIgbmFtZVtJRk5BTVNJWl07CS8q IG5hbWUgb2YgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgbGluazsJCS8qIGlm aW5kZXggb2YgdW5kZXJseWluZyBMMiBpbnRlcmZhY2UgKi8NCisJX191OCBw cm90bzsJCS8qIHR1bm5lbCBwcm90b2NvbCAqLw0KKwlfX3U4IGVuY2FwX2xp bWl0OwkvKiBlbmNhcHN1bGF0aW9uIGxpbWl0IGZvciB0dW5uZWwgKi8NCisJ X191OCBob3BfbGltaXQ7CQkvKiBob3AgbGltaXQgZm9yIHR1bm5lbCAqLw0K KwlfX3UzMiBmbG93aW5mbzsJCS8qIHRyYWZmaWMgY2xhc3MgYW5kIGZsb3ds YWJlbCBmb3IgdHVubmVsICovDQorCV9fdTMyIGZsYWdzOwkJLyogdHVubmVs IGZsYWdzICovDQorCXN0cnVjdCBpbjZfYWRkciBsYWRkcjsJLyogbG9jYWwg dHVubmVsIGVuZC1wb2ludCBhZGRyZXNzICovDQorCXN0cnVjdCBpbjZfYWRk ciByYWRkcjsJLyogcmVtb3RlIHR1bm5lbCBlbmQtcG9pbnQgYWRkcmVzcyAq Lw0KK307DQorDQorI2VuZGlmDQpkaWZmIC1OdXIgLS1leGNsdWRlPVNDQ1Mg LS1leGNsdWRlPUJpdEtlZXBlciAtLWV4Y2x1ZGU9Q2hhbmdlU2V0IGxpbnV4 LTIuNS9pbmNsdWRlL25ldC9pcDZfdHVubmVsLmggbWVyZ2UtMi41L2luY2x1 ZGUvbmV0L2lwNl90dW5uZWwuaA0KLS0tIGxpbnV4LTIuNS9pbmNsdWRlL25l dC9pcDZfdHVubmVsLmgJVGh1IEphbiAgMSAwMjowMDowMCAxOTcwDQorKysg bWVyZ2UtMi41L2luY2x1ZGUvbmV0L2lwNl90dW5uZWwuaAlNb24gSnVuICA5 IDEwOjE0OjI0IDIwMDMNCkBAIC0wLDAgKzEsNDQgQEANCisvKg0KKyAqICRJ ZCQNCisgKi8NCisNCisjaWZuZGVmIF9ORVRfSVA2X1RVTk5FTF9IDQorI2Rl ZmluZSBfTkVUX0lQNl9UVU5ORUxfSA0KKw0KKyNpbmNsdWRlIDxsaW51eC9p cHY2Lmg+DQorI2luY2x1ZGUgPGxpbnV4L25ldGRldmljZS5oPg0KKyNpbmNs dWRlIDxsaW51eC9pcDZfdHVubmVsLmg+DQorDQorLyogY2FwYWJsZSBvZiBz ZW5kaW5nIHBhY2tldHMgKi8NCisjZGVmaW5lIElQNl9UTkxfRl9DQVBfWE1J VCAweDEwMDAwDQorLyogY2FwYWJsZSBvZiByZWNlaXZpbmcgcGFja2V0cyAq Lw0KKyNkZWZpbmUgSVA2X1ROTF9GX0NBUF9SQ1YgMHgyMDAwMA0KKw0KKyNk ZWZpbmUgSVA2X1ROTF9NQVggMTI4DQorDQorLyogSVB2NiB0dW5uZWwgKi8N CisNCitzdHJ1Y3QgaXA2X3RubCB7DQorCXN0cnVjdCBpcDZfdG5sICpuZXh0 OwkvKiBuZXh0IHR1bm5lbCBpbiBsaXN0ICovDQorCXN0cnVjdCBuZXRfZGV2 aWNlICpkZXY7CS8qIHZpcnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0 dW5uZWwgKi8NCisJc3RydWN0IG5ldF9kZXZpY2Vfc3RhdHMgc3RhdDsJLyog c3RhdGlzdGljcyBmb3IgdHVubmVsIGRldmljZSAqLw0KKwlpbnQgcmVjdXJz aW9uOwkJLyogZGVwdGggb2YgaGFyZF9zdGFydF94bWl0IHJlY3Vyc2lvbiAq Lw0KKwlzdHJ1Y3QgaXA2X3RubF9wYXJtIHBhcm1zOwkvKiB0dW5uZWwgY29u ZmlndXJhdGlvbiBwYXJhbXRlcnMgKi8NCisJc3RydWN0IGZsb3dpIGZsOwkv KiBmbG93aSB0ZW1wbGF0ZSBmb3IgeG1pdCAqLw0KK307DQorDQorLyogVHVu bmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgZGVzdGluYXRpb24gc3ViLW9wdGlv biAqLw0KKw0KK3N0cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSB7DQorCV9f dTggdHlwZTsJCS8qIHR5cGUtY29kZSBmb3Igb3B0aW9uICAgICAgICAgKi8N CisJX191OCBsZW5ndGg7CQkvKiBvcHRpb24gbGVuZ3RoICAgICAgICAgICAg ICAgICovDQorCV9fdTggZW5jYXBfbGltaXQ7CS8qIHR1bm5lbCBlbmNhcHN1 bGF0aW9uIGxpbWl0ICAgKi8NCit9IF9fYXR0cmlidXRlX18gKChwYWNrZWQp KTsNCisNCisjaWZkZWYgX19LRVJORUxfXw0KKyNpZmRlZiBDT05GSUdfSVBW Nl9UVU5ORUwNCitleHRlcm4gaW50IF9faW5pdCBpcDZfdHVubmVsX2luaXQo dm9pZCk7DQorZXh0ZXJuIHZvaWQgaXA2X3R1bm5lbF9jbGVhbnVwKHZvaWQp Ow0KKyNlbmRpZg0KKyNlbmRpZg0KKyNlbmRpZg0KZGlmZiAtTnVyIC0tZXhj bHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5n ZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvS2NvbmZpZyBtZXJnZS0yLjUvbmV0 L2lwdjYvS2NvbmZpZw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9LY29uZmln CU1vbiBKdW4gIDkgMDk6MTE6MjQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQv aXB2Ni9LY29uZmlnCU1vbiBKdW4gIDkgMTA6MTQ6MjcgMjAwMw0KQEAgLTU1 LDQgKzU1LDEyIEBADQogDQogCSAgSWYgdW5zdXJlLCBzYXkgWS4NCiANCitj b25maWcgSVBWNl9UVU5ORUwNCisJdHJpc3RhdGUgIklQdjY6IElQdjYtaW4t SVB2NiB0dW5uZWwiDQorCWRlcGVuZHMgb24gSVBWNg0KKwktLS1oZWxwLS0t DQorCSAgU3VwcG9ydCBmb3IgSVB2Ni1pbi1JUHY2IHR1bm5lbHMgZGVzY3Jp YmVkIGluIFJGQyAyNDczLg0KKw0KKwkgIElmIHVuc3VyZSwgc2F5IE4uDQor DQogc291cmNlICJuZXQvaXB2Ni9uZXRmaWx0ZXIvS2NvbmZpZyINCmRpZmYg LU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2VlcGVyIC0tZXhj bHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L01ha2VmaWxlIG1l cmdlLTIuNS9uZXQvaXB2Ni9NYWtlZmlsZQ0KLS0tIGxpbnV4LTIuNS9uZXQv aXB2Ni9NYWtlZmlsZQlXZWQgSnVuICA0IDEzOjQzOjA2IDIwMDMNCisrKyBt ZXJnZS0yLjUvbmV0L2lwdjYvTWFrZWZpbGUJTW9uIEp1biAgOSAxMDoxNDoy NyAyMDAzDQpAQCAtMTUsMyArMTUsNSBAQA0KIG9iai0kKENPTkZJR19JTkVU Nl9FU1ApICs9IGVzcDYubw0KIG9iai0kKENPTkZJR19JTkVUNl9JUENPTVAp ICs9IGlwY29tcDYubw0KIG9iai0kKENPTkZJR19ORVRGSUxURVIpCSs9IG5l dGZpbHRlci8NCisNCitvYmotJChDT05GSUdfSVBWNl9UVU5ORUwpICs9IGlw Nl90dW5uZWwubw0KZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVk ZT1CaXRLZWVwZXIgLS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0 L2lwdjYvYWZfaW5ldDYuYyBtZXJnZS0yLjUvbmV0L2lwdjYvYWZfaW5ldDYu Yw0KLS0tIGxpbnV4LTIuNS9uZXQvaXB2Ni9hZl9pbmV0Ni5jCU1vbiBKdW4g IDkgMDk6MTE6MjQgMjAwMw0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9hZl9p bmV0Ni5jCU1vbiBKdW4gIDkgMTA6MTQ6MzYgMjAwMw0KQEAgLTU3LDYgKzU3 LDkgQEANCiAjaW5jbHVkZSA8bmV0L3RyYW5zcF92Ni5oPg0KICNpbmNsdWRl IDxuZXQvaXA2X3JvdXRlLmg+DQogI2luY2x1ZGUgPG5ldC9hZGRyY29uZi5o Pg0KKyNpZiBDT05GSUdfSVBWNl9UVU5ORUwNCisjaW5jbHVkZSA8bmV0L2lw Nl90dW5uZWwuaD4NCisjZW5kaWYNCiANCiAjaW5jbHVkZSA8YXNtL3VhY2Nl c3MuaD4NCiAjaW5jbHVkZSA8YXNtL3N5c3RlbS5oPg0KQEAgLTc3Niw2ICs3 NzksMTEgQEANCiAJZXJyID0gbmRpc2NfaW5pdCgmaW5ldDZfZmFtaWx5X29w cyk7DQogCWlmIChlcnIpDQogCQlnb3RvIG5kaXNjX2ZhaWw7DQorI2lmZGVm IENPTkZJR19JUFY2X1RVTk5FTA0KKwllcnIgPSBpcDZfdHVubmVsX2luaXQo KTsNCisJaWYgKGVycikNCisJCWdvdG8gaXA2X3R1bm5lbF9mYWlsOw0KKyNl bmRpZg0KIAllcnIgPSBpZ21wNl9pbml0KCZpbmV0Nl9mYW1pbHlfb3BzKTsN CiAJaWYgKGVycikNCiAJCWdvdG8gaWdtcF9mYWlsOw0KQEAgLTgzMCw2ICs4 MzgsMTAgQEANCiAJaWdtcDZfY2xlYW51cCgpOw0KICNlbmRpZg0KIGlnbXBf ZmFpbDoNCisjaWZkZWYgQ09ORklHX0lQVjZfVFVOTkVMDQorCWlwNl90dW5u ZWxfY2xlYW51cCgpOw0KK2lwNl90dW5uZWxfZmFpbDoNCisjZW5kaWYNCiAJ bmRpc2NfY2xlYW51cCgpOw0KIG5kaXNjX2ZhaWw6DQogCWljbXB2Nl9jbGVh bnVwKCk7DQpAQCAtODY1LDYgKzg3Nyw5IEBADQogCWlwNl9yb3V0ZV9jbGVh bnVwKCk7DQogCWlwdjZfcGFja2V0X2NsZWFudXAoKTsNCiAJaWdtcDZfY2xl YW51cCgpOw0KKyNpZmRlZiBDT05GSUdfSVBWNl9UVU5ORUwNCisJaXA2X3R1 bm5lbF9jbGVhbnVwKCk7DQorI2VuZGlmDQogCW5kaXNjX2NsZWFudXAoKTsN CiAJaWNtcHY2X2NsZWFudXAoKTsNCiAjaWZkZWYgQ09ORklHX1NZU0NUTA0K ZGlmZiAtTnVyIC0tZXhjbHVkZT1TQ0NTIC0tZXhjbHVkZT1CaXRLZWVwZXIg LS1leGNsdWRlPUNoYW5nZVNldCBsaW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1 bm5lbC5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVsLmMNCi0tLSBs aW51eC0yLjUvbmV0L2lwdjYvaXA2X3R1bm5lbC5jCVRodSBKYW4gIDEgMDI6 MDA6MDAgMTk3MA0KKysrIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcDZfdHVubmVs LmMJTW9uIEp1biAgOSAxMDozOTo1MCAyMDAzDQpAQCAtMCwwICsxLDEyNjEg QEANCisvKg0KKyAqCUlQdjYgb3ZlciBJUHY2IHR1bm5lbCBkZXZpY2UNCisg KglMaW51eCBJTkVUNiBpbXBsZW1lbnRhdGlvbg0KKyAqDQorICoJQXV0aG9y czoNCisgKglWaWxsZSBOdW9ydmFsYQkJPHZudW9ydmFsQHRjcy5odXQuZmk+ CQ0KKyAqDQorICoJJElkJA0KKyAqDQorICogICAgICBCYXNlZCBvbjoNCisg KiAgICAgIGxpbnV4L25ldC9pcHY2L3NpdC5jDQorICoNCisgKiAgICAgIFJG QyAyNDczDQorICoNCisgKglUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2Fy ZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yDQorICogICAgICBt b2RpZnkgaXQgdW5kZXIgdGhlIHRlcm1zIG9mIHRoZSBHTlUgR2VuZXJhbCBQ dWJsaWMgTGljZW5zZQ0KKyAqICAgICAgYXMgcHVibGlzaGVkIGJ5IHRoZSBG cmVlIFNvZnR3YXJlIEZvdW5kYXRpb247IGVpdGhlciB2ZXJzaW9uDQorICog ICAgICAyIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFu eSBsYXRlciB2ZXJzaW9uLg0KKyAqDQorICovDQorDQorI2luY2x1ZGUgPGxp bnV4L2NvbmZpZy5oPg0KKyNpbmNsdWRlIDxsaW51eC9tb2R1bGUuaD4NCisj aW5jbHVkZSA8bGludXgvZXJybm8uaD4NCisjaW5jbHVkZSA8bGludXgvdHlw ZXMuaD4NCisjaW5jbHVkZSA8bGludXgvc29ja2V0Lmg+DQorI2luY2x1ZGUg PGxpbnV4L3NvY2tpb3MuaD4NCisjaW5jbHVkZSA8bGludXgvaWYuaD4NCisj aW5jbHVkZSA8bGludXgvaW4uaD4NCisjaW5jbHVkZSA8bGludXgvaXAuaD4N CisjaW5jbHVkZSA8bGludXgvaWZfdHVubmVsLmg+DQorI2luY2x1ZGUgPGxp bnV4L25ldC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pbjYuaD4NCisjaW5jbHVk ZSA8bGludXgvbmV0ZGV2aWNlLmg+DQorI2luY2x1ZGUgPGxpbnV4L2lmX2Fy cC5oPg0KKyNpbmNsdWRlIDxsaW51eC9pY21wdjYuaD4NCisjaW5jbHVkZSA8 bGludXgvaW5pdC5oPg0KKyNpbmNsdWRlIDxsaW51eC9yb3V0ZS5oPg0KKyNp bmNsdWRlIDxsaW51eC9ydG5ldGxpbmsuaD4NCisNCisjaW5jbHVkZSA8YXNt L3VhY2Nlc3MuaD4NCisjaW5jbHVkZSA8YXNtL2F0b21pYy5oPg0KKw0KKyNp bmNsdWRlIDxuZXQvaXAuaD4NCisjaW5jbHVkZSA8bmV0L3NvY2suaD4NCisj aW5jbHVkZSA8bmV0L2lwdjYuaD4NCisjaW5jbHVkZSA8bmV0L3Byb3RvY29s Lmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfcm91dGUuaD4NCisjaW5jbHVkZSA8 bmV0L2FkZHJjb25mLmg+DQorI2luY2x1ZGUgPG5ldC9pcDZfdHVubmVsLmg+ DQorDQorTU9EVUxFX0FVVEhPUigiVmlsbGUgTnVvcnZhbGEiKTsNCitNT0RV TEVfREVTQ1JJUFRJT04oIklQdjYtaW4tSVB2NiB0dW5uZWwiKTsNCitNT0RV TEVfTElDRU5TRSgiR1BMIik7DQorDQorI2RlZmluZSBJUFY2X1RMVl9URUxf RFNUX1NJWkUgOA0KKw0KKyNpZmRlZiBJUDZfVE5MX0RFQlVHDQorI2RlZmlu ZSBJUDZfVE5MX1RSQUNFKHguLi4pIHByaW50ayhLRVJOX0RFQlVHICIlczoi IHggIlxuIiwgX19GVU5DVElPTl9fKQ0KKyNlbHNlDQorI2RlZmluZSBJUDZf VE5MX1RSQUNFKHguLi4pIGRvIHs7fSB3aGlsZSgwKQ0KKyNlbmRpZg0KKw0K KyNkZWZpbmUgSVBWNl9UQ0xBU1NfTUFTSyAoSVBWNl9GTE9XSU5GT19NQVNL ICYgfklQVjZfRkxPV0xBQkVMX01BU0spDQorDQorLyogc29ja2V0KHMpIHVz ZWQgYnkgaXA2aXA2X3RubF94bWl0KCkgZm9yIHJlc2VuZGluZyBwYWNrZXRz ICovDQorc3RhdGljIHN0cnVjdCBzb2NrZXQgKl9faXA2X3NvY2tldFtOUl9D UFVTXTsNCisjZGVmaW5lIGlwNl9zb2NrZXQgX19pcDZfc29ja2V0W3NtcF9w cm9jZXNzb3JfaWQoKV0NCisNCitzdGF0aWMgdm9pZCBpcDZfeG1pdF9sb2Nr KHZvaWQpDQorew0KKwlsb2NhbF9iaF9kaXNhYmxlKCk7DQorCWlmICh1bmxp a2VseSghc3Bpbl90cnlsb2NrKCZpcDZfc29ja2V0LT5zay0+c2tfbG9jay5z bG9jaykpKQ0KKwkJQlVHKCk7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlwNl94 bWl0X3VubG9jayh2b2lkKQ0KK3sNCisJc3Bpbl91bmxvY2tfYmgoJmlwNl9z b2NrZXQtPnNrLT5za19sb2NrLnNsb2NrKTsNCit9DQorDQorI2RlZmluZSBI QVNIX1NJWkUgIDMyDQorDQorI2RlZmluZSBIQVNIKGFkZHIpICgoKGFkZHIp LT5zNl9hZGRyMzJbMF0gXiAoYWRkciktPnM2X2FkZHIzMlsxXSBeIFwNCisJ ICAgICAgICAgICAgIChhZGRyKS0+czZfYWRkcjMyWzJdIF4gKGFkZHIpLT5z Nl9hZGRyMzJbM10pICYgXA0KKyAgICAgICAgICAgICAgICAgICAgKEhBU0hf U0laRSAtIDEpKQ0KKw0KK3N0YXRpYyBpbnQgaXA2aXA2X2ZiX3RubF9kZXZf aW5pdChzdHJ1Y3QgbmV0X2RldmljZSAqZGV2KTsNCitzdGF0aWMgaW50IGlw NmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldik7DQor DQorLyogdGhlIElQdjYgdHVubmVsIGZhbGxiYWNrIGRldmljZSAqLw0KK3N0 YXRpYyBzdHJ1Y3QgbmV0X2RldmljZSBpcDZpcDZfZmJfdG5sX2RldiA9IHsN CisJLm5hbWUgPSAiaXA2dG5sMCIsDQorCS5pbml0ID0gaXA2aXA2X2ZiX3Ru bF9kZXZfaW5pdA0KK307DQorDQorLyogdGhlIElQdjYgZmFsbGJhY2sgdHVu bmVsICovDQorc3RhdGljIHN0cnVjdCBpcDZfdG5sIGlwNmlwNl9mYl90bmwg PSB7DQorCS5kZXYgPSAmaXA2aXA2X2ZiX3RubF9kZXYsDQorCS5wYXJtcyA9 ey5uYW1lID0gImlwNnRubDAiLCAucHJvdG8gPSBJUFBST1RPX0lQVjZ9DQor fTsNCisNCisvKiBsaXN0cyBmb3Igc3RvcmluZyB0dW5uZWxzIGluIHVzZSAq Lw0KK3N0YXRpYyBzdHJ1Y3QgaXA2X3RubCAqdG5sc19yX2xbSEFTSF9TSVpF XTsNCitzdGF0aWMgc3RydWN0IGlwNl90bmwgKnRubHNfd2NbMV07DQorc3Rh dGljIHN0cnVjdCBpcDZfdG5sICoqdG5sc1syXSA9IHsgdG5sc193YywgdG5s c19yX2wgfTsNCisNCisvKiBsb2NrIGZvciB0aGUgdHVubmVsIGxpc3RzICov DQorc3RhdGljIHJ3bG9ja190IGlwNmlwNl9sb2NrID0gUldfTE9DS19VTkxP Q0tFRDsNCisNCisvKioNCisgKiBpcDZpcDZfdG5sX2xvb2t1cCAtIGZldGNo IHR1bm5lbCBtYXRjaGluZyB0aGUgZW5kLXBvaW50IGFkZHJlc3Nlcw0KKyAq ICAgQHJlbW90ZTogdGhlIGFkZHJlc3Mgb2YgdGhlIHR1bm5lbCBleGl0LXBv aW50IA0KKyAqICAgQGxvY2FsOiB0aGUgYWRkcmVzcyBvZiB0aGUgdHVubmVs IGVudHJ5LXBvaW50IA0KKyAqDQorICogUmV0dXJuOiAgDQorICogICB0dW5u ZWwgbWF0Y2hpbmcgZ2l2ZW4gZW5kLXBvaW50cyBpZiBmb3VuZCwNCisgKiAg IGVsc2UgZmFsbGJhY2sgdHVubmVsIGlmIGl0cyBkZXZpY2UgaXMgdXAsIA0K KyAqICAgZWxzZSAlTlVMTA0KKyAqKi8NCisNCitzdHJ1Y3QgaXA2X3RubCAq DQoraXA2aXA2X3RubF9sb29rdXAoc3RydWN0IGluNl9hZGRyICpyZW1vdGUs IHN0cnVjdCBpbjZfYWRkciAqbG9jYWwpDQorew0KKwl1bnNpZ25lZCBoMCA9 IEhBU0gocmVtb3RlKTsNCisJdW5zaWduZWQgaDEgPSBIQVNIKGxvY2FsKTsN CisJc3RydWN0IGlwNl90bmwgKnQ7DQorDQorCWZvciAodCA9IHRubHNfcl9s W2gwIF4gaDFdOyB0OyB0ID0gdC0+bmV4dCkgew0KKwkJaWYgKCFpcHY2X2Fk ZHJfY21wKGxvY2FsLCAmdC0+cGFybXMubGFkZHIpICYmDQorCQkgICAgIWlw djZfYWRkcl9jbXAocmVtb3RlLCAmdC0+cGFybXMucmFkZHIpICYmDQorCQkg ICAgKHQtPmRldi0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJCXJldHVybiB0Ow0K Kwl9DQorCWlmICgodCA9IHRubHNfd2NbMF0pICE9IE5VTEwgJiYgKHQtPmRl di0+ZmxhZ3MgJiBJRkZfVVApKQ0KKwkJcmV0dXJuIHQ7DQorDQorCXJldHVy biBOVUxMOw0KK30NCisNCisvKioNCisgKiBpcDZpcDZfYnVja2V0IC0gZ2V0 IGhlYWQgb2YgbGlzdCBtYXRjaGluZyBnaXZlbiB0dW5uZWwgcGFyYW1ldGVy cw0KKyAqICAgQHA6IHBhcmFtZXRlcnMgY29udGFpbmluZyB0dW5uZWwgZW5k LXBvaW50cyANCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2 X2J1Y2tldCgpIHJldHVybnMgdGhlIGhlYWQgb2YgdGhlIGxpc3QgbWF0Y2hp bmcgdGhlIA0KKyAqICAgJnN0cnVjdCBpbjZfYWRkciBlbnRyaWVzIGxhZGRy IGFuZCByYWRkciBpbiBAcC4NCisgKg0KKyAqIFJldHVybjogaGVhZCBvZiBJ UHY2IHR1bm5lbCBsaXN0IA0KKyAqKi8NCisNCitzdGF0aWMgc3RydWN0IGlw Nl90bmwgKioNCitpcDZpcDZfYnVja2V0KHN0cnVjdCBpcDZfdG5sX3Bhcm0g KnApDQorew0KKwlzdHJ1Y3QgaW42X2FkZHIgKnJlbW90ZSA9ICZwLT5yYWRk cjsNCisJc3RydWN0IGluNl9hZGRyICpsb2NhbCA9ICZwLT5sYWRkcjsNCisJ dW5zaWduZWQgaCA9IDA7DQorCWludCBwcmlvID0gMDsNCisNCisJaWYgKCFp cHY2X2FkZHJfYW55KHJlbW90ZSkgfHwgIWlwdjZfYWRkcl9hbnkobG9jYWwp KSB7DQorCQlwcmlvID0gMTsNCisJCWggPSBIQVNIKHJlbW90ZSkgXiBIQVNI KGxvY2FsKTsNCisJfQ0KKwlyZXR1cm4gJnRubHNbcHJpb11baF07DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfbGluayAtIGFkZCB0dW5uZWwgdG8g aGFzaCB0YWJsZQ0KKyAqICAgQHQ6IHR1bm5lbCB0byBiZSBhZGRlZA0KKyAq Ki8NCisNCitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfbGluayhzdHJ1Y3Qg aXA2X3RubCAqdCkNCit7DQorCXN0cnVjdCBpcDZfdG5sICoqdHAgPSBpcDZp cDZfYnVja2V0KCZ0LT5wYXJtcyk7DQorDQorCXdyaXRlX2xvY2tfYmgoJmlw NmlwNl9sb2NrKTsNCisJdC0+bmV4dCA9ICp0cDsNCisJd3JpdGVfdW5sb2Nr X2JoKCZpcDZpcDZfbG9jayk7DQorCSp0cCA9IHQ7DQorfQ0KKw0KKy8qKg0K KyAqIGlwNmlwNl90bmxfdW5saW5rIC0gcmVtb3ZlIHR1bm5lbCBmcm9tIGhh c2ggdGFibGUNCisgKiAgIEB0OiB0dW5uZWwgdG8gYmUgcmVtb3ZlZA0KKyAq Ki8NCisNCitzdGF0aWMgdm9pZA0KK2lwNmlwNl90bmxfdW5saW5rKHN0cnVj dCBpcDZfdG5sICp0KQ0KK3sNCisJc3RydWN0IGlwNl90bmwgKip0cDsNCisN CisJZm9yICh0cCA9IGlwNmlwNl9idWNrZXQoJnQtPnBhcm1zKTsgKnRwOyB0 cCA9ICYoKnRwKS0+bmV4dCkgew0KKwkJaWYgKHQgPT0gKnRwKSB7DQorCQkJ d3JpdGVfbG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwkJCSp0cCA9IHQtPm5l eHQ7DQorCQkJd3JpdGVfdW5sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQkJ YnJlYWs7DQorCQl9DQorCX0NCit9DQorDQorLyoqDQorICogaXA2X3RubF9j cmVhdGUoKSAtIGNyZWF0ZSBhIG5ldyB0dW5uZWwNCisgKiAgIEBwOiB0dW5u ZWwgcGFyYW1ldGVycw0KKyAqICAgQHB0OiBwb2ludGVyIHRvIG5ldyB0dW5u ZWwNCisgKg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgQ3JlYXRlIHR1bm5l bCBtYXRjaGluZyBnaXZlbiBwYXJhbWV0ZXJzLg0KKyAqIA0KKyAqIFJldHVy bjogDQorICogICAwIG9uIHN1Y2Nlc3MNCisgKiovDQorDQorc3RhdGljIGlu dA0KK2lwNl90bmxfY3JlYXRlKHN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAsIHN0 cnVjdCBpcDZfdG5sICoqcHQpDQorew0KKwlzdHJ1Y3QgbmV0X2RldmljZSAq ZGV2Ow0KKwlpbnQgZXJyID0gLUVOT0JVRlM7DQorCXN0cnVjdCBpcDZfdG5s ICp0Ow0KKw0KKwlkZXYgPSBrbWFsbG9jKHNpemVvZiAoKmRldikgKyBzaXpl b2YgKCp0KSwgR0ZQX0tFUk5FTCk7DQorCWlmICghZGV2KQ0KKwkJcmV0dXJu IGVycjsNCisNCisJbWVtc2V0KGRldiwgMCwgc2l6ZW9mICgqZGV2KSArIHNp emVvZiAoKnQpKTsNCisJZGV2LT5wcml2ID0gKHZvaWQgKikgKGRldiArIDEp Ow0KKwl0ID0gKHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJdC0+ ZGV2ID0gZGV2Ow0KKwlkZXYtPmluaXQgPSBpcDZpcDZfdG5sX2Rldl9pbml0 Ow0KKwltZW1jcHkoJnQtPnBhcm1zLCBwLCBzaXplb2YgKCpwKSk7DQorCXQt PnBhcm1zLm5hbWVbSUZOQU1TSVogLSAxXSA9ICdcMCc7DQorCWlmICh0LT5w YXJtcy5ob3BfbGltaXQgPiAyNTUpDQorCQl0LT5wYXJtcy5ob3BfbGltaXQg PSAtMTsNCisJc3RyY3B5KGRldi0+bmFtZSwgdC0+cGFybXMubmFtZSk7DQor CWlmICghZGV2LT5uYW1lWzBdKSB7DQorCQlpbnQgaSA9IDA7DQorCQlpbnQg ZXhpc3RzID0gMDsNCisNCisJCWRvIHsNCisJCQlzcHJpbnRmKGRldi0+bmFt ZSwgImlwNnRubCVkIiwgKytpKTsNCisJCQlleGlzdHMgPSAoX19kZXZfZ2V0 X2J5X25hbWUoZGV2LT5uYW1lKSAhPSBOVUxMKTsNCisJCX0gd2hpbGUgKGkg PCBJUDZfVE5MX01BWCAmJiBleGlzdHMpOw0KKw0KKwkJaWYgKGkgPT0gSVA2 X1ROTF9NQVgpIHsNCisJCQlnb3RvIGZhaWxlZDsNCisJCX0NCisJCW1lbWNw eSh0LT5wYXJtcy5uYW1lLCBkZXYtPm5hbWUsIElGTkFNU0laKTsNCisJfQ0K KwlTRVRfTU9EVUxFX09XTkVSKGRldik7DQorCWlmICgoZXJyID0gcmVnaXN0 ZXJfbmV0ZGV2aWNlKGRldikpIDwgMCkgew0KKwkJZ290byBmYWlsZWQ7DQor CX0NCisJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkqcHQgPSB0Ow0KKwlyZXR1 cm4gMDsNCitmYWlsZWQ6DQorCWtmcmVlKGRldik7DQorCXJldHVybiBlcnI7 DQorfQ0KKw0KKy8qKg0KKyAqIGlwNl90bmxfZGVzdHJveSgpIC0gZGVzdHJv eSBvbGQgdHVubmVsDQorICogICBAdDogdHVubmVsIHRvIGJlIGRlc3Ryb3ll ZA0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgd2hhdGV2ZXIgdW5yZWdpc3Rl cl9uZXRkZXZpY2UoKSByZXR1cm5zDQorICoqLw0KKw0KK3N0YXRpYyBpbmxp bmUgaW50DQoraXA2X3RubF9kZXN0cm95KHN0cnVjdCBpcDZfdG5sICp0KQ0K K3sNCisJcmV0dXJuIHVucmVnaXN0ZXJfbmV0ZGV2aWNlKHQtPmRldik7DQor fQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfbG9jYXRlIC0gZmluZCBvciBj cmVhdGUgdHVubmVsIG1hdGNoaW5nIGdpdmVuIHBhcmFtZXRlcnMNCisgKiAg IEBwOiB0dW5uZWwgcGFyYW1ldGVycyANCisgKiAgIEBjcmVhdGU6ICE9IDAg aWYgYWxsb3dlZCB0byBjcmVhdGUgbmV3IHR1bm5lbCBpZiBubyBtYXRjaCBm b3VuZA0KKyAqDQorICogRGVzY3JpcHRpb246DQorICogICBpcDZpcDZfdG5s X2xvY2F0ZSgpIGZpcnN0IHRyaWVzIHRvIGxvY2F0ZSBhbiBleGlzdGluZyB0 dW5uZWwNCisgKiAgIGJhc2VkIG9uIEBwYXJtcy4gSWYgdGhpcyBpcyB1bnN1 Y2Nlc3NmdWwsIGJ1dCBAY3JlYXRlIGlzIHNldCBhIG5ldw0KKyAqICAgdHVu bmVsIGRldmljZSBpcyBjcmVhdGVkIGFuZCByZWdpc3RlcmVkIGZvciB1c2Uu DQorICoNCisgKiBSZXR1cm46DQorICogICAwIGlmIHR1bm5lbCBsb2NhdGVk IG9yIGNyZWF0ZWQsDQorICogICAtRUlOVkFMIGlmIHBhcmFtZXRlcnMgaW5j b3JyZWN0LA0KKyAqICAgLUVOT0RFViBpZiBubyBtYXRjaGluZyB0dW5uZWwg YXZhaWxhYmxlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitpcDZpcDZfdG5s X2xvY2F0ZShzdHJ1Y3QgaXA2X3RubF9wYXJtICpwLCBzdHJ1Y3QgaXA2X3Ru bCAqKnB0LCBpbnQgY3JlYXRlKQ0KK3sNCisJc3RydWN0IGluNl9hZGRyICpy ZW1vdGUgPSAmcC0+cmFkZHI7DQorCXN0cnVjdCBpbjZfYWRkciAqbG9jYWwg PSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKw0KKwlpZiAo cC0+cHJvdG8gIT0gSVBQUk9UT19JUFY2KQ0KKwkJcmV0dXJuIC1FSU5WQUw7 DQorDQorCWZvciAodCA9ICppcDZpcDZfYnVja2V0KHApOyB0OyB0ID0gdC0+ bmV4dCkgew0KKwkJaWYgKCFpcHY2X2FkZHJfY21wKGxvY2FsLCAmdC0+cGFy bXMubGFkZHIpICYmDQorCQkgICAgIWlwdjZfYWRkcl9jbXAocmVtb3RlLCAm dC0+cGFybXMucmFkZHIpKSB7DQorCQkJKnB0ID0gdDsNCisJCQlyZXR1cm4g KGNyZWF0ZSA/IC1FRVhJU1QgOiAwKTsNCisJCX0NCisJfQ0KKwlpZiAoIWNy ZWF0ZSkgew0KKwkJcmV0dXJuIC1FTk9ERVY7DQorCX0NCisJcmV0dXJuIGlw Nl90bmxfY3JlYXRlKHAsIHB0KTsNCit9DQorDQorLyoqDQorICogaXA2aXA2 X3RubF9kZXZfZGVzdHJ1Y3RvciAtIHR1bm5lbCBkZXZpY2UgZGVzdHJ1Y3Rv cg0KKyAqICAgQGRldjogdGhlIGRldmljZSB0byBiZSBkZXN0cm95ZWQNCisg KiovDQorDQorc3RhdGljIHZvaWQNCitpcDZpcDZfdG5sX2Rldl9kZXN0cnVj dG9yKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYpDQorew0KKwlrZnJlZShkZXYp Ow0KK30NCisNCisvKioNCisgKiBpcDZpcDZfdG5sX2Rldl91bmluaXQgLSB0 dW5uZWwgZGV2aWNlIHVuaW5pdGlhbGl6ZXINCisgKiAgIEBkZXY6IHRoZSBk ZXZpY2UgdG8gYmUgZGVzdHJveWVkDQorICogICANCisgKiBEZXNjcmlwdGlv bjoNCisgKiAgIGlwNmlwNl90bmxfZGV2X3VuaW5pdCgpIHJlbW92ZXMgdHVu bmVsIGZyb20gaXRzIGxpc3QNCisgKiovDQorDQorc3RhdGljIHZvaWQNCitp cDZpcDZfdG5sX2Rldl91bmluaXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikN Cit7DQorCWlmIChkZXYgPT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7DQorCQl3 cml0ZV9sb2NrX2JoKCZpcDZpcDZfbG9jayk7DQorCQl0bmxzX3djWzBdID0g TlVMTDsNCisJCXdyaXRlX3VubG9ja19iaCgmaXA2aXA2X2xvY2spOw0KKwl9 IGVsc2Ugew0KKwkJc3RydWN0IGlwNl90bmwgKnQgPSAoc3RydWN0IGlwNl90 bmwgKikgZGV2LT5wcml2Ow0KKwkJaXA2aXA2X3RubF91bmxpbmsodCk7DQor CX0NCit9DQorDQorLyoqDQorICogcGFyc2VfdHZsX3RubF9lbmNfbGltIC0g aGFuZGxlIGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICogICBAc2ti OiByZWNlaXZlZCBzb2NrZXQgYnVmZmVyDQorICoNCisgKiBSZXR1cm46IA0K KyAqICAgMCBpZiBub25lIHdhcyBmb3VuZCwgDQorICogICBlbHNlIGluZGV4 IHRvIGVuY2Fwc3VsYXRpb24gbGltaXQNCisgKiovDQorDQorc3RhdGljIF9f dTE2DQorcGFyc2VfdGx2X3RubF9lbmNfbGltKHN0cnVjdCBza19idWZmICpz a2IsIF9fdTggKiByYXcpDQorew0KKwlzdHJ1Y3QgaXB2NmhkciAqaXB2Nmgg PSAoc3RydWN0IGlwdjZoZHIgKikgcmF3Ow0KKwlfX3U4IG5leHRoZHIgPSBp cHY2aC0+bmV4dGhkcjsNCisJX191MTYgb2ZmID0gc2l6ZW9mICgqaXB2Nmgp Ow0KKw0KKwl3aGlsZSAoaXB2Nl9leHRfaGRyKG5leHRoZHIpICYmIG5leHRo ZHIgIT0gTkVYVEhEUl9OT05FKSB7DQorCQlfX3UxNiBvcHRsZW4gPSAwOw0K KwkJc3RydWN0IGlwdjZfb3B0X2hkciAqaGRyOw0KKwkJaWYgKHJhdyArIG9m ZiArIHNpemVvZiAoKmhkcikgPiBza2ItPmRhdGEgJiYNCisJCSAgICAhcHNr Yl9tYXlfcHVsbChza2IsIHJhdyAtIHNrYi0+ZGF0YSArIG9mZiArIHNpemVv ZiAoKmhkcikpKQ0KKwkJCWJyZWFrOw0KKw0KKwkJaGRyID0gKHN0cnVjdCBp cHY2X29wdF9oZHIgKikgKHJhdyArIG9mZik7DQorCQlpZiAobmV4dGhkciA9 PSBORVhUSERSX0ZSQUdNRU5UKSB7DQorCQkJc3RydWN0IGZyYWdfaGRyICpm cmFnX2hkciA9IChzdHJ1Y3QgZnJhZ19oZHIgKikgaGRyOw0KKwkJCWlmIChm cmFnX2hkci0+ZnJhZ19vZmYpDQorCQkJCWJyZWFrOw0KKwkJCW9wdGxlbiA9 IDg7DQorCQl9IGVsc2UgaWYgKG5leHRoZHIgPT0gTkVYVEhEUl9BVVRIKSB7 DQorCQkJb3B0bGVuID0gKGhkci0+aGRybGVuICsgMikgPDwgMjsNCisJCX0g ZWxzZSB7DQorCQkJb3B0bGVuID0gaXB2Nl9vcHRsZW4oaGRyKTsNCisJCX0N CisJCWlmIChuZXh0aGRyID09IE5FWFRIRFJfREVTVCkgew0KKwkJCV9fdTE2 IGkgPSBvZmYgKyAyOw0KKwkJCXdoaWxlICgxKSB7DQorCQkJCXN0cnVjdCBp cHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKw0KKwkJCQkvKiBObyBtb3Jl IHJvb20gZm9yIGVuY2Fwc3VsYXRpb24gbGltaXQgKi8NCisJCQkJaWYgKGkg KyBzaXplb2YgKCp0ZWwpID4gb2ZmICsgb3B0bGVuKQ0KKwkJCQkJYnJlYWs7 DQorDQorCQkJCXRlbCA9IChzdHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0g KikgJnJhd1tpXTsNCisJCQkJLyogcmV0dXJuIGluZGV4IG9mIG9wdGlvbiBp ZiBmb3VuZCBhbmQgdmFsaWQgKi8NCisJCQkJaWYgKHRlbC0+dHlwZSA9PSBJ UFY2X1RMVl9UTkxfRU5DQVBfTElNSVQgJiYNCisJCQkJICAgIHRlbC0+bGVu Z3RoID09IDEpDQorCQkJCQlyZXR1cm4gaTsNCisJCQkJLyogZWxzZSBqdW1w IHRvIG5leHQgb3B0aW9uICovDQorCQkJCWlmICh0ZWwtPnR5cGUpDQorCQkJ CQlpICs9IHRlbC0+bGVuZ3RoICsgMjsNCisJCQkJZWxzZQ0KKwkJCQkJaSsr Ow0KKwkJCX0NCisJCX0NCisJCW5leHRoZHIgPSBoZHItPm5leHRoZHI7DQor CQlvZmYgKz0gb3B0bGVuOw0KKwl9DQorCXJldHVybiAwOw0KK30NCisNCisv KioNCisgKiBpcDZpcDZfZXJyIC0gdHVubmVsIGVycm9yIGhhbmRsZXINCisg Kg0KKyAqIERlc2NyaXB0aW9uOg0KKyAqICAgaXA2aXA2X2VycigpIHNob3Vs ZCBoYW5kbGUgZXJyb3JzIGluIHRoZSB0dW5uZWwgYWNjb3JkaW5nDQorICog ICB0byB0aGUgc3BlY2lmaWNhdGlvbnMgaW4gUkZDIDI0NzMuDQorICoqLw0K Kw0KK3ZvaWQgaXA2aXA2X2VycihzdHJ1Y3Qgc2tfYnVmZiAqc2tiLCBzdHJ1 Y3QgaW5ldDZfc2tiX3Bhcm0gKm9wdCwNCisJCSAgIGludCB0eXBlLCBpbnQg Y29kZSwgaW50IG9mZnNldCwgX191MzIgaW5mbykNCit7DQorCXN0cnVjdCBp cHY2aGRyICppcHY2aCA9IChzdHJ1Y3QgaXB2NmhkciAqKSBza2ItPmRhdGE7 DQorCXN0cnVjdCBpcDZfdG5sICp0Ow0KKwlpbnQgcmVsX21zZyA9IDA7DQor CWludCByZWxfdHlwZSA9IElDTVBWNl9ERVNUX1VOUkVBQ0g7DQorCWludCBy ZWxfY29kZSA9IElDTVBWNl9BRERSX1VOUkVBQ0g7DQorCV9fdTMyIHJlbF9p bmZvID0gMDsNCisJX191MTYgbGVuOw0KKw0KKwkvKiBJZiB0aGUgcGFja2V0 IGRvZXNuJ3QgY29udGFpbiB0aGUgb3JpZ2luYWwgSVB2NiBoZWFkZXIgd2Ug YXJlIA0KKwkgICBpbiB0cm91YmxlIHNpbmNlIHdlIG1pZ2h0IG5lZWQgdGhl IHNvdXJjZSBhZGRyZXNzIGZvciBmdXJ0ZXIgDQorCSAgIHByb2Nlc3Npbmcg b2YgdGhlIGVycm9yLiAqLw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2Nr KTsNCisJaWYgKCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5kYWRk ciwgJmlwdjZoLT5zYWRkcikpID09IE5VTEwpDQorCQlnb3RvIG91dDsNCisN CisJc3dpdGNoICh0eXBlKSB7DQorCQlfX3UzMiB0ZWxpOw0KKwkJc3RydWN0 IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCQlfX3UzMiBtdHU7DQor CWNhc2UgSUNNUFY2X0RFU1RfVU5SRUFDSDoNCisJCWlmIChuZXRfcmF0ZWxp bWl0KCkpDQorCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCSAgICAgICAi JXM6IFBhdGggdG8gZGVzdGluYXRpb24gaW52YWxpZCAiDQorCQkJICAgICAg ICJvciBpbmFjdGl2ZSFcbiIsIHQtPnBhcm1zLm5hbWUpOw0KKwkJcmVsX21z ZyA9IDE7DQorCQlicmVhazsNCisJY2FzZSBJQ01QVjZfVElNRV9FWENFRUQ6 DQorCQlpZiAoY29kZSA9PSBJQ01QVjZfRVhDX0hPUExJTUlUKSB7DQorCQkJ aWYgKG5ldF9yYXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklO Rw0KKwkJCQkgICAgICAgIiVzOiBUb28gc21hbGwgaG9wIGxpbWl0IG9yICIN CisJCQkJICAgICAgICJyb3V0aW5nIGxvb3AgaW4gdHVubmVsIVxuIiwgDQor CQkJCSAgICAgICB0LT5wYXJtcy5uYW1lKTsNCisJCQlyZWxfbXNnID0gMTsN CisJCX0NCisJCWJyZWFrOw0KKwljYXNlIElDTVBWNl9QQVJBTVBST0I6DQor CQkvKiBpZ25vcmUgaWYgcGFyYW1ldGVyIHByb2JsZW0gbm90IGNhdXNlZCBi eSBhIHR1bm5lbA0KKwkJICAgZW5jYXBzdWxhdGlvbiBsaW1pdCBzdWItb3B0 aW9uICovDQorCQlpZiAoY29kZSAhPSBJQ01QVjZfSERSX0ZJRUxEKSB7DQor CQkJYnJlYWs7DQorCQl9DQorCQl0ZWxpID0gcGFyc2VfdGx2X3RubF9lbmNf bGltKHNrYiwgc2tiLT5kYXRhKTsNCisNCisJCWlmICh0ZWxpICYmIHRlbGkg PT0gaW5mbyAtIDIpIHsNCisJCQl0ZWwgPSAoc3RydWN0IGlwdjZfdGx2X3Ru bF9lbmNfbGltICopICZza2ItPmRhdGFbdGVsaV07DQorCQkJaWYgKHRlbC0+ ZW5jYXBfbGltaXQgPD0gMSkgew0KKwkJCQlpZiAobmV0X3JhdGVsaW1pdCgp KQ0KKwkJCQkJcHJpbnRrKEtFUk5fV0FSTklORw0KKwkJCQkJICAgICAgICIl czogVG9vIHNtYWxsIGVuY2Fwc3VsYXRpb24gIg0KKwkJCQkJICAgICAgICJs aW1pdCBvciByb3V0aW5nIGxvb3AgaW4gIg0KKwkJCQkJICAgICAgICJ0dW5u ZWwhXG4iLCB0LT5wYXJtcy5uYW1lKTsNCisJCQkJcmVsX21zZyA9IDE7DQor CQkJfQ0KKwkJfQ0KKwkJYnJlYWs7DQorCWNhc2UgSUNNUFY2X1BLVF9UT09C SUc6DQorCQltdHUgPSBpbmZvIC0gb2Zmc2V0Ow0KKwkJaWYgKG10dSA8PSBJ UFY2X01JTl9NVFUpIHsNCisJCQltdHUgPSBJUFY2X01JTl9NVFU7DQorCQl9 DQorCQl0LT5kZXYtPm10dSA9IG10dTsNCisNCisJCWlmICgobGVuID0gc2l6 ZW9mICgqaXB2NmgpICsgaXB2NmgtPnBheWxvYWRfbGVuKSA+IG10dSkgew0K KwkJCXJlbF90eXBlID0gSUNNUFY2X1BLVF9UT09CSUc7DQorCQkJcmVsX2Nv ZGUgPSAwOw0KKwkJCXJlbF9pbmZvID0gbXR1Ow0KKwkJCXJlbF9tc2cgPSAx Ow0KKwkJfQ0KKwkJYnJlYWs7DQorCX0NCisJaWYgKHJlbF9tc2cgJiYgIHBz a2JfbWF5X3B1bGwoc2tiLCBvZmZzZXQgKyBzaXplb2YgKCppcHY2aCkpKSB7 DQorCQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0Ow0KKwkJc3RydWN0IHNrX2J1ZmYg KnNrYjIgPSBza2JfY2xvbmUoc2tiLCBHRlBfQVRPTUlDKTsNCisJCWlmICgh c2tiMikNCisJCQlnb3RvIG91dDsNCisNCisJCWRzdF9yZWxlYXNlKHNrYjIt PmRzdCk7DQorCQlza2IyLT5kc3QgPSBOVUxMOw0KKwkJc2tiX3B1bGwoc2ti Miwgb2Zmc2V0KTsNCisJCXNrYjItPm5oLnJhdyA9IHNrYjItPmRhdGE7DQor DQorCQkvKiBUcnkgdG8gZ3Vlc3MgaW5jb21pbmcgaW50ZXJmYWNlICovDQor CQlydCA9IHJ0Nl9sb29rdXAoJnNrYjItPm5oLmlwdjZoLT5zYWRkciwgTlVM TCwgMCwgMCk7DQorDQorCQlpZiAocnQgJiYgcnQtPnJ0NmlfZGV2KQ0KKwkJ CXNrYjItPmRldiA9IHJ0LT5ydDZpX2RldjsNCisNCisJCWljbXB2Nl9zZW5k KHNrYjIsIHJlbF90eXBlLCByZWxfY29kZSwgcmVsX2luZm8sIHNrYjItPmRl dik7DQorDQorCQlpZiAocnQpDQorCQkJZHN0X2ZyZWUoJnJ0LT51LmRzdCk7 DQorDQorCQlrZnJlZV9za2Ioc2tiMik7DQorCX0NCitvdXQ6DQorCXJlYWRf dW5sb2NrKCZpcDZpcDZfbG9jayk7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlw Nl9yY3YgLSBkZWNhcHN1bGF0ZSBJUHY2IHBhY2tldCBhbmQgcmV0cmFuc21p dCBpdCBsb2NhbGx5DQorICogICBAc2tiOiByZWNlaXZlZCBzb2NrZXQgYnVm ZmVyDQorICoNCisgKiBSZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlw Nl9yY3Yoc3RydWN0IHNrX2J1ZmYgKipwc2tiLCB1bnNpZ25lZCBpbnQgKm5o b2ZmcCkNCit7DQorCXN0cnVjdCBza19idWZmICpza2IgPSAqcHNrYjsNCisJ c3RydWN0IGlwdjZoZHIgKmlwdjZoOw0KKwlzdHJ1Y3QgaXA2X3RubCAqdDsN CisNCisJaWYgKCFwc2tiX21heV9wdWxsKHNrYiwgc2l6ZW9mICgqaXB2Nmgp KSkNCisJCWdvdG8gZGlzY2FyZDsNCisNCisJaXB2NmggPSBza2ItPm5oLmlw djZoOw0KKw0KKwlyZWFkX2xvY2soJmlwNmlwNl9sb2NrKTsNCisNCisJaWYg KCh0ID0gaXA2aXA2X3RubF9sb29rdXAoJmlwdjZoLT5zYWRkciwgJmlwdjZo LT5kYWRkcikpICE9IE5VTEwpIHsNCisJCWlmICghKHQtPnBhcm1zLmZsYWdz ICYgSVA2X1ROTF9GX0NBUF9SQ1YpKSB7DQorCQkJdC0+c3RhdC5yeF9kcm9w cGVkKys7DQorCQkJcmVhZF91bmxvY2soJmlwNmlwNl9sb2NrKTsNCisJCQln b3RvIGRpc2NhcmQ7DQorCQl9DQorCQlza2ItPm1hYy5yYXcgPSBza2ItPm5o LnJhdzsNCisJCXNrYi0+bmgucmF3ID0gc2tiLT5kYXRhOw0KKwkJc2tiLT5w cm90b2NvbCA9IGh0b25zKEVUSF9QX0lQVjYpOw0KKwkJc2tiLT5wa3RfdHlw ZSA9IFBBQ0tFVF9IT1NUOw0KKwkJbWVtc2V0KHNrYi0+Y2IsIDAsIHNpemVv ZihzdHJ1Y3QgaW5ldDZfc2tiX3Bhcm0pKTsNCisJCXNrYi0+ZGV2ID0gdC0+ ZGV2Ow0KKwkJZHN0X3JlbGVhc2Uoc2tiLT5kc3QpOw0KKwkJc2tiLT5kc3Qg PSBOVUxMOw0KKwkJdC0+c3RhdC5yeF9wYWNrZXRzKys7DQorCQl0LT5zdGF0 LnJ4X2J5dGVzICs9IHNrYi0+bGVuOw0KKwkJbmV0aWZfcngoc2tiKTsNCisJ CXJlYWRfdW5sb2NrKCZpcDZpcDZfbG9jayk7DQorCQlyZXR1cm4gMDsNCisJ fQ0KKwlyZWFkX3VubG9jaygmaXA2aXA2X2xvY2spOw0KKwlpY21wdjZfc2Vu ZChza2IsIElDTVBWNl9ERVNUX1VOUkVBQ0gsIElDTVBWNl9BRERSX1VOUkVB Q0gsIDAsIHNrYi0+ZGV2KTsNCitkaXNjYXJkOg0KKwlrZnJlZV9za2Ioc2ti KTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIHR4b3B0X2xlbiAt IGdldCBuZWNlc3Nhcnkgc2l6ZSBmb3IgbmV3ICZzdHJ1Y3QgaXB2Nl90eG9w dGlvbnMNCisgKiAgIEBvcmlnX29wdDogb2xkIG9wdGlvbnMNCisgKg0KKyAq IFJldHVybjoNCisgKiAgIFNpemUgb2Ygb2xkIG9uZSBwbHVzIHNpemUgb2Yg dHVubmVsIGVuY2Fwc3VsYXRpb24gbGltaXQgb3B0aW9uDQorICoqLw0KKw0K K3N0YXRpYyBpbmxpbmUgaW50DQordHhvcHRfbGVuKHN0cnVjdCBpcHY2X3R4 b3B0aW9ucyAqb3JpZ19vcHQpDQorew0KKwlpbnQgbGVuID0gc2l6ZW9mICgq b3JpZ19vcHQpICsgODsNCisNCisJaWYgKG9yaWdfb3B0ICYmIG9yaWdfb3B0 LT5kc3Qwb3B0KQ0KKwkJbGVuICs9IGlwdjZfb3B0bGVuKG9yaWdfb3B0LT5k c3Qwb3B0KTsNCisJcmV0dXJuIGxlbjsNCit9DQorDQorLyoqDQorICogbWVy Z2Vfb3B0aW9ucyAtIGFkZCBlbmNhcHN1bGF0aW9uIGxpbWl0IHRvIG9yaWdp bmFsIG9wdGlvbnMNCisgKiAgIEBlbmNhcF9saW1pdDogbnVtYmVyIG9mIGFs bG93ZWQgZW5jYXBzdWxhdGlvbiBsaW1pdHMNCisgKiAgIEBvcmlnX29wdDog b3JpZ2luYWwgb3B0aW9ucw0KKyAqIA0KKyAqIFJldHVybjoNCisgKiAgIFBv aW50ZXIgdG8gbmV3ICZzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgY29udGFpbmlu ZyB0aGUgdHVubmVsDQorICogICBlbmNhcHN1bGF0aW9uIGxpbWl0DQorICoq Lw0KKw0KK3N0YXRpYyBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKg0KK21lcmdl X29wdGlvbnMoc3RydWN0IHNvY2sgKnNrLCBfX3U4IGVuY2FwX2xpbWl0LA0K KwkgICAgICBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9yaWdfb3B0KQ0KK3sN CisJc3RydWN0IGlwdjZfdGx2X3RubF9lbmNfbGltICp0ZWw7DQorCXN0cnVj dCBpcHY2X3R4b3B0aW9ucyAqb3B0Ow0KKwlfX3U4ICpyYXc7DQorCV9fdTgg cGFkX3RvID0gODsNCisJaW50IG9wdF9sZW4gPSB0eG9wdF9sZW4ob3JpZ19v cHQpOw0KKw0KKwlpZiAoIShvcHQgPSBzb2NrX2ttYWxsb2Moc2ssIG9wdF9s ZW4sIEdGUF9BVE9NSUMpKSkgew0KKwkJcmV0dXJuIE5VTEw7DQorCX0NCisN CisJbWVtc2V0KG9wdCwgMCwgb3B0X2xlbik7DQorCW9wdC0+dG90X2xlbiA9 IG9wdF9sZW47DQorCW9wdC0+ZHN0MG9wdCA9IChzdHJ1Y3QgaXB2Nl9vcHRf aGRyICopIChvcHQgKyAxKTsNCisJb3B0LT5vcHRfbmZsZW4gPSA4Ow0KKw0K KwlyYXcgPSAoX191OCAqKSBvcHQtPmRzdDBvcHQ7DQorDQorCXRlbCA9IChz dHJ1Y3QgaXB2Nl90bHZfdG5sX2VuY19saW0gKikgKG9wdC0+ZHN0MG9wdCAr IDEpOw0KKwl0ZWwtPnR5cGUgPSBJUFY2X1RMVl9UTkxfRU5DQVBfTElNSVQ7 DQorCXRlbC0+bGVuZ3RoID0gMTsNCisJdGVsLT5lbmNhcF9saW1pdCA9IGVu Y2FwX2xpbWl0Ow0KKw0KKwlpZiAob3JpZ19vcHQpIHsNCisJCV9fdTggKm9y aWdfcmF3Ow0KKw0KKwkJb3B0LT5ob3BvcHQgPSBvcmlnX29wdC0+aG9wb3B0 Ow0KKw0KKwkJLyogS2VlcCB0aGUgb3JpZ2luYWwgZGVzdGluYXRpb24gb3B0 aW9ucyBwcm9wZXJseQ0KKwkJICAgYWxpZ25lZCBhbmQgbWVyZ2UgcG9zc2li bGUgb2xkIHBhZGRpbmdzIHRvIHRoZQ0KKwkJICAgbmV3IHBhZGRpbmcgb3B0 aW9uICovDQorCQlpZiAoKG9yaWdfcmF3ID0gKF9fdTggKikgb3JpZ19vcHQt PmRzdDBvcHQpICE9IE5VTEwpIHsNCisJCQlfX3U4IHR5cGU7DQorCQkJaW50 IGkgPSBzaXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXBhZF90 byArPSBzaXplb2YgKHN0cnVjdCBpcHY2X29wdF9oZHIpOw0KKwkJCXdoaWxl IChpIDwgaXB2Nl9vcHRsZW4ob3JpZ19vcHQtPmRzdDBvcHQpKSB7DQorCQkJ CXR5cGUgPSBvcmlnX3Jhd1tpKytdOw0KKwkJCQlpZiAodHlwZSA9PSBJUFY2 X1RMVl9QQUQwKQ0KKwkJCQkJcGFkX3RvKys7DQorCQkJCWVsc2UgaWYgKHR5 cGUgPT0gSVBWNl9UTFZfUEFETikgew0KKwkJCQkJaW50IGxlbiA9IG9yaWdf cmF3W2krK107DQorCQkJCQlpICs9IGxlbjsNCisJCQkJCXBhZF90byArPSBs ZW4gKyAyOw0KKwkJCQl9IGVsc2Ugew0KKwkJCQkJYnJlYWs7DQorCQkJCX0N CisJCQl9DQorCQkJb3B0LT5kc3Qwb3B0LT5oZHJsZW4gPSBvcmlnX29wdC0+ ZHN0MG9wdC0+aGRybGVuICsgMTsNCisJCQltZW1jcHkocmF3ICsgcGFkX3Rv LCBvcmlnX3JhdyArIHBhZF90byAtIDgsDQorCQkJICAgICAgIG9wdF9sZW4g LSBzaXplb2YgKCpvcHQpIC0gcGFkX3RvKTsNCisJCX0NCisJCW9wdC0+c3Jj cnQgPSBvcmlnX29wdC0+c3JjcnQ7DQorCQlvcHQtPm9wdF9uZmxlbiArPSBv cmlnX29wdC0+b3B0X25mbGVuOw0KKw0KKwkJb3B0LT5kc3Qxb3B0ID0gb3Jp Z19vcHQtPmRzdDFvcHQ7DQorCQlvcHQtPmF1dGggPSBvcmlnX29wdC0+YXV0 aDsNCisJCW9wdC0+b3B0X2ZsZW4gPSBvcmlnX29wdC0+b3B0X2ZsZW47DQor CX0NCisJcmF3WzVdID0gSVBWNl9UTFZfUEFETjsNCisNCisJLyogc3VidHJh Y3QgbGVuZ3RocyBvZiBkZXN0aW5hdGlvbiBzdWJvcHRpb24gaGVhZGVyLA0K KwkgICB0dW5uZWwgZW5jYXBzdWxhdGlvbiBsaW1pdCBhbmQgcGFkIE4gaGVh ZGVyICovDQorCXJhd1s2XSA9IHBhZF90byAtIDc7DQorDQorCXJldHVybiBv cHQ7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfYWRkcl9jb25mbGlj dCAtIGNvbXBhcmUgcGFja2V0IGFkZHJlc3NlcyB0byB0dW5uZWwncyBvd24N CisgKiAgIEB0OiB0aGUgb3V0Z29pbmcgdHVubmVsIGRldmljZQ0KKyAqICAg QGhkcjogSVB2NiBoZWFkZXIgZnJvbSB0aGUgaW5jb21pbmcgcGFja2V0IA0K KyAqDQorICogRGVzY3JpcHRpb246DQorICogICBBdm9pZCB0cml2aWFsIHR1 bm5lbGluZyBsb29wIGJ5IGNoZWNraW5nIHRoYXQgdHVubmVsIGV4aXQtcG9p bnQgDQorICogICBkb2Vzbid0IG1hdGNoIHNvdXJjZSBvZiBpbmNvbWluZyBw YWNrZXQuDQorICoNCisgKiBSZXR1cm46IA0KKyAqICAgMSBpZiBjb25mbGlj dCwNCisgKiAgIDAgZWxzZQ0KKyAqKi8NCisNCitzdGF0aWMgaW5saW5lIGlu dA0KK2lwNmlwNl90bmxfYWRkcl9jb25mbGljdChzdHJ1Y3QgaXA2X3RubCAq dCwgc3RydWN0IGlwdjZoZHIgKmhkcikNCit7DQorCXJldHVybiAhaXB2Nl9h ZGRyX2NtcCgmdC0+cGFybXMucmFkZHIsICZoZHItPnNhZGRyKTsNCit9DQor DQorLyoqDQorICogaXA2aXA2X3RubF94bWl0IC0gZW5jYXBzdWxhdGUgcGFj a2V0IGFuZCBzZW5kIA0KKyAqICAgQHNrYjogdGhlIG91dGdvaW5nIHNvY2tl dCBidWZmZXINCisgKiAgIEBkZXY6IHRoZSBvdXRnb2luZyB0dW5uZWwgZGV2 aWNlIA0KKyAqDQorICogRGVzY3JpcHRpb246DQorICogICBCdWlsZCBuZXcg aGVhZGVyIGFuZCBkbyBzb21lIHNhbml0eSBjaGVja3Mgb24gdGhlIHBhY2tl dCBiZWZvcmUgc2VuZGluZw0KKyAqICAgaXQgdG8gaXA2X2J1aWxkX3htaXQo KS4NCisgKg0KKyAqIFJldHVybjogDQorICogICAwDQorICoqLw0KKw0KK2lu dCBpcDZpcDZfdG5sX3htaXQoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3RydWN0 IG5ldF9kZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0g KHN0cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJc3RydWN0IG5ldF9k ZXZpY2Vfc3RhdHMgKnN0YXRzID0gJnQtPnN0YXQ7DQorCXN0cnVjdCBpcHY2 aGRyICppcHY2aCA9IHNrYi0+bmguaXB2Nmg7DQorCXN0cnVjdCBpcHY2X3R4 b3B0aW9ucyAqb3JpZ19vcHQgPSBOVUxMOw0KKwlzdHJ1Y3QgaXB2Nl90eG9w dGlvbnMgKm9wdCA9IE5VTEw7DQorCV9fdTggZW5jYXBfbGltaXQgPSAwOw0K KwlfX3UxNiBvZmZzZXQ7DQorCXN0cnVjdCBmbG93aSBmbDsNCisJc3RydWN0 IGlwNl9mbG93bGFiZWwgKmZsX2xibCA9IE5VTEw7DQorCWludCBlcnIgPSAw Ow0KKwlzdHJ1Y3QgZHN0X2VudHJ5ICpkc3Q7DQorCWludCBsaW5rX2ZhaWx1 cmUgPSAwOw0KKwlzdHJ1Y3Qgc29jayAqc2sgPSBpcDZfc29ja2V0LT5zazsN CisJc3RydWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOw0KKwlp bnQgbXR1Ow0KKw0KKwlpZiAodC0+cmVjdXJzaW9uKyspIHsNCisJCXN0YXRz LT5jb2xsaXNpb25zKys7DQorCQlnb3RvIHR4X2VycjsNCisJfQ0KKwlpZiAo c2tiLT5wcm90b2NvbCAhPSBodG9ucyhFVEhfUF9JUFY2KSB8fA0KKwkgICAg ISh0LT5wYXJtcy5mbGFncyAmIElQNl9UTkxfRl9DQVBfWE1JVCkgfHwNCisJ ICAgIGlwNmlwNl90bmxfYWRkcl9jb25mbGljdCh0LCBpcHY2aCkpIHsNCisJ CWdvdG8gdHhfZXJyOw0KKwl9DQorCWlmICgob2Zmc2V0ID0gcGFyc2VfdGx2 X3RubF9lbmNfbGltKHNrYiwgc2tiLT5uaC5yYXcpKSA+IDApIHsNCisJCXN0 cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSAqdGVsOw0KKwkJdGVsID0gKHN0 cnVjdCBpcHY2X3Rsdl90bmxfZW5jX2xpbSAqKSAmc2tiLT5uaC5yYXdbb2Zm c2V0XTsNCisJCWlmICh0ZWwtPmVuY2FwX2xpbWl0IDw9IDEpIHsNCisJCQlp Y21wdjZfc2VuZChza2IsIElDTVBWNl9QQVJBTVBST0IsDQorCQkJCSAgICBJ Q01QVjZfSERSX0ZJRUxELCBvZmZzZXQgKyAyLCBza2ItPmRldik7DQorCQkJ Z290byB0eF9lcnI7DQorCQl9DQorCQllbmNhcF9saW1pdCA9IHRlbC0+ZW5j YXBfbGltaXQgLSAxOw0KKwl9IGVsc2UgaWYgKCEodC0+cGFybXMuZmxhZ3Mg JiBJUDZfVE5MX0ZfSUdOX0VOQ0FQX0xJTUlUKSkgew0KKwkJZW5jYXBfbGlt aXQgPSB0LT5wYXJtcy5lbmNhcF9saW1pdDsNCisJfQ0KKwlpcDZfeG1pdF9s b2NrKCk7DQorDQorCW1lbWNweSgmZmwsICZ0LT5mbCwgc2l6ZW9mIChmbCkp Ow0KKw0KKwlpZiAoKHQtPnBhcm1zLmZsYWdzICYgSVA2X1ROTF9GX1VTRV9P UklHX1RDTEFTUykpDQorCQlmbC5mbDZfZmxvd2xhYmVsIHw9ICgqKF9fdTMy ICopIGlwdjZoICYgSVBWNl9UQ0xBU1NfTUFTSyk7DQorCWlmICgodC0+cGFy bXMuZmxhZ3MgJiBJUDZfVE5MX0ZfVVNFX09SSUdfRkxPV0xBQkVMKSkNCisJ CWZsLmZsNl9mbG93bGFiZWwgfD0gKCooX191MzIgKikgaXB2NmggJiBJUFY2 X0ZMT1dMQUJFTF9NQVNLKTsNCisNCisJaWYgKGZsLmZsNl9mbG93bGFiZWwp IHsNCisJCWZsX2xibCA9IGZsNl9zb2NrX2xvb2t1cChzaywgZmwuZmw2X2Zs b3dsYWJlbCk7DQorCQlpZiAoZmxfbGJsKQ0KKwkJCW9yaWdfb3B0ID0gZmxf bGJsLT5vcHQ7DQorCX0NCisJaWYgKGVuY2FwX2xpbWl0ID4gMCkgew0KKwkJ aWYgKCEob3B0ID0gbWVyZ2Vfb3B0aW9ucyhzaywgZW5jYXBfbGltaXQsIG9y aWdfb3B0KSkpIHsNCisJCQlnb3RvIHR4X2Vycl9mcmVlX2ZsX2xibDsNCisJ CX0NCisJfSBlbHNlIHsNCisJCW9wdCA9IG9yaWdfb3B0Ow0KKwl9DQorCWRz dCA9IF9fc2tfZHN0X2NoZWNrKHNrLCBucC0+ZHN0X2Nvb2tpZSk7DQorDQor CWlmIChkc3QpIHsNCisJCWlmIChucC0+ZGFkZHJfY2FjaGUgPT0gTlVMTCB8 fA0KKwkJICAgIGlwdjZfYWRkcl9jbXAoJmZsLmZsNl9kc3QsIG5wLT5kYWRk cl9jYWNoZSkgfHwNCisJCSAgICAoZmwub2lmICYmIGZsLm9pZiAhPSBkc3Qt PmRldi0+aWZpbmRleCkpIHsNCisJCQlkc3QgPSBOVUxMOw0KKwkJfQ0KKwl9 DQorCWlmIChkc3QgPT0gTlVMTCkgew0KKwkJZHN0ID0gaXA2X3JvdXRlX291 dHB1dChzaywgJmZsKTsNCisJCWlmIChkc3QtPmVycm9yKSB7DQorCQkJc3Rh dHMtPnR4X2NhcnJpZXJfZXJyb3JzKys7DQorCQkJbGlua19mYWlsdXJlID0g MTsNCisJCQlnb3RvIHR4X2Vycl9kc3RfcmVsZWFzZTsNCisJCX0NCisJCS8q IGxvY2FsIHJvdXRpbmcgbG9vcCAqLw0KKwkJaWYgKGRzdC0+ZGV2ID09IGRl dikgew0KKwkJCXN0YXRzLT5jb2xsaXNpb25zKys7DQorCQkJaWYgKG5ldF9y YXRlbGltaXQoKSkNCisJCQkJcHJpbnRrKEtFUk5fV0FSTklORyANCisJCQkJ ICAgICAgICIlczogTG9jYWwgcm91dGluZyBsb29wIGRldGVjdGVkIVxuIiwN CisJCQkJICAgICAgIHQtPnBhcm1zLm5hbWUpOw0KKwkJCWdvdG8gdHhfZXJy X2RzdF9yZWxlYXNlOw0KKwkJfQ0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5k YWRkciwgJmZsLmZsNl9kc3QpOw0KKwkJaXB2Nl9hZGRyX2NvcHkoJm5wLT5z YWRkciwgJmZsLmZsNl9zcmMpOw0KKwl9DQorCW10dSA9IGRzdF9wbXR1KGRz dCkgLSBzaXplb2YgKCppcHY2aCk7DQorCWlmIChvcHQpIHsNCisJCW10dSAt PSAob3B0LT5vcHRfbmZsZW4gKyBvcHQtPm9wdF9mbGVuKTsNCisJfQ0KKwlp ZiAobXR1IDwgSVBWNl9NSU5fTVRVKQ0KKwkJbXR1ID0gSVBWNl9NSU5fTVRV Ow0KKwlpZiAoc2tiLT5kc3QgJiYgbXR1IDwgZHN0X3BtdHUoc2tiLT5kc3Qp KSB7DQorCQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gKHN0cnVjdCBydDZfaW5m byAqKSBza2ItPmRzdDsNCisJCXJ0LT5ydDZpX2ZsYWdzIHw9IFJURl9NT0RJ RklFRDsNCisJCXJ0LT51LmRzdC5tZXRyaWNzW1JUQVhfTVRVLTFdID0gbXR1 Ow0KKwl9DQorCWlmIChza2ItPmxlbiA+IG10dSkgew0KKwkJaWNtcHY2X3Nl bmQoc2tiLCBJQ01QVjZfUEtUX1RPT0JJRywgMCwgbXR1LCBkZXYpOw0KKwkJ Z290byB0eF9lcnJfb3B0X3JlbGVhc2U7DQorCX0NCisJZXJyID0gaXA2X2Fw cGVuZF9kYXRhKHNrLCBpcF9nZW5lcmljX2dldGZyYWcsIHNrYi0+bmgucmF3 LCBza2ItPmxlbiwgMCwNCisJCQkgICAgICB0LT5wYXJtcy5ob3BfbGltaXQs IG9wdCwgJmZsLCANCisJCQkgICAgICAoc3RydWN0IHJ0Nl9pbmZvICopZHN0 LCBNU0dfRE9OVFdBSVQpOw0KKw0KKwlpZiAoZXJyKSB7DQorCQlpcDZfZmx1 c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwl9IGVsc2Ugew0KKwkJZXJyID0g aXA2X3B1c2hfcGVuZGluZ19mcmFtZXMoc2spOw0KKwkJZXJyID0gKGVyciA8 IDAgPyBlcnIgOiAwKTsNCisJfQ0KKwlpZiAoIWVycikgew0KKwkJc3RhdHMt PnR4X2J5dGVzICs9IHNrYi0+bGVuOw0KKwkJc3RhdHMtPnR4X3BhY2tldHMr KzsNCisJfSBlbHNlIHsNCisJCXN0YXRzLT50eF9lcnJvcnMrKzsNCisJCXN0 YXRzLT50eF9hYm9ydGVkX2Vycm9ycysrOw0KKwl9DQorCWlmIChvcHQgJiYg b3B0ICE9IG9yaWdfb3B0KQ0KKwkJc29ja19rZnJlZV9zKHNrLCBvcHQsIG9w dC0+dG90X2xlbik7DQorDQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsN CisJaXA2X2RzdF9zdG9yZShzaywgZHN0LCAmbnAtPmRhZGRyKTsNCisJaXA2 X3htaXRfdW5sb2NrKCk7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0LT5yZWN1 cnNpb24tLTsNCisJcmV0dXJuIDA7DQordHhfZXJyX2RzdF9yZWxlYXNlOg0K Kwlkc3RfcmVsZWFzZShkc3QpOw0KK3R4X2Vycl9vcHRfcmVsZWFzZToNCisJ aWYgKG9wdCAmJiBvcHQgIT0gb3JpZ19vcHQpDQorCQlzb2NrX2tmcmVlX3Mo c2ssIG9wdCwgb3B0LT50b3RfbGVuKTsNCit0eF9lcnJfZnJlZV9mbF9sYmw6 DQorCWZsNl9zb2NrX3JlbGVhc2UoZmxfbGJsKTsNCisJaXA2X3htaXRfdW5s b2NrKCk7DQorCWlmIChsaW5rX2ZhaWx1cmUpDQorCQlkc3RfbGlua19mYWls dXJlKHNrYik7DQordHhfZXJyOg0KKwlzdGF0cy0+dHhfZXJyb3JzKys7DQor CXN0YXRzLT50eF9kcm9wcGVkKys7DQorCWtmcmVlX3NrYihza2IpOw0KKwl0 LT5yZWN1cnNpb24tLTsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyB2 b2lkIGlwNl90bmxfc2V0X2NhcChzdHJ1Y3QgaXA2X3RubCAqdCkNCit7DQor CXN0cnVjdCBpcDZfdG5sX3Bhcm0gKnAgPSAmdC0+cGFybXM7DQorCXN0cnVj dCBpbjZfYWRkciAqbGFkZHIgPSAmcC0+bGFkZHI7DQorCXN0cnVjdCBpbjZf YWRkciAqcmFkZHIgPSAmcC0+cmFkZHI7DQorCWludCBsdHlwZSA9IGlwdjZf YWRkcl90eXBlKGxhZGRyKTsNCisJaW50IHJ0eXBlID0gaXB2Nl9hZGRyX3R5 cGUocmFkZHIpOw0KKw0KKwlwLT5mbGFncyAmPSB+KElQNl9UTkxfRl9DQVBf WE1JVHxJUDZfVE5MX0ZfQ0FQX1JDVik7DQorDQorCWlmIChsdHlwZSAhPSBJ UFY2X0FERFJfQU5ZICYmIHJ0eXBlICE9IElQVjZfQUREUl9BTlkgJiYNCisJ ICAgICgobHR5cGV8cnR5cGUpICYNCisJICAgICAoSVBWNl9BRERSX1VOSUNB U1R8DQorCSAgICAgIElQVjZfQUREUl9MT09QQkFDS3xJUFY2X0FERFJfTElO S0xPQ0FMfA0KKwkgICAgICBJUFY2X0FERFJfTUFQUEVEfElQVjZfQUREUl9S RVNFUlZFRCkpID09IElQVjZfQUREUl9VTklDQVNUKSB7DQorCQlzdHJ1Y3Qg bmV0X2RldmljZSAqbGRldiA9IE5VTEw7DQorCQlpbnQgbF9vayA9IDE7DQor CQlpbnQgcl9vayA9IDE7DQorDQorCQlpZiAocC0+bGluaykNCisJCQlsZGV2 ID0gZGV2X2dldF9ieV9pbmRleChwLT5saW5rKTsNCisJCQ0KKwkJaWYgKChs dHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkgJiYgIWlwdjZfY2hrX2FkZHIobGFk ZHIsIGxkZXYpKQ0KKwkJCWxfb2sgPSAwOw0KKwkJDQorCQlpZiAoKHJ0eXBl JklQVjZfQUREUl9VTklDQVNUKSAmJiBpcHY2X2Noa19hZGRyKHJhZGRyLCBO VUxMKSkNCisJCQlyX29rID0gMDsNCisJCQ0KKwkJaWYgKGxfb2sgJiYgcl9v aykgew0KKwkJCWlmIChsdHlwZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJ cC0+ZmxhZ3MgfD0gSVA2X1ROTF9GX0NBUF9YTUlUOw0KKwkJCWlmIChydHlw ZSZJUFY2X0FERFJfVU5JQ0FTVCkNCisJCQkJcC0+ZmxhZ3MgfD0gSVA2X1RO TF9GX0NBUF9SQ1Y7DQorCQl9DQorCQlpZiAobGRldikNCisJCQlkZXZfcHV0 KGxkZXYpOw0KKwl9DQorfQ0KKw0KKw0KK3N0YXRpYyB2b2lkIGlwNmlwNl90 bmxfbGlua19jb25maWcoc3RydWN0IGlwNl90bmwgKnQpDQorew0KKwlzdHJ1 Y3QgbmV0X2RldmljZSAqZGV2ID0gdC0+ZGV2Ow0KKwlzdHJ1Y3QgaXA2X3Ru bF9wYXJtICpwID0gJnQtPnBhcm1zOw0KKwlzdHJ1Y3QgZmxvd2kgKmZsOw0K KwkvKiBTZXQgdXAgZmxvd2kgdGVtcGxhdGUgKi8NCisJZmwgPSAmdC0+Zmw7 DQorCWlwdjZfYWRkcl9jb3B5KCZmbC0+Zmw2X3NyYywgJnAtPmxhZGRyKTsN CisJaXB2Nl9hZGRyX2NvcHkoJmZsLT5mbDZfZHN0LCAmcC0+cmFkZHIpOw0K KwlmbC0+b2lmID0gcC0+bGluazsNCisJZmwtPmZsNl9mbG93bGFiZWwgPSAw Ow0KKw0KKwlpZiAoIShwLT5mbGFncyZJUDZfVE5MX0ZfVVNFX09SSUdfVENM QVNTKSkNCisJCWZsLT5mbDZfZmxvd2xhYmVsIHw9IElQVjZfVENMQVNTX01B U0sgJiBodG9ubChwLT5mbG93aW5mbyk7DQorCWlmICghKHAtPmZsYWdzJklQ Nl9UTkxfRl9VU0VfT1JJR19GTE9XTEFCRUwpKQ0KKwkJZmwtPmZsNl9mbG93 bGFiZWwgfD0gSVBWNl9GTE9XTEFCRUxfTUFTSyAmIGh0b25sKHAtPmZsb3dp bmZvKTsNCisNCisJaXA2X3RubF9zZXRfY2FwKHQpOw0KKw0KKwlpZiAocC0+ ZmxhZ3MmSVA2X1ROTF9GX0NBUF9YTUlUICYmIHAtPmZsYWdzJklQNl9UTkxf Rl9DQVBfUkNWKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRPUE9JTlQ7 DQorCWVsc2UNCisJCWRldi0+ZmxhZ3MgJj0gfklGRl9QT0lOVE9QT0lOVDsN CisNCisJaWYgKHAtPmZsYWdzICYgSVA2X1ROTF9GX0NBUF9YTUlUKSB7DQor CQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0ID0gcnQ2X2xvb2t1cCgmcC0+cmFkZHIs ICZwLT5sYWRkciwNCisJCQkJCQkgcC0+bGluaywgMCk7DQorCQlpZiAocnQp IHsNCisJCQlzdHJ1Y3QgbmV0X2RldmljZSAqcnRkZXY7DQorCQkJaWYgKCEo cnRkZXYgPSBydC0+cnQ2aV9kZXYpIHx8DQorCQkJICAgIHJ0ZGV2LT50eXBl ID09IEFSUEhSRF9UVU5ORUw2KSB7DQorCQkJCS8qIGFzIGxvbmcgYXMgdHVu bmVscyB1c2UgdGhlIHNhbWUgc29ja2V0IA0KKwkJCQkgICBmb3IgdHJhbnNt aXNzaW9uLCBsb2NhbGx5IG5lc3RlZCB0dW5uZWxzIA0KKwkJCQkgICB3b24n dCB3b3JrICovDQorCQkJCWRzdF9yZWxlYXNlKCZydC0+dS5kc3QpOw0KKwkJ CQlnb3RvIG5vX2xpbms7DQorCQkJfSBlbHNlIHsNCisJCQkJZGV2LT5pZmxp bmsgPSBydGRldi0+aWZpbmRleDsNCisJCQkJZGV2LT5oYXJkX2hlYWRlcl9s ZW4gPSBydGRldi0+aGFyZF9oZWFkZXJfbGVuICsNCisJCQkJCXNpemVvZiAo c3RydWN0IGlwdjZoZHIpOw0KKwkJCQlkZXYtPm10dSA9IHJ0ZGV2LT5tdHUg LSBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCQkJaWYgKGRldi0+bXR1 IDwgSVBWNl9NSU5fTVRVKQ0KKwkJCQkJZGV2LT5tdHUgPSBJUFY2X01JTl9N VFU7DQorCQkJCQ0KKwkJCQlkc3RfcmVsZWFzZSgmcnQtPnUuZHN0KTsNCisJ CQl9DQorCQl9DQorCX0gZWxzZSB7DQorCW5vX2xpbms6DQorCQlkZXYtPmlm bGluayA9IDA7DQorCQlkZXYtPmhhcmRfaGVhZGVyX2xlbiA9IExMX01BWF9I RUFERVIgKyBzaXplb2YgKHN0cnVjdCBpcHY2aGRyKTsNCisJCWRldi0+bXR1 ID0gRVRIX0RBVEFfTEVOIC0gc2l6ZW9mIChzdHJ1Y3QgaXB2Nmhkcik7DQor CX0NCit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9jaGFuZ2UgLSB1cGRh dGUgdGhlIHR1bm5lbCBwYXJhbWV0ZXJzDQorICogICBAdDogdHVubmVsIHRv IGJlIGNoYW5nZWQNCisgKiAgIEBwOiB0dW5uZWwgY29uZmlndXJhdGlvbiBw YXJhbWV0ZXJzDQorICogICBAYWN0aXZlOiAhPSAwIGlmIHR1bm5lbCBpcyBy ZWFkeSBmb3IgdXNlDQorICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIGlw NmlwNl90bmxfY2hhbmdlKCkgdXBkYXRlcyB0aGUgdHVubmVsIHBhcmFtZXRl cnMNCisgKiovDQorDQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfY2hhbmdl KHN0cnVjdCBpcDZfdG5sICp0LCBzdHJ1Y3QgaXA2X3RubF9wYXJtICpwKQ0K K3sNCisJaXB2Nl9hZGRyX2NvcHkoJnQtPnBhcm1zLmxhZGRyLCAmcC0+bGFk ZHIpOw0KKwlpcHY2X2FkZHJfY29weSgmdC0+cGFybXMucmFkZHIsICZwLT5y YWRkcik7DQorCXQtPnBhcm1zLmZsYWdzID0gcC0+ZmxhZ3M7DQorCXQtPnBh cm1zLmhvcF9saW1pdCA9IChwLT5ob3BfbGltaXQgPD0gMjU1ID8gcC0+aG9w X2xpbWl0IDogLTEpOw0KKwl0LT5wYXJtcy5lbmNhcF9saW1pdCA9IHAtPmVu Y2FwX2xpbWl0Ow0KKwl0LT5wYXJtcy5mbG93aW5mbyA9IHAtPmZsb3dpbmZv Ow0KKwlpcDZpcDZfdG5sX2xpbmtfY29uZmlnKHQpOw0KKwlyZXR1cm4gMDsN Cit9DQorDQorLyoqDQorICogaXA2aXA2X3RubF9pb2N0bCAtIGNvbmZpZ3Vy ZSBpcHY2IHR1bm5lbHMgZnJvbSB1c2Vyc3BhY2UgDQorICogICBAZGV2OiB2 aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQorICogICBA aWZyOiBwYXJhbWV0ZXJzIHBhc3NlZCBmcm9tIHVzZXJzcGFjZQ0KKyAqICAg QGNtZDogY29tbWFuZCB0byBiZSBwZXJmb3JtZWQNCisgKg0KKyAqIERlc2Ny aXB0aW9uOg0KKyAqICAgaXA2aXA2X3RubF9pb2N0bCgpIGlzIHVzZWQgZm9y IG1hbmFnaW5nIElQdjYgdHVubmVscyANCisgKiAgIGZyb20gdXNlcnNwYWNl LiANCisgKg0KKyAqICAgVGhlIHBvc3NpYmxlIGNvbW1hbmRzIGFyZSB0aGUg Zm9sbG93aW5nOg0KKyAqICAgICAlU0lPQ0dFVFRVTk5FTDogZ2V0IHR1bm5l bCBwYXJhbWV0ZXJzIGZvciBkZXZpY2UNCisgKiAgICAgJVNJT0NBRERUVU5O RUw6IGFkZCB0dW5uZWwgbWF0Y2hpbmcgZ2l2ZW4gdHVubmVsIHBhcmFtZXRl cnMNCisgKiAgICAgJVNJT0NDSEdUVU5ORUw6IGNoYW5nZSB0dW5uZWwgcGFy YW1ldGVycyB0byB0aG9zZSBnaXZlbg0KKyAqICAgICAlU0lPQ0RFTFRVTk5F TDogZGVsZXRlIHR1bm5lbA0KKyAqDQorICogICBUaGUgZmFsbGJhY2sgZGV2 aWNlICJpcDZ0bmwwIiwgY3JlYXRlZCBkdXJpbmcgbW9kdWxlIA0KKyAqICAg aW5pdGlhbGl6YXRpb24sIGNhbiBiZSB1c2VkIGZvciBjcmVhdGluZyBvdGhl ciB0dW5uZWwgZGV2aWNlcy4NCisgKg0KKyAqIFJldHVybjoNCisgKiAgIDAg b24gc3VjY2VzcywNCisgKiAgICUtRUZBVUxUIGlmIHVuYWJsZSB0byBjb3B5 IGRhdGEgdG8gb3IgZnJvbSB1c2Vyc3BhY2UsDQorICogICAlLUVQRVJNIGlm IGN1cnJlbnQgcHJvY2VzcyBoYXNuJ3QgJUNBUF9ORVRfQURNSU4gc2V0DQor ICogICAlLUVJTlZBTCBpZiBwYXNzZWQgdHVubmVsIHBhcmFtZXRlcnMgYXJl IGludmFsaWQsDQorICogICAlLUVFWElTVCBpZiBjaGFuZ2luZyBhIHR1bm5l bCdzIHBhcmFtZXRlcnMgd291bGQgY2F1c2UgYSBjb25mbGljdA0KKyAqICAg JS1FTk9ERVYgaWYgYXR0ZW1wdGluZyB0byBjaGFuZ2Ugb3IgZGVsZXRlIGEg bm9uZXhpc3RpbmcgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBpbnQNCitp cDZpcDZfdG5sX2lvY3RsKHN0cnVjdCBuZXRfZGV2aWNlICpkZXYsIHN0cnVj dCBpZnJlcSAqaWZyLCBpbnQgY21kKQ0KK3sNCisJaW50IGVyciA9IDA7DQor CWludCBjcmVhdGU7DQorCXN0cnVjdCBpcDZfdG5sX3Bhcm0gcDsNCisJc3Ry dWN0IGlwNl90bmwgKnQgPSBOVUxMOw0KKw0KKwlzd2l0Y2ggKGNtZCkgew0K KwljYXNlIFNJT0NHRVRUVU5ORUw6DQorCQlpZiAoZGV2ID09ICZpcDZpcDZf ZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5X2Zyb21fdXNlcigmcCwNCisJ CQkJCSAgIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLA0KKwkJCQkJICAgc2l6 ZW9mIChwKSkpIHsNCisJCQkJZXJyID0gLUVGQVVMVDsNCisJCQkJYnJlYWs7 DQorCQkJfQ0KKwkJCWlmICgoZXJyID0gaXA2aXA2X3RubF9sb2NhdGUoJnAs ICZ0LCAwKSkgPT0gLUVOT0RFVikNCisJCQkJdCA9IChzdHJ1Y3QgaXA2X3Ru bCAqKSBkZXYtPnByaXY7DQorCQkJZWxzZSBpZiAoZXJyKQ0KKwkJCQlicmVh azsNCisJCX0gZWxzZQ0KKwkJCXQgPSAoc3RydWN0IGlwNl90bmwgKikgZGV2 LT5wcml2Ow0KKw0KKwkJbWVtY3B5KCZwLCAmdC0+cGFybXMsIHNpemVvZiAo cCkpOw0KKwkJaWYgKGNvcHlfdG9fdXNlcihpZnItPmlmcl9pZnJ1LmlmcnVf ZGF0YSwgJnAsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsN CisJCX0NCisJCWJyZWFrOw0KKwljYXNlIFNJT0NBRERUVU5ORUw6DQorCWNh c2UgU0lPQ0NIR1RVTk5FTDoNCisJCWVyciA9IC1FUEVSTTsNCisJCWNyZWF0 ZSA9IChjbWQgPT0gU0lPQ0FERFRVTk5FTCk7DQorCQlpZiAoIWNhcGFibGUo Q0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorCQlpZiAoY29weV9mcm9t X3VzZXIoJnAsIGlmci0+aWZyX2lmcnUuaWZydV9kYXRhLCBzaXplb2YgKHAp KSkgew0KKwkJCWVyciA9IC1FRkFVTFQ7DQorCQkJYnJlYWs7DQorCQl9DQor CQlpZiAoIWNyZWF0ZSAmJiBkZXYgIT0gJmlwNmlwNl9mYl90bmxfZGV2KSB7 DQorCQkJdCA9IChzdHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9 DQorCQlpZiAoIXQgJiYgKGVyciA9IGlwNmlwNl90bmxfbG9jYXRlKCZwLCAm dCwgY3JlYXRlKSkpIHsNCisJCQlicmVhazsNCisJCX0NCisJCWlmIChjbWQg PT0gU0lPQ0NIR1RVTk5FTCkgew0KKwkJCWlmICh0LT5kZXYgIT0gZGV2KSB7 DQorCQkJCWVyciA9IC1FRVhJU1Q7DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJ CQlpcDZpcDZfdG5sX3VubGluayh0KTsNCisJCQllcnIgPSBpcDZpcDZfdG5s X2NoYW5nZSh0LCAmcCk7DQorCQkJaXA2aXA2X3RubF9saW5rKHQpOw0KKwkJ CW5ldGRldl9zdGF0ZV9jaGFuZ2UoZGV2KTsNCisJCX0NCisJCWlmIChjb3B5 X3RvX3VzZXIoaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJCSAmdC0+ cGFybXMsIHNpemVvZiAocCkpKSB7DQorCQkJZXJyID0gLUVGQVVMVDsNCisJ CX0gZWxzZSB7DQorCQkJZXJyID0gMDsNCisJCX0NCisJCWJyZWFrOw0KKwlj YXNlIFNJT0NERUxUVU5ORUw6DQorCQllcnIgPSAtRVBFUk07DQorCQlpZiAo IWNhcGFibGUoQ0FQX05FVF9BRE1JTikpDQorCQkJYnJlYWs7DQorDQorCQlp ZiAoZGV2ID09ICZpcDZpcDZfZmJfdG5sX2Rldikgew0KKwkJCWlmIChjb3B5 X2Zyb21fdXNlcigmcCwgaWZyLT5pZnJfaWZydS5pZnJ1X2RhdGEsDQorCQkJ CQkgICBzaXplb2YgKHApKSkgew0KKwkJCQllcnIgPSAtRUZBVUxUOw0KKwkJ CQlicmVhazsNCisJCQl9DQorCQkJZXJyID0gaXA2aXA2X3RubF9sb2NhdGUo JnAsICZ0LCAwKTsNCisJCQlpZiAoZXJyKQ0KKwkJCQlicmVhazsNCisJCQlp ZiAodCA9PSAmaXA2aXA2X2ZiX3RubCkgew0KKwkJCQllcnIgPSAtRVBFUk07 DQorCQkJCWJyZWFrOw0KKwkJCX0NCisJCX0gZWxzZSB7DQorCQkJdCA9IChz dHJ1Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCQl9DQorCQllcnIgPSBp cDZfdG5sX2Rlc3Ryb3kodCk7DQorCQlicmVhazsNCisJZGVmYXVsdDoNCisJ CWVyciA9IC1FSU5WQUw7DQorCX0NCisJcmV0dXJuIGVycjsNCit9DQorDQor LyoqDQorICogaXA2aXA2X3RubF9nZXRfc3RhdHMgLSByZXR1cm4gdGhlIHN0 YXRzIGZvciB0dW5uZWwgZGV2aWNlIA0KKyAqICAgQGRldjogdmlydHVhbCBk ZXZpY2UgYXNzb2NpYXRlZCB3aXRoIHR1bm5lbA0KKyAqDQorICogUmV0dXJu OiBzdGF0cyBmb3IgZGV2aWNlDQorICoqLw0KKw0KK3N0YXRpYyBzdHJ1Y3Qg bmV0X2RldmljZV9zdGF0cyAqDQoraXA2aXA2X3RubF9nZXRfc3RhdHMoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCXJldHVybiAmKCgoc3RydWN0 IGlwNl90bmwgKikgZGV2LT5wcml2KS0+c3RhdCk7DQorfQ0KKw0KKy8qKg0K KyAqIGlwNmlwNl90bmxfY2hhbmdlX210dSAtIGNoYW5nZSBtdHUgbWFudWFs bHkgZm9yIHR1bm5lbCBkZXZpY2UNCisgKiAgIEBkZXY6IHZpcnR1YWwgZGV2 aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiAgIEBuZXdfbXR1OiB0 aGUgbmV3IG10dQ0KKyAqDQorICogUmV0dXJuOg0KKyAqICAgMCBvbiBzdWNj ZXNzLA0KKyAqICAgJS1FSU5WQUwgaWYgbXR1IHRvbyBzbWFsbA0KKyAqKi8N CisNCitzdGF0aWMgaW50DQoraXA2aXA2X3RubF9jaGFuZ2VfbXR1KHN0cnVj dCBuZXRfZGV2aWNlICpkZXYsIGludCBuZXdfbXR1KQ0KK3sNCisJaWYgKG5l d19tdHUgPCBJUFY2X01JTl9NVFUpIHsNCisJCXJldHVybiAtRUlOVkFMOw0K Kwl9DQorCWRldi0+bXR1ID0gbmV3X210dTsNCisJcmV0dXJuIDA7DQorfQ0K Kw0KKy8qKg0KKyAqIGlwNmlwNl90bmxfZGV2X2luaXRfZ2VuIC0gZ2VuZXJh bCBpbml0aWFsaXplciBmb3IgYWxsIHR1bm5lbCBkZXZpY2VzDQorICogICBA ZGV2OiB2aXJ0dWFsIGRldmljZSBhc3NvY2lhdGVkIHdpdGggdHVubmVsDQor ICoNCisgKiBEZXNjcmlwdGlvbjoNCisgKiAgIFNldCBmdW5jdGlvbiBwb2lu dGVycyBhbmQgaW5pdGlhbGl6ZSB0aGUgJnN0cnVjdCBmbG93aSB0ZW1wbGF0 ZSB1c2VkDQorICogICBieSB0aGUgdHVubmVsLg0KKyAqKi8NCisNCitzdGF0 aWMgdm9pZA0KK2lwNmlwNl90bmxfZGV2X2luaXRfZ2VuKHN0cnVjdCBuZXRf ZGV2aWNlICpkZXYpDQorew0KKwlzdHJ1Y3QgaXA2X3RubCAqdCA9IChzdHJ1 Y3QgaXA2X3RubCAqKSBkZXYtPnByaXY7DQorCXN0cnVjdCBmbG93aSAqZmwg PSAmdC0+Zmw7DQorDQorCW1lbXNldChmbCwgMCwgc2l6ZW9mICgqZmwpKTsN CisJZmwtPnByb3RvID0gSVBQUk9UT19JUFY2Ow0KKw0KKwlkZXYtPmRlc3Ry dWN0b3IgPSBpcDZpcDZfdG5sX2Rldl9kZXN0cnVjdG9yOw0KKwlkZXYtPnVu aW5pdCA9IGlwNmlwNl90bmxfZGV2X3VuaW5pdDsNCisJZGV2LT5oYXJkX3N0 YXJ0X3htaXQgPSBpcDZpcDZfdG5sX3htaXQ7DQorCWRldi0+Z2V0X3N0YXRz ID0gaXA2aXA2X3RubF9nZXRfc3RhdHM7DQorCWRldi0+ZG9faW9jdGwgPSBp cDZpcDZfdG5sX2lvY3RsOw0KKwlkZXYtPmNoYW5nZV9tdHUgPSBpcDZpcDZf dG5sX2NoYW5nZV9tdHU7DQorCWRldi0+dHlwZSA9IEFSUEhSRF9UVU5ORUw2 Ow0KKwlkZXYtPmZsYWdzIHw9IElGRl9OT0FSUDsNCisJaWYgKGlwdjZfYWRk cl90eXBlKCZ0LT5wYXJtcy5yYWRkcikgJiBJUFY2X0FERFJfVU5JQ0FTVCAm Jg0KKwkgICAgaXB2Nl9hZGRyX3R5cGUoJnQtPnBhcm1zLmxhZGRyKSAmIElQ VjZfQUREUl9VTklDQVNUKQ0KKwkJZGV2LT5mbGFncyB8PSBJRkZfUE9JTlRP UE9JTlQ7DQorCS8qIEhtbS4uLiBNQVhfQUREUl9MRU4gaXMgOCwgc28gdGhl IGlwdjYgYWRkcmVzc2VzIGNhbid0IGJlIA0KKwkgICBjb3BpZWQgdG8gZGV2 LT5kZXZfYWRkciBhbmQgZGV2LT5icm9hZGNhc3QsIGxpa2UgdGhlIGlwdjQN CisJICAgYWRkcmVzc2VzIHdlcmUgaW4gaXBpcC5jLCBpcF9ncmUuYyBhbmQg c2l0LmMuICovDQorCWRldi0+YWRkcl9sZW4gPSAwOw0KK30NCisNCisvKioN CisgKiBpcDZpcDZfdG5sX2Rldl9pbml0IC0gaW5pdGlhbGl6ZXIgZm9yIGFs bCBub24gZmFsbGJhY2sgdHVubmVsIGRldmljZXMNCisgKiAgIEBkZXY6IHZp cnR1YWwgZGV2aWNlIGFzc29jaWF0ZWQgd2l0aCB0dW5uZWwNCisgKiovDQor DQorc3RhdGljIGludA0KK2lwNmlwNl90bmxfZGV2X2luaXQoc3RydWN0IG5l dF9kZXZpY2UgKmRldikNCit7DQorCXN0cnVjdCBpcDZfdG5sICp0ID0gKHN0 cnVjdCBpcDZfdG5sICopIGRldi0+cHJpdjsNCisJaXA2aXA2X3RubF9kZXZf aW5pdF9nZW4oZGV2KTsNCisJaXA2aXA2X3RubF9saW5rX2NvbmZpZyh0KTsN CisJcmV0dXJuIDA7DQorfQ0KKw0KKy8qKg0KKyAqIGlwNmlwNl9mYl90bmxf ZGV2X2luaXQgLSBpbml0aWFsaXplciBmb3IgZmFsbGJhY2sgdHVubmVsIGRl dmljZQ0KKyAqICAgQGRldjogZmFsbGJhY2sgZGV2aWNlDQorICoNCisgKiBS ZXR1cm46IDANCisgKiovDQorDQoraW50IGlwNmlwNl9mYl90bmxfZGV2X2lu aXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikNCit7DQorCWlwNmlwNl90bmxf ZGV2X2luaXRfZ2VuKGRldik7DQorCXRubHNfd2NbMF0gPSAmaXA2aXA2X2Zi X3RubDsNCisJcmV0dXJuIDA7DQorfQ0KKw0KK3N0YXRpYyBzdHJ1Y3QgaW5l dDZfcHJvdG9jb2wgaXA2aXA2X3Byb3RvY29sID0gew0KKwkuaGFuZGxlciA9 IGlwNmlwNl9yY3YsDQorCS5lcnJfaGFuZGxlciA9IGlwNmlwNl9lcnIsDQor CS5mbGFncyA9IElORVQ2X1BST1RPX0ZJTkFMDQorfTsNCisNCisvKioNCisg KiBpcDZfdHVubmVsX2luaXQgLSByZWdpc3RlciBwcm90b2NvbCBhbmQgcmVz ZXJ2ZSBuZWVkZWQgcmVzb3VyY2VzDQorICoNCisgKiBSZXR1cm46IDAgb24g c3VjY2Vzcw0KKyAqKi8NCisNCitpbnQgX19pbml0IGlwNl90dW5uZWxfaW5p dCh2b2lkKQ0KK3sNCisJaW50IGksIGosIGVycjsNCisJc3RydWN0IHNvY2sg KnNrOw0KKwlzdHJ1Y3QgaXB2Nl9waW5mbyAqbnA7DQorDQorCWlwNmlwNl9m Yl90bmxfZGV2LnByaXYgPSAodm9pZCAqKSAmaXA2aXA2X2ZiX3RubDsNCisN CisJZm9yIChpID0gMDsgaSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFj cHVfcG9zc2libGUoaSkpDQorCQkJY29udGludWU7DQorDQorCQllcnIgPSBz b2NrX2NyZWF0ZShQRl9JTkVUNiwgU09DS19SQVcsIElQUFJPVE9fSVBWNiwg DQorCQkJCSAgJl9faXA2X3NvY2tldFtpXSk7DQorCQlpZiAoZXJyIDwgMCkg ew0KKwkJCXByaW50ayhLRVJOX0VSUiANCisJCQkgICAgICAgIkZhaWxlZCB0 byBjcmVhdGUgdGhlIElQdjYgdHVubmVsIHNvY2tldCAiDQorCQkJICAgICAg ICIoZXJyICVkKS5cbiIsIA0KKwkJCSAgICAgICBlcnIpOw0KKwkJCWdvdG8g ZmFpbDsNCisJCX0NCisJCXNrID0gX19pcDZfc29ja2V0W2ldLT5zazsNCisJ CXNrLT5za19hbGxvY2F0aW9uID0gR0ZQX0FUT01JQzsNCisNCisJCW5wID0g aW5ldDZfc2soc2spOw0KKwkJbnAtPmhvcF9saW1pdCA9IDI1NTsNCisJCW5w LT5tY19sb29wID0gMDsNCisNCisJCXNrLT5za19wcm90LT51bmhhc2goc2sp Ow0KKwl9DQorCWlmICgoZXJyID0gaW5ldDZfYWRkX3Byb3RvY29sKCZpcDZp cDZfcHJvdG9jb2wsIElQUFJPVE9fSVBWNikpIDwgMCkgew0KKwkJcHJpbnRr KEtFUk5fRVJSICJGYWlsZWQgdG8gcmVnaXN0ZXIgSVB2NiBwcm90b2NvbFxu Iik7DQorCQlnb3RvIGZhaWw7DQorCX0NCisNCisJU0VUX01PRFVMRV9PV05F UigmaXA2aXA2X2ZiX3RubF9kZXYpOw0KKwlyZWdpc3Rlcl9uZXRkZXYoJmlw NmlwNl9mYl90bmxfZGV2KTsNCisNCisJcmV0dXJuIDA7DQorZmFpbDoNCisJ Zm9yIChqID0gMDsgaiA8IGk7IGorKykgew0KKwkJaWYgKCFjcHVfcG9zc2li bGUoaikpDQorCQkJY29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZf c29ja2V0W2pdKTsNCisJCV9faXA2X3NvY2tldFtqXSA9IE5VTEw7DQorCX0N CisJcmV0dXJuIGVycjsNCit9DQorDQorLyoqDQorICogaXA2X3R1bm5lbF9j bGVhbnVwIC0gZnJlZSByZXNvdXJjZXMgYW5kIHVucmVnaXN0ZXIgcHJvdG9j b2wNCisgKiovDQorDQordm9pZCBpcDZfdHVubmVsX2NsZWFudXAodm9pZCkN Cit7DQorCWludCBpOw0KKw0KKwl1bnJlZ2lzdGVyX25ldGRldigmaXA2aXA2 X2ZiX3RubF9kZXYpOw0KKw0KKwlpbmV0Nl9kZWxfcHJvdG9jb2woJmlwNmlw Nl9wcm90b2NvbCwgSVBQUk9UT19JUFY2KTsNCisNCisJZm9yIChpID0gMDsg aSA8IE5SX0NQVVM7IGkrKykgew0KKwkJaWYgKCFjcHVfcG9zc2libGUoaSkp DQorCQkJY29udGludWU7DQorCQlzb2NrX3JlbGVhc2UoX19pcDZfc29ja2V0 W2ldKTsNCisJCV9faXA2X3NvY2tldFtpXSA9IE5VTEw7DQorCX0NCit9DQor DQorI2lmZGVmIE1PRFVMRQ0KK21vZHVsZV9pbml0KGlwNl90dW5uZWxfaW5p dCk7DQorbW9kdWxlX2V4aXQoaXA2X3R1bm5lbF9jbGVhbnVwKTsNCisjZW5k aWYNCmRpZmYgLU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1ZGU9Qml0S2Vl cGVyIC0tZXhjbHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25ldC9pcHY2L2lw djZfc3ltcy5jIG1lcmdlLTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYw0KLS0t IGxpbnV4LTIuNS9uZXQvaXB2Ni9pcHY2X3N5bXMuYwlNb24gSnVuICA5IDA5 OjExOjI1IDIwMDMNCisrKyBtZXJnZS0yLjUvbmV0L2lwdjYvaXB2Nl9zeW1z LmMJTW9uIEp1biAgOSAxMDozNjo0NiAyMDAzDQpAQCAtMzgsMyArMzgsOSBA QA0KIEVYUE9SVF9TWU1CT0woaXA2X2ZpbmRfMXN0ZnJhZ29wdCk7DQogRVhQ T1JUX1NZTUJPTCh4ZnJtNl9yY3YpOw0KIEVYUE9SVF9TWU1CT0woeGZybTZf Y2xlYXJfbXV0YWJsZV9vcHRpb25zKTsNCitFWFBPUlRfU1lNQk9MKHJ0Nl9s b29rdXApOw0KK0VYUE9SVF9TWU1CT0woZmw2X3NvY2tfbG9va3VwKTsNCitF WFBPUlRfU1lNQk9MKGlwdjZfZXh0X2hkcik7DQorRVhQT1JUX1NZTUJPTChp cDZfYXBwZW5kX2RhdGEpOw0KK0VYUE9SVF9TWU1CT0woaXA2X2ZsdXNoX3Bl bmRpbmdfZnJhbWVzKTsNCitFWFBPUlRfU1lNQk9MKGlwNl9wdXNoX3BlbmRp bmdfZnJhbWVzKTsNCmRpZmYgLU51ciAtLWV4Y2x1ZGU9U0NDUyAtLWV4Y2x1 ZGU9Qml0S2VlcGVyIC0tZXhjbHVkZT1DaGFuZ2VTZXQgbGludXgtMi41L25l dC9uZXRzeW1zLmMgbWVyZ2UtMi41L25ldC9uZXRzeW1zLmMNCi0tLSBsaW51 eC0yLjUvbmV0L25ldHN5bXMuYwlNb24gSnVuICA5IDA4OjQ1OjU3IDIwMDMN CisrKyBtZXJnZS0yLjUvbmV0L25ldHN5bXMuYwlNb24gSnVuICA5IDEwOjM4 OjE2IDIwMDMNCkBAIC00NzcsOCArNDc3LDEwIEBADQogRVhQT1JUX1NZTUJP TChzeXNjdGxfbWF4X3N5bl9iYWNrbG9nKTsNCiAjZW5kaWYNCiANCi1FWFBP UlRfU1lNQk9MKGlwX2dlbmVyaWNfZ2V0ZnJhZyk7DQorI2VuZGlmDQogDQor I2lmIGRlZmluZWQgKENPTkZJR19JUFY2X01PRFVMRSkgfHwgZGVmaW5lZCAo Q09ORklHX0lQX1NDVFBfTU9EVUxFKSB8fCBkZWZpbmVkIChDT05GSUdfSVBW Nl9UVU5ORUxfTU9EVUxFKQ0KK0VYUE9SVF9TWU1CT0woaXBfZ2VuZXJpY19n ZXRmcmFnKTsNCiAjZW5kaWYNCiANCiBFWFBPUlRfU1lNQk9MKHRjcF9yZWFk X3NvY2spOw0K ---377318441-1375410448-1055158991=:13811-- From davem@redhat.com Mon Jun 9 04:58:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 04:58:58 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Bwr2x023287 for ; Mon, 9 Jun 2003 04:58:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id EAA18251; Mon, 9 Jun 2003 04:55:48 -0700 Date: Mon, 09 Jun 2003 04:55:47 -0700 (PDT) Message-Id: <20030609.045547.91327851.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609072227.R34462@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> <20030609072227.R34462@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2996 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) Yes, you have a nice setup and thats why you should test all the patches DaveM is posting. Dave, Paul is running in a real ISP environment i think he is very valuable in helping to test these patches and collect any says that might be needed. Now watch him disapear ;-> If he doesn't test my patches he isn't very useful, so we'll see :-) Additional thought Dave: i think prefetching the rth would help in 2.5 at least when you have lotsa collisions. call prefetch(nextrth) right after smp_read_barrier_depends() everywhere in route.c You're going to prefetch "nextrth" when the first thing we're going to access is "&nextrth->fl"? :-) It only makes sense to prefetch the 'fl' member of the first hash chain entry and that's what I've done in my tree. This points out that it would make sense to put the struct flowi up into the dst entry. From hch@lst.de Mon Jun 9 05:06:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:06:18 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [212.34.189.10]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59C662x023725 for ; Mon, 9 Jun 2003 05:06:07 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-6.4) with ESMTP id h59C64DC031415 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Mon, 9 Jun 2003 14:06:04 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.3) id h59C63tA031413 for netdev@oss.sgi.com; Mon, 9 Jun 2003 14:06:03 +0200 Date: Mon, 9 Jun 2003 14:06:03 +0200 From: Christoph Hellwig To: netdev@oss.sgi.com Subject: [PATCH] switch skfp over to initcalls Message-ID: <20030609120603.GA31393@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Score: -3 () PATCH_UNIFIED_DIFF,USER_AGENT_MUTT X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) X-archive-position: 2997 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) --- 1.20/drivers/net/Space.c Wed May 21 03:56:26 2003 +++ edited/drivers/net/Space.c Tue Jun 3 22:17:09 2003 @@ -105,9 +105,6 @@ /* Detachable devices ("pocket adaptors") */ extern int de620_probe(struct net_device *); -/* FDDI adapters */ -extern int skfp_probe(struct net_device *dev); - /* Fibre Channel adapters */ extern int iph5526_probe(struct net_device *dev); @@ -401,29 +398,6 @@ return -ENODEV; } -#ifdef CONFIG_FDDI -static int __init fddiif_probe(struct net_device *dev) -{ - unsigned long base_addr = dev->base_addr; - - if (base_addr == 1) - return 1; /* ENXIO */ - - if (1 -#ifdef CONFIG_APFDDI - && apfddi_init(dev) -#endif -#ifdef CONFIG_SKFP - && skfp_probe(dev) -#endif - && 1 ) { - return 1; /* -ENODEV or -EAGAIN would be more accurate. */ - } - return 0; -} -#endif - - #ifdef CONFIG_NET_FC static int fcif_probe(struct net_device *dev) { @@ -614,52 +588,6 @@ #define NEXT_DEV (&tr0_dev) #endif - -#ifdef CONFIG_FDDI -static struct net_device fddi7_dev = { - .name = "fddi7", - .next = NEXT_DEV, - .init = fddiif_probe -}; -static struct net_device fddi6_dev = { - .name = "fddi6", - .next = &fddi7_dev, - .init = fddiif_probe -}; -static struct net_device fddi5_dev = { - .name = "fddi5", - .next = &fddi6_dev, - .init = fddiif_probe -}; -static struct net_device fddi4_dev = { - .name = "fddi4", - .next = &fddi5_dev, - .init = fddiif_probe -}; -static struct net_device fddi3_dev = { - .name = "fddi3", - .next = &fddi4_dev, - .init = fddiif_probe -}; -static struct net_device fddi2_dev = { - .name = "fddi2", - .next = &fddi3_dev, - .init = fddiif_probe -}; -static struct net_device fddi1_dev = { - .name = "fddi1", - .next = &fddi2_dev, - .init = fddiif_probe -}; -static struct net_device fddi0_dev = { - .name = "fddi0", - .next = &fddi1_dev, - .init = fddiif_probe -}; -#undef NEXT_DEV -#define NEXT_DEV (&fddi0_dev) -#endif - #ifdef CONFIG_NET_FC static struct net_device fc1_dev = { --- 1.12/drivers/net/skfp/skfddi.c Fri May 9 02:40:17 2003 +++ edited/drivers/net/skfp/skfddi.c Tue Jun 3 22:19:04 2003 @@ -2539,72 +2539,25 @@ } // drv_reset_indication - -//--------------- functions for use as a module ---------------- - -#ifdef MODULE -/************************ - * - * Note now that module autoprobing is allowed under PCI. The - * IRQ lines will not be auto-detected; instead I'll rely on the BIOSes - * to "do the right thing". - * - ************************/ -#define LP(a) ((struct s_smc*)(a)) static struct net_device *mdev; -/************************ - * - * init_module - * - * If compiled as a module, find - * adapters and initialize them. - * - ************************/ -int init_module(void) +static int __init skfd_init(void) { struct net_device *p; - PRINTK(KERN_INFO "FDDI init module\n"); if ((mdev = insert_device(NULL, skfp_probe)) == NULL) return -ENOMEM; - for (p = mdev; p != NULL; p = LP(p->priv)->os.next_module) { - PRINTK(KERN_INFO "device to register: %s\n", p->name); + for (p = mdev; p != NULL; p = ((struct s_smc *)p->priv)->os.next_module) { if (register_netdev(p) != 0) { printk("skfddi init_module failed\n"); return -EIO; } } - PRINTK(KERN_INFO "+++++ exit with success +++++\n"); return 0; -} // init_module +} -/************************ - * - * cleanup_module - * - * Release all resources claimed by this module. - * - ************************/ -void cleanup_module(void) -{ - PRINTK(KERN_INFO "cleanup_module\n"); - while (mdev != NULL) { - mdev = unlink_modules(mdev); - } - return; -} // cleanup_module - - -/************************ - * - * unlink_modules - * - * Unregister devices and release their memory. - * - ************************/ static struct net_device *unlink_modules(struct net_device *p) { struct net_device *next = NULL; @@ -2638,5 +2591,11 @@ return next; } // unlink_modules +static void __exit skfd_exit(void) +{ + while (mdev) + mdev = unlink_modules(mdev); +} -#endif /* MODULE */ +module_init(skfd_init); +module_exit(skfd_exit); From hadi@shell.cyberus.ca Mon Jun 9 05:19:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:19:24 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59CJE2x024134 for ; Mon, 9 Jun 2003 05:19:15 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PLbu-0008zh-Lh; Mon, 09 Jun 2003 08:18:50 -0400 Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) From: Jamal Hadi To: "David S. Miller" cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.045547.91327851.davem@redhat.com> Message-ID: <20030609080430.I34540@shell.cyberus.ca> References: <000401c32e5e$a707b6d0$4a00000a@badass> <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2998 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, David S. Miller wrote: > From: Jamal Hadi > Date: Mon, 9 Jun 2003 07:38:44 -0400 (EDT) > > Yes, you have a nice setup and thats why you should test all the patches > DaveM is posting. Dave, Paul is running in a real ISP environment i think > he is very valuable in helping to test these patches and collect > any says that might be needed. Now watch him disapear ;-> > > If he doesn't test my patches he isn't very useful, > so we'll see :-) Ok foo the pressure in on you now ;-> You wanna see things fixed then run the damn tests or stop bitching ;-> > You're going to prefetch "nextrth" when the first thing we're > going to access is "&nextrth->fl"? :-) > > It only makes sense to prefetch the 'fl' member of the first hash > chain entry and that's what I've done in my tree. This points out > that it would make sense to put the struct flowi up into the dst > entry. yes moving the flowi up makes more sense. I found in my tests with a ethernet driver that prefetching the _next_ dma descriptor gave better numbers than prefetching the current one but i didnt spend too much time. I am going to revisit this. Good thought on rearranging the structure, may help with the descriptors as well. cheers, jamal From davem@redhat.com Mon Jun 9 05:35:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:35:40 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59CZT2x024615 for ; Mon, 9 Jun 2003 05:35:31 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id FAA18415; Mon, 9 Jun 2003 05:32:19 -0700 Date: Mon, 09 Jun 2003 05:32:18 -0700 (PDT) Message-Id: <20030609.053218.54202815.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609080430.I34540@shell.cyberus.ca> References: <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> <20030609080430.I34540@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2999 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) I found in my tests with a ethernet driver that prefetching the _next_ dma descriptor gave better numbers than prefetching the current one but i didnt spend too much time. Two issues: 1) We have some cycles to borrow for head entry, we can make prefetch right before rcu_read_lock() 2) Ideally, hash chains will not exceed 1 (2 at the max) entries. Just some thinking... From nakam@linux-ipv6.org Mon Jun 9 05:39:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 05:39:47 -0700 (PDT) Received: from localhost ([203.178.141.107]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Cdb2x024938 for ; Mon, 9 Jun 2003 05:39:38 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19PKxT-00017F-00; Mon, 09 Jun 2003 20:37:03 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= , , , , , , , , Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030609203659.089b241b.nakam@linux-ipv6.org> In-Reply-To: References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Mon, 09 Jun 2003 20:37:03 +0900 X-archive-position: 3000 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 12:06:35 +0300 (EEST) Henrik Petander wrote: > On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: > > > > We don't think we have to change the logic handling policy with > > the reason because we can treat MIPv6 policy just like IPsec. > > > > When we want to apply both MIPv6 and IPsec to the same target, > > we need one policy that has two or more of templates(e.g. one is > > MIPv6's template and the other is IPsec's). > > Does this also mean that the IPSec and MIPv6 policies and SAs need to be > configured at the same time or is it possible to add templates to an > existing policy? Currently no interface to add templates directly to it. : > A different issue related to the different addresses is that the SPD > lookup should be done with the original source address, i.e. home address, > if home address option is used and with the final destination address, if > routing header is used. SPD lookup works now for TCP (with RT header), but > not for raw sockets, which the mipv6 daemon will use. We will provide a > patch for fixing the SPD lookups with raw sockets, which add routing > header and home address option from socket options. > Ok, I want to see your patch when it is provided because now I'm not so clear about using socket option in the above case. Regards, -- Masahide NAKAMURA From ralph@istop.com Mon Jun 9 06:04:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:04:27 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59D472x026506 for ; Mon, 9 Jun 2003 06:04:18 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 4393636AEB; Mon, 9 Jun 2003 09:04:06 -0400 (EDT) Date: Mon, 9 Jun 2003 09:04:06 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'Florian Weimer'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030608230300.X33412@shell.cyberus.ca> Message-ID: References: <001001c32e19$81bc7ea0$4a00000a@badass> <20030608230300.X33412@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3001 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Jamal Hadi wrote: > I am sure there are people who will like to sell you linux devices > at half the cisco prices doing Millions of PPS via hardware assists. > Support these linux supporting companies instead ;-> Are you serious? Who is making these boxes? -Ralph From hadi@shell.cyberus.ca Mon Jun 9 06:22:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:22:43 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59DMa2x028671 for ; Mon, 9 Jun 2003 06:22:37 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PMbD-00091q-Ju; Mon, 09 Jun 2003 09:22:11 -0400 Date: Mon, 9 Jun 2003 09:22:11 -0400 (EDT) From: Jamal Hadi To: "David S. Miller" cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.053218.54202815.davem@redhat.com> Message-ID: <20030609091907.Y34702@shell.cyberus.ca> References: <20030609072227.R34462@shell.cyberus.ca> <20030609.045547.91327851.davem@redhat.com> <20030609080430.I34540@shell.cyberus.ca> <20030609.053218.54202815.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3002 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, David S. Miller wrote: > From: Jamal Hadi > Date: Mon, 9 Jun 2003 08:18:50 -0400 (EDT) > > I found in my tests with a ethernet driver that prefetching the > _next_ dma descriptor gave better numbers than prefetching the > current one but i didnt spend too much time. > > Two issues: > > 1) We have some cycles to borrow for head entry, we can make > prefetch right before rcu_read_lock() > > 2) Ideally, hash chains will not exceed 1 (2 at the max) > entries. > I dont think youll see much benefit with 1 or 2 entries. I was thinking more along the lines of people with over 100K entries total; Let me run with this and get back to you. cheers, jamal From davem@redhat.com Mon Jun 9 06:25:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 06:25:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59DPM2x029047 for ; Mon, 9 Jun 2003 06:25:23 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id GAA18572; Mon, 9 Jun 2003 06:22:17 -0700 Date: Mon, 09 Jun 2003 06:22:17 -0700 (PDT) Message-Id: <20030609.062217.48383829.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609091907.Y34702@shell.cyberus.ca> References: <20030609080430.I34540@shell.cyberus.ca> <20030609.053218.54202815.davem@redhat.com> <20030609091907.Y34702@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3003 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 09:22:11 -0400 (EDT) I dont think youll see much benefit with 1 or 2 entries. I was thinking more along the lines of people with over 100K entries total; You simply don't want the chains to get that long. In my experience, even with prefetching tricks, past 2 or 3 entry deep hash chains you run into serious problems. TCP has the same issue BTW, in fact DoS-like behavior is the common thing there. Every time you create a new TCP connection on a server it's exactly like a routing cache miss. Let me run with this and get back to you. Ok. From ralph@istop.com Mon Jun 9 07:06:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:06:31 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59E6E2x030168 for ; Mon, 9 Jun 2003 07:06:15 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 1D94736952; Mon, 9 Jun 2003 09:28:59 -0400 (EDT) Date: Mon, 9 Jun 2003 09:28:59 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: CIT/Paul , "'Florian Weimer'" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030609064719.GA20613@netnation.com> Message-ID: References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3004 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Sun, 8 Jun 2003, Simon Kirby wrote: > You got a 7200 VXR to do 300kpps? I would have liked to see that. > We couldn't get our 7206 VXR routers to do anything more than about 12 > Mbit/second of small packets, which I believe is about 40,000 packets > per second. This is with CEF disabled, because it ended up duplicating > packets and doing some other strange things with CEF enabled. The trick is finding the good IOS revs. 12.0(7)T and 12.2(11)T have been good ones for me. Finding other ISPs running ciscos to exchange tips and ideas has been much easier than finding folks running linux. A sure-fire way to get flamed is to post to NANOG asking what's the best Linux router setup! For most ISPs it's better to spend $20K on a 7206VXR/NPE-G1 than to spend days trying to figure out what kernel + patch set, NIC, and motherboard combination will squeeze the best performance out of a PC router. And once you've done that you still have zebra quirks to worry about... -Ralph From davem@redhat.com Mon Jun 9 07:17:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:18:06 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59EHv2x030561 for ; Mon, 9 Jun 2003 07:17:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA18814; Mon, 9 Jun 2003 07:14:52 -0700 Date: Mon, 09 Jun 2003 07:14:51 -0700 (PDT) Message-Id: <20030609.071451.108794109.davem@redhat.com> To: sim@netnation.com Cc: xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, hadi@shell.cyberus.ca, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609081803.GF20613@netnation.com> References: <20030609065211.GB20613@netnation.com> <20030608.235622.38700262.davem@redhat.com> <20030609081803.GF20613@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3005 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Ok Simon/Robert/Mr.Foo :), give this a try, it's my final installment for the evening :-) If this shows improvement, we can make even larger strides by moving the struct flowi up into struct dst_entry. --- net/core/dst.c.~1~ Mon Jun 9 01:47:26 2003 +++ net/core/dst.c Mon Jun 9 03:13:56 2003 @@ -122,13 +122,34 @@ void * dst_alloc(struct dst_ops * ops) dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); if (!dst) return NULL; - memset(dst, 0, ops->entry_size); + dst->next = NULL; atomic_set(&dst->__refcnt, 0); - dst->ops = ops; + dst->__use = 0; + dst->child = NULL; + dst->dev = NULL; + dst->obsolete = 0; + dst->flags = 0; dst->lastuse = jiffies; + dst->expires = 0; + dst->header_len = 0; + dst->trailer_len = 0; + memset(dst->metrics, 0, sizeof(dst->metrics)); dst->path = dst; + dst->rate_last = 0; + dst->rate_tokens = 0; + dst->error = 0; + dst->neighbour = NULL; + dst->hh = NULL; + dst->xfrm = NULL; dst->input = dst_discard; dst->output = dst_blackhole; +#ifdef CONFIG_NET_CLS_ROUTE + dst->tclassid = 0; +#endif + dst->ops = ops; + INIT_RCU_HEAD(&dst->rcu_head); + memset(dst->info, 0, + ops->entry_size - offsetof(struct dst_entry, info)); #if RT_CACHE_DEBUG >= 2 atomic_inc(&dst_total); #endif --- net/ipv4/route.c.~1~ Sun Jun 8 23:28:00 2003 +++ net/ipv4/route.c Mon Jun 9 06:49:15 2003 @@ -88,6 +88,7 @@ #include #include #include +#include #include #include #include @@ -882,6 +883,60 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(&rt_hash_table[hash].lock); } +static void __rt_hash_shrink(unsigned int hash) +{ + struct rtable *rth, **rthp; + struct rtable *cand, **candp; + unsigned int min_use = ~(unsigned int) 0; + + spin_lock_bh(&rt_hash_table[hash].lock); + cand = NULL; + candp = NULL; + rthp = &rt_hash_table[hash].chain; + while ((rth = *rthp) != NULL) { + if (!atomic_read(&rth->u.dst.__refcnt) && + ((unsigned int) rth->u.dst.__use) < min_use) { + cand = rth; + candp = rthp; + min_use = rth->u.dst.__use; + } + rthp = &rth->u.rt_next; + } + if (cand) { + *candp = cand->u.rt_next; + rt_free(cand); + } + + spin_unlock_bh(&rt_hash_table[hash].lock); +} + +static inline struct rtable *ip_rt_dst_alloc(unsigned int hash) +{ + if (atomic_read(&ipv4_dst_ops.entries) > + ipv4_dst_ops.gc_thresh) + __rt_hash_shrink(hash); + + return dst_alloc(&ipv4_dst_ops); +} + +static void ip_rt_copy(struct rtable *rt, struct rtable *old) +{ + memcpy(rt, old, sizeof(*rt)); + + INIT_RCU_HEAD(&rt->u.dst.rcu_head); + rt->u.dst.__use = 1; + atomic_set(&rt->u.dst.__refcnt, 1); + rt->u.dst.child = NULL; + if (rt->u.dst.dev) + dev_hold(rt->u.dst.dev); + rt->u.dst.obsolete = 0; + rt->u.dst.lastuse = jiffies; + rt->u.dst.path = &rt->u.dst; + rt->u.dst.neighbour = NULL; + rt->u.dst.hh = NULL; + rt->u.dst.xfrm = NULL; +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, u8 tos, struct net_device *dev) { @@ -912,9 +967,10 @@ void ip_rt_redirect(u32 old_gw, u32 dadd for (i = 0; i < 2; i++) { for (k = 0; k < 2; k++) { - unsigned hash = rt_hash_code(daddr, - skeys[i] ^ (ikeys[k] << 5), - tos); + unsigned int hash = rt_hash_code(daddr, + skeys[i] ^ + (ikeys[k] << 5), + tos); rthp=&rt_hash_table[hash].chain; @@ -942,7 +998,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd dst_hold(&rth->u.dst); rcu_read_unlock(); - rt = dst_alloc(&ipv4_dst_ops); + rt = ip_rt_dst_alloc(hash); if (rt == NULL) { ip_rt_put(rth); in_dev_put(in_dev); @@ -950,19 +1006,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd } /* Copy all the information. */ - *rt = *rth; - INIT_RCU_HEAD(&rt->u.dst.rcu_head); - rt->u.dst.__use = 1; - atomic_set(&rt->u.dst.__refcnt, 1); - rt->u.dst.child = NULL; - if (rt->u.dst.dev) - dev_hold(rt->u.dst.dev); - rt->u.dst.obsolete = 0; - rt->u.dst.lastuse = jiffies; - rt->u.dst.path = &rt->u.dst; - rt->u.dst.neighbour = NULL; - rt->u.dst.hh = NULL; - rt->u.dst.xfrm = NULL; + ip_rt_copy(rt, rth); rt->rt_flags |= RTCF_REDIRECTED; @@ -1352,7 +1396,7 @@ static void rt_set_nexthop(struct rtable static int ip_route_input_mc(struct sk_buff *skb, u32 daddr, u32 saddr, u8 tos, struct net_device *dev, int our) { - unsigned hash; + unsigned int hash; struct rtable *rth; u32 spec_dst; struct in_device *in_dev = in_dev_get(dev); @@ -1375,7 +1419,9 @@ static int ip_route_input_mc(struct sk_b dev, &spec_dst, &itag) < 0) goto e_inval; - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -1421,7 +1467,6 @@ static int ip_route_input_mc(struct sk_b RT_CACHE_STAT_INC(in_slow_mc); in_dev_put(in_dev); - hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos); return rt_intern_hash(hash, rth, (struct rtable**) &skb->dst); e_nobufs: @@ -1584,45 +1629,42 @@ int ip_route_input_slow(struct sk_buff * goto e_inval; } - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; atomic_set(&rth->u.dst.__refcnt, 1); - rth->u.dst.flags= DST_HOST; - if (in_dev->cnf.no_policy) - rth->u.dst.flags |= DST_NOPOLICY; - if (in_dev->cnf.no_xfrm) - rth->u.dst.flags |= DST_NOXFRM; - rth->fl.fl4_dst = daddr; + rth->u.dst.dev = out_dev->dev; + dev_hold(out_dev->dev); + rth->u.dst.flags= (DST_HOST | + (in_dev->cnf.no_policy ? DST_NOPOLICY : 0) | + (in_dev->cnf.no_xfrm ? DST_NOXFRM : 0)); + rth->u.dst.input = ip_forward; + rth->u.dst.output = ip_output; + + rth->rt_flags = flags; + rth->rt_src = saddr; rth->rt_dst = daddr; - rth->fl.fl4_tos = tos; + rth->rt_iif = dev->ifindex; + rth->rt_gateway = daddr; + + rth->fl.iif = dev->ifindex; + rth->fl.fl4_dst = daddr; + rth->fl.fl4_src = saddr; #ifdef CONFIG_IP_ROUTE_FWMARK rth->fl.fl4_fwmark= skb->nfmark; #endif - rth->fl.fl4_src = saddr; - rth->rt_src = saddr; - rth->rt_gateway = daddr; + rth->fl.fl4_tos = tos; + rth->rt_spec_dst= spec_dst; #ifdef CONFIG_IP_ROUTE_NAT rth->rt_src_map = fl.fl4_src; rth->rt_dst_map = fl.fl4_dst; - if (flags&RTCF_DNAT) + if (flags & RTCF_DNAT) rth->rt_gateway = fl.fl4_dst; #endif - rth->rt_iif = - rth->fl.iif = dev->ifindex; - rth->u.dst.dev = out_dev->dev; - dev_hold(rth->u.dst.dev); - rth->fl.oif = 0; - rth->rt_spec_dst= spec_dst; - - rth->u.dst.input = ip_forward; - rth->u.dst.output = ip_output; rt_set_nexthop(rth, &res, itag); - rth->rt_flags = flags; - #ifdef CONFIG_NET_FASTROUTE if (netdev_fastroute && !(flags&(RTCF_NAT|RTCF_MASQ|RTCF_DOREDIRECT))) { struct net_device *odev = rth->u.dst.dev; @@ -1663,45 +1705,45 @@ brd_input: RT_CACHE_STAT_INC(in_brd); local_input: - rth = dst_alloc(&ipv4_dst_ops); + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; + atomic_set(&rth->u.dst.__refcnt, 1); + rth->u.dst.dev = &loopback_dev; + dev_hold(&loopback_dev); + rth->u.dst.flags= (DST_HOST | + (in_dev->cnf.no_policy ? DST_NOPOLICY : 0)); + rth->u.dst.input= ip_local_deliver; rth->u.dst.output= ip_rt_bug; +#ifdef CONFIG_NET_CLS_ROUTE + rth->u.dst.tclassid = itag; +#endif - atomic_set(&rth->u.dst.__refcnt, 1); - rth->u.dst.flags= DST_HOST; - if (in_dev->cnf.no_policy) - rth->u.dst.flags |= DST_NOPOLICY; - rth->fl.fl4_dst = daddr; + rth->rt_flags = flags|RTCF_LOCAL; + rth->rt_type = res.type; + rth->rt_src = saddr; rth->rt_dst = daddr; - rth->fl.fl4_tos = tos; + rth->rt_iif = dev->ifindex; + rth->rt_gateway = daddr; + + rth->fl.iif = dev->ifindex; + rth->fl.fl4_dst = daddr; + rth->fl.fl4_src = saddr; #ifdef CONFIG_IP_ROUTE_FWMARK rth->fl.fl4_fwmark= skb->nfmark; #endif - rth->fl.fl4_src = saddr; - rth->rt_src = saddr; + rth->fl.fl4_tos = tos; + rth->rt_spec_dst= spec_dst; #ifdef CONFIG_IP_ROUTE_NAT rth->rt_dst_map = fl.fl4_dst; rth->rt_src_map = fl.fl4_src; #endif -#ifdef CONFIG_NET_CLS_ROUTE - rth->u.dst.tclassid = itag; -#endif - rth->rt_iif = - rth->fl.iif = dev->ifindex; - rth->u.dst.dev = &loopback_dev; - dev_hold(rth->u.dst.dev); - rth->rt_gateway = daddr; - rth->rt_spec_dst= spec_dst; - rth->u.dst.input= ip_local_deliver; - rth->rt_flags = flags|RTCF_LOCAL; if (res.type == RTN_UNREACHABLE) { rth->u.dst.input= ip_error; rth->u.dst.error= -err; rth->rt_flags &= ~RTCF_LOCAL; } - rth->rt_type = res.type; goto intern; no_route: @@ -1767,6 +1809,8 @@ int ip_route_input(struct sk_buff *skb, tos &= IPTOS_RT_MASK; hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos); + prefetch(&rt_hash_table[hash].chain->fl); + rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { smp_read_barrier_depends(); @@ -2048,7 +2092,10 @@ make_route: } } - rth = dst_alloc(&ipv4_dst_ops); + hash = rt_hash_code(oldflp->fl4_dst, + oldflp->fl4_src ^ (oldflp->oif << 5), tos); + + rth = ip_rt_dst_alloc(hash); if (!rth) goto e_nobufs; @@ -2104,10 +2151,6 @@ make_route: rt_set_nexthop(rth, &res, 0); - - rth->rt_flags = flags; - - hash = rt_hash_code(oldflp->fl4_dst, oldflp->fl4_src ^ (oldflp->oif << 5), tos); err = rt_intern_hash(hash, rth, rp); done: if (free_res) @@ -2132,6 +2175,8 @@ int __ip_route_output_key(struct rtable struct rtable *rth; hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), flp->fl4_tos); + + prefetch(&rt_hash_table[hash].chain->fl); rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { From babydr@baby-dragons.com Mon Jun 9 07:37:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 07:37:12 -0700 (PDT) Received: from filesrv1.baby-dragons.com (filesrv1.system-techniques.com [199.33.245.55]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Eb42x001024 for ; Mon, 9 Jun 2003 07:37:05 -0700 Received: from filesrv1.baby-dragons.com (localhost [127.0.0.1]) by filesrv1.baby-dragons.com (8.12.9/8.12.7) with ESMTP id h59Eb17W006564; Mon, 9 Jun 2003 10:37:01 -0400 Received: from localhost (babydr@localhost) by filesrv1.baby-dragons.com (8.12.9/8.12.7/Submit) with ESMTP id h59Eb1SL006561; Mon, 9 Jun 2003 10:37:01 -0400 X-Authentication-Warning: filesrv1.baby-dragons.com: babydr owned process doing -bs Date: Mon, 9 Jun 2003 10:37:01 -0400 (EDT) From: "Mr. James W. Laferriere" To: Jamal Hadi cc: Linux networking maillist , netdev@oss.sgi.com Subject: Re: netlink tester program In-Reply-To: <20030608212033.Y33230@shell.cyberus.ca> Message-ID: References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> <20030608212033.Y33230@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3006 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: babydr@baby-dragons.com Precedence: bulk X-list: netdev Hello Jamal , Is there a time frame to update the draft ? Many people do not see these lists . Yours & Mr. Haas's work is only viewable to these thru the ietf draft/rfc/... process . Rather narrow veiw but necessary at times . Tia , JimL On Sun, 8 Jun 2003, Jamal Hadi wrote: ...snip... > apologies, I actually have a unrelated daytime job that tends to keep me > too occupied at times ;-> > > Netlink2 draft is work in progress. The draft tends to lag reality. > I believe what you refer to has been fixed. Refer to the slides at: > http://www.zurich.ibm.com/~rha/netlink2.pdf > > > BTW, is netlink2 support planned for linux in the near > > future? > > You will see code from us that is GPL. Consider netlink2 as a distributed > netlink. netlink is already proven so why reinvent the wheel? > Essentially you should be able to manager clusters of linux network > devices (think firewalls, routers etc) with netlink/netlink2. > There are some mechanisms for distributdness that are missing. These are > the holes we are going to fill. > > Note some of the stuff i am working on at: > www.cyberus.ca/~hadi/patches/action which fits the whole forces paradigm > and works quiet well with netlink today and netlink2 next. > (I stopped updating that web page for sometime now, talk to me if > interested in the patches and if you would like to help in testing, > coding, etc) -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ From davem@redhat.com Mon Jun 9 08:00:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 08:00:28 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59F0I2x002275 for ; Mon, 9 Jun 2003 08:00:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA18915; Mon, 9 Jun 2003 07:55:35 -0700 Date: Mon, 09 Jun 2003 07:55:34 -0700 (PDT) Message-Id: <20030609.075534.124074402.davem@redhat.com> To: vnuorval@tcs.hut.fi Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, lpetande@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com Subject: Re: ipv6 tunnel patch From: "David S. Miller" In-Reply-To: References: <20030607.033059.48393210.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3007 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ville Nuorvala Date: Mon, 9 Jun 2003 14:43:11 +0300 (EEST) Ok here's the last revision of the patch :) It's done against ChangeSet 1.1308. Applied, thank you. From shemminger@osdl.org Mon Jun 9 09:24:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:24:42 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GOV2x003418 for ; Mon, 9 Jun 2003 09:24:32 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59GNRX06913; Mon, 9 Jun 2003 09:23:27 -0700 Date: Mon, 9 Jun 2003 09:23:27 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: xerox@foonet.net, hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress Message-Id: <20030609092327.41899cb5.shemminger@osdl.org> In-Reply-To: <20030608.232827.88487519.davem@redhat.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030608.232827.88487519.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3008 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Has anyone looked into using Judy array's to speedup the route cache. HP has opened it up (see http://sourceforge.net/projects/judy ) and it should have better scaling for these type of attacks. From sim@netnation.com Mon Jun 9 09:30:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:30:37 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GUA2x003792 for ; Mon, 9 Jun 2003 09:30:11 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PPX8-0005GM-6M; Mon, 09 Jun 2003 09:30:10 -0700 Date: Mon, 9 Jun 2003 09:30:10 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030609163010.GA11509@netnation.com> References: <20030608234926.GA9453@netnation.com> <001001c32e19$81bc7ea0$4a00000a@badass> <20030609064719.GA20613@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3009 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 09:28:59AM -0400, Ralph Doncaster wrote: > The trick is finding the good IOS revs. 12.0(7)T and 12.2(11)T have been > good ones for me. Finding other ISPs running ciscos to exchange tips and > ideas has been much easier than finding folks running linux. A sure-fire > way to get flamed is to post to NANOG asking what's the best Linux router > setup! > > For most ISPs it's better to spend $20K on a 7206VXR/NPE-G1 than to spend > days trying to figure out what kernel + patch set, NIC, and motherboard > combination will squeeze the best performance out of a PC router. And > once you've done that you still have zebra quirks to worry about... I beg to differ. We had much more pain trying to get those things to work properly than putting together two boxes that have been up now for almost a year without incident. Running Zebra, keepalived, etc., without any problems at all. What Zebra quirks? There has not yet been one crash or failure, which is much better than we could say for the 7206s. And I wouldn't exactly call it difficult to "squeeze" performance out of a PC when the 7206 VXRs have a 200 MHz processor. The main reason we switched is when we realized we could set up a powerful Linux box full of gigabit NICs for less than the price of one gigabit interface. At the time we purchased the NICs (3C996B-T) for less than $150 CDN each, and they're probably cheaper now. Simon- From davem@redhat.com Mon Jun 9 09:41:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:41:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GfD2x004186 for ; Mon, 9 Jun 2003 09:41:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA19368; Mon, 9 Jun 2003 09:37:56 -0700 Date: Mon, 09 Jun 2003 09:37:55 -0700 (PDT) Message-Id: <20030609.093755.51688774.davem@redhat.com> To: shemminger@osdl.org Cc: xerox@foonet.net, hadi@shell.cyberus.ca, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609092327.41899cb5.shemminger@osdl.org> References: <001801c32e50$57ef0750$4a00000a@badass> <20030608.232827.88487519.davem@redhat.com> <20030609092327.41899cb5.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3010 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 9 Jun 2003 09:23:27 -0700 Has anyone looked into using Judy array's to speedup the route cache. HP has opened it up (see http://sourceforge.net/projects/judy ) and it should have better scaling for these type of attacks. Like all such seemingly promising schemes, insert/retrieve are optimized at the expense of delete. I normally don't even look at such algorithms anymore, they all are amazing if you only build tables and look for things in them but are unusable when O(1) insert/delete/lookup are absolutely required. From davem@redhat.com Mon Jun 9 09:54:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 09:54:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59GsB2x004619 for ; Mon, 9 Jun 2003 09:54:11 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA19441; Mon, 9 Jun 2003 09:51:06 -0700 Date: Mon, 09 Jun 2003 09:51:06 -0700 (PDT) Message-Id: <20030609.095106.22030084.davem@redhat.com> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] switch skfp over to initcalls From: "David S. Miller" In-Reply-To: <20030609120603.GA31393@lst.de> References: <20030609120603.GA31393@lst.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3011 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Christoph Hellwig Date: Mon, 9 Jun 2003 14:06:03 +0200 This is a PCI driver and has no business in Space.c. Also allows to kill all the fddi code in there (and the stale reference to the long gone apfddi driver) Applied, thanks. From shemminger@osdl.org Mon Jun 9 10:10:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:10:39 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HAU2x005409; Mon, 9 Jun 2003 10:10:30 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59HAJX22232; Mon, 9 Jun 2003 10:10:19 -0700 Date: Mon, 9 Jun 2003 10:10:18 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: ralf@oss.sgi.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-Id: <20030609101018.0ca2e1f9.shemminger@osdl.org> In-Reply-To: <20030607.013010.116359540.davem@redhat.com> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3012 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 07 Jun 2003 01:30:10 -0700 (PDT) "David S. Miller" wrote: > From: Stephen Hemminger > Date: Fri, 6 Jun 2003 16:16:58 -0700 > > This driver never calls unregister in it's module exit function: > > static void __exit ioc3_cleanup_module(void) > { > pci_unregister_driver(&ioc3_driver); > } > > pci_unregister_driver() invokes, for each PCI driver instance > registered, the ->remove() method for that driver. > > What is the problem? tg3.c and many other drivers work exactly > like this, using the PCI registry mechanism as a helper to do > all the grunge work or device iteration. pci_unregister_driver does the iteration but not the net device cleanup. The problem is the driver never calls unregister_netdev, it just free's the device structure. If this ever happens, the net device list would be corrupt. Don't have the hardware to actually do this though. Looks like the right fix is: --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s struct ioc3 *ioc3 = ip->regs; iounmap(ioc3); + unregister_netdev(dev); pci_release_regions(pdev); kfree(dev); } From garzik@gtf.org Mon Jun 9 10:12:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:32 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCP2x005765; Mon, 9 Jun 2003 10:12:28 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id A73FE666D; Mon, 9 Jun 2003 13:12:24 -0400 (EDT) Date: Mon, 9 Jun 2003 13:12:24 -0400 From: Jeff Garzik To: Stephen Hemminger Cc: "David S. Miller" , ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-ID: <20030609171224.GA14623@gtf.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609101018.0ca2e1f9.shemminger@osdl.org> User-Agent: Mutt/1.3.28i X-archive-position: 3014 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:10:18AM -0700, Stephen Hemminger wrote: > Looks like the right fix is: > --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 > +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 > @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s > struct ioc3 *ioc3 = ip->regs; > > iounmap(ioc3); > + unregister_netdev(dev); > pci_release_regions(pdev); > kfree(dev); > } You want to unregister before iounmap. Jeff From davem@redhat.com Mon Jun 9 10:12:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCF2x005736; Mon, 9 Jun 2003 10:12:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19545; Mon, 9 Jun 2003 10:09:14 -0700 Date: Mon, 09 Jun 2003 10:09:13 -0700 (PDT) Message-Id: <20030609.100913.67895069.davem@redhat.com> To: shemminger@osdl.org Cc: ralf@oss.sgi.com, jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 From: "David S. Miller" In-Reply-To: <20030609101018.0ca2e1f9.shemminger@osdl.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3013 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Stephen Hemminger Date: Mon, 9 Jun 2003 10:10:18 -0700 pci_unregister_driver does the iteration but not the net device cleanup. You're absolutely right. From davem@redhat.com Mon Jun 9 10:12:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:12:57 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HCr2x006045 for ; Mon, 9 Jun 2003 10:12:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19560; Mon, 9 Jun 2003 10:09:49 -0700 Date: Mon, 09 Jun 2003 10:09:48 -0700 (PDT) Message-Id: <20030609.100948.84378474.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <3EE4BF39.2020503@pobox.com> References: <3EE4045D.4040002@pobox.com> <20030608.225309.39172149.davem@redhat.com> <3EE4BF39.2020503@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3015 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Mon, 09 Jun 2003 13:09:13 -0400 David S. Miller wrote: > That's your plan, but did you do any of this yet? It'll keep > going deeper and deeper into bitkeeper history the longer that > you wait :-) Yes, I have been following my plan. You will see when Marcelo opens 2.4.22-pre1 that I have been committing these to my net-drivers-2.4 queue. Awesome. Now why was I asking this to begin with? You wanted to do something, what was that? :-) From garzik@gtf.org Mon Jun 9 10:14:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:14:50 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HEl2x006644 for ; Mon, 9 Jun 2003 10:14:47 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id DA97C666D; Mon, 9 Jun 2003 13:14:46 -0400 (EDT) Date: Mon, 9 Jun 2003 13:14:46 -0400 From: Jeff Garzik To: "David S. Miller" Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030609171446.GA15239@gtf.org> References: <3EE4045D.4040002@pobox.com> <20030608.225309.39172149.davem@redhat.com> <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.100948.84378474.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3016 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:09:48AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Mon, 09 Jun 2003 13:09:13 -0400 > > David S. Miller wrote: > > That's your plan, but did you do any of this yet? It'll keep > > going deeper and deeper into bitkeeper history the longer that > > you wait :-) > > Yes, I have been following my plan. You will see when Marcelo opens > 2.4.22-pre1 that I have been committing these to my net-drivers-2.4 queue. > > Awesome. > > Now why was I asking this to begin with? You wanted to do something, > what was that? :-) I wanted to wait on the s/kfree/release_netdev/ patch until the other stuff is done. Said patch can be applied anytime, and doing it in this order reduces 2.4 backport merge pain. :) Jeff From davem@redhat.com Mon Jun 9 10:21:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:21:17 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HLD2x007011 for ; Mon, 9 Jun 2003 10:21:13 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19625; Mon, 9 Jun 2003 10:18:07 -0700 Date: Mon, 09 Jun 2003 10:18:07 -0700 (PDT) Message-Id: <20030609.101807.119873069.davem@redhat.com> To: jgarzik@pobox.com Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup From: "David S. Miller" In-Reply-To: <20030609171446.GA15239@gtf.org> References: <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> <20030609171446.GA15239@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3017 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Mon, 9 Jun 2003 13:14:46 -0400 I wanted to wait on the s/kfree/release_netdev/ patch until the other stuff is done. Said patch can be applied anytime, and doing it in this order reduces 2.4 backport merge pain. :) No problem, but once all the init_etherdev() etc. crap is abolished, Stephen's work or something similar goes in... Ok? From davem@redhat.com Mon Jun 9 10:21:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:21:18 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HL72x007006 for ; Mon, 9 Jun 2003 10:21:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19607; Mon, 9 Jun 2003 10:16:15 -0700 Date: Mon, 09 Jun 2003 10:16:15 -0700 (PDT) Message-Id: <20030609.101615.133898193.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: etsh_cucu@yahoo.com, david-b@pacbell.net, rddunlap@osdl.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: netlink tester program From: "David S. Miller" In-Reply-To: <20030608212033.Y33230@shell.cyberus.ca> References: <20030603075742.34434.qmail@web14305.mail.yahoo.com> <20030608212033.Y33230@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3018 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Sun, 8 Jun 2003 21:35:09 -0400 (EDT) Netlink2 draft is work in progress. The draft tends to lag reality. I believe what you refer to has been fixed. Refer to the slides at: http://www.zurich.ibm.com/~rha/netlink2.pdf ... Consider netlink2 as a distributed netlink. Beautiful, if you're going to allow this protocol to go over the wire, you have to choose a network byte order and swap in/out of it. Should be fun :) Unfortunately I see no mention of this issue in the slides, should I be scared? From davem@redhat.com Mon Jun 9 10:23:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:23:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HN52x007608 for ; Mon, 9 Jun 2003 10:23:05 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19658; Mon, 9 Jun 2003 10:20:00 -0700 Date: Mon, 09 Jun 2003 10:19:59 -0700 (PDT) Message-Id: <20030609.101959.62355600.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16091.11735.721251.925522@robur.slu.se> References: <20030522.153330.74735095.davem@redhat.com> <20030529205125.GA30058@netnation.com> <16091.11735.721251.925522@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3019 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Mon, 2 Jun 2003 12:58:31 +0200 And later GC have to remove all enties with spin_lock_bh hold (no packet processing runs). I see packet drops exactly when GC runs. Tuning GC might help but it's something to observe. Please note, in 2.5.x, holding of this lock on one cpu does not prevent packet processing (even for routes on same hash chain) on another cpu because we use RCU there. From davem@redhat.com Mon Jun 9 10:24:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:24:41 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HOZ2x007920 for ; Mon, 9 Jun 2003 10:24:37 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA19665; Mon, 9 Jun 2003 10:21:32 -0700 Date: Mon, 09 Jun 2003 10:21:31 -0700 (PDT) Message-Id: <20030609.102131.73669235.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: sim@netnation.com, netdev@oss.sgi.com, linux-net@vger.kernel.org, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <16091.32021.75335.227150@robur.slu.se> References: <16091.11735.721251.925522@robur.slu.se> <20030602151852.GA6070@netnation.com> <16091.32021.75335.227150@robur.slu.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3020 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Robert Olsson Date: Mon, 2 Jun 2003 18:36:37 +0200 Simon Kirby writes: > Is it possible to have a dst LRU or a simpler approximation of such and > recycle dst entries rather than deallocating/reallocating them? This > would relieve a lot of work from the garbage collector and avoid the > periodic large garbage collection latency. It could be tuned to only > occur in an attack (I remember Alexey saying that the deferred garbage > collection was implemented to reduce latency in normal opreation). I don't see how this can be done. Others may? Full recycle is very doable in 2.4.x, in 2.5.x is an enormously hard problem because we use RCU there (readers run completely without locks). From shemminger@osdl.org Mon Jun 9 10:51:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 10:51:23 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59HpJ2x008863 for ; Mon, 9 Jun 2003 10:51:20 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Hp6X03779; Mon, 9 Jun 2003 10:51:06 -0700 Date: Mon, 9 Jun 2003 10:51:06 -0700 From: Stephen Hemminger To: Jeff Garzik , "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] warning in ethtool ixgb Message-Id: <20030609105106.0330bbec.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3021 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Looks like ethtool now knows about 10G devices in 2.5.70 bk-latest, so ixgb genrates a warning. This should fix that: diff -Nru a/drivers/net/ixgb/ixgb_ethtool.c b/drivers/net/ixgb/ixgb_ethtool.c --- a/drivers/net/ixgb/ixgb_ethtool.c Mon Jun 9 10:49:01 2003 +++ b/drivers/net/ixgb/ixgb_ethtool.c Mon Jun 9 10:49:01 2003 @@ -50,9 +50,6 @@ return (IXGB_EEPROM_SIZE << 1); } -#define SUPPORTED_10000baseT_Full (1 << 11) -#define SPEED_10000 10000 - static void ixgb_ethtool_gset(struct ixgb_adapter *adapter, struct ethtool_cmd *ecmd) { From shemminger@osdl.org Mon Jun 9 11:09:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:09:21 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59I9A2x009422; Mon, 9 Jun 2003 11:09:11 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59I8uX09307; Mon, 9 Jun 2003 11:08:56 -0700 Date: Mon, 9 Jun 2003 11:08:55 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: davem@redhat.com, ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: [BUG] drivers/net/ioc3_eth.c in 2.5 Message-Id: <20030609110855.2e264ce1.shemminger@osdl.org> In-Reply-To: <20030609171224.GA14623@gtf.org> References: <20030606161658.1f01b8f9.shemminger@osdl.org> <20030607.013010.116359540.davem@redhat.com> <20030609101018.0ca2e1f9.shemminger@osdl.org> <20030609171224.GA14623@gtf.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3022 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Mon, 9 Jun 2003 13:12:24 -0400 Jeff Garzik wrote: > On Mon, Jun 09, 2003 at 10:10:18AM -0700, Stephen Hemminger wrote: > > Looks like the right fix is: > > --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 > > +++ ioc3-eth.c 2003-06-09 10:04:45.000000000 -0700 > > @@ -1614,6 +1614,7 @@ static void __devexit ioc3_remove_one (s > > struct ioc3 *ioc3 = ip->regs; > > > > iounmap(ioc3); > > + unregister_netdev(dev); > > pci_release_regions(pdev); > > kfree(dev); > > } > > You want to unregister before iounmap. > > Jeff > > Okay: --- ioc3-eth.c.orig 2003-06-09 10:05:45.000000000 -0700 +++ ioc3-eth.c 2003-06-09 11:08:01.000000000 -0700 @@ -1613,6 +1613,7 @@ static void __devexit ioc3_remove_one (s struct ioc3_private *ip = dev->priv; struct ioc3 *ioc3 = ip->regs; + unregister_netdev(dev); iounmap(ioc3); pci_release_regions(pdev); kfree(dev); From garzik@gtf.org Mon Jun 9 11:11:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:11:58 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IBq2x009776 for ; Mon, 9 Jun 2003 11:11:52 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 963BA666E; Mon, 9 Jun 2003 14:11:51 -0400 (EDT) Date: Mon, 9 Jun 2003 14:11:51 -0400 From: Jeff Garzik To: "David S. Miller" Cc: shemminger@osdl.org, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70] Add release_netdev -- hook for sysfs/net device cleanup Message-ID: <20030609181151.GA20308@gtf.org> References: <3EE4BF39.2020503@pobox.com> <20030609.100948.84378474.davem@redhat.com> <20030609171446.GA15239@gtf.org> <20030609.101807.119873069.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030609.101807.119873069.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3023 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 10:18:07AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Mon, 9 Jun 2003 13:14:46 -0400 > > I wanted to wait on the s/kfree/release_netdev/ patch until the other > stuff is done. Said patch can be applied anytime, and doing it in this > order reduces 2.4 backport merge pain. :) > > No problem, but once all the init_etherdev() etc. crap is abolished, > Stephen's work or something similar goes in... > > Ok? Right. That's what I want :) Jeff From shemminger@osdl.org Mon Jun 9 11:51:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:51:19 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IpD2x011074 for ; Mon, 9 Jun 2003 11:51:13 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Ip1X25445; Mon, 9 Jun 2003 11:51:01 -0700 Date: Mon, 9 Jun 2003 11:51:01 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] expose alloc_netdev for use by drivers. Message-Id: <20030609115101.1f875e0c.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3024 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Several network drivers (tun, bridge, slip, vlan, ...) need to allocate private data structures in a like manner to ether_allocdev. This exposes the net_init hook for them to use. diff -Nru a/drivers/net/net_init.c b/drivers/net/net_init.c --- a/drivers/net/net_init.c Mon Jun 9 11:42:25 2003 +++ b/drivers/net/net_init.c Mon Jun 9 11:42:25 2003 @@ -70,7 +70,7 @@ */ -static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, +struct net_device *alloc_netdev(int sizeof_priv, const char *mask, void (*setup)(struct net_device *)) { struct net_device *dev; @@ -96,6 +96,7 @@ return dev; } +EXPORT_SYMBOL(alloc_netdev); static struct net_device *init_alloc_dev(int sizeof_priv) { diff -Nru a/include/linux/etherdevice.h b/include/linux/etherdevice.h --- a/include/linux/etherdevice.h Mon Jun 9 11:42:25 2003 +++ b/include/linux/etherdevice.h Mon Jun 9 11:42:25 2003 @@ -40,7 +40,6 @@ unsigned char *haddr); extern struct net_device *init_etherdev(struct net_device *dev, int sizeof_priv); extern struct net_device *alloc_etherdev(int sizeof_priv); - static inline void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) { memcpy (dest->data, src, len); diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Mon Jun 9 11:42:25 2003 +++ b/include/linux/netdevice.h Mon Jun 9 11:42:25 2003 @@ -815,6 +815,8 @@ extern void fc_setup(struct net_device *dev); extern void fc_freedev(struct net_device *dev); /* Support for loadable net-drivers */ +extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, + void (*setup)(struct net_device *)); extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); /* Functions used for multicast support */ From shemminger@osdl.org Mon Jun 9 11:54:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:54:28 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IsM2x011688 for ; Mon, 9 Jun 2003 11:54:23 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59Is8X25887; Mon, 9 Jun 2003 11:54:08 -0700 Date: Mon, 9 Jun 2003 11:54:08 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] bridge using alloc_netdev Message-Id: <20030609115408.6b90dc4e.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3025 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This changes the bridge driver to encapsulate it's private information the same way that ether drivers do. This allows later delayed release to work properly. Tested on my machine. Since release_netdev isn't in yet, the destructor is just set to kfree -- that is why I wanted the release_netdev hook to make it in part I, so all these sub drivers got patched only once. diff -Nru a/net/bridge/br_device.c b/net/bridge/br_device.c --- a/net/bridge/br_device.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_device.c Mon Jun 9 11:42:38 2003 @@ -110,10 +110,6 @@ return -1; } -static void br_dev_destruct(struct net_device *dev) -{ - kfree(dev->priv); -} void br_dev_setup(struct net_device *dev) { @@ -124,10 +120,13 @@ dev->hard_start_xmit = br_dev_xmit; dev->open = br_dev_open; dev->set_multicast_list = br_dev_set_multicast_list; - dev->destructor = br_dev_destruct; + dev->destructor = (void (*)(struct net_device *))kfree; SET_MODULE_OWNER(dev); dev->stop = br_dev_stop; dev->accept_fastpath = br_dev_accept_fastpath; dev->tx_queue_len = 0; dev->set_mac_address = NULL; + dev->priv_flags = IFF_EBRIDGE; + + ether_setup(dev); } diff -Nru a/net/bridge/br_if.c b/net/bridge/br_if.c --- a/net/bridge/br_if.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_if.c Mon Jun 9 11:42:38 2003 @@ -78,17 +78,14 @@ struct net_bridge *br; struct net_device *dev; - if ((br = kmalloc(sizeof(*br), GFP_KERNEL)) == NULL) + dev = alloc_netdev(sizeof(struct net_bridge), name, + br_dev_setup); + + if (!dev) return NULL; - memset(br, 0, sizeof(*br)); - dev = &br->dev; - - strlcpy(dev->name, name, sizeof(dev->name)); - dev->priv = br; - dev->priv_flags = IFF_EBRIDGE; - ether_setup(dev); - br_dev_setup(dev); + br = dev->priv; + br->dev = dev; br->lock = SPIN_LOCK_UNLOCKED; INIT_LIST_HEAD(&br->port_list); @@ -159,9 +156,9 @@ if ((br = new_nb(name)) == NULL) return -ENOMEM; - ret = register_netdev(&br->dev); + ret = register_netdev(br->dev); if (ret) - kfree(br); + kfree(br->dev); return ret; } @@ -219,7 +216,7 @@ br_stp_recalculate_bridge_id(br); br_fdb_insert(br, p, dev->dev_addr, 1); - if ((br->dev.flags & IFF_UP) && (dev->flags & IFF_UP)) + if ((br->dev->flags & IFF_UP) && (dev->flags & IFF_UP)) br_stp_enable_port(p); spin_unlock_bh(&br->lock); diff -Nru a/net/bridge/br_input.c b/net/bridge/br_input.c --- a/net/bridge/br_input.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_input.c Mon Jun 9 11:42:38 2003 @@ -40,7 +40,7 @@ br->statistics.rx_bytes += skb->len; indev = skb->dev; - skb->dev = &br->dev; + skb->dev = br->dev; NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL, br_pass_frame_up_finish); @@ -67,7 +67,7 @@ br = p->br; passedup = 0; - if (br->dev.flags & IFF_PROMISC) { + if (br->dev->flags & IFF_PROMISC) { struct sk_buff *skb2; skb2 = skb_clone(skb, GFP_ATOMIC); @@ -140,7 +140,7 @@ return -1; } - if (!memcmp(p->br->dev.dev_addr, dest, ETH_ALEN)) + if (!memcmp(p->br->dev->dev_addr, dest, ETH_ALEN)) skb->pkt_type = PACKET_HOST; NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, diff -Nru a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c --- a/net/bridge/br_netfilter.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_netfilter.c Mon Jun 9 11:42:38 2003 @@ -37,7 +37,7 @@ sizeof(struct bridge_skb_cb))) #define has_bridge_parent(device) ((device)->br_port != NULL) -#define bridge_parent(device) (&((device)->br_port->br->dev)) +#define bridge_parent(device) ((device)->br_port->br->dev) /* We need these fake structures to make netfilter happy -- * lots of places assume that skb->dst != NULL, which isn't diff -Nru a/net/bridge/br_notify.c b/net/bridge/br_notify.c --- a/net/bridge/br_notify.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_notify.c Mon Jun 9 11:42:38 2003 @@ -52,7 +52,7 @@ break; case NETDEV_DOWN: - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { spin_lock_bh(&br->lock); br_stp_disable_port(p); spin_unlock_bh(&br->lock); @@ -60,7 +60,7 @@ break; case NETDEV_UP: - if (!(br->dev.flags & IFF_UP)) { + if (!(br->dev->flags & IFF_UP)) { spin_lock_bh(&br->lock); br_stp_enable_port(p); spin_unlock_bh(&br->lock); diff -Nru a/net/bridge/br_private.h b/net/bridge/br_private.h --- a/net/bridge/br_private.h Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_private.h Mon Jun 9 11:42:38 2003 @@ -81,7 +81,7 @@ { spinlock_t lock; struct list_head port_list; - struct net_device dev; + struct net_device *dev; struct net_device_stats statistics; rwlock_t hash_lock; struct hlist_head hash[BR_HASH_SIZE]; diff -Nru a/net/bridge/br_stp.c b/net/bridge/br_stp.c --- a/net/bridge/br_stp.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp.c Mon Jun 9 11:42:38 2003 @@ -26,7 +26,7 @@ void br_log_state(const struct net_bridge_port *p) { pr_info("%s: port %d(%s) entering %s state\n", - p->br->dev.name, p->port_no, p->dev->name, + p->br->dev->name, p->port_no, p->dev->name, br_port_state_names[p->state]); } @@ -130,7 +130,7 @@ br_topology_change_detection(br); del_timer(&br->tcn_timer); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); mod_timer(&br->hello_timer, jiffies + br->hello_time); } @@ -289,10 +289,10 @@ /* called under bridge lock */ void br_topology_change_detection(struct net_bridge *br) { - if (!(br->dev.flags & IFF_UP)) + if (!(br->dev->flags & IFF_UP)) return; - pr_info("%s: topology change detected", br->dev.name); + pr_info("%s: topology change detected", br->dev->name); if (br_is_root_bridge(br)) { printk(", propagating"); br->topology_change = 1; @@ -446,7 +446,7 @@ { if (br_is_designated_port(p)) { pr_info("%s: received tcn bpdu on port %i(%s)\n", - p->br->dev.name, p->port_no, p->dev->name); + p->br->dev->name, p->port_no, p->dev->name); br_topology_change_detection(p->br); br_topology_change_acknowledge(p); diff -Nru a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c --- a/net/bridge/br_stp_bpdu.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_bpdu.c Mon Jun 9 11:42:38 2003 @@ -145,7 +145,7 @@ spin_lock_bh(&br->lock); if (p->state == BR_STATE_DISABLED - || !(br->dev.flags & IFF_UP) + || !(br->dev->flags & IFF_UP) || !br->stp_enabled || memcmp(buf, header, 6)) goto out; diff -Nru a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c --- a/net/bridge/br_stp_if.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_if.c Mon Jun 9 11:42:38 2003 @@ -93,7 +93,7 @@ br = p->br; printk(KERN_INFO "%s: port %i(%s) entering %s state\n", - br->dev.name, p->port_no, p->dev->name, "disabled"); + br->dev->name, p->port_no, p->dev->name, "disabled"); wasroot = br_is_root_bridge(br); br_become_designated_port(p); @@ -124,7 +124,7 @@ memcpy(oldaddr, br->bridge_id.addr, ETH_ALEN); memcpy(br->bridge_id.addr, addr, ETH_ALEN); - memcpy(br->dev.dev_addr, addr, ETH_ALEN); + memcpy(br->dev->dev_addr, addr, ETH_ALEN); list_for_each_entry(p, &br->port_list, list) { if (!memcmp(p->designated_bridge.addr, oldaddr, ETH_ALEN)) diff -Nru a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c --- a/net/bridge/br_stp_timer.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/br_stp_timer.c Mon Jun 9 11:42:38 2003 @@ -38,9 +38,9 @@ { struct net_bridge *br = (struct net_bridge *)arg; - pr_debug("%s: hello timer expired\n", br->dev.name); + pr_debug("%s: hello timer expired\n", br->dev->name); spin_lock_bh(&br->lock); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); br->hello_timer.expires = jiffies + br->hello_time; @@ -61,7 +61,7 @@ pr_info("%s: neighbor %.2x%.2x.%.2x:%.2x:%.2x:%.2x:%.2x:%.2x lost on port %d(%s)\n", - br->dev.name, + br->dev->name, id->prio[0], id->prio[1], id->addr[0], id->addr[1], id->addr[2], id->addr[3], id->addr[4], id->addr[5], @@ -89,7 +89,7 @@ struct net_bridge *br = p->br; pr_debug("%s: %d(%s) forward delay timer\n", - br->dev.name, p->port_no, p->dev->name); + br->dev->name, p->port_no, p->dev->name); spin_lock_bh(&br->lock); if (p->state == BR_STATE_LISTENING) { p->state = BR_STATE_LEARNING; @@ -108,9 +108,9 @@ { struct net_bridge *br = (struct net_bridge *) arg; - pr_debug("%s: tcn timer expired\n", br->dev.name); + pr_debug("%s: tcn timer expired\n", br->dev->name); spin_lock_bh(&br->lock); - if (br->dev.flags & IFF_UP) { + if (br->dev->flags & IFF_UP) { br_transmit_tcn(br); br->tcn_timer.expires = jiffies + br->bridge_hello_time; @@ -123,7 +123,7 @@ { struct net_bridge *br = (struct net_bridge *) arg; - pr_debug("%s: topo change timer expired\n", br->dev.name); + pr_debug("%s: topo change timer expired\n", br->dev->name); spin_lock_bh(&br->lock); br->topology_change_detected = 0; br->topology_change = 0; @@ -135,7 +135,7 @@ struct net_bridge_port *p = (struct net_bridge_port *) arg; pr_debug("%s: %d(%s) hold timer expired\n", - p->br->dev.name, p->port_no, p->dev->name); + p->br->dev->name, p->port_no, p->dev->name); spin_lock_bh(&p->br->lock); if (p->config_pending) diff -Nru a/net/bridge/netfilter/ebt_redirect.c b/net/bridge/netfilter/ebt_redirect.c --- a/net/bridge/netfilter/ebt_redirect.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/netfilter/ebt_redirect.c Mon Jun 9 11:42:38 2003 @@ -22,7 +22,7 @@ if (hooknr != NF_BR_BROUTING) memcpy((**pskb).mac.ethernet->h_dest, - in->br_port->br->dev.dev_addr, ETH_ALEN); + in->br_port->br->dev->dev_addr, ETH_ALEN); else { memcpy((**pskb).mac.ethernet->h_dest, in->dev_addr, ETH_ALEN); diff -Nru a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c --- a/net/bridge/netfilter/ebtables.c Mon Jun 9 11:42:38 2003 +++ b/net/bridge/netfilter/ebtables.c Mon Jun 9 11:42:38 2003 @@ -135,10 +135,10 @@ if (FWINV2(ebt_dev_check(e->out, out), EBT_IOUT)) return 1; if ((!in || !in->br_port) ? 0 : FWINV2(ebt_dev_check( - e->logical_in, &in->br_port->br->dev), EBT_ILOGICALIN)) + e->logical_in, in->br_port->br->dev), EBT_ILOGICALIN)) return 1; if ((!out || !out->br_port) ? 0 : FWINV2(ebt_dev_check( - e->logical_out, &out->br_port->br->dev), EBT_ILOGICALOUT)) + e->logical_out, out->br_port->br->dev), EBT_ILOGICALOUT)) return 1; if (e->bitmask & EBT_SOURCEMAC) { From shemminger@osdl.org Mon Jun 9 11:55:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:56:02 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Itv2x012008 for ; Mon, 9 Jun 2003 11:55:58 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59IthX26254; Mon, 9 Jun 2003 11:55:43 -0700 Date: Mon, 9 Jun 2003 11:55:43 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] vlan network device using alloc_netdev Message-Id: <20030609115543.42092c96.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3026 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change how vlan driver allocates private data so that it is like ether devices, and will support later changes for delayed free. diff -Nru a/net/8021q/vlan.c b/net/8021q/vlan.c --- a/net/8021q/vlan.c Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan.c Mon Jun 9 11:43:20 2003 @@ -334,6 +334,33 @@ return ret; } +static void vlan_setup(struct net_device *new_dev) +{ + SET_MODULE_OWNER(new_dev); + + /* new_dev->ifindex = 0; it will be set when added to + * the global list. + * iflink is set as well. + */ + new_dev->get_stats = vlan_dev_get_stats; + + /* Make this thing known as a VLAN device */ + new_dev->priv_flags |= IFF_802_1Q_VLAN; + + /* Set us up to have no queue, as the underlying Hardware device + * can do all the queueing we could want. + */ + new_dev->tx_queue_len = 0; + + /* set up method calls */ + new_dev->change_mtu = vlan_dev_change_mtu; + new_dev->open = vlan_dev_open; + new_dev->stop = vlan_dev_stop; + new_dev->set_mac_address = vlan_dev_set_mac_address; + new_dev->set_multicast_list = vlan_dev_set_multicast_list; + new_dev->destructor = (void (*)(struct net_device *)) kfree; +} + /* Attach a VLAN device to a mac address (ie Ethernet Card). * Returns the device that was created, or NULL if there was * an error of some kind. @@ -344,8 +371,8 @@ struct vlan_group *grp; struct net_device *new_dev; struct net_device *real_dev; /* the ethernet device */ - int malloc_size = 0; int r; + char name[IFNAMSIZ]; #ifdef VLAN_DEBUG printk(VLAN_DBG "%s: if_name -:%s:- vid: %i\n", @@ -403,21 +430,6 @@ goto out_unlock; } - malloc_size = (sizeof(struct net_device)); - new_dev = (struct net_device *) kmalloc(malloc_size, GFP_KERNEL); - VLAN_MEM_DBG("net_device malloc, addr: %p size: %i\n", - new_dev, malloc_size); - - if (new_dev == NULL) - goto out_unlock; - - memset(new_dev, 0, malloc_size); - - /* Set us up to have no queue, as the underlying Hardware device - * can do all the queueing we could want. - */ - new_dev->tx_queue_len = 0; - /* Gotta set up the fields for the device. */ #ifdef VLAN_DEBUG printk(VLAN_DBG "About to allocate name, vlan_name_type: %i\n", @@ -426,54 +438,44 @@ switch (vlan_name_type) { case VLAN_NAME_TYPE_RAW_PLUS_VID: /* name will look like: eth1.0005 */ - sprintf(new_dev->name, "%s.%.4i", real_dev->name, VLAN_ID); + snprintf(name, IFNAMSIZ, "%s.%.4i", real_dev->name, VLAN_ID); break; case VLAN_NAME_TYPE_PLUS_VID_NO_PAD: /* Put our vlan.VID in the name. * Name will look like: vlan5 */ - sprintf(new_dev->name, "vlan%i", VLAN_ID); + snprintf(name, IFNAMSIZ, "vlan%i", VLAN_ID); break; case VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD: /* Put our vlan.VID in the name. * Name will look like: eth0.5 */ - sprintf(new_dev->name, "%s.%i", real_dev->name, VLAN_ID); + snprintf(name, IFNAMSIZ, "%s.%i", real_dev->name, VLAN_ID); break; case VLAN_NAME_TYPE_PLUS_VID: /* Put our vlan.VID in the name. * Name will look like: vlan0005 */ default: - sprintf(new_dev->name, "vlan%.4i", VLAN_ID); + snprintf(name, IFNAMSIZ, "vlan%.4i", VLAN_ID); }; + new_dev = alloc_netdev(sizeof(struct vlan_dev_info), name, + vlan_setup); + if (new_dev == NULL) + goto out_unlock; + #ifdef VLAN_DEBUG printk(VLAN_DBG "Allocated new name -:%s:-\n", new_dev->name); #endif - /* set up method calls */ - new_dev->init = vlan_dev_init; - new_dev->destructor = vlan_dev_destruct; - SET_MODULE_OWNER(new_dev); - - /* new_dev->ifindex = 0; it will be set when added to - * the global list. - * iflink is set as well. - */ - new_dev->get_stats = vlan_dev_get_stats; - /* IFF_BROADCAST|IFF_MULTICAST; ??? */ new_dev->flags = real_dev->flags; new_dev->flags &= ~IFF_UP; - /* Make this thing known as a VLAN device */ - new_dev->priv_flags |= IFF_802_1Q_VLAN; - /* need 4 bytes for extra VLAN header info, * hope the underlying device can handle it. */ new_dev->mtu = real_dev->mtu; - new_dev->change_mtu = vlan_dev_change_mtu; /* TODO: maybe just assign it to be ETHERNET? */ new_dev->type = real_dev->type; @@ -484,24 +486,14 @@ new_dev->hard_header_len += VLAN_HLEN; } - new_dev->priv = kmalloc(sizeof(struct vlan_dev_info), - GFP_KERNEL); VLAN_MEM_DBG("new_dev->priv malloc, addr: %p size: %i\n", new_dev->priv, sizeof(struct vlan_dev_info)); - if (new_dev->priv == NULL) - goto out_free_newdev; - - memset(new_dev->priv, 0, sizeof(struct vlan_dev_info)); - memcpy(new_dev->broadcast, real_dev->broadcast, real_dev->addr_len); memcpy(new_dev->dev_addr, real_dev->dev_addr, real_dev->addr_len); new_dev->addr_len = real_dev->addr_len; - new_dev->open = vlan_dev_open; - new_dev->stop = vlan_dev_stop; - if (real_dev->features & NETIF_F_HW_VLAN_TX) { new_dev->hard_header = real_dev->hard_header; new_dev->hard_start_xmit = vlan_dev_hwaccel_hard_start_xmit; @@ -512,8 +504,6 @@ new_dev->rebuild_header = vlan_dev_rebuild_header; } new_dev->hard_header_parse = real_dev->hard_header_parse; - new_dev->set_mac_address = vlan_dev_set_mac_address; - new_dev->set_multicast_list = vlan_dev_set_multicast_list; VLAN_DEV_INFO(new_dev)->vlan_id = VLAN_ID; /* 1 through VLAN_VID_MASK */ VLAN_DEV_INFO(new_dev)->real_dev = real_dev; @@ -526,7 +516,7 @@ #endif if (register_netdevice(new_dev)) - goto out_free_newdev_priv; + goto out_free_newdev; /* So, got the sucker initialized, now lets place * it into our local structure. @@ -572,9 +562,7 @@ out_free_unregister: unregister_netdev(new_dev); - -out_free_newdev_priv: - kfree(new_dev->priv); + goto out_put_dev; out_free_newdev: kfree(new_dev); diff -Nru a/net/8021q/vlan.h b/net/8021q/vlan.h --- a/net/8021q/vlan.h Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan.h Mon Jun 9 11:43:20 2003 @@ -65,8 +65,6 @@ int vlan_dev_set_mac_address(struct net_device *dev, void* addr); int vlan_dev_open(struct net_device* dev); int vlan_dev_stop(struct net_device* dev); -int vlan_dev_init(struct net_device* dev); -void vlan_dev_destruct(struct net_device* dev); int vlan_dev_set_ingress_priority(char* dev_name, __u32 skb_prio, short vlan_prio); int vlan_dev_set_egress_priority(char* dev_name, __u32 skb_prio, short vlan_prio); int vlan_dev_set_vlan_flag(char* dev_name, __u32 flag, short flag_val); diff -Nru a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c --- a/net/8021q/vlan_dev.c Mon Jun 9 11:43:20 2003 +++ b/net/8021q/vlan_dev.c Mon Jun 9 11:43:20 2003 @@ -766,28 +766,6 @@ vlan_flush_mc_list(dev); return 0; } - -int vlan_dev_init(struct net_device *dev) -{ - /* TODO: figure this out, maybe do nothing?? */ - return 0; -} - -void vlan_dev_destruct(struct net_device *dev) -{ - if (dev) { - vlan_flush_mc_list(dev); - if (dev->priv) { - if (VLAN_DEV_INFO(dev)->dent) - BUG(); - - kfree(dev->priv); - dev->priv = NULL; - } - kfree(dev); - } -} - /** Taken from Gleb + Lennert's VLAN code, and modified... */ void vlan_dev_set_multicast_list(struct net_device *vlan_dev) { From shemminger@osdl.org Mon Jun 9 11:59:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 11:59:15 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59IxA2x012340 for ; Mon, 9 Jun 2003 11:59:11 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h59IwvX27144; Mon, 9 Jun 2003 11:58:57 -0700 Date: Mon, 9 Jun 2003 11:58:57 -0700 From: Stephen Hemminger To: "David S. Miller" , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH 2.5.70+] tun using alloc_netdev Message-Id: <20030609115857.38bb31d6.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3027 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change how TUN allocates private data, to be like ethernet devices. This allows later changes that make network device structure persist if sysfs hooks are open. Compiles and loads, but don't know how to test it. One gratitious change was to add C99 initializer for tun_miscdev. diff -Nru a/drivers/net/tun.c b/drivers/net/tun.c --- a/drivers/net/tun.c Mon Jun 9 11:43:37 2003 +++ b/drivers/net/tun.c Mon Jun 9 11:43:37 2003 @@ -122,12 +122,6 @@ DBG(KERN_INFO "%s: tun_net_init\n", tun->name); - SET_MODULE_OWNER(dev); - dev->open = tun_net_open; - dev->hard_start_xmit = tun_net_xmit; - dev->stop = tun_net_close; - dev->get_stats = tun_net_stats; - switch (tun->flags & TUN_TYPE_MASK) { case TUN_TUN_DEV: /* Point-to-Point TUN Device */ @@ -199,14 +193,14 @@ skb_reserve(skb, 2); memcpy_fromiovec(skb_put(skb, len), iv, len); - skb->dev = &tun->dev; + skb->dev = tun->dev; switch (tun->flags & TUN_TYPE_MASK) { case TUN_TUN_DEV: skb->mac.raw = skb->data; skb->protocol = pi.proto; break; case TUN_TAP_DEV: - skb->protocol = eth_type_trans(skb, &tun->dev); + skb->protocol = eth_type_trans(skb, tun->dev); break; }; @@ -325,7 +319,7 @@ schedule(); continue; } - netif_start_queue(&tun->dev); + netif_start_queue(tun->dev); ret = tun_put_user(tun, skb, (struct iovec *) iv, len); @@ -347,6 +341,24 @@ return tun_chr_readv(file, &iv, 1, pos); } +static void tun_setup(struct net_device *dev) +{ + struct tun_struct *tun = dev->priv; + + skb_queue_head_init(&tun->readq); + init_waitqueue_head(&tun->read_wait); + + tun->owner = -1; + dev->init = tun_net_init; + tun->name = dev->name; + SET_MODULE_OWNER(dev); + dev->open = tun_net_open; + dev->hard_start_xmit = tun_net_xmit; + dev->stop = tun_net_close; + dev->get_stats = tun_net_stats; + dev->destructor = (void (*)(struct net_device *))kfree; +} + static int tun_set_iff(struct file *file, struct ifreq *ifr) { struct tun_struct *tun; @@ -367,30 +379,18 @@ return -EPERM; } else { char *name; - - /* Allocate new device */ - if (!(tun = kmalloc(sizeof(struct tun_struct), GFP_KERNEL)) ) - return -ENOMEM; - memset(tun, 0, sizeof(struct tun_struct)); - - skb_queue_head_init(&tun->readq); - init_waitqueue_head(&tun->read_wait); - - tun->owner = -1; - tun->dev.init = tun_net_init; - tun->dev.priv = tun; - SET_MODULE_OWNER(&tun->dev); + unsigned long flags = 0; err = -EINVAL; /* Set dev type */ if (ifr->ifr_flags & IFF_TUN) { /* TUN device */ - tun->flags |= TUN_TUN_DEV; + flags |= TUN_TUN_DEV; name = "tun%d"; } else if (ifr->ifr_flags & IFF_TAP) { /* TAP device */ - tun->flags |= TUN_TAP_DEV; + flags |= TUN_TAP_DEV; name = "tap%d"; } else goto failed; @@ -398,12 +398,19 @@ if (*ifr->ifr_name) name = ifr->ifr_name; - if ((err = dev_alloc_name(&tun->dev, name)) < 0) - goto failed; - if ((err = register_netdevice(&tun->dev))) + dev = alloc_netdev(sizeof(struct tun_struct), name, + tun_setup); + if (!dev) + return -ENOMEM; + + tun = dev->priv; + tun->flags = flags; + + if ((err = register_netdevice(tun->dev))) { + kfree(dev); goto failed; + } - tun->name = tun->dev.name; } DBG(KERN_INFO "%s: tun_set_iff\n", tun->name); @@ -419,9 +426,7 @@ strcpy(ifr->ifr_name, tun->name); return 0; - -failed: - kfree(tun); + failed: return err; } @@ -548,10 +553,8 @@ /* Drop read queue */ skb_queue_purge(&tun->readq); - if (!(tun->flags & TUN_PERSIST)) { - dev_close(&tun->dev); - unregister_netdevice(&tun->dev); - } + if (!(tun->flags & TUN_PERSIST)) + unregister_netdevice(tun->dev); rtnl_unlock(); @@ -574,11 +577,10 @@ .fasync = tun_chr_fasync }; -static struct miscdevice tun_miscdev= -{ - TUN_MINOR, - "net/tun", - &tun_fops +static struct miscdevice tun_miscdev = { + .minor = TUN_MINOR, + .name = "net/tun", + .fops = &tun_fops }; int __init tun_init(void) diff -Nru a/include/linux/if_tun.h b/include/linux/if_tun.h --- a/include/linux/if_tun.h Mon Jun 9 11:43:37 2003 +++ b/include/linux/if_tun.h Mon Jun 9 11:43:37 2003 @@ -40,7 +40,7 @@ wait_queue_head_t read_wait; struct sk_buff_head readq; - struct net_device dev; + struct net_device *dev; struct net_device_stats stats; struct fasync_struct *fasync; From davem@redhat.com Mon Jun 9 12:03:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 12:03:15 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59J3B2x012707 for ; Mon, 9 Jun 2003 12:03:12 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA20323; Mon, 9 Jun 2003 12:00:02 -0700 Date: Mon, 09 Jun 2003 12:00:02 -0700 (PDT) Message-Id: <20030609.120002.129768701.davem@redhat.com> To: shemminger@osdl.org Cc: jgarzik@pobox.com, netdev@oss.sgi.com Subject: Re: [PATCH 2.5.70+] tun using alloc_netdev From: "David S. Miller" In-Reply-To: <20030609115857.38bb31d6.shemminger@osdl.org> References: <20030609115857.38bb31d6.shemminger@osdl.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3028 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev All this stuff looks great Stephen, I'll apply this tomorrow unless there are major objections. From davem@redhat.com Mon Jun 9 14:33:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 14:34:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59LXv2x017152 for ; Mon, 9 Jun 2003 14:33:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA20856; Mon, 9 Jun 2003 14:30:46 -0700 Date: Mon, 09 Jun 2003 14:30:46 -0700 (PDT) Message-Id: <20030609.143046.48683937.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <004f01c32ebe$b4bd88d0$4a00000a@badass> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3029 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 15:38:30 -0400 I've tried other settings, secret-interval 1 which seems to 'flush' the cache every second or 60 seconds as I have it here.. If I have secret interval set to 1 the GC never runs because the cache never gets > my gc thresh.. Set secret interval to infinity. Even the default setting of 10 minutes is overly anal. It's only picking a new random secret for the hash so that algorithmic attacks are less likely even if the attacker find a method by which to determine the secret key on your system. It is impossible for an attacker to do this as far as I am aware. Also tried with max_size 16000 but juno pegs the route cache What do you mean, specifically, by "pegs"? This seems to be a good compromise for now.. Setting the secret interval smaller than it's default serves no purpose. I would recommend instead to incrase it. Ok you see this happening but during this the router is almost unusable.. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3 root 20 -1 0 0 0 RW< 48.5 0.0 34:04 ksoftirqd_CPU0 4 root 20 -1 0 0 0 RW< 46.7 0.0 34:14 ksoftirqd_CPU1 Both cpus are slammed at 100% by the ksoftirqds. ksoftirqd kicks in WAY too early, try my patch below. This is using e1000 with interrups limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time and drops more packets than without :> Something is very wrong, NAPI can only give your system more CPU time by which to do packet processing. Some good kernel profiles would be nice too. Anyways, here is the patch to make ksoftirqd no kick in so quickly, it's based upon a 2.4.x patch from Ingo Molnar: --- kernel/softirq.c.~1~ Mon Jun 9 14:28:02 2003 +++ kernel/softirq.c Mon Jun 9 14:29:28 2003 @@ -52,11 +52,22 @@ wake_up_process(tsk); } +/* + * We restart softirq processing MAX_SOFTIRQ_RESTART times, + * and we fall back to softirqd after that. + * + * This number has been established via experimentation. + * The two things to balance is latency against fairness - + * we want to handle softirqs as soon as possible, but they + * should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_RESTART 10 + asmlinkage void do_softirq(void) { + int max_restart = MAX_SOFTIRQ_RESTART; __u32 pending; unsigned long flags; - __u32 mask; if (in_interrupt()) return; @@ -68,7 +79,6 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); restart: /* Reset the pending bitmask before enabling irqs */ @@ -88,10 +98,8 @@ local_irq_disable(); pending = local_softirq_pending(); - if (pending & mask) { - mask &= ~pending; + if (pending && --max_restart) goto restart; - } if (pending) wakeup_softirqd(smp_processor_id()); __local_bh_enable(); From sim@netnation.com Mon Jun 9 15:19:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:19:19 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59MJC2x017843 for ; Mon, 9 Jun 2003 15:19:12 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PUyt-00037Z-97; Mon, 09 Jun 2003 15:19:11 -0700 Date: Mon, 9 Jun 2003 15:19:11 -0700 From: Simon Kirby To: CIT/Paul Cc: "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress Message-ID: <20030609221911.GF11509@netnation.com> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <004f01c32ebe$b4bd88d0$4a00000a@badass> User-Agent: Mutt/1.5.4i X-archive-position: 3030 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 03:38:30PM -0400, CIT/Paul wrote: > gc_elasticity:1 > gc_interval:600 > gc_min_interval:1 > gc_thresh:60000 > gc_timeout:15 > max_delay:10 > max_size:512000 ^^^ EEP, no! Even the default of 65536 is too big. No wonder you have no CPU left. This should never be bigger than 65536 (unless the hash is increased), but even then it should be set smaller and the GC interval should be fixed. With a table that large, it's going to be walking the buckets all of the time. > I've tried other settings, secret-interval 1 which seems to 'flush' the > cache every second or 60 seconds as I have it here.. That's only for permutating the hash table to avoid remote hash exploits. Ideally, you don't want anything clearing the route cache except for the regular garbage collection (where the gc_elasticity controls how much of it gets nuked). > If I have secret interval set to 1 the GC never runs because the cache > never gets > my gc thresh.. I've also tried this with > Gc_thresh 2000 and more aggressive settings (timeout 5, interval 10).. > Also tried with max_size 16000 but juno pegs the route cache > And I get massive amounts of dst_cache_overflow messages .. Try setting gc_min_interval to 0 and gc_elasticity to 4 (so that it doesn't entirely nuke it all the time, but so that it runs fairly often and prunes quite a bit). gc_min_interval:0 will actually make it clear as it allocates, if I remember correctly. > This is 'normal' traffic on the router (using the rtstat program) > > ./rts -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot > mc GC: tot ignored goal_miss ovrf > 59272 26954 1826 0 0 0 0 0 6 0 > 0 0 0 0 0 Yes, your route cache is way too large for the hash. Ours looks like this: [sroot@r2:/root]# rtstat -i 1 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc 870721946 16394 1013 8 4 4 0 0 38 12 0 870722937 16278 1007 8 0 10 0 0 32 6 0 870723935 16362 999 5 0 6 0 0 34 8 0 870725083 16483 1158 1 0 0 0 2 26 6 0 870726047 16634 974 0 0 4 0 0 42 0 0 870726168 14315 2338 13 10 8 0 0 34 44 2 870726168 14683 1383 0 8 2 0 0 30 12 2 870726864 16172 1155 0 6 2 0 0 28 4 0 870728079 17842 1234 0 0 0 0 0 28 12 0 870729106 17545 1036 2 0 2 0 0 30 6 0 ...Hmm, the size is a bit off there. I'm not sure what that's all about. Did you have to hack on rtstat.c at all? Alternative: [sroot@r2:/root]# while (1) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 8064 8706 9299 9939 10277 10857 11426 11731 12328 12796 13096 13623 1139 2712 4233 561 2468 3948 5075 5459 6114 6768 7502 7815 8303 8969 9602 10090 10566 11194 11765 11987 12678 12920 13563 14136 14693 2336 3652 4814 5954 6449 6741 7412 8036 ....Hmm, even that is growing a bit large. Pfft. I guess we were doing less traffic last time I checked this. :) Maybe you have a bit more traffic than us in normal operation and it's growing faster because of that. Still, with a gc_elasticity of 1 it should be clearing it out very quickly. ...Though I just tried that, and it's not. In fact, the gc_elasticity doesn't seem to be making much of a difference at all. The only thing that seems to really change it is if I set gc_min_interval to 0: [sroot@r2:/proc/sys/net/ipv4/route]# echo 0 > gc_min_interval [sroot@r2:/proc/sys/net/ipv4/route]# while ( 1 ) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 9674 9547 9678 9525 9625 9544 9385 497 2579 3820 4083 4099 4068 4054 4089 4095 4137 4072 4071 4137 2141 3414 4044 2487 3759 4047 4085 4092 4156 4089 4008 475 2497 3729 4146 4085 4116 It seems to regulate it after it gets cleared the first time. If I set gc_elasticity to 1 it seems to bounce around a lot more -- 4 is much smoother. It didn't seem to make a difference with gc_min_interval set to 1, though... hmmm. We've been running normally with gc_min_interval set to 1, but it looks like the BGP table updates have kept the cache from growing too large. > Check what happens when I load up juno.. Yeah... Juno's just going to hit it harder and show the problems with it having to walk through such large hash buckets. How big is your routing table on this box? Is it running BGP? > slammed at 100% by the ksoftirqds. This is using e1000 with interrups > limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time and > drops more packets than without :> Hmm, that's weird. It works quite well here on a single CPU box with tg3 cards. Simon- From Robert.Olsson@data.slu.se Mon Jun 9 15:40:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:41:01 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Mer2x018302 for ; Mon, 9 Jun 2003 15:40:55 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id AAA14450; Tue, 10 Jun 2003 00:39:56 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.3260.654259.708727@robur.slu.se> Date: Tue, 10 Jun 2003 00:39:56 +0200 To: "David S. Miller" Cc: sim@netnation.com, xerox@foonet.net, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org, Robert.Olsson@data.slu.se, kuznet@ms2.inr.ac.ru Subject: Re: Route cache performance under stress In-Reply-To: <20030609.015648.55736734.davem@redhat.com> References: <20030608.225837.115923841.davem@redhat.com> <001801c32e50$57ef0750$4a00000a@badass> <20030609071330.GD20613@netnation.com> <20030609.015648.55736734.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3031 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > BTW, ignoring juno, Robert Olsson has some pktgen hacks that allow > that to generate new-dst-per-packet DoS like traffic. It's much > more effective than Juno-z > > Robert could you should these guys your hacks to do that? Sure. What a discussion... Well I'm happy for the past lazy days. I've include some references in the experiment from last week and it should be interesting for people in this discussion. Summary: Forwarding experiment with different rates of new incoming destinations/sec. Ranging from DoS attack to single destination flow. With full 123k routes. http://robur.slu.se/Linux/net-development/experiments/router-flow-test.html Your latest patch looks interesting... good thinking. Operations and tuning would be simplier. Hope to have time for a test tomorrow. Testing is very manual work still. Cheers. --ro From Robert.Olsson@data.slu.se Mon Jun 9 15:55:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:55:18 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59Mt92x018711 for ; Mon, 9 Jun 2003 15:55:10 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id AAA14703; Tue, 10 Jun 2003 00:54:32 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.4136.328760.955758@robur.slu.se> Date: Tue, 10 Jun 2003 00:54:32 +0200 To: Simon Kirby Cc: CIT/Paul , "'David S. Miller'" , hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609221911.GF11509@netnation.com> References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3032 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Simon Kirby writes: > [sroot@r2:/root]# rtstat -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc > 870721946 16394 1013 8 4 4 0 0 38 12 0 > 870722937 16278 1007 8 0 10 0 0 32 6 0 > ...Hmm, the size is a bit off there. I'm not sure what that's all about. Seems you have an older version of rtstat. There are stats for the GC process there too. You can get recent rtstat from: robur.slu.se:/pub/Linux/net-development/rt_cache_stat/rtstat.c I'm about to propose some stats even for hash spinning.... --- linux/include/net/route.h.orig 2003-03-24 22:59:53.000000000 +0100 +++ linux/include/net/route.h 2003-05-16 11:04:07.000000000 +0200 @@ -102,6 +102,8 @@ unsigned int gc_ignored; unsigned int gc_goal_miss; unsigned int gc_dst_overflow; + unsigned int in_hlist_search; + unsigned int out_hlist_search; }; extern struct rt_cache_stat *rt_cache_stat; --- linux/net/ipv4/route.c.orig 2003-03-24 23:01:48.000000000 +0100 +++ linux/net/ipv4/route.c 2003-05-16 11:18:54.000000000 +0200 @@ -321,7 +321,7 @@ for (i = 0; i < NR_CPUS; i++) { if (!cpu_possible(i)) continue; - len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", + len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", dst_entries, per_cpu_ptr(rt_cache_stat, i)->in_hit, per_cpu_ptr(rt_cache_stat, i)->in_slow_tot, @@ -338,7 +338,9 @@ per_cpu_ptr(rt_cache_stat, i)->gc_total, per_cpu_ptr(rt_cache_stat, i)->gc_ignored, per_cpu_ptr(rt_cache_stat, i)->gc_goal_miss, - per_cpu_ptr(rt_cache_stat, i)->gc_dst_overflow + per_cpu_ptr(rt_cache_stat, i)->gc_dst_overflow, + per_cpu_ptr(rt_cache_stat, i)->in_hlist_search, + per_cpu_ptr(rt_cache_stat, i)->out_hlist_search ); } @@ -1771,6 +1773,7 @@ skb->dst = (struct dst_entry*)rth; return 0; } + RT_CACHE_STAT_INC(in_hlist_search); } rcu_read_unlock(); @@ -2137,6 +2140,7 @@ *rp = rth; return 0; } + RT_CACHE_STAT_INC(out_hlist_search); } rcu_read_unlock(); Cheers. --ro From xerox@foonet.net Mon Jun 9 15:57:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 15:57:33 -0700 (PDT) Received: from foonix.foonet.net (root@foonix.foonet.net [66.252.0.130]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59MvM2x019029 for ; Mon, 9 Jun 2003 15:57:22 -0700 Received: from badass (web-proxy2.foonet.net [65.117.175.254]) by foonix.foonet.net (8.12.8/8.12.5) with ESMTP id h59MvHeq019566; Mon, 9 Jun 2003 18:57:17 -0400 From: "CIT/Paul" To: "'Simon Kirby'" Cc: "'David S. Miller'" , , , , Subject: RE: Route cache performance under stress Date: Mon, 9 Jun 2003 18:56:18 -0400 Organization: CIT Message-ID: <008001c32eda$56760830$4a00000a@badass> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <20030609221911.GF11509@netnation.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-archive-position: 3033 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss under high load.. So I find that the e1000 ITR works just as well And there is no reason for NAPI at this point. I will try your settings :) net.ipv4.route.secret_interval = 600 net.ipv4.route.min_adv_mss = 256 net.ipv4.route.min_pmtu = 552 net.ipv4.route.mtu_expires = 600 net.ipv4.route.gc_elasticity = 4 net.ipv4.route.error_burst = 500 net.ipv4.route.error_cost = 100 net.ipv4.route.redirect_silence = 2048 net.ipv4.route.redirect_number = 9 net.ipv4.route.redirect_load = 2 net.ipv4.route.gc_interval = 600 net.ipv4.route.gc_timeout = 15 net.ipv4.route.gc_min_interval = 0 net.ipv4.route.max_size = 32768 net.ipv4.route.gc_thresh = 2000 net.ipv4.route.max_delay = 10 net.ipv4.route.min_delay = 5 Current settings.... Rtstat output: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2010 9014 14039 0 0 0 0 0 0 6 2 14038 0 49 0 2008 8675 13999 0 0 0 1 0 1 5 2 13992 0 56 0 2002 8529 16484 0 0 0 1 0 0 7 2 16483 0 43 0 2009 8549 15304 0 0 0 0 0 1 10 2 15303 0 55 0 2007 8491 16118 0 0 0 0 0 0 10 2 16117 0 50 0 2024 8219 18306 0 0 0 1 0 0 7 2 18309 0 14 0 2005 8586 15536 0 0 0 0 0 0 9 2 15536 0 42 0 2007 8804 15797 0 0 0 0 0 0 7 2 15796 0 42 0 2012 8535 16519 0 0 0 1 0 0 7 2 16518 0 28 0 2004 8348 15709 0 0 0 0 1 0 8 2 15707 0 42 0 ... 2043 8600 18278 0 0 0 0 0 0 12 2 18285 0 15 0 2030 8631 17731 0 0 0 1 0 0 9 2 17737 0 7 0 2002 8489 14653 0 0 0 1 0 2 5 2 14650 0 35 0 2015 8147 15004 0 0 0 0 0 0 9 2 15003 0 57 0 2015 8352 17303 0 0 0 2 0 0 8 2 17308 0 7 0 2025 8451 16768 0 0 0 0 0 0 6 2 16768 0 35 0 2013 8531 16464 0 0 0 0 0 0 13 2 16476 0 7 0 2013 8117 15202 0 0 0 1 1 0 7 2 15198 0 35 0 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2019 7913 15054 0 0 0 1 0 0 9 2 15057 0 42 0 2008 8258 16019 0 0 0 0 0 1 9 2 16020 0 43 0 2025 8211 17897 0 0 0 1 0 0 5 2 17902 0 0 0 CPU NORMAL: CPU0 states: 36.0% user, 29.0% system, 0.0% nice, 33.0% idle CPU1 states: 18.0% user, 61.0% system, 0.0% nice, 19.0% idle CPU0 states: 21.0% user, 44.0% system, 0.0% nice, 35.0% idle CPU1 states: 18.0% user, 47.0% system, 0.0% nice, 35.0% idle 3 root 10 -1 0 0 0 SW< 0.0 0.0 35:29 ksoftirqd_CPU0 4 root 10 -1 0 0 0 SW< 0.0 0.0 35:35 ksoftirqd_CPU1 Rtstat under light juno: 2315 7955 51691 0 0 0 1 1 1 5 1 51695 0 0 0 2336 6620 47387 0 0 0 1 0 1 5 1 47393 0 0 0 2371 5630 49726 0 0 0 0 0 1 12 2 49737 0 0 0 2372 5420 53458 0 0 0 1 0 0 2 1 53460 0 0 0 2369 4891 48983 0 0 0 0 0 1 5 2 48988 0 0 0 2389 4529 50525 0 0 0 0 1 1 8 1 50532 0 0 0 2334 4645 49092 0 0 1 1 0 0 1 1 49093 0 0 0 2358 5033 48971 0 0 0 1 0 1 6 2 48977 0 0 0 2366 4864 51411 0 0 0 2 0 1 8 1 51419 0 0 0 2370 5035 49444 0 0 0 0 0 0 4 2 49448 0 0 0 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 2391 5328 49098 0 0 0 1 0 3 12 3 49110 0 0 0 2363 5586 50687 0 0 0 2 0 0 7 1 50693 0 0 0 2361 4571 49243 0 0 0 0 0 0 2 1 49243 0 0 0 2356 5758 56664 0 0 1 1 0 1 5 1 56666 0 0 0 2375 5581 62098 0 0 0 2 0 0 8 2 62103 0 0 0 2393 3895 50762 0 0 0 1 0 0 5 0 50764 0 0 0 2335 4066 56659 0 0 0 1 0 0 10 2 56667 0 0 0 2315 3607 49990 0 0 0 1 0 0 4 1 49992 0 0 0 2339 4369 54149 0 0 0 1 0 0 7 1 54153 0 0 0 CPU under JUNO: CPU0 states: 0.0% user, 99.3% system, 0.2% nice, 0.0% idle CPU1 states: 0.2% user, 99.3% system, 0.1% nice, 0.0% idle 4 root 14 -1 0 0 0 SW< 21.0 0.0 35:33 ksoftirqd_CPU1 3 root 15 -1 0 0 0 SW< 20.1 0.0 35:27 ksoftirqd_CPU0 This is 10mbit of juno....... Or around 9.6 or so... RTS normal with 8000 thresh: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8003 11474 9076 0 0 0 2 0 0 4 2 9071 0 10 0 8010 11425 9205 0 0 0 0 0 0 7 2 9203 0 14 0 8006 11393 12516 0 0 0 1 0 4 5 0 12509 0 20 0 8005 12082 9188 0 0 0 2 0 0 5 2 9184 0 14 0 8004 11447 8893 0 0 0 0 0 0 8 2 8890 0 12 0 8004 12346 8898 0 0 0 1 0 2 5 2 8891 0 10 0 8003 11557 8944 0 0 0 2 0 1 7 1 8942 0 14 0 8004 12812 9890 0 0 0 0 0 1 5 1 9878 0 16 0 8004 12166 11363 0 0 0 1 0 2 3 2 11349 0 23 0 8012 11933 8881 0 0 0 2 0 0 6 2 8874 0 15 0 8003 11938 9024 0 0 0 0 0 1 5 1 9017 0 12 0 8003 12107 8682 0 0 0 1 0 2 3 2 8674 0 13 0 8008 11328 8945 0 0 0 1 0 2 6 1 8942 0 10 0 CPU: CPU0 states: 0.0% user, 50.0% system, 0.0% nice, 49.0% idle CPU1 states: 1.0% user, 57.0% system, 0.0% nice, 40.0% idle CPU0 states: 0.0% user, 27.0% system, 0.0% nice, 72.0% idle CPU1 states: 0.0% user, 41.0% system, 0.0% nice, 58.0% idle 3 root 12 -1 0 0 0 SW< 0.0 0.0 35:29 ksoftirqd_CPU0 4 root 9 -1 0 0 0 SW< 0.0 0.0 35:35 ksoftirqd_CPU1 I've mucked with TONNnss of settings.. I've even had the route-cache up to over 600,000 entries and the CPU still has room left for more.. It can't possibly be the size of the cache, it simply has to be the constant creation and teardown of entries .. I can't hit anywhere NEAR 100kpps On this router with the amount of load on it.. The routing table: ip ro ls | wc 516 2598 21032 Doesn't have too much in it.. It's running bgp but im not taking the full routes right now.. We will later though. There are some ip rules Also some netfilters iptables-save | wc 1154 7658 46126 Of course there isn't 1154 entries because some of that is the chains and things but there are a lot of rules in netfilter also.. Everything seems to slow it down :/ especially the mangle table.. If I add 1000 entries to the mangle table in netfilter it uses massive cpu .. Netfilter seems to be a hog. Like I said I've tested this with NO netfilter and nothing else on a test box except for the kernel, e1000 , ITR set to ~4000 and all sorts of changing the settings and I still can't hit 100kpps routing with juno-z Paul xerox@foonet.net http://www.httpd.net -----Original Message----- From: Simon Kirby [mailto:sim@netnation.com] Sent: Monday, June 09, 2003 6:19 PM To: CIT/Paul Cc: 'David S. Miller'; hadi@shell.cyberus.ca; fw@deneb.enyo.de; netdev@oss.sgi.com; linux-net@vger.kernel.org Subject: Re: Route cache performance under stress On Mon, Jun 09, 2003 at 03:38:30PM -0400, CIT/Paul wrote: > gc_elasticity:1 > gc_interval:600 > gc_min_interval:1 > gc_thresh:60000 > gc_timeout:15 > max_delay:10 > max_size:512000 ^^^ EEP, no! Even the default of 65536 is too big. No wonder you have no CPU left. This should never be bigger than 65536 (unless the hash is increased), but even then it should be set smaller and the GC interval should be fixed. With a table that large, it's going to be walking the buckets all of the time. > I've tried other settings, secret-interval 1 which seems to 'flush' > the cache every second or 60 seconds as I have it here.. That's only for permutating the hash table to avoid remote hash exploits. Ideally, you don't want anything clearing the route cache except for the regular garbage collection (where the gc_elasticity controls how much of it gets nuked). > If I have secret interval set to 1 the GC never runs because the cache > never gets > my gc thresh.. I've also tried this with Gc_thresh 2000 > and more aggressive settings (timeout 5, interval 10).. Also tried > with max_size 16000 but juno pegs the route cache And I get massive > amounts of dst_cache_overflow messages .. Try setting gc_min_interval to 0 and gc_elasticity to 4 (so that it doesn't entirely nuke it all the time, but so that it runs fairly often and prunes quite a bit). gc_min_interval:0 will actually make it clear as it allocates, if I remember correctly. > This is 'normal' traffic on the router (using the rtstat program) > > ./rts -i 1 > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot > mc GC: tot ignored goal_miss ovrf > 59272 26954 1826 0 0 0 0 0 6 0 > 0 0 0 0 0 Yes, your route cache is way too large for the hash. Ours looks like this: [sroot@r2:/root]# rtstat -i 1 size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc 870721946 16394 1013 8 4 4 0 0 38 12 0 870722937 16278 1007 8 0 10 0 0 32 6 0 870723935 16362 999 5 0 6 0 0 34 8 0 870725083 16483 1158 1 0 0 0 2 26 6 0 870726047 16634 974 0 0 4 0 0 42 0 0 870726168 14315 2338 13 10 8 0 0 34 44 2 870726168 14683 1383 0 8 2 0 0 30 12 2 870726864 16172 1155 0 6 2 0 0 28 4 0 870728079 17842 1234 0 0 0 0 0 28 12 0 870729106 17545 1036 2 0 2 0 0 30 6 0 ...Hmm, the size is a bit off there. I'm not sure what that's all about. Did you have to hack on rtstat.c at all? Alternative: [sroot@r2:/root]# while (1) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 8064 8706 9299 9939 10277 10857 11426 11731 12328 12796 13096 13623 1139 2712 4233 561 2468 3948 5075 5459 6114 6768 7502 7815 8303 8969 9602 10090 10566 11194 11765 11987 12678 12920 13563 14136 14693 2336 3652 4814 5954 6449 6741 7412 8036 ....Hmm, even that is growing a bit large. Pfft. I guess we were doing less traffic last time I checked this. :) Maybe you have a bit more traffic than us in normal operation and it's growing faster because of that. Still, with a gc_elasticity of 1 it should be clearing it out very quickly. ...Though I just tried that, and it's not. In fact, the gc_elasticity doesn't seem to be making much of a difference at all. The only thing that seems to really change it is if I set gc_min_interval to 0: [sroot@r2:/proc/sys/net/ipv4/route]# echo 0 > gc_min_interval [sroot@r2:/proc/sys/net/ipv4/route]# while ( 1 ) [sroot@r2:(while)]# sleep 1 [sroot@r2:(while)]# ip -o route show cache | wc -l [sroot@r2:(while)]# end 9674 9547 9678 9525 9625 9544 9385 497 2579 3820 4083 4099 4068 4054 4089 4095 4137 4072 4071 4137 2141 3414 4044 2487 3759 4047 4085 4092 4156 4089 4008 475 2497 3729 4146 4085 4116 It seems to regulate it after it gets cleared the first time. If I set gc_elasticity to 1 it seems to bounce around a lot more -- 4 is much smoother. It didn't seem to make a difference with gc_min_interval set to 1, though... hmmm. We've been running normally with gc_min_interval set to 1, but it looks like the BGP table updates have kept the cache from growing too large. > Check what happens when I load up juno.. Yeah... Juno's just going to hit it harder and show the problems with it having to walk through such large hash buckets. How big is your routing table on this box? Is it running BGP? > slammed at 100% by the ksoftirqds. This is using e1000 with interrups > limited to ~ 4000/second (ITR), no NAPI.. NAPI messes it up big time > and drops more packets than without :> Hmm, that's weird. It works quite well here on a single CPU box with tg3 cards. Simon- From davem@redhat.com Mon Jun 9 16:08:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 16:09:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h59N8v2x019498 for ; Mon, 9 Jun 2003 16:08:58 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA21213; Mon, 9 Jun 2003 16:05:48 -0700 Date: Mon, 09 Jun 2003 16:05:47 -0700 (PDT) Message-Id: <20030609.160547.41648991.davem@redhat.com> To: xerox@foonet.net Cc: sim@netnation.com, hadi@shell.cyberus.ca, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <008001c32eda$56760830$4a00000a@badass> References: <20030609221911.GF11509@netnation.com> <008001c32eda$56760830$4a00000a@badass> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3034 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "CIT/Paul" Date: Mon, 9 Jun 2003 18:56:18 -0400 And there is no reason for NAPI at this point. Intel's ITR give you high latency, NAPI is far superior than any hardware based interrupt mitigation scheme whatsoever. You have some system specific problem with NAPI and we need to analyze that. I've mucked with TONNnss of settings.. I've even had the route-cache up to over 600,000 entries and the CPU still has room left for more.. It can't possibly be the size of the cache, You are letting your hash chains reach the size of "max_size" divided by the number of hash chains. This means that every packet into your machine has to walk that many hash chains. You can keep doing some shamans dance saying that the size you have choosen doesn't matter, but the people who have written this code and work with it every day know that it does. From hadi@shell.cyberus.ca Mon Jun 9 17:03:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:03:56 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A03k2x020464 for ; Mon, 9 Jun 2003 17:03:47 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PWbf-0009IC-Jy; Mon, 09 Jun 2003 20:03:19 -0400 Date: Mon, 9 Jun 2003 20:03:19 -0400 (EDT) From: Jamal Hadi To: CIT/Paul cc: "'Simon Kirby'" , "'David S. Miller'" , fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: RE: Route cache performance under stress In-Reply-To: <008001c32eda$56760830$4a00000a@badass> Message-ID: <20030609195652.E35696@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3035 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, CIT/Paul wrote: > NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss > under high load.. So I find that the e1000 ITR works just as well > And there is no reason for NAPI at this point. > Foo, you on cheap crack again? Please just try the tests as described if you want to help. It doesnt help anyone when you wildly wave your hands like that. Why dont we take you offline - give me access to your machine i have a couple of hours to kill. cheers, jamal From ralph@istop.com Mon Jun 9 17:32:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:32:59 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A0Wn2x021061 for ; Mon, 9 Jun 2003 17:32:50 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id B186E3699E; Mon, 9 Jun 2003 20:32:45 -0400 (EDT) Date: Mon, 9 Jun 2003 20:32:48 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030609195652.E35696@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3036 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Jamal Hadi wrote: > On Mon, 9 Jun 2003, CIT/Paul wrote: > > > NAPI despises SMP.. Any SMP box we run NAPI on has major packet loss > > under high load.. So I find that the e1000 ITR works just as well > > And there is no reason for NAPI at this point. > > > > Foo, you on cheap crack again? > Please just try the tests as described if you want to help. It doesnt help > anyone when you wildly wave your hands like that. From personal experience, after trying numerous things for over a year one can get very frustrated. Although your contribution has been useful, you are also guilty of wildly waving your hands around too. Many moons ago when I lamented that my 2.2.19 kernel, 750Mhz duron, 3c59x core router performance sucked you told me NAPI would solve the performance problems. It didn't. And Rob's latest numbers seem to show that even with the latest and greatest patches 148kpps is still a dream. It's good to see that people are finally doing tests to simulate real-world routing (instead of just pretending the problem doesn't exist because they were able to get 148kpps in some contrived test). Here's my CPU graphs for the box; it's only doing routing and firewalling isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html eth1 and eth2 are both sending and receiving ~30mbps of traffic (at 8-10kpps in and out on each interface). The other variable that I haven't seen people discuss but have anecdotal evidence will measurably impact performance is the motherboard used (chipset and chipset configuration/timing). Lastly from the software side Linux doesn't seem to have anything like BSD's parameter to control user/system CPU sharing. Once my CPU load reaches 70-80%, I'd rather have some dropped packets than let the CPU hit 100% and end up with my BGP sessions drop. -Ralph From krkumar@us.ibm.com Mon Jun 9 17:55:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 17:56:05 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A0tk2x021487 for ; Mon, 9 Jun 2003 17:55:56 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5A0tesZ173504; Mon, 9 Jun 2003 20:55:40 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5A0tbSe129102; Mon, 9 Jun 2003 20:55:38 -0400 Message-ID: <3EE52C92.4060509@us.ibm.com> Date: Mon, 09 Jun 2003 17:55:46 -0700 From: Krishna Kumar Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru, "David S. Miller" , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: [PATCH] Panic in ipv6_add_dev Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3037 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: krkumar@us.ibm.com Precedence: bulk X-list: netdev Hi, I am using 2.5.70 and using VLAN to configure many interfaces, and after some are configured, the system panics in unregister_sysctl_table called from (STACK) neigh_sysctl_unregister, neigh_parms_release, ipv_add_dev. The problem is that we have called neigh_parms_alloc, but not neigh_sysctl_register. Hence calling neigh_parms_release() in the middle frees up the sysctl_header entry for the nd_table as a side-effect (due to the memcpy in neigh_parms_alloc). We need to initialize sysctl_table to NULL in neigh_parms_alloc so that a release can be called safely at any time. Thanks, - KK diff -ruN linux-2.5.70.org/net/core/neighbour.c linux-2.5.70/net/core/neighbour.c --- linux-2.5.70.org/net/core/neighbour.c 2003-06-09 17:32:10.000000000 -0700 +++ linux-2.5.70/net/core/neighbour.c 2003-06-09 17:36:22.000000000 -0700 @@ -1094,6 +1094,7 @@ kfree(p); return NULL; } + p->sysctl_table = NULL; write_lock_bh(&tbl->lock); p->next = tbl->parms.next; tbl->parms.next = p; From ralph@istop.com Mon Jun 9 18:30:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 18:30:41 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A1UC2x022793 for ; Mon, 9 Jun 2003 18:30:13 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 3A11136A81; Mon, 9 Jun 2003 20:56:41 -0400 (EDT) Date: Mon, 9 Jun 2003 20:56:43 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: CIT/Paul , "'David S. Miller'" , "hadi@shell.cyberus.ca" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030609221911.GF11509@netnation.com> Message-ID: References: <20030609082718.GG20613@netnation.com> <004f01c32ebe$b4bd88d0$4a00000a@badass> <20030609221911.GF11509@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3038 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > [sroot@r2:/root]# while (1) > [sroot@r2:(while)]# sleep 1 > [sroot@r2:(while)]# ip -o route show cache | wc -l > [sroot@r2:(while)]# end I considered doing the same test on my box, but I don't have enough juice left to do it every second: root@tor-router# time ip -o route show cache | wc -l 15023 real 0m1.563s user 0m0.380s sys 0m1.180s So instead... root@tor-router# while (true); do sleep 5; ip -o route show cache | wc -l; done 12630 15659 17951 20733 8875 9282 11913 4216 9437 11973 14503 17088 -Ralph From sim@netnation.com Mon Jun 9 18:53:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 18:53:24 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A1rD2x023445 for ; Mon, 9 Jun 2003 18:53:13 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PYK0-0006cC-0Q; Mon, 09 Jun 2003 18:53:12 -0700 Date: Mon, 9 Jun 2003 18:53:12 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030610015311.GB23009@netnation.com> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3039 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 08:32:48PM -0400, Ralph Doncaster wrote: > Here's my CPU graphs for the box; it's only doing routing and firewalling > isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) > http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html > > eth1 and eth2 are both sending and receiving ~30mbps of traffic (at > 8-10kpps in and out on each interface). Interesting! Your CPU use is quite a bit higher than ours. It looks like we have fairly similar network configurations. We're advertising a /24 and a /20 of which about 60% of the IPs are in use. Each router forwards about 60 Mbit/second (16 kpps) during the day, and the CPU load is usually around 18-25%. This is with a single CPU, though I accidentally compiled the kernel SMP. I had forgotten to add CPU utilization to the cricket graphs, so I'll have a better idea from now on, but I've never seen it above 30% (from "vmstat 1") except in attack cases. The difference is probably just the fact that this is running on slightly faster hardware (single Athlon 1800MP, Tyan Tiger MPX board). > Lastly from the software side Linux doesn't seem to have anything like > BSD's parameter to control user/system CPU sharing. Once my CPU load > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > 100% and end up with my BGP sessions drop. Hmm. I found that once NAPI was happening, userspace seemed to get a fairly decent amount of time. I'm not exactly sure what the settings are, but I was able to run things through SSH quite easily (not without noticeable slowness, though). Actually, the slowness appeared to be mostly the result of incoming packet drops ("vmstat 1" output where it was _sending_ data and getting the ACKs some time later was perfectly smooth). We just set up a dual Opertron box today with dual onboard Tigon3s, so I'll see if I can do some profiling. I hooked it via crossover to a Xeon 2.4 GHz box with onboard e1000, so I should be able to do some remote profiling tonight. Simon- From ralph@istop.com Mon Jun 9 19:45:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 19:45:42 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A2jS2x025147 for ; Mon, 9 Jun 2003 19:45:31 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 6857936A7A; Mon, 9 Jun 2003 22:45:27 -0400 (EDT) Date: Mon, 9 Jun 2003 22:45:29 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030609204257.L35799@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3040 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Jamal Hadi wrote: > Problem is people disappear real quick when asked to run tests that > could validate certain concepts. I wish everyone would emulate S Kirby > he actually gives good info. The test results Rob posted today show that the testing can be done in a lab environment. Most of the people I know that would actually see 50kpps in the real world don't have the time to apply various patches and run a bunch of tests; pretending the problem doesn't exist when someone doesn't run tests to prove is a poor excuse. > > Here's my CPU graphs for the box; it's only doing routing and firewalling > > isn't even built into the kernel (2.4.20 with 3c59x NAPI patches) > > http://66.11.168.198/mrtg/tbgp/tbgp_usrsys.html > > > > eth1 and eth2 are both sending and receiving ~30mbps of traffic (at > > 8-10kpps in and out on each interface). > > Is this still the duron 750Mhz? Are you running zebra? Did you > check out some of the ideas i talked about earlier? Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra 0.93b. If the ideas you're referring to are changing the zebra source to arp the next-nops, then no, I haven't tried it (and am not likely to any time soon). > Robert has a good collection for what is good hardware. I am so outdated > i dont keep track anymore. My fastest machine is still an ASuse dual > 450Mhz. There's still more dead-end suggestions than good ones (i.e. the O'Reilley high performance routing book). > > Lastly from the software side Linux doesn't seem to have anything like > > BSD's parameter to control user/system CPU sharing. Once my CPU load > > reaches 70-80%, I'd rather have some dropped packets than let the CPU hit > > 100% and end up with my BGP sessions drop. > > > > Well, heres a good example: With NAPI, have your sessions been dropped? Yup, twice in the last 2 weeks. > Have you tried a different NIC? Not sure how well the 3com is maintained > for example. > Try a tulip or tg3 or e1000 or the dlink gige. Initially I was looking for tulip cards but almost nobody is producing them any more. Almost a year ago I came across the following list, which is why I went with the 3com (at the time it indicated rx/tx irqmit for the 3com, until I emailed the author that I found out it was tx only) http://www.fefe.de/linuxeth/ I had joined the vortex list last fall looking for some tips and that didn't help much (other than telling me that the 3com wasn't the best choice). I've since bought a couple tg3 and a bunch of e1000 cards that I'm planning to put into production. Rob's test results seem to show that even if I replace my 3c905cx cards with e1000's I'll still get killed with a 50kpps synflood with my current CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I can't do that in a 1U rack-mount box. Yeah, I could probably do it with water cooling, but that's not an option in a telco hotel like 151 Front St. (Toronto). A couple weeks ago I got one of my techs to test freeBSD/polling with full routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to suggest it will handle a 50kpps synflood DOS. It would be nice if Linux could do the same. Despite the BSD bashing (to be expected on a Linux list, I guess), I will be using BSD as well as Linux for core routing. The plan is 1 linux router and 1 bsd router each running zebra, connected to separate upstream transit providers, running ibgp between them, and both advertising a default route into OSPF. Then if I get hit with a DOS that kills Linux, the BSD box will have a much better chance of staying up than if I just used a second Linux box for redundancy. If the BSD boxes turn out to have twice the performance of the linux boxes, it may be better for me to dump linux for routing altogether. :-( -Ralph From slblake@petri-meat.com Mon Jun 9 20:05:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:06:02 -0700 (PDT) Received: from server26.totalchoicehosting.com (rs-207-44-248-87.ev1.net [207.44.248.87] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A35o2x025994 for ; Mon, 9 Jun 2003 20:05:51 -0700 Received: from rdu74-174-070.nc.rr.com ([24.74.174.70]) by server26.totalchoicehosting.com with esmtp (Exim 3.36 #1) id 19PZS9-00032B-00; Mon, 09 Jun 2003 22:05:41 -0500 Subject: Re: Route cache performance under stress From: Steven Blake To: Florian Weimer Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: <874r30r9z2.fsf@deneb.enyo.de> References: <87wuge59w2.fsf@deneb.enyo.de> <20030526.233211.54217447.davem@redhat.com> <87he70re62.fsf@deneb.enyo.de> <20030608.050500.28795668.davem@redhat.com> <874r30r9z2.fsf@deneb.enyo.de> Content-Type: text/plain Organization: Message-Id: <1055214346.1199.65.camel@photon> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 09 Jun 2003 23:05:47 -0400 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server26.totalchoicehosting.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [0 0] X-AntiAbuse: Sender Address Domain - petri-meat.com X-archive-position: 3041 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: slblake@petri-meat.com Precedence: bulk X-list: netdev On Sun, 2003-06-08 at 09:10, Florian Weimer wrote: > "David S. Miller" writes: > > > Although, I hope it's not "too similar" to what CEF does because > > undoubtedly Cisco has a bazillion patents in this area. > > Most things in this area are patented, and the patents are extremely > fuzzy (e.g. policy-based routing with hierarchical sequence of > decisions has been patented countless times). 8-( > > > This is actually an argument for coming up with out own algorithms > > without any knowledge of what CEF does or might do. :( > > The branchless variant is not described in the IOS book, and I can't > tell if Cisco routers use it. If this idea is really novel, we are in > pretty good shape because we no longer use trees, tries or whatever, > but a DFA. 8-) Based on my quick reading of your code sample, I think you have just reinvented multibit trees; in your case with a fixed stride of 8 bits. > Further parameters which could be tweaked is the kind of adjacency > information (where to store the L2 information, whether to include the > prefix length in the adjacency record etc.). If you are curious, or just have a lot of time on your hands, you might find the following set of references useful: http://www.petri-meat.com/slblake/networking/refs/lpm_pkt-class/ IMHO, the best LPM algorithm (in terms of balancing lookup speed vs. memory consumption vs. update rate) is CRT, described in the first paper [ASIK]. It is patented, but there is hope that it might get released under GPL in the near future. Regards, // Steve From ralph@istop.com Mon Jun 9 20:18:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:18:52 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A3Ih2x026731 for ; Mon, 9 Jun 2003 20:18:44 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id B5F0D36A2B; Mon, 9 Jun 2003 23:18:42 -0400 (EDT) Date: Mon, 9 Jun 2003 23:18:45 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610015311.GB23009@netnation.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030610015311.GB23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3042 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > "vmstat 1") except in attack cases. The difference is probably just the > fact that this is running on slightly faster hardware (single Athlon > 1800MP, Tyan Tiger MPX board). What happened to Linux users being able to brag about how much they could do with CPUs that were useless for running Windows? On a 1Ghz CPU you've got almost 7,000 cycles to route a packet in order to handle 148kpps. I can't see why the slow path should be more than 2,000 cycles. I know some people's attitude is don't talk if you're not going to write the code. If I had the time I would; from my earliest days of programming I've been optimizing performance to the maximum. I can still remember using page 0 on my c64 to store an 8-bit register in 3 cycles instead of four... So to put a stake in the ground, I'd like to see a 1Ghz celeron with e1000 cards handle 148kpps of DOS traffic at <50% CPU utilization (with full routing tables & no firewalling). If that's not a reasonable expectation, someone please let me know. Even if my time was only worth $500/day, in the past year and a half I spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( -Ralph From greearb@candelatech.com Mon Jun 9 20:24:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 20:24:16 -0700 (PDT) Received: from grok.yi.org (IDENT:Kec0++FnlFSJ3MW+qk3egPD7hn3dYXea@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A3O92x027209 for ; Mon, 9 Jun 2003 20:24:10 -0700 Received: from candelatech.com (IDENT:keqxvO2uK1Z9NpHE2qFZHfwyAQ+0s8ct@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.6) with ESMTP id h5A3Nvl29701; Mon, 9 Jun 2003 20:23:57 -0700 Message-ID: <3EE54F4D.50909@candelatech.com> Date: Mon, 09 Jun 2003 20:23:57 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: ralph+d@istop.com CC: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3043 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Ralph Doncaster wrote: > Initially I was looking for tulip cards but almost nobody is producing > them any more. Almost a year ago I came across the following list, which > is why I went with the 3com (at the time it indicated rx/tx irqmit for the > 3com, until I emailed the author that I found out it was tx only) > http://www.fefe.de/linuxeth/ If you want 4-port tulip NICs, I've had decent luck with the Phobox p430tx NICs ($350 or so per NIC, so not cheap). That said, the e1000s are definately better as far as my own testing has been concerned. (I'm doing packet pushing & reception, no significant routing, though). One waring about e1000's, make sure you have active airflow across the NICs if you put two together. Otherwise, buy a dual port NIC...it has a single chip and you will have less cooling issues. Ben > > I had joined the vortex list last fall looking for some tips and that > didn't help much (other than telling me that the 3com wasn't the best > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > I'm planning to put into production. > > Rob's test results seem to show that even if I replace my 3c905cx cards > with e1000's I'll still get killed with a 50kpps synflood with my current > CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I > can't do that in a 1U rack-mount box. Yeah, I could probably do it with > water cooling, but that's not an option in a telco hotel like 151 Front > St. (Toronto). > > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. > > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. If the BSD boxes turn out to have > twice the performance of the linux boxes, it may be better for me to dump > linux for routing altogether. :-( > > -Ralph > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From ralph@istop.com Mon Jun 9 21:17:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:17:31 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4HC2x029435 for ; Mon, 9 Jun 2003 21:17:15 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 2FC4B369E5; Mon, 9 Jun 2003 23:41:05 -0400 (EDT) Date: Mon, 9 Jun 2003 23:41:07 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Ben Greear Cc: "'netdev@oss.sgi.com'" Subject: Re: Route cache performance under stress In-Reply-To: <3EE54F4D.50909@candelatech.com> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <3EE54F4D.50909@candelatech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3044 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ben Greear wrote: > One waring about e1000's, make sure you have active airflow across the NICs > if you put two together. Otherwise, buy a dual port NIC...it has a single > chip and you will have less cooling issues. I liked how easy the e1000's are to come by; even more so than the 3com cards. Intel seems to be grabbing market share by agressive pricing (bought 4 last week for C$50 ea), so almost every computer equipment distributor carries the intel cards. Since I already have the single-port cards, I guess I'll install them with a couple empty PCI slots between them to help with the cooling. -Ralph From sim@netnation.com Mon Jun 9 21:34:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:35:01 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4Yr2x030207 for ; Mon, 9 Jun 2003 21:34:54 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19PaqT-0007tW-JC; Mon, 09 Jun 2003 21:34:53 -0700 Date: Mon, 9 Jun 2003 21:34:53 -0700 From: Simon Kirby To: ralph+d@istop.com Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress Message-ID: <20030610043453.GC23009@netnation.com> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-archive-position: 3045 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev On Mon, Jun 09, 2003 at 11:18:45PM -0400, Ralph Doncaster wrote: > What happened to Linux users being able to brag about how much they could > do with CPUs that were useless for running Windows? On a 1Ghz CPU you've > got almost 7,000 cycles to route a packet in order to handle 148kpps. I > can't see why the slow path should be more than 2,000 cycles. We're still here. I want the code to be fast and efficient as much as you do. I'd be willing to bet that a lot of this will get fixed now, though. Broken parts of the code only get fixed if enough people whine or especially if somebody decides to actually fix it. My guess is that the "using Linux as an Internet router with more than 10 Mbit/sec of bandwidth" user base is relatively small. > I know some people's attitude is don't talk if you're not going to write > the code. If I had the time I would; from my earliest days of programming > I've been optimizing performance to the maximum. I can still remember > using page 0 on my c64 to store an 8-bit register in 3 cycles instead of > four... I wrote an entire game in TASM once. :) > So to put a stake in the ground, I'd like to see a 1Ghz celeron with e1000 > cards handle 148kpps of DOS traffic at <50% CPU utilization (with full > routing tables & no firewalling). Sounds reasonable. The routing table size issue has now been eliminated, so that should make no difference to the equation. > If that's not a reasonable expectation, someone please let me know. > Even if my time was only worth $500/day, in the past year and a half I > spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( But in the end you'll end up with a system that you'll know the inner workings of and that will be open source, maintainable, scalable, easy to replicate, and easy to upgrade. And it'll have tcpdump, damn it. :) On Mon, Jun 09, 2003 at 10:45:29PM -0400, Ralph Doncaster wrote: > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. I was going to ask before, and it's probably not even possible anymore, but have you tried on a 2.0 kernel before? 2.0 kernels probably have a lot of other problems and don't support the new hardware, but it would be interesting to see how it scales to many srcs/dsts before the route cache was integrated. It probably scales a lot more like FreeBSD does. You'd probably have to use eepro100s or something, though. > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. Good idea. Others have also suggested using Zebra on one and another of the BGP routing daemons on another to avoid routing-daemon-specific DoS issues (or accidental remote crash bugs). Anyway, the performance issues should be fixable. It is going to take some work, but there seem to be some interested people. I'm going to try to set up something that will allow for easy comparisons of patches so that we can measure progress, and perhaps reach an eventual goal. Simon- From yoshfuji@linux-ipv6.org Mon Jun 9 21:55:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 09 Jun 2003 21:55:23 -0700 (PDT) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A4tB2x003991 for ; Mon, 9 Jun 2003 21:55:12 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian-5) with ESMTP id h5A4u1Bo012511; Tue, 10 Jun 2003 13:56:01 +0900 Date: Tue, 10 Jun 2003 13:56:01 +0900 (JST) Message-Id: <20030610.135601.20565349.yoshfuji@linux-ipv6.org> To: netdev@oss.sgi.com, linux-net@vger.kernel.org Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, krkumar@us.ibm.com Subject: Re: [PATCH] Panic in ipv6_add_dev From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <3EE52C92.4060509@us.ibm.com> References: <3EE52C92.4060509@us.ibm.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3046 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <3EE52C92.4060509@us.ibm.com> (at Mon, 09 Jun 2003 17:55:46 -0700), Krishna Kumar says: > We need to initialize sysctl_table to NULL in neigh_parms_alloc so that a > release can be called safely at any time. It solves the problem, patch should be applied. Well, it is also the problem that the tasks of neigh_parms_alloc() / neigh_sysctl_register() and neigh_parms_release() / neigh_sysctl_unregister() were not symmetric. We have neigh_parms_alloc() - neigh_parms_release() pair and neigh_sysctl_register() - neigh_sysctl_unregister() pair. Memory for sysctl table is allocated by neigh_sysctl_register(). While it was/is very natural to free it by neigh_sysctl_unregister(), it was freed by neigh_parms_release(), in rather different context... Here's the fix. (This patch alone also solve the problem.) Index: linux25-LINUS/net/netsyms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/netsyms.c,v retrieving revision 1.1.1.29 diff -u -r1.1.1.29 netsyms.c --- linux25-LINUS/net/netsyms.c 31 May 2003 07:30:46 -0000 1.1.1.29 +++ linux25-LINUS/net/netsyms.c 10 Jun 2003 04:25:32 -0000 @@ -190,6 +190,7 @@ #endif #ifdef CONFIG_SYSCTL EXPORT_SYMBOL(neigh_sysctl_register); +EXPORT_SYMBOL(neigh_sysctl_unregister); #endif EXPORT_SYMBOL(pneigh_lookup); EXPORT_SYMBOL(pneigh_enqueue); Index: linux25-LINUS/net/core/neighbour.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/core/neighbour.c,v retrieving revision 1.1.1.7 diff -u -r1.1.1.7 neighbour.c --- linux25-LINUS/net/core/neighbour.c 26 May 2003 08:04:08 -0000 1.1.1.7 +++ linux25-LINUS/net/core/neighbour.c 10 Jun 2003 04:25:32 -0000 @@ -1113,9 +1113,6 @@ if (*p == parms) { *p = parms->next; write_unlock_bh(&tbl->lock); -#ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(parms); -#endif kfree(parms); return; } @@ -1178,9 +1175,6 @@ } } write_unlock(&neigh_tbl_lock); -#ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(&tbl->parms); -#endif return 0; } Index: linux25-LINUS/net/ipv4/devinet.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv4/devinet.c,v retrieving revision 1.1.1.10 diff -u -r1.1.1.10 devinet.c --- linux25-LINUS/net/ipv4/devinet.c 26 May 2003 08:04:08 -0000 1.1.1.10 +++ linux25-LINUS/net/ipv4/devinet.c 10 Jun 2003 04:25:32 -0000 @@ -197,7 +197,9 @@ /* in_dev_put following below will kill the in_device */ write_unlock_bh(&inetdev_lock); - +#ifdef CONFIG_SYSCTL + neigh_sysctl_unregister(in_dev->arp_parms); +#endif neigh_parms_release(&arp_tbl, in_dev->arp_parms); in_dev_put(in_dev); } Index: linux25-LINUS/net/ipv6/addrconf.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/addrconf.c,v retrieving revision 1.1.1.20 diff -u -r1.1.1.20 addrconf.c --- linux25-LINUS/net/ipv6/addrconf.c 5 Jun 2003 07:47:43 -0000 1.1.1.20 +++ linux25-LINUS/net/ipv6/addrconf.c 10 Jun 2003 04:25:33 -0000 @@ -1925,10 +1925,11 @@ /* Shot the device (if unregistered) */ if (how == 1) { - neigh_parms_release(&nd_tbl, idev->nd_parms); #ifdef CONFIG_SYSCTL addrconf_sysctl_unregister(&idev->cnf); + neigh_sysctl_unregister(&idev->nd_parms); #endif + neigh_parms_release(&nd_tbl, idev->nd_parms); in6_dev_put(idev); } return 0; Index: linux25-LINUS/net/ipv6/ndisc.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ndisc.c,v retrieving revision 1.1.1.17 diff -u -r1.1.1.17 ndisc.c --- linux25-LINUS/net/ipv6/ndisc.c 31 May 2003 07:30:52 -0000 1.1.1.17 +++ linux25-LINUS/net/ipv6/ndisc.c 10 Jun 2003 04:25:33 -0000 @@ -1487,6 +1487,9 @@ void ndisc_cleanup(void) { +#ifdef CONFIG_SYSCTL + neigh_sysctl_unregister(&nd_tbl.parms); +#endif neigh_table_clear(&nd_tbl); sock_release(ndisc_socket); ndisc_socket = NULL; /* For safety. */ -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From sim@netnation.com Tue Jun 10 00:57:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 00:57:47 -0700 (PDT) Received: from peace.netnation.com (newpeace.netnation.com [204.174.223.7]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5A7vX2x011756 for ; Tue, 10 Jun 2003 00:57:34 -0700 Received: from sim by peace.netnation.com with local (Exim 4.20) id 19Pe0a-0001fd-HO; Tue, 10 Jun 2003 00:57:32 -0700 Date: Tue, 10 Jun 2003 00:57:32 -0700 From: Simon Kirby To: ralph+d@istop.com, Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" Cc: "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Route cache performance tests Message-ID: <20030610075732.GD23009@netnation.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.4i X-archive-position: 3047 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sim@netnation.com Precedence: bulk X-list: netdev Okay, I got a chance to run some first tests and have found some simple results that might be worth a read. The test setup is as follows (I'll probably be using this setup for a number of other tests): [ My work desktop, other test boxes on network ] | | | | | [ 100 Mbit Switch ] | | (100 Mbit) | [ Dual tg3 dual 1.4 GHz Opertron box, 1 GB RAM ] | | (1000 MBit) | [ Single e1000 single 2.4 GHz Xeon box ] I have a route added on the test boxes to stuff traffic destined for the Xeon box through the Opertron box. Forwarding is enabled on the Opertron box, and it has a route for the Xeon box. I am testing with Juno right now because it generates the (pseudo-)random IP traffic which we is where the problem is right now. We already know Linux can do hundreds of thousands of pps of ip<->ip traffic, so we can test that later. Juno seems to be able to send about 150,000 pps from my Celery desktop. Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward an amazing 39,000 packets per second. Woohoo! NAPI definitely kicks in and seems to work even on SMP (blink?). The output of "rtstat -i 1" is somewhat interesting. The "GC: tot" field seems to almost exactly match the forwarded packet count, which is handy: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8 4 4 0 0 0 0 0 0 0 0 0 0 0 0 8 3 3 0 0 0 0 0 0 0 0 0 0 0 0 8 5 6 0 0 0 0 0 0 0 0 0 0 0 0 8 4 4 0 0 0 0 0 0 0 0 0 0 0 0 8 5 5 0 0 0 0 0 0 0 0 0 0 0 0 9 3 5 0 0 1 0 0 0 0 0 0 0 0 0 33549 11 65533 0 0 0 0 0 0 0 0 57347 57345 1 0 53499 13 65200 0 0 1 0 0 0 0 0 65196 65194 1 0 65536 19 65540 0 0 1 0 0 0 0 0 65538 64879 0 0 65536 11 33980 0 0 0 0 0 0 0 0 33978 6123 0 0 65536 9 37491 0 0 1 0 0 0 0 0 37489 930 0 0 65536 13 40487 0 0 0 0 0 0 0 0 40484 991 0 0 65536 13 39287 0 0 1 0 0 0 0 0 39284 933 0 0 65536 10 40790 0 0 1 0 0 0 0 0 40789 1006 0 0 65536 17 37783 0 0 0 0 0 0 0 0 37781 866 0 0 65536 8 38092 0 0 0 0 0 0 0 0 38090 880 0 0 65536 14 38086 0 0 1 0 0 0 0 0 38085 877 0 0 65536 13 39587 0 0 0 0 0 0 0 0 39586 922 0 0 65536 18 39882 0 0 1 0 0 0 0 0 39880 908 0 0 65536 8 39292 0 0 0 0 0 0 0 0 39290 894 0 0 65536 10 38390 0 0 4 0 0 0 0 0 38389 879 0 0 65536 13 38087 0 0 0 0 0 0 0 0 38086 830 0 0 65536 10 38692 0 0 0 0 0 0 0 0 38690 845 0 0 65536 16 38982 0 0 1 0 0 0 0 0 38981 899 0 0 The above is with stock settings. Note how the table completely fills up causing the forward rate to suffer. In an attempt to improve performance, I tried "echo 0 > gc_min_interval": size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 65536 15 39585 0 0 0 0 0 0 0 0 39585 909 0 0 65535 13 39587 0 0 1 0 0 0 0 0 39587 877 0 0 32027 10 70044 0 0 0 0 0 0 0 0 70043 0 6 0 32013 8 71092 0 0 0 0 0 0 0 0 71091 0 0 0 31995 10 72290 0 0 1 0 0 0 0 0 72290 0 0 0 31969 13 71087 0 0 2 0 0 0 0 0 71083 0 0 0 31950 5 71695 0 0 0 0 0 0 0 0 71693 0 0 0 31937 10 71690 0 0 2 0 0 0 0 0 71690 0 0 0 31927 10 71390 0 0 0 0 0 0 0 0 71389 0 0 0 31915 18 71382 0 0 0 0 0 0 0 0 71381 0 0 0 31897 5 71395 0 0 0 0 0 0 0 0 71394 0 0 0 31881 7 70793 0 0 0 0 0 0 0 0 70793 0 0 0 31869 5 71095 0 0 0 0 0 0 0 0 71094 0 0 0 31863 16 71084 0 0 0 0 0 0 0 0 71082 0 0 0 31846 22 70778 0 0 0 0 0 0 0 0 70776 0 0 0 31825 5 70795 0 0 1 0 0 0 0 0 70795 0 0 0 31816 10 70490 0 0 0 0 0 0 0 0 70488 0 0 0 And then decided to try "ip route flush cache": size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 31768 8 70192 0 0 0 0 0 0 0 0 70190 0 0 0 31757 15 70185 0 0 1 0 0 0 0 0 70184 0 0 0 31743 5 70495 0 0 1 0 0 0 0 0 70491 0 0 0 8204 2 83314 0 0 0 0 0 1 2 0 75524 0 89 0 8204 2 88859 0 0 0 0 0 1 0 0 88449 0 84 0 8203 3 85797 0 0 1 0 0 0 0 0 85795 0 0 0 8203 0 86100 0 0 0 0 0 0 0 0 86098 0 0 0 ...And then I tried reducing gc_thresh: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf 8200 7 85793 0 0 1 0 0 0 0 0 85790 0 0 0 8200 4 85796 0 0 1 0 0 0 0 0 85792 0 0 0 8200 13 86087 0 0 0 0 0 0 0 0 86086 0 0 0 8200 3 86097 0 0 0 0 0 0 0 0 86096 0 0 0 1530 4 87896 0 0 0 0 0 0 0 0 87277 0 562 0 1370 0 135832 0 0 0 0 0 0 0 0 135829 0 617 0 1348 0 135952 0 0 2 0 0 0 0 0 135952 0 543 0 1341 0 135740 0 0 0 0 0 0 0 0 135739 0 529 0 1348 1 135817 0 0 1 0 0 0 0 0 135817 0 567 0 I tried fiddling with more settings, even setting gc_thresh to 1, but I wasn't able to get the route cache much smaller than that or get it to forward any more packets per second. In any case, setting gc_min_interval to 0 definitely helped, but I suspect Dave's patches will make a bigger difference. Next up is 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. Simon- From hadi@shell.cyberus.ca Tue Jun 10 03:53:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 03:53:48 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AArX2x027234 for ; Tue, 10 Jun 2003 03:53:34 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PgkS-0009dD-Js; Tue, 10 Jun 2003 06:53:04 -0400 Date: Tue, 10 Jun 2003 06:53:04 -0400 (EDT) From: Jamal Hadi To: ralph+d@istop.com cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610061010.Y36963@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3048 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Ralph Doncaster wrote: > On Mon, 9 Jun 2003, Jamal Hadi wrote: > > The test results Rob posted today show that the testing can be done in a > lab environment. I thought you were saying those were _not_ real world traffic patterns. Robert is just doing a worst case scenario testing. What would be useful is we actually test on real environments or maybe even collect real world traffic patterns and run them in the lab. Typically, real world is less intense than the lab. Ex: noone sends 100Mbps at 64 byte packet size. Typical packet is around 500 bytes average. If linux can handle that forwarding capacity, it should easily be doing close to Gige real world capacity. Have you seen how the big boys advertise? when tuning specs they talk about bits/sec. Juniper just announced a blade at supercom that can do firewalling at 500Mbps. > Most of the people I know that would actually see 50kpps > in the real world don't have the time to apply various patches and run a Now thats one big dilema, isnt it? Do you think i have time? Let me assure you that I dont get paid by anybody to do any of this stuff. Infact i havent been paid to do any of this stuff since 1994. Thats a lot of man hours in corporate speak. The point i am making is as a community we gotta put the hours together; the coder, the user etc. As someone who is not maintaining anything (lucky bastard that i am, my name is not even in the credits file - by choice) so i have the luxury to disappear once in a while. Imagine Davems reaction to a message like the above. > bunch of tests; pretending the problem doesn't exist when someone doesn't > run tests to prove is a poor excuse. > I think you _may_ be right theres a problem. However, as a defensive mechanism it is easier to tell someone to go away and come back with solid data. For example, you CPU graphs are very strange: Theres a few hundred variables that may be involved. I have spent many hours investigating peoples problems sshing to their machines only to find out they didnt follow instructions. After the 10th person doing the same thing, what do you expect my reaction to be? Please see the view from this side as well because it is almost a thankless task. > Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra > 0.93b. If the ideas you're referring to are changing the zebra source to > arp the next-nops, then no, I haven't tried it (and am not likely to any > time soon). > I think you may be suffering from the "too low" traffic NAPI syndrome. Under low traffic (1-2 Mbps) on lower end machines NAPI will consume more CPU because of an extra PCI operation per packet that is performed. As for the zebra thing, if you post my message to the Zebra list i am sure someone will be excited enough to do it. I need a few hours to do it but like you i dont have much time. > > Robert has a good collection for what is good hardware. I am so outdated > > i dont keep track anymore. My fastest machine is still an ASuse dual > > 450Mhz. > > There's still more dead-end suggestions than good ones (i.e. the > O'Reilley high performance routing book). > URL? > > Well, heres a good example: With NAPI, have your sessions been dropped? > Yup, twice in the last 2 weeks. > I have seen NAPI slow down throughput because of an intensive user space app. > > Have you tried a different NIC? Not sure how well the 3com is maintained > > for example. > > Try a tulip or tg3 or e1000 or the dlink gige. > > Initially I was looking for tulip cards but almost nobody is producing > them any more. Almost a year ago I came across the following list, which Thats not true. You could buy them off znyx. Yes, intel has EOLed the chips so i dont think Znyx will be doing this for much longer. Get yourself the giges instead. > is why I went with the 3com (at the time it indicated rx/tx irqmit for the > 3com, until I emailed the author that I found out it was tx only) > http://www.fefe.de/linuxeth/ > > I had joined the vortex list last fall looking for some tips and that > didn't help much (other than telling me that the 3com wasn't the best > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > I'm planning to put into production. > yes, move to the giges then lets talk again. I think your main problem is that 3com NAPI is not well supported. Lennert disappeared right after he released the patch and noone else has the interest of maintaining it. > Rob's test results seem to show that even if I replace my 3c905cx cards > with e1000's I'll still get killed with a 50kpps synflood with my current > CPU. Upgrading to dual 2Ghz CPUs is not a preferred solution since I > can't do that in a 1U rack-mount box. Yeah, I could probably do it with > water cooling, but that's not an option in a telco hotel like 151 Front > St. (Toronto). > where are you getting the 50Kpps data from? I see him talkking of input rate of no less than 200Kpps. > A couple weeks ago I got one of my techs to test freeBSD/polling with full > routing tables on a 1Ghz celeron and 2 e1000 cards. His testing seems to > suggest it will handle a 50kpps synflood DOS. It would be nice if Linux > could do the same. > > Despite the BSD bashing (to be expected on a Linux list, I guess), I will > be using BSD as well as Linux for core routing. The plan is 1 linux > router and 1 bsd router each running zebra, connected to separate upstream > transit providers, running ibgp between them, and both advertising a > default route into OSPF. Then if I get hit with a DOS that kills Linux, > the BSD box will have a much better chance of staying up than if I just > used a second Linux box for redundancy. If the BSD boxes turn out to have > twice the performance of the linux boxes, it may be better for me to dump > linux for routing altogether. :-( > This is why you dont get very positivre reaction. You use religious scripture and you expect that people will help prove you are wrong. Let the person who showed that BSD can do better publish the data. If they are in town, let me know because i am willing to walk to meet the challenge. Maybe we'll learn something. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:01:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:01:58 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AB1s2x028215 for ; Tue, 10 Jun 2003 04:01:55 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Pgsi-0009di-C3; Tue, 10 Jun 2003 07:01:36 -0400 Date: Tue, 10 Jun 2003 07:01:36 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610043453.GC23009@netnation.com> Message-ID: <20030610070045.N37047@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3049 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > Anyway, the performance issues should be fixable. It is going to take > some work, but there seem to be some interested people. I'm going to try > to set up something that will allow for easy comparisons of patches so > that we can measure progress, and perhaps reach an eventual goal. > Now heres the right spirit. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:23:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:23:59 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABNs2x029563 for ; Tue, 10 Jun 2003 04:23:54 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PhDs-0009eM-V6; Tue, 10 Jun 2003 07:23:28 -0400 Date: Tue, 10 Jun 2003 07:23:28 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance tests In-Reply-To: <20030610075732.GD23009@netnation.com> Message-ID: <20030610071638.R37090@shell.cyberus.ca> References: <20030610075732.GD23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3050 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Simon Kirby wrote: [some good stuff deleted] Simon, I havent looked at your data in details; i will. Someone like Robert would be able to snuff it much faster than i do. I just wanna say thanks for the effort, I will spend time catching up with you folks. It is clear that our next hurudle is gc. Do you have profiles for your data? Profiles would be nice to collect as well. > In any case, setting gc_min_interval to 0 definitely helped, but I > suspect Dave's patches will make a bigger difference. Next up is > 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday. > Also since you are doing all that work post the kernels somewhere so people like foo can grab them and test as well. cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 04:28:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:28:39 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABSZ2x030044 for ; Tue, 10 Jun 2003 04:28:35 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PhIW-0009ed-Pu; Tue, 10 Jun 2003 07:28:16 -0400 Date: Tue, 10 Jun 2003 07:28:16 -0400 (EDT) From: Jamal Hadi To: Simon Kirby cc: ralph+d@istop.com, "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610043453.GC23009@netnation.com> Message-ID: <20030610072444.Q37105@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3051 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Mon, 9 Jun 2003, Simon Kirby wrote: > I was going to ask before, and it's probably not even possible anymore, > but have you tried on a 2.0 kernel before? 2.0 kernels probably have a > lot of other problems and don't support the new hardware, but it would be > interesting to see how it scales to many srcs/dsts before the route cache > was integrated. It probably scales a lot more like FreeBSD does. You'd > probably have to use eepro100s or something, though. > As a side note, note that stateless forwarding like BSD patricie tries is no longer sufficient. Its no longer just looking up a nexthop, dec ttl, recompute csum that we are optimizing for. The dst cache/flowi is the way to go, so theres no going back;-> - we just gotta make what we have work better. cheers, jamal From pekkas@netcore.fi Tue Jun 10 04:42:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:42:13 -0700 (PDT) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABg42x030794 for ; Tue, 10 Jun 2003 04:42:05 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h5ABf9H21359; Tue, 10 Jun 2003 14:41:09 +0300 Date: Tue, 10 Jun 2003 14:41:08 +0300 (EEST) From: Pekka Savola To: Jamal Hadi cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610061010.Y36963@shell.cyberus.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3052 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > Typically, real world is less intense than the lab. Ex: noone sends > 100Mbps at 64 byte packet size. Some attackers do, and if your box dies because of that.. well, you don't like it and your managers certainly don't :-) > Typical packet is around 500 bytes > average. Not sure that's really the case. I have the impression the traffic is basically something like: - close to 1500 bytes (data transfers) - between 40-100 bytes (TCP acks, simple UDP requests, etc.) - something in between > If linux can handle that forwarding capacity, it should easily > be doing close to Gige real world capacity. Yes, but not the worst case capacity you really have to plan for :-( > Have you seen how the big boys advertise? when tuning specs they talk > about bits/sec. Juniper just announced a blade at supercom that can do > firewalling at 500Mbps. May be for some, but they *DO* give their pps figures also; many operators do, in fact, *explicitly* check the pps figures especially when there are some slower-path features in use (ACL's, IPv6, multicast, RPF, etc.): that's much more important than the optimal figures which are great for advertising material and press releases :-). -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 04:43:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:43:30 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABhL2x031390 for ; Tue, 10 Jun 2003 04:43:22 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5ABgssG004210; Tue, 10 Jun 2003 07:42:54 -0400 (EDT) Message-Id: <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> To: Jamal Hadi cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-reply-to: Your message of "Tue, 10 Jun 2003 06:53:04 EDT." <20030610061010.Y36963@shell.cyberus.ca> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 07:41:01 -0400 From: chas williams X-Spam-Score: () hits=-0.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3053 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030610061010.Y36963@shell.cyberus.ca>,Jamal Hadi writes: >is we actually test on real environments or maybe even collect real >world traffic patterns and run them in the lab. >Typically, real world is less intense than the lab. Ex: noone sends >100Mbps at 64 byte packet size. Typical packet is around 500 bytes >average. If linux can handle that forwarding capacity, it should easily i was curious at one point and collected a some packet size stats on our border router. while the average packet size is close to 500, the bulk (by count) of the traffic seems to be in the 64-95 byte range. (the length here is the link level size as given by tcpdump -e) # 27100000 packets average length = 747 0-31 5271 32-63 0 64-95 12143442 96-127 934314 128-159 202984 160-191 98772 192-223 49279 224-255 37826 256-287 28276 288-319 41675 320-351 42359 352-383 93709 384-415 24557 416-447 73969 448-479 25100 480-511 23210 512-543 86515 544-575 77779 576-607 146066 608-639 23967 640-671 23005 672-703 87471 704-735 13154 736-767 8818 768-799 20850 800-831 7678 832-863 7379 864-895 7920 896-927 5789 928-959 48122 960-991 35512 992-1023 26081 1024-1055 63541 1056-1087 23673 1088-1119 8397 1120-1151 5780 1152-1183 5133 1184-1215 8820 1216-1247 40251 1248-1279 6295 1280-1311 11420 1312-1343 31610 1344-1375 21802 1376-1407 22442 1408-1439 4932071 1440-1471 594385 1472-1503 439460 1504-1535 6434071 From jsd@monmouth.com Tue Jun 10 04:58:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 04:58:49 -0700 (PDT) Received: from tadenker.com (tadenker.com [65.103.215.217]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ABwh2x032259 for ; Tue, 10 Jun 2003 04:58:44 -0700 Received: (qmail 16143 invoked from network); 10 Jun 2003 11:58:36 -0000 Received: from unknown (HELO av8n.net) (10.200.2.1) by jeeves.office.tad.private with SMTP; 10 Jun 2003 11:58:36 -0000 Received: (qmail 638 invoked from network); 10 Jun 2003 11:58:34 -0000 Received: from localhost (HELO monmouth.com) (127.0.0.1) by localhost with SMTP; 10 Jun 2003 11:58:34 -0000 Message-ID: <3EE5C7E9.6090401@monmouth.com> Date: Tue, 10 Jun 2003 07:58:33 -0400 From: "John S. Denker" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030323 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pekka Savola CC: Jamal Hadi , ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3054 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jsd@monmouth.com Precedence: bulk X-list: netdev On 06/10/2003 07:41 AM, Pekka Savola wrote: > >>Typical packet is around 500 bytes >>average. > > Not sure that's really the case. I have the impression the traffic is > basically something like: > - close to 1500 bytes (data transfers) > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > - something in between It helps to take a more sophisticated view of things. In typical networks: Most of the packet-count is to be found in small packets. Most of the byte-count is to be found in large packets. Some things (e.g. routing) depend mainly on the packet-count. Other things (e.g. encryption, layer-1 hardware requirements, memory bandwidth usage, ISP contracts) are sensitive to the byte-count. We shouldn't optimize one at the expense of the other. From hadi@shell.cyberus.ca Tue Jun 10 05:08:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 05:08:29 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AC8I2x000469 for ; Tue, 10 Jun 2003 05:08:19 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Phuf-0009fj-TO; Tue, 10 Jun 2003 08:07:41 -0400 Date: Tue, 10 Jun 2003 08:07:41 -0400 (EDT) From: Jamal Hadi To: Pekka Savola cc: ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610075702.I37165@shell.cyberus.ca> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3055 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Pekka Savola wrote: > On Tue, 10 Jun 2003, Jamal Hadi wrote: > > Typically, real world is less intense than the lab. Ex: noone sends > > 100Mbps at 64 byte packet size. > > Some attackers do, and if your box dies because of that.. well, you don't > like it and your managers certainly don't :-) > Assuming the attacker has a 100mbps link to you, yes ;-> I am not trying to say we should ignore it; infact all our tests have been worst case scenarios. > > Typical packet is around 500 bytes > > average. > > Not sure that's really the case. I have the impression the traffic is > basically something like: > - close to 1500 bytes (data transfers) > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > - something in between > Its is typically trimodal (the ACKs, something in the 500 bytes and the 1500 byte end). The 500 average is derived from staring at cdf graphs: slightly dated more thorough: http://www.nlanr.net/NA/Learn/packetsizes.html Frequent collections by sprint: http://ipmon.sprint.com/packstat/packet.php?030407 so 500 bytes does sound reasonable. Theres a lot of papers that have been written on this subject. > > If linux can handle that forwarding capacity, it should easily > > be doing close to Gige real world capacity. > > Yes, but not the worst case capacity you really have to plan for :-( > agreed. > > Have you seen how the big boys advertise? when tuning specs they talk > > about bits/sec. Juniper just announced a blade at supercom that can do > > firewalling at 500Mbps. > > May be for some, but they *DO* give their pps figures also; many operators > do, in fact, *explicitly* check the pps figures especially when there are > some slower-path features in use (ACL's, IPv6, multicast, RPF, etc.): > that's much more important than the optimal figures which are great for > advertising material and press releases :-). > The announce in question i saw in some post supercom2003. I kept looking for conditions that apply to get that 500mbops but couldnt find any. A lot of people fall for the big brand name, so granted some people will check, quiet a few dont have that expertise and will buy because iut reads "juniper". cheers, jamal From hadi@shell.cyberus.ca Tue Jun 10 05:13:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 05:13:31 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ACDP2x000859 for ; Tue, 10 Jun 2003 05:13:26 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19Phzm-0009g1-1l; Tue, 10 Jun 2003 08:12:58 -0400 Date: Tue, 10 Jun 2003 08:12:58 -0400 (EDT) From: Jamal Hadi To: "John S. Denker" cc: Pekka Savola , ralph+d@istop.com, CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <3EE5C7E9.6090401@monmouth.com> Message-ID: <20030610080901.M37190@shell.cyberus.ca> References: <3EE5C7E9.6090401@monmouth.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3056 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, John S. Denker wrote: > On 06/10/2003 07:41 AM, Pekka Savola wrote: > > > >>Typical packet is around 500 bytes > >>average. > > > > Not sure that's really the case. I have the impression the traffic is > > basically something like: > > - close to 1500 bytes (data transfers) > > - between 40-100 bytes (TCP acks, simple UDP requests, etc.) > > - something in between > > It helps to take a more sophisticated view of things. > In typical networks: > Most of the packet-count is to be found in small packets. > Most of the byte-count is to be found in large packets. > > Some things (e.g. routing) depend mainly on the packet-count. > Other things (e.g. encryption, layer-1 hardware requirements, > memory bandwidth usage, ISP contracts) are sensitive to the > byte-count. > > We shouldn't optimize one at the expense of the other. You bring a good point. Theres another dimension actually: mostly driven by BSD mbuff style packet allocation; some tests show that some vendors are optimized for certain packet sizes, Linux skbuffs dont have this problem. We dont optimize for packet sizes given the linear nature of skbuffs. Donalds ether drivers tend to amortize some of the costs by reallocating skbs when the packet <= 100 bytes, but this is no longer valid with skb recycling and the magazine layer appearing in the slab. cheers, jamal From ralph@istop.com Tue Jun 10 06:11:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:11:33 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADBG2x005746 for ; Tue, 10 Jun 2003 06:11:17 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id E669F36D0F; Tue, 10 Jun 2003 09:10:38 -0400 (EDT) Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610061010.Y36963@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3057 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > I thought you were saying those were _not_ real world traffic patterns. I'm saying the tests that you and Rob did in the past did not reflect real-world use of Linux as a core router (i.e. small routing table and not many different traffic flows). The tests he posted yesterday are a big step forward. > Typically, real world is less intense than the lab. Ex: noone sends > 100Mbps at 64 byte packet size. Typical packet is around 500 bytes > average. If linux can handle that forwarding capacity, it should easily > be doing close to Gige real world capacity. No, it needs to work in the worst case. If some script kiddie can peg my CPU with a synflood then there's still a problem. > > Most of the people I know that would actually see 50kpps > > in the real world don't have the time to apply various patches and run a > > Now thats one big dilema, isnt it? Do you think i have time? Let me > assure you that I dont get paid by anybody to do any of this stuff. Sure I realize that. The problem I've seen occur is that Linux developers with big egos say "linux can route as well as a cisco 3640", or "linux routing is beats BSD any day". Then guys like me decide to give it a try, not realizing we're walking into a tarpit. If I had been told in the first place that running linux as a high-throughput router in a service provider environment was an unknown, things would have been different. > I have spent many hours investigating peoples problems sshing to their > machines only to find out they didnt follow instructions. After the > 10th person doing the same thing, what do you expect my reaction to be? Take 15 minutes and write a web page with the magic settings required to make things work. > > Yup, still a duron 750 on an Asus mobo (Via chipset). Running Zebra > > 0.93b. If the ideas you're referring to are changing the zebra source to > > arp the next-nops, then no, I haven't tried it (and am not likely to any > > time soon). > > > > I think you may be suffering from the "too low" traffic NAPI syndrome. > Under low traffic (1-2 Mbps) on lower end machines NAPI will consume > more CPU because of an extra PCI operation per packet that is performed. No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx cards. > As for the zebra thing, if you post my message to the Zebra list i am sure > someone will be excited enough to do it. I need a few hours to do it > but like you i dont have much time. The last time I looked at the zebra list things seemed pretty dead. Most of the new work is now happening on the commercial zebra development. > > > Well, heres a good example: With NAPI, have your sessions been dropped? > > Yup, twice in the last 2 weeks. > > > > I have seen NAPI slow down throughput because of an intensive user space > app. This is a router with just zebra (zebra, ospfd, bgpd) running. > > I had joined the vortex list last fall looking for some tips and that > > didn't help much (other than telling me that the 3com wasn't the best > > choice). I've since bought a couple tg3 and a bunch of e1000 cards that > > I'm planning to put into production. > > yes, move to the giges then lets talk again. I think your main problem is > that 3com NAPI is not well supported. Lennert disappeared right after he > released the patch and noone else has the interest of maintaining it. Yes, and it would be nice if you mentioned in your NAPI docs that people should use a tulip, tg3, or e1000 if they want it to work well. In making your sales pitches for NAPI you made it sound like any high-performance card should do fine (i.e. anything but a Realtek). > > Rob's test results seem to show that even if I replace my 3c905cx cards > > with e1000's I'll still get killed with a 50kpps synflood with my current > > CPU. > > where are you getting the 50Kpps data from? I see him talkking of > input rate of no less than 200Kpps. On his first graph, for 50k new incoming dst/sec throughput looks to be ~175kpps. And he's running a 1.8Ghz Xenon vs my 750Mhz Duron. > > used a second Linux box for redundancy. If the BSD boxes turn out to have > > twice the performance of the linux boxes, it may be better for me to dump > > linux for routing altogether. :-( > > > > This is why you dont get very positivre reaction. You use religious > scripture and you expect that people will help prove you are wrong. You don't seem to get it. There's at least a dozen things more important to me than seeing Linux routing performance compete with Cisco and BSD. I'm annoyed that people like you have told me linux is up to the task, and then when it's not I'm left SOL. I thought I was talking to competent techies, but now I see most of the techies were also Linux evangelists. Now that people like Rob and Dave are taking a hard look at it I think it's worth my while to ante up for a couple more rounds. I still fell like a sucker that should have walked away from the table a long time ago though. Jim Mercer and Marc Ackley at 151.net/tht.net told me they tried Linux/Zebra and gave up (and went with 7206vxr routers). And they're very pro-unix (still do all their netflow collection and billing on Unix). They're not likely to go back and give Linux another try. If the linux evangelists had just said Linux would be ready for core routing in a year (or whatever) instead, I think network operators would look at it more seriously rather than they joke that they see it as now. -Ralph From ralph@istop.com Tue Jun 10 06:34:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:34:20 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADYE2x006400 for ; Tue, 10 Jun 2003 06:34:14 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id C7E3E36C48; Tue, 10 Jun 2003 09:34:10 -0400 (EDT) Date: Tue, 10 Jun 2003 09:34:15 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Simon Kirby Cc: Jamal Hadi , CIT/Paul , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance tests In-Reply-To: <20030610075732.GD23009@netnation.com> Message-ID: References: <20030610075732.GD23009@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3058 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Simon Kirby wrote: > Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward > an amazing 39,000 packets per second. Woohoo! I hope that's sarcasm. I know if you posted to NANOG saying it took a dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the list. Maybe I should be bragging about my 3-minute lap times on the Shannonville track in my M5! -Ralph From hadi@shell.cyberus.ca Tue Jun 10 06:37:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:37:12 -0700 (PDT) Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.236.4]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADb22x006711 for ; Tue, 10 Jun 2003 06:37:03 -0700 Received: from hadi (helo=localhost) by shell.cyberus.ca with local-esmtp (Exim 4.14) id 19PjId-0009iP-LR; Tue, 10 Jun 2003 09:36:31 -0400 Date: Tue, 10 Jun 2003 09:36:31 -0400 (EDT) From: Jamal Hadi To: ralph+d@istop.com cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: Message-ID: <20030610091736.V37313@shell.cyberus.ca> References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3059 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@shell.cyberus.ca Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Ralph Doncaster wrote: > On Tue, 10 Jun 2003, Jamal Hadi wrote: > > > I thought you were saying those were _not_ real world traffic patterns. > > I'm saying the tests that you and Rob did in the past did not reflect > real-world use of Linux as a core router (i.e. small routing table and not > many different traffic flows). The tests he posted yesterday are a big > step forward. > I think at a minimal define what "real world" means. Is it 100 flows/sec at 20Kpps? what is it? > > Typically, real world is less intense than the lab. Ex: noone sends > > 100Mbps at 64 byte packet size. Typical packet is around 500 bytes > > average. If linux can handle that forwarding capacity, it should easily > > be doing close to Gige real world capacity. > > No, it needs to work in the worst case. If some script kiddie can peg my > CPU with a synflood then there's still a problem. > Lets work on defining "real world". Factor in the script kiddie. > Sure I realize that. The problem I've seen occur is that Linux developers > with big egos say "linux can route as well as a cisco 3640", or "linux > routing is beats BSD any day". Then guys like me decide to give it a try, > not realizing we're walking into a tarpit. If I had been told in the > first place that running linux as a high-throughput router in a service > provider environment was an unknown, things would have been different. > Heres where the problem is: If you interact at this low level then you oughta produce low level input. Provide people with data to help. Otherwise its a high maintanance task. > > I have spent many hours investigating peoples problems sshing to their > > machines only to find out they didnt follow instructions. After the > > 10th person doing the same thing, what do you expect my reaction to be? > > Take 15 minutes and write a web page with the magic settings required to > make things work. > I have many times. I still do. It is also a thankless task. > > I think you may be suffering from the "too low" traffic NAPI syndrome. > > Under low traffic (1-2 Mbps) on lower end machines NAPI will consume > > more CPU because of an extra PCI operation per packet that is performed. > > No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx > cards. > Change your NICs. I dont know what else to suggest. > > As for the zebra thing, if you post my message to the Zebra list i am sure > > someone will be excited enough to do it. I need a few hours to do it > > but like you i dont have much time. > > The last time I looked at the zebra list things seemed pretty dead. Most > of the new work is now happening on the commercial zebra development. > Maybe its time to fork Zebra into something that has the same momentum it had in the earlier days. > > yes, move to the giges then lets talk again. I think your main problem is > > that 3com NAPI is not well supported. Lennert disappeared right after he > > released the patch and noone else has the interest of maintaining it. > > Yes, and it would be nice if you mentioned in your NAPI docs that people > should use a tulip, tg3, or e1000 if they want it to work well. In making > your sales pitches for NAPI you made it sound like any high-performance > card should do fine (i.e. anything but a Realtek). > Theres a URL which points people to where the various NICS supported are. > On his first graph, for 50k new incoming dst/sec throughput looks to be > ~175kpps. And he's running a 1.8Ghz Xenon vs my 750Mhz Duron. > i think what would be interesting is to show CPU utilization as well. > > This is why you dont get very positivre reaction. You use religious > > scripture and you expect that people will help prove you are wrong. > > You don't seem to get it. There's at least a dozen things more important > to me than seeing Linux routing performance compete with Cisco and BSD. Again, if you wanna complain about it at the level you are i think its only fair you help. I actually dont care about CISCO or BSD. We dont win because someone else looses. We simply want to be the best. If you tell me BSD works better, i told you i will walk all the way downtown in the hope i'll find somethuing we can improve on. > I'm annoyed that people like you have told me linux is up to the task, and > then when it's not I'm left SOL. I thought I was talking to competent > techies, but now I see most of the techies were also Linux evangelists. > > Now that people like Rob and Dave are taking a hard look at it I think > it's worth my while to ante up for a couple more rounds. I still fell > like a sucker that should have walked away from the table a long time ago > though. > I think your setup maybe the question. Like i said theres probably a hunderd variables involved. It is up to you to isolate things. Yes, theres a support line in open source, but it is rewarded more when people show some effort. > Jim Mercer and Marc Ackley at 151.net/tht.net told me they tried > Linux/Zebra and gave up (and went with 7206vxr routers). And they're very > pro-unix (still do all their netflow collection and billing on Unix). > They're not likely to go back and give Linux another try. If the linux > evangelists had just said Linux would be ready for core routing in a year > (or whatever) instead, I think network operators would look at it more > seriously rather than they joke that they see it as now. > Theres a lot of BSD bigots in a lot of ISPS and IETF. It's human nature to be comfortable with what they know best. Most of the people i have met that put Linux down or consider it a joke come from the old BSD camp. Its their loss and i dismiss anything they have to say. Lets work on facts. What is it that we can do to improve Linux? Provide data. If you want to compare against BSD, what is it that _ you have facts on_ and not heard from other people that BSD does better? cheers, jamal From Robert.Olsson@data.slu.se Tue Jun 10 06:41:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 06:41:56 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5ADfk2x007045 for ; Tue, 10 Jun 2003 06:41:51 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id PAA28797; Tue, 10 Jun 2003 15:41:09 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16101.57333.369129.622540@robur.slu.se> Date: Tue, 10 Jun 2003 15:41:09 +0200 To: "David S. Miller" Cc: hadi@shell.cyberus.ca, sim@netnation.com, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-Reply-To: <20030609.160547.41648991.davem@redhat.com> References: <20030609221911.GF11509@netnation.com> <008001c32eda$56760830$4a00000a@badass> <20030609.160547.41648991.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 X-archive-position: 3060 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev First run... Worst senario 1 dst/pkt w. 64 byte pkts. 2*10 Million packets injected. eth0, eth2. Input rate 2*190 kpps clone_skb=1. Routing table of 123946 routes. UP. NAPI gives fairmess between both DoS attackers. :-) But more testing to be done. plain w. DaveM patch ---------------------------------- 72 114 kpps throughput 30271883 12246290 hash misses (second last in my rt_cache_stat) 58% better... and it can be further improved. Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 1964858 9793618 9793618 8035147 16 0 0 0 BRU eth1 1500 0 19 0 0 0 1887577 0 0 0 BRU eth2 1500 0 1964698 9793419 9793419 8035305 3 0 0 0 BRU eth3 1500 0 1 0 0 0 1886904 0 0 0 BRU /proc/net/rt_cache_stat 000004ba 00000e27 003be7ba 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000000 003869c1 00360b4d 00025dcb 00025dca 01cde98b 00000000 With DaveM hash-list limit patch. Input rate 2*190 kpps clone_skb=1 Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 2990462 9680257 9680257 7009542 12 0 0 0 BRU eth1 1500 0 12 0 0 0 2990467 0 0 0 BRU eth2 1500 0 2990460 9673421 9673421 7009544 4 0 0 0 BRU eth3 1500 0 1 0 0 0 2990459 0 0 0 BRU /proc/net/rt_cache_stat 00000000 00000607 005b3cfb 00000000 00000000 00000000 00000000 00000000 00000000 00000002 00000000 005b2cfa 005b2ced 00000008 00000000 00badd12 00000003 Cheers. --ro From ralph@istop.com Tue Jun 10 07:00:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 07:00:42 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AE0O2x007515 for ; Tue, 10 Jun 2003 07:00:25 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id E3ADC374AD; Tue, 10 Jun 2003 09:18:32 -0400 (EDT) Date: Tue, 10 Jun 2003 09:18:37 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: Simon Kirby , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: Re: Route cache performance under stress In-Reply-To: <20030610072444.Q37105@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> <20030610072444.Q37105@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3061 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > As a side note, note that stateless forwarding like BSD patricie tries > is no longer sufficient. Its no longer just looking up a nexthop, dec ttl, > recompute csum that we are optimizing for. It would certainly be sufficient for core routing. If I can have flow manipulation at no extra cost, I'll take it. If it's going to double the horsepower requirements, I don't want it. -Ralph From ralph@istop.com Tue Jun 10 07:33:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 07:33:54 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AEXV2x008077 for ; Tue, 10 Jun 2003 07:33:32 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id 2442638360; Tue, 10 Jun 2003 10:03:29 -0400 (EDT) Date: Tue, 10 Jun 2003 10:03:33 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610091736.V37313@shell.cyberus.ca> Message-ID: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> <20030610061010.Y36963@shell.cyberus.ca> <20030610091736.V37313@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3062 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > > No, it needs to work in the worst case. If some script kiddie can peg my > > CPU with a synflood then there's still a problem. > > > > Lets work on defining "real world". Factor in the script kiddie. "real world" is the worst-case DOS tool available. Synflood tools like juno seem to fit that category. If you think juno is not a good real-world test, then keep pissing people off and you'll find out how real it is. ;-) > > > I have spent many hours investigating peoples problems sshing to their > > > machines only to find out they didnt follow instructions. After the > > > 10th person doing the same thing, what do you expect my reaction to be? > > > > Take 15 minutes and write a web page with the magic settings required to > > make things work. > > > > I have many times. I still do. It is also a thankless task. URL? I've looked at almost everything on your web page since you were involved in the pppoe client software. I haven't seen anything that says how to sprinkle the pixie dust so my router works well. > > No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx > > cards. > > > > Change your NICs. I dont know what else to suggest. Yup. It just takes a bit of time and planning when the box is deployed in a POP 400km away... > > The last time I looked at the zebra list things seemed pretty dead. Most > > of the new work is now happening on the commercial zebra development. > > > > Maybe its time to fork Zebra into something that has the same momentum it > had in the earlier days. Hmmm... maybe we can both bug MCR to try your suggested changes... > > You don't seem to get it. There's at least a dozen things more important > > to me than seeing Linux routing performance compete with Cisco and BSD. > > Again, if you wanna complain about it at the level you are i think its > only fair you help. I actually dont care about CISCO or BSD. We dont win > because someone else looses. We simply want to be the best. You can want to be the best, but I don't think it's fair to sucker people into using Linux as a core router with false claims. > > Now that people like Rob and Dave are taking a hard look at it I think > > it's worth my while to ante up for a couple more rounds. I still fell > > like a sucker that should have walked away from the table a long time ago > > though. > > > > I think your setup maybe the question. Like i said theres probably a > hunderd variables involved. It is up to you to isolate things. > Yes, theres a support line in open source, but it is rewarded more > when people show some effort. Fuck, if you think I haven't put any effort into it already then there's no point in even trying any more. > to be comfortable with what they know best. Most of the people i have > met that put Linux down or consider it a joke come from the old > BSD camp. Its their loss and i dismiss anything they have to say. In my case I would have been better off to dismiss your advice a year ago. How does that help the Linux cause? -Ralph From ralph@istop.com Tue Jun 10 08:29:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:29:19 -0700 (PDT) Received: from smtp.istop.com (dci.doncaster.on.ca [66.11.168.194]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFTD2x008866 for ; Tue, 10 Jun 2003 08:29:13 -0700 Received: from ns.istop.com (ns.istop.com [66.11.168.199]) by smtp.istop.com (Postfix) with ESMTP id CFB8636AAF; Tue, 10 Jun 2003 11:29:12 -0400 (EDT) Date: Tue, 10 Jun 2003 11:29:18 -0400 (EDT) From: Ralph Doncaster Reply-To: ralph+d@istop.com To: Jamal Hadi Cc: Pekka Savola , CIT/Paul , "'Simon Kirby'" , "'David S. Miller'" , "fw@deneb.enyo.de" , "netdev@oss.sgi.com" , "linux-net@vger.kernel.org" Subject: RE: Route cache performance under stress In-Reply-To: <20030610075702.I37165@shell.cyberus.ca> Message-ID: References: <20030610075702.I37165@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3063 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralph@istop.com Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, Jamal Hadi wrote: > Assuming the attacker has a 100mbps link to you, yes ;-> A script kiddie 0wning a box with a FE connection is nothing. During what was probably the worst DOS I got hit with, one of my upstream providers said they were seeing about 600mbps of traffic related to the attack. -Ralph From nakam@linux-ipv6.org Tue Jun 10 08:45:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:45:58 -0700 (PDT) Received: from localhost (p2162-ipbf07hodogaya.kanagawa.ocn.ne.jp [220.104.10.162]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFji2x009551 for ; Tue, 10 Jun 2003 08:45:45 -0700 Received: from localhost ([127.0.0.1]) by localhost with smtp (Exim 3.36 #1 (Debian)) id 19PlEq-000139-00; Wed, 11 Jun 2003 00:40:44 +0900 From: Masahide NAKAMURA To: Henrik Petander Cc: YOSHIFUJI Hideaki , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 Message-Id: <20030611004035.40027642.nakam@linux-ipv6.org> In-Reply-To: <3EE5F85E.9080006@tml.hut.fi> References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> Organization: USAGI Project X-Mailer: Sylpheed version 0.9.0claws (GTK+ 1.2.10; i386-pc-linux-gnu) X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Wed, 11 Jun 2003 00:40:44 +0900 X-archive-position: 3064 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev On Tue, 10 Jun 2003 18:25:18 +0300 Henrik Petander wrote: > Then the policies for mipv6 would need to be specified at the same time > as the ipsec policies. This is not a problem as long as the policies are > loaded at start up. However, this could lead to problems with > applications which specify their own policies, e.g. racoon. How about providing interface of handling templates to update existing policy in kernel? Regards, -- Masahide NAKAMURA From davem@redhat.com Tue Jun 10 08:53:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:53:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFr52x010028 for ; Tue, 10 Jun 2003 08:53:06 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22890; Tue, 10 Jun 2003 08:49:41 -0700 Date: Tue, 10 Jun 2003 08:49:40 -0700 (PDT) Message-Id: <20030610.084940.74727904.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <008001c32eda$56760830$4a00000a@badass> <20030609195652.E35696@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3065 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Mon, 9 Jun 2003 20:32:48 -0400 (EDT) Lastly from the software side Linux doesn't seem to have anything like BSD's parameter to control user/system CPU sharing. Once my CPU load reaches 70-80%, I'd rather have some dropped packets than let the CPU hit 100% and end up with my BGP sessions drop. When packet (more specifically, software interrupt) processing reaches a certain level, we offload the work into process context. From chas@relax.cmf.nrl.navy.mil Tue Jun 10 08:55:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:55:07 -0700 (PDT) Received: from relax.cmf.nrl.navy.mil (relax.cmf.nrl.navy.mil [134.207.10.227]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFt32x010377 for ; Tue, 10 Jun 2003 08:55:03 -0700 Received: (from chas@localhost) by relax.cmf.nrl.navy.mil (8.11.6/8.11.6) id h5AFtbQ00899 for netdev@oss.sgi.com; Tue, 10 Jun 2003 11:55:37 -0400 Date: Tue, 10 Jun 2003 11:55:37 -0400 From: chas williams Message-Id: <200306101555.h5AFtbQ00899@relax.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [RFC] suggest changes cleanup to atm svc/pvc family X-archive-position: 3066 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev i was hoping some people might take a look at the following changes and let me know what they think. ftp://galileo.cmf.nrl.navy.mil/pub/chas/linux-atm/2_5_70_vcc_sklist_diffs a quick summary (since the diff is rather lengthy): - vcc are now in a global list protected by a rw lock (much like other protocol families). this means the atm devices dont hold a list of vcc's. this make some things much easier to write. - a few things where renamed to vcc_XXX from atm_XXX. eventually routines deal with vcc's will be vcc_, svc_, or pvc_. atm devices functions should be called atm_dev_XXX. this makes things a bit easier to read. - vcc are now reference counted properly (or so i think) (this doenst mean all the atm drivers understand this yet. the he driver should do the right thing though, holding a read on vcc sklist lock during recv operations to keep vcc's from prematurely disappearing. - SOCKOPS_WRAP was removed and lock_sock's introduced in the appropriate locations. i might have a missed some. - atm_ioctl was split into vcc_ioctl and atm_dev_ioctl - recvmsg was rewritten to take advantage of some the existing kernel routines that make datagram manipulation so much easier. - sendmsg needs rewritten but the ip components will need to skb_clone so they can skb_set_owner_w on skb's that might already be owned by another socket. right? - changed add_wait_queue to prepare_to_wait and finish_wait. is this the accepted interface? From davem@redhat.com Tue Jun 10 08:57:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:57:04 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFv02x010729 for ; Tue, 10 Jun 2003 08:57:00 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22919; Tue, 10 Jun 2003 08:53:42 -0700 Date: Tue, 10 Jun 2003 08:53:42 -0700 (PDT) Message-Id: <20030610.085342.41654796.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030609204257.L35799@shell.cyberus.ca> References: <20030609195652.E35696@shell.cyberus.ca> <20030609204257.L35799@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3067 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Mon, 9 Jun 2003 21:15:18 -0400 (EDT) Have you tried a different NIC? Not sure how well the 3com is maintained for example. Acutally, the main issue with 3c59x is that it still uses PIO accesses. This basically makes it useless for routing or anything wanting serious latency. Andrew Morton knows this, but he is such a good maintainer that he doesn't want to change over the MEM I/O accesses for fear of breaking something. It's actually a simple change to make if someone wants to spend a few cycles on it, then you can see what kind of performance you'll get with that. From davem@redhat.com Tue Jun 10 08:59:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 08:59:23 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AFxJ2x011097 for ; Tue, 10 Jun 2003 08:59:19 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA22933; Tue, 10 Jun 2003 08:56:01 -0700 Date: Tue, 10 Jun 2003 08:56:00 -0700 (PDT) Message-Id: <20030610.085600.71109220.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610015311.GB23009@netnation.com> References: <20030609195652.E35696@shell.cyberus.ca> <20030610015311.GB23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3068 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 18:53:12 -0700 Your CPU use is quite a bit higher than ours. Yeah, but his faster cpu is all being burnt to a crisp doing PIO accesses to the 3c59x card. I found that once NAPI was happening, userspace seemed to get a fairly decent amount of time. Unfortunately, NAPI won't help him with the current way the 3c59x driver works. It needs to provide a way to use MEM I/O before NAPI would start to be of use to him. From davem@redhat.com Tue Jun 10 09:10:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:10:25 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGAG2x011788 for ; Tue, 10 Jun 2003 09:10:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA22960; Tue, 10 Jun 2003 09:06:56 -0700 Date: Tue, 10 Jun 2003 09:06:56 -0700 (PDT) Message-Id: <20030610.090656.104052471.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: sim@netnation.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610015311.GB23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3069 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Mon, 9 Jun 2003 23:18:45 -0400 (EDT) Even if my time was only worth $500/day, in the past year and a half I spent enough time working on Linux routers to buy a Cisco NPE-G1. :-( Slapping different machines together and mucking with zebra config files is not going to fix the kind of issues you are talking about. It is pure wasted effort. Someone needs to apply brains to the code and improve the algorithms and schemes we use. So far I see approximately 1 person doing something for every 1,000 guys complaining. So shut your yap and open up and editor and some algorithms books and papers. :) If you stop using Linux right now, I won't cry nor will I lose sleep tonight, I've never felt threatened by such things so I wouldn't advise using them to coerce me into somehow "working harder". :) See, I know the reasonable people will stick around and back me up as I continue to improve the code. Becuase I'm actually doing something about the problems. From davem@redhat.com Tue Jun 10 09:13:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:14:01 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGDv2x012407 for ; Tue, 10 Jun 2003 09:13:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA22983; Tue, 10 Jun 2003 09:10:44 -0700 Date: Tue, 10 Jun 2003 09:10:44 -0700 (PDT) Message-Id: <20030610.091044.78724912.davem@redhat.com> To: sim@netnation.com Cc: ralph+d@istop.com, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610043453.GC23009@netnation.com> References: <20030609204257.L35799@shell.cyberus.ca> <20030610043453.GC23009@netnation.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3070 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Simon Kirby Date: Mon, 9 Jun 2003 21:34:53 -0700 Broken parts of the code only get fixed if enough people whine This isn't how I operate... or especially if somebody decides to actually fix it. This is. I hack on something because I want to and it seems interesting to me at the moment. Not because someone is shitting their pants in public about it. :) So, for future reference, you'll get more using honey than vinegar from me :) Franks a lot, David S. Miller davem@redhat.com From bogdan.costescu@iwr.uni-heidelberg.de Tue Jun 10 09:15:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:15:30 -0700 (PDT) Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGFM2x012857 for ; Tue, 10 Jun 2003 09:15:25 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:q2RTacGrgKO+YS81+Qyx5C81bBI5qhL+@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h5AGF7F4014193; Tue, 10 Jun 2003 18:15:08 +0200 (MET DST) Received: from kenzo.iwr.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8) with ESMTP id h5AGF8f0027696; Tue, 10 Jun 2003 18:15:08 +0200 Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8/Submit) with ESMTP id h5AGF72r027692; Tue, 10 Jun 2003 18:15:07 +0200 Date: Tue, 10 Jun 2003 18:15:07 +0200 (CEST) From: Bogdan Costescu To: "David S. Miller" cc: hadi@shell.cyberus.ca, , , , , , Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030610.085342.41654796.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3071 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bogdan.costescu@iwr.uni-heidelberg.de Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > Acutally, the main issue with 3c59x is that it still > uses PIO accesses. This basically makes it useless > for routing or anything wanting serious latency. I did try about 2 years ago and converted the driver to MMIO. I wasn't able to see _any_ kind of improvement and I was using it in parallel computation where latency counts. I have to say though that I wasn't interested at that time in obtaining profiles and such because only the end-user performance was important. > Andrew Morton knows this, ... and knows about my MMIO trial too (mentioned also on vortex-list)... > but he is such a good maintainer that he doesn't want to change over the > MEM I/O accesses for fear of breaking something. Given that the 3c59x driver supports several generations of cards most of them being EOL-ed years ago, it's pretty hard to do such change. If a new driver would be forked that serviced only the latest generations (Cyclone = 905B and Tornado = 905C(X)), switching to MMIO would probably make sense along with lots of others small changes (large MTU/VLAN, polling descriptors, MII-only media selection etc.) and maybe have NAPI in the mix as well... > It's actually a simple change to make if someone wants to > spend a few cycles on it, Not if you include testing in those cycles :-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From lpetande@tml.hut.fi Tue Jun 10 09:17:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:17:17 -0700 (PDT) Received: from smtp-1.hut.fi (root@smtp-1.hut.fi [130.233.228.91]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGH92x013263 for ; Tue, 10 Jun 2003 09:17:12 -0700 Received: from tml.hut.fi (tcs-pc-5.tcs.hut.fi [130.233.215.132]) by smtp-1.hut.fi (8.12.9/8.12.9) with ESMTP id h5AFHFji009028; Tue, 10 Jun 2003 18:17:17 +0300 Message-ID: <3EE5F85E.9080006@tml.hut.fi> Date: Tue, 10 Jun 2003 18:25:18 +0300 From: Henrik Petander User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Masahide NAKAMURA CC: Henrik Petander , YOSHIFUJI Hideaki , vnuorval@tcs.hut.fi, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 References: <20030606223057.41ac1c9d.nakam@linux-ipv6.org> <20030609203659.089b241b.nakam@linux-ipv6.org> In-Reply-To: <20030609203659.089b241b.nakam@linux-ipv6.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-RAVMilter-Version: 8.4.3(snapshot 20030212) (smtp-1.hut.fi) X-DCC-HUTCC-Metrics: smtp-1.hut.fi 1165; Body=11 Fuz1=11 Fuz2=11 X-archive-position: 3072 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lpetande@tml.hut.fi Precedence: bulk X-list: netdev Masahide NAKAMURA wrote: > On Mon, 9 Jun 2003 12:06:35 +0300 (EEST) > Henrik Petander wrote: > > >>On Fri, 6 Jun 2003, Masahide NAKAMURA wrote: >> >>>We don't think we have to change the logic handling policy with >>>the reason because we can treat MIPv6 policy just like IPsec. >>> >>>When we want to apply both MIPv6 and IPsec to the same target, >>>we need one policy that has two or more of templates(e.g. one is >>>MIPv6's template and the other is IPsec's). >> >>Does this also mean that the IPSec and MIPv6 policies and SAs need to be >>configured at the same time or is it possible to add templates to an >>existing policy? > > > Currently no interface to add templates directly to it. Then the policies for mipv6 would need to be specified at the same time as the ipsec policies. This is not a problem as long as the policies are loaded at start up. However, this could lead to problems with applications which specify their own policies, e.g. racoon. Henrik From ak@suse.de Tue Jun 10 09:20:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:20:41 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGKa2x013671 for ; Tue, 10 Jun 2003 09:20:37 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 02B7C142CE; Tue, 10 Jun 2003 18:20:31 +0200 (MEST) Date: Tue, 10 Jun 2003 18:20:29 +0200 From: Andi Kleen To: Bogdan Costescu Cc: "David S. Miller" , hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610162029.GA8168@wotan.suse.de> References: <20030610.085342.41654796.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 3073 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > > > but he is such a good maintainer that he doesn't want to change over the > > MEM I/O accesses for fear of breaking something. > > Given that the 3c59x driver supports several generations of cards most of > them being EOL-ed years ago, it's pretty hard to do such change. If a new > driver would be forked that serviced only the latest generations (Cyclone > = 905B and Tornado = 905C(X)), switching to MMIO would probably make sense > along with lots of others small changes (large MTU/VLAN, polling > descriptors, MII-only media selection etc.) and maybe have NAPI in the mix > as well... Can't you just wrap it in a few macros and offer a config for those who want the best performance and a runtime test for the others? Then switch between PIO and mmio dynamically. Even runtime test should be pretty painless these days, the CPU normally can execute hundreds or even thousands of tests in the time it takes to wait for an mmio or even PIO. > > > It's actually a simple change to make if someone wants to > > spend a few cycles on it, > > Not if you include testing in those cycles :-) Just make it a whitelist + a force module param. -Andi (who has a 3c980 and could do it, but already has too much on his todo list..) From garzik@gtf.org Tue Jun 10 09:23:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:23:50 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGNk2x014074 for ; Tue, 10 Jun 2003 09:23:47 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 48F006641; Tue, 10 Jun 2003 12:23:42 -0400 (EDT) Date: Tue, 10 Jun 2003 12:23:42 -0400 From: Jeff Garzik To: Andi Kleen Cc: Bogdan Costescu , "David S. Miller" , hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610162342.GB1959@gtf.org> References: <20030610.085342.41654796.davem@redhat.com> <20030610162029.GA8168@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610162029.GA8168@wotan.suse.de> User-Agent: Mutt/1.3.28i X-archive-position: 3074 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 06:20:29PM +0200, Andi Kleen wrote: > Can't you just wrap it in a few macros and offer a config for those > who want the best performance and a runtime test for the others? > Then switch between PIO and mmio dynamically. > > Even runtime test should be pretty painless these days, the CPU normally > can execute hundreds or even thousands of tests in the time it takes to > wait for an mmio or even PIO. I prefer a compile-time test. But yes, this is what several other net drivers do: offer a config option for MMIO (or PIO), and the default is MMIO unless that is known to be unsafe on certain boards (which, unfortunately, it is). Jeff From davem@redhat.com Tue Jun 10 09:31:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:31:21 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGVH2x014605 for ; Tue, 10 Jun 2003 09:31:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23117; Tue, 10 Jun 2003 09:27:48 -0700 Date: Tue, 10 Jun 2003 09:27:48 -0700 (PDT) Message-Id: <20030610.092748.115929981.davem@redhat.com> To: chas@cmf.nrl.navy.mil Cc: hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> References: <20030610061010.Y36963@shell.cyberus.ca> <200306101142.h5ABgssG004210@ginger.cmf.nrl.navy.mil> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3075 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: chas williams Date: Tue, 10 Jun 2003 07:41:01 -0400 the bulk (by count) of the traffic seems to be in the 64-95 byte range. Ok, time to deploy ATM everywhere to replace our IP routers :) Sorry Chas, I couldn't resist... :) Reagardless, there are some sites on the net that publish things like BGP tables and traffic samples that people can use to do performance testing on new algorithms. I've read about it in papers by Vern Paxson (he used it to do his Bro thing) and others. I don't have a reference handy, anyone? I think it's called the IPMA project... From davem@redhat.com Tue Jun 10 09:37:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:37:19 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGbF2x015080 for ; Tue, 10 Jun 2003 09:37:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23136; Tue, 10 Jun 2003 09:33:50 -0700 Date: Tue, 10 Jun 2003 09:33:49 -0700 (PDT) Message-Id: <20030610.093349.48511220.davem@redhat.com> To: hadi@shell.cyberus.ca Cc: jsd@monmouth.com, pekkas@netcore.fi, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: <20030610080901.M37190@shell.cyberus.ca> References: <3EE5C7E9.6090401@monmouth.com> <20030610080901.M37190@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3076 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jamal Hadi Date: Tue, 10 Jun 2003 08:12:58 -0400 (EDT) Theres another dimension actually: mostly driven by BSD mbuff style packet allocation; some tests show that some vendors are optimized for certain packet sizes, Linux skbuffs dont have this problem. Well, the most amusing part for me is that if you read all the papers on TCP congestion algorithms you'd think that routers dropped based upon packet sizes since the majority work on multiple of MSS this and multiple of MSS that. :) Routers drop packets, period. They do so using a variety of selection schemes (RED, CBQ, actually just egrep net/sched/sch_*.c :) but you're contribution to the router's work is measured in terms of packets and time when you come right down to it. From davem@redhat.com Tue Jun 10 09:41:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:41:39 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGfZ2x015536 for ; Tue, 10 Jun 2003 09:41:36 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23156; Tue, 10 Jun 2003 09:38:12 -0700 Date: Tue, 10 Jun 2003 09:38:11 -0700 (PDT) Message-Id: <20030610.093811.08342771.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610061010.Y36963@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3077 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) No, as I said I'm moving ~30mbps and ~10kpps in and out of 2 3c905cx cards. This is because the driver still uses PIO, I am rather sure of this. From davem@redhat.com Tue Jun 10 09:42:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:42:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGgk2x015785 for ; Tue, 10 Jun 2003 09:42:46 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23165; Tue, 10 Jun 2003 09:39:27 -0700 Date: Tue, 10 Jun 2003 09:39:27 -0700 (PDT) Message-Id: <20030610.093927.21906828.davem@redhat.com> To: ralph+d@istop.com, ralph@istop.com Cc: hadi@shell.cyberus.ca, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress From: "David S. Miller" In-Reply-To: References: <20030610061010.Y36963@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3078 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Ralph Doncaster Date: Tue, 10 Jun 2003 09:10:43 -0400 (EDT) Yes, and it would be nice if you mentioned in your NAPI docs that people should use a tulip, tg3, or e1000 if they want it to work well. In making your sales pitches for NAPI you made it sound like any high-performance card should do fine (i.e. anything but a Realtek). The problems the 3c59x has is nothing to do with NAPI vs. non-NAPI. You're routing rate is limited by how much time a PIO to the PCI device takes :) From bogdan.costescu@iwr.uni-heidelberg.de Tue Jun 10 09:45:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:45:20 -0700 (PDT) Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGjE2x016234 for ; Tue, 10 Jun 2003 09:45:15 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:q5UrjJhLGKaJ/W/fmsQF9/+ZUycREyip@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h5AGj3F4014940; Tue, 10 Jun 2003 18:45:03 +0200 (MET DST) Received: from kenzo.iwr.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8) with ESMTP id h5AGj3f0027954; Tue, 10 Jun 2003 18:45:03 +0200 Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.12.8/8.12.8/Submit) with ESMTP id h5AGj3Vm027950; Tue, 10 Jun 2003 18:45:03 +0200 Date: Tue, 10 Jun 2003 18:45:03 +0200 (CEST) From: Bogdan Costescu To: "David S. Miller" cc: sim@netnation.com, , , , , , Subject: Re: 3c59x (was Route cache performance under stress) In-Reply-To: <20030610.085600.71109220.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 3079 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bogdan.costescu@iwr.uni-heidelberg.de Precedence: bulk X-list: netdev On Tue, 10 Jun 2003, David S. Miller wrote: > Unfortunately, NAPI won't help him with the current way the 3c59x > driver works. It needs to provide a way to use MEM I/O before NAPI > would start to be of use to him. I don't really want to sound like defending the 3c59x driver, but... The 3c90x driver released by 3Com uses some mechanism "similar" to NAPI which is based on the on-board timer; these timer interrupts are scheduled dynamically. With this driver I would typically get TCP bandwidth figures 4-5 Mbps lower than those obtained with 3c59x and noticable difference in the parallel jobs timing (using MPI over TCP). I'm not saying that NAPI will perform the same way, just that there might be also hardware limits somewhere... But the real question is: does it make sense to spend time now in trying to improve a driver with hope for only a marginal speed increase ? After using these cards and the 3c59x driver with very good results for the past 4 years, I'm looking for GigE replacements. Shouldn't anybody concerned with performance do the same ? Does it make sense to pair a very fast CPU and memory with a 33MHz-32bit PCI bus ? And another important question: how much improvement can be gained from the driver ? Folks that do parallel computation over TCP over Ethernet know very well that the software in the kernel is the bottleneck (extra copies, TCP, IRQ management, etc). Packages that throw away TCP and use another communication protocol can typically achieve much better ping-pong times (they do have some other problems though) which shows that the hardware and NIC driver are capable enough. So until I see a profile showing that the CPU is spending most of the time in the driver, I won't be convinced that these changes are needed.... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From ak@suse.de Tue Jun 10 09:50:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:50:23 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGoJ2x016605 for ; Tue, 10 Jun 2003 09:50:20 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 978631505F; Tue, 10 Jun 2003 18:49:49 +0200 (MEST) Date: Tue, 10 Jun 2003 18:49:49 +0200 From: Andi Kleen To: Bogdan Costescu Cc: "David S. Miller" , sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x (was Route cache performance under stress) Message-ID: <20030610164949.GB13246@wotan.suse.de> References: <20030610.085600.71109220.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 3080 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev > > And another important question: how much improvement can be gained from > the driver ? Folks that do parallel computation over TCP over Ethernet You can play some tricks with the driver to make eth_type_trans disappear from the profiles. This usually helps a lot because it avoids one full "fetch from cache cold memory" roundtrip per packet, which is slow on any CPU. -Andi From davem@redhat.com Tue Jun 10 09:56:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:56:33 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGuT2x017079 for ; Tue, 10 Jun 2003 09:56:29 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23255; Tue, 10 Jun 2003 09:51:36 -0700 Date: Tue, 10 Jun 2003 09:51:35 -0700 (PDT) Message-Id: <20030610.095135.28806569.davem@redhat.com> To: lpetande@tml.hut.fi Cc: nakam@linux-ipv6.org, lpetande@morphine.tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <3EE5F85E.9080006@tml.hut.fi> References: <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3081 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Henrik Petander Date: Tue, 10 Jun 2003 18:25:18 +0300 Then the policies for mipv6 would need to be specified at the same time as the ipsec policies. This is not a problem as long as the policies are loaded at start up. However, this could lead to problems with applications which specify their own policies, e.g. racoon. It is an important point. Ask yourself this, why do we have tunnel devices and don't implement them with cool routing or XFRM rules? We don't do this because as soon as you type "zebra" all your by-hand routes are gone, and as soon as you type "racoon" al your by-hand xfrm rules are gone. If you want to do these things using routes or xfrm rules, you must integrate the creation of them into either zebra or racoon. You cannot have a setup where mipv6d and racoon/zebra fight each other flushing each other's settings. It doesn't work. From chas@locutus.cmf.nrl.navy.mil Tue Jun 10 09:59:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 09:59:15 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AGxC2x017482 for ; Tue, 10 Jun 2003 09:59:13 -0700 Received: from locutus.cmf.nrl.navy.mil (locutus.cmf.nrl.navy.mil [134.207.10.66]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h5AGwtsG008215; Tue, 10 Jun 2003 12:58:55 -0400 (EDT) Message-Id: <200306101658.h5AGwtsG008215@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: Route cache performance under stress In-reply-to: Your message of "Tue, 10 Jun 2003 09:27:48 PDT." <20030610.092748.115929981.davem@redhat.com> X-url: http://www.nrl.navy.mil/CCS/people/chas/index.html X-mailer: nmh 1.0 Date: Tue, 10 Jun 2003 12:57:02 -0400 From: chas williams X-Spam-Score: (*) hits=1.7 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 3082 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev In message <20030610.092748.115929981.davem@redhat.com>,"David S. Miller" write s: >Ok, time to deploy ATM everywhere to replace our IP routers :) >Sorry Chas, I couldn't resist... :) i see a lot of crying about the 'atm tax' but it seems to me that the 'ip tax' is typically much steeper (except when you graph packet_count*packet_size then you will see that the bulk of the data is carried by larger packets were the tax isnt as high). so for some applications, like voice, atm might actually be a winner as far as the tax goes (as long as you arent doing voice over ip over atm) hosestly i needed real numbers to tune the atm driver on our linux-router. i have two recv buffer pools--small and large (duh). i needed an idea of what to use for the small value. From davem@redhat.com Tue Jun 10 10:00:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:01:00 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AH0v2x017881 for ; Tue, 10 Jun 2003 10:00:57 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA23285; Tue, 10 Jun 2003 09:56:07 -0700 Date: Tue, 10 Jun 2003 09:56:06 -0700 (PDT) Message-Id: <20030610.095606.23033683.davem@redhat.com> To: nakam@linux-ipv6.org Cc: lpetande@tml.hut.fi, yoshfuji@linux-ipv6.org, vnuorval@tcs.hut.fi, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, ajtuomin@morphine.tml.hut.fi, jagana@us.ibm.com, kumarkr@us.ibm.com, usagi-core@linux-ipv6.org Subject: Re: [patch]: CONFIG_IPV6_SUBTREES fix for MIPv6 From: "David S. Miller" In-Reply-To: <20030611004035.40027642.nakam@linux-ipv6.org> References: <20030609203659.089b241b.nakam@linux-ipv6.org> <3EE5F85E.9080006@tml.hut.fi> <20030611004035.40027642.nakam@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3083 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Masahide NAKAMURA Date: Wed, 11 Jun 2003 00:40:44 +0900 How about providing interface of handling templates to update existing policy in kernel? Who will manage mipv6 policies? racoon? See my other email on why any other setup simply will not work. From davem@redhat.com Tue Jun 10 10:05:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:05:53 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AH5n2x022527 for ; Tue, 10 Jun 2003 10:05:49 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23322; Tue, 10 Jun 2003 10:02:09 -0700 Date: Tue, 10 Jun 2003 10:02:09 -0700 (PDT) Message-Id: <20030610.100209.70199702.davem@redhat.com> To: jgarzik@pobox.com Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: <20030610162342.GB1959@gtf.org> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3084 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 10 Jun 2003 12:23:42 -0400 I prefer a compile-time test. This means end users don't see the benefit, so I definitely prefer Andi's idea. From davem@redhat.com Tue Jun 10 10:15:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:15:45 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHFe2x023005 for ; Tue, 10 Jun 2003 10:15:41 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA23355; Tue, 10 Jun 2003 10:12:19 -0700 Date: Tue, 10 Jun 2003 10:12:19 -0700 (PDT) Message-Id: <20030610.101219.38691038.davem@redhat.com> To: bogdan.costescu@iwr.uni-heidelberg.de Cc: sim@netnation.com, ralph+d@istop.com, hadi@shell.cyberus.ca, xerox@foonet.net, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x From: "David S. Miller" In-Reply-To: References: <20030610.085600.71109220.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 3085 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Bogdan Costescu Date: Tue, 10 Jun 2003 18:45:03 +0200 (CEST) With this driver I would typically get TCP bandwidth figures 4-5 Mbps lower than those obtained with 3c59x and noticable difference in the parallel jobs timing (using MPI over TCP). I'm not saying that NAPI will perform the same way, just that there might be also hardware limits somewhere... I think it won't, hardware interrupt mitigation schemes have lots of problems that NAPI is more ept to deal with. But the real question is: does it make sense to spend time now in trying to improve a driver with hope for only a marginal speed increase ? People who have the cards care, and I think PIO-->MMIO is more than marginal. You're attempt to get "latency" was ill founded :) Your limits have to do with the wire speed, not all the cpu cycles being eaten by PIO acceses. On a DoS'd router, it's another situation altogether. And another important question: how much improvement can be gained from the driver ? Folks that do parallel computation over TCP over Ethernet know very well that the software in the kernel is the bottleneck (extra copies, TCP, IRQ management, etc). Your lmitations in parallel computation have to do with how TCP behaves more than how TCP is implemented. For starters try: echo "1" >/proc/sys/net/ipv4/tcp_low_latency That's the kind of thing that will help parallel computation folks, not driver hacks. From garzik@gtf.org Tue Jun 10 10:16:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:16:21 -0700 (PDT) Received: from havoc.gtf.org (host-64-213-145-173.atlantasolutions.com [64.213.145.173] (may be forged)) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHGH2x023193 for ; Tue, 10 Jun 2003 10:16:18 -0700 Received: by havoc.gtf.org (Postfix, from userid 500) id 38D3F6641; Tue, 10 Jun 2003 13:16:17 -0400 (EDT) Date: Tue, 10 Jun 2003 13:16:17 -0400 From: Jeff Garzik To: "David S. Miller" Cc: ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610171617.GC1959@gtf.org> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610.100209.70199702.davem@redhat.com> User-Agent: Mutt/1.3.28i X-archive-position: 3086 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > From: Jeff Garzik > Date: Tue, 10 Jun 2003 12:23:42 -0400 > > I prefer a compile-time test. > > This means end users don't see the benefit, so I definitely > prefer Andi's idea. Making every IO a conditional branch? Ug. Jeff From ak@suse.de Tue Jun 10 10:18:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:18:26 -0700 (PDT) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5AHIM2x023942 for ; Tue, 10 Jun 2003 10:18:22 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id ED74814456; Tue, 10 Jun 2003 19:18:16 +0200 (MEST) Date: Tue, 10 Jun 2003 19:18:16 +0200 From: Andi Kleen To: Jeff Garzik Cc: "David S. Miller" , ak@suse.de, bogdan.costescu@iwr.uni-heidelberg.de, hadi@shell.cyberus.ca, ralph+d@istop.com, xerox@foonet.net, sim@netnation.com, fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: 3c59x Message-ID: <20030610171816.GA24640@wotan.suse.de> References: <20030610162029.GA8168@wotan.suse.de> <20030610162342.GB1959@gtf.org> <20030610.100209.70199702.davem@redhat.com> <20030610171617.GC1959@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030610171617.GC1959@gtf.org> X-archive-position: 3088 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Tue, Jun 10, 2003 at 01:16:17PM -0400, Jeff Garzik wrote: > On Tue, Jun 10, 2003 at 10:02:09AM -0700, David S. Miller wrote: > > From: Jeff Garzik > > Date: Tue, 10 Jun 2003 12:23:42 -0400 > > > > I prefer a compile-time test. > > > > This means end users don't see the benefit, so I definitely > > prefer Andi's idea. > > Making every IO a conditional branch? Ug. An IO takes hundreds or even thousands of cycles. The test and branch is completely lost in the noise. I bet you won't be able to measure a difference on any modern CPU. -Andi From davem@redhat.com Tue Jun 10 10:18:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 10 Jun 2003 10:18:21 -0700 (PDT) Received: from