From devnetfs@yahoo.com Sat Feb 1 10:15:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 01 Feb 2003 10:15:18 -0800 (PST) Received: from web20422.mail.yahoo.com (web20410.mail.yahoo.com [66.163.169.98]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h11IF93v021316 for ; Sat, 1 Feb 2003 10:15:09 -0800 Message-ID: <20030201182235.89076.qmail@web20422.mail.yahoo.com> Received: from [143.127.3.10] by web20410.mail.yahoo.com via HTTP; Sat, 01 Feb 2003 10:22:35 PST Date: Sat, 1 Feb 2003 10:22:35 -0800 (PST) From: devnetfs Subject: fragmentation To: linux-net@vger.kernel.org, netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 1643 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: devnetfs@yahoo.com Precedence: bulk X-list: netdev Hello, I have a skb whose data I want to fragment into fixed size chunks with each fragment having its own (fixed size) header. What is the fastest way to do this, with minimal copying of data around? The easiest approach looks to be: 1. allocated a *new* skb of size=header_size+fragement_size 2. *copy* part of data from my original skb into this new pkt 3. setup the header in the fragmented-packet 4. xmit this sk_buff (dev_queue_xmit) What I dont like is the *new* and (extra) *copy* part. As my original skb in in the kernel itself an additional copy (of fragment-size) before xmit seems wasteful to me. Is it possible to allocate buffer-space for just the header and then have pointers into the original skb for the fragement data, avoiding the extra *copy* of the fragment-data. So the question(s) are: [1] can I submit a packet (a sk_buff)to low level device (using dev_queue_xmit()) that has its buffer/data in discontinuous blocks in the kernel -- some kind of iovec. And if the network device supports DMA gather -- this would make the xmit path very fast. right? OR [2] can I submit a chain of sk_buff's that need to be xmited as ONE ether frame to dev_queue_xmit()? OR [2] such a chaining is not very useful and the above approach (steps 1-4) is ok. [btw this data is being generated in the kernel and sent over ether] Any insights or possible solutions would be helpful. Thanks, A. ps: I've just subscribed to netdev, so not sure if I will get the reply to this mail. So please Cc: me the reply -- thanks. __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com From greearb@candelatech.com Sat Feb 1 14:05:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 01 Feb 2003 14:05:39 -0800 (PST) Received: from grok.yi.org (IDENT:oCDAy0/7McZuq1EWWAiGEOouESgdj9O7@dhcp101-dsl-usw4.w-link.net [208.161.125.101]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h11M5V3v024701 for ; Sat, 1 Feb 2003 14:05:31 -0800 Received: from candelatech.com (IDENT:LgAYiQwHH9vPPQTxRNP3V3KQa8az8ErP@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id h11MD1406888 for ; Sat, 1 Feb 2003 14:13:01 -0800 Message-ID: <3E3C466D.7030602@candelatech.com> Date: Sat, 01 Feb 2003 14:13:01 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "'netdev@oss.sgi.com'" Subject: problems achieving decent throughput with latency. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1644 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev I am testing my latency-insertion tool, and I notice that tcp will not use all of the available bandwidth if there is any significant amount of latency on the wire. For example, with 25ms latency in both directions, I see about 8Mbps bi-directional throughput. If I lower that to 15ms, I see 12Mbps bi-directional throughput. I see 27Mbps at 5ms. Here is the /proc/net/tcp output at 5ms latency. machine demo2 13: 050302AC:80EB 070302AC:80EB 01 0005900C:0002012E 01:00000016 00000000 0 0 578943 3 c6628a80 22 4 1 45 -1 machine demo1 11: 070302AC:80EB 050302AC:80EB 01 00010DDB:00000000 01:00000014 00000000 0 0 513094 3 c62c5080 21 4 1 45 -1 Any ideas why it is so slow at the higher latencies? Any other info I can gather to help determine the cause? (UDP does not experience this slowdown, so I believe my latency insertion tool is working as designed, but it's always possible it is to blame...) -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From hona0005@yahoo.fr Mon Feb 3 00:21:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 03 Feb 2003 00:21:56 -0800 (PST) Received: from web13306.mail.yahoo.com (web13306.mail.yahoo.com [216.136.175.42]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h138Ll3v021958 for ; Mon, 3 Feb 2003 00:21:48 -0800 Message-ID: <20030203082920.38177.qmail@web13306.mail.yahoo.com> Received: from [194.94.121.80] by web13306.mail.yahoo.com via HTTP; Mon, 03 Feb 2003 09:29:20 CET Date: Mon, 3 Feb 2003 09:29:20 +0100 (CET) From: =?iso-8859-1?q?nadia=20houziri?= Subject: Re: how to set the configuration EEPROM back to the default setting To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Content-length: 1062 X-archive-position: 1645 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hona0005@yahoo.fr Precedence: bulk X-list: netdev =20 =20 Hello , i have COMBO 3COM900B Networkkarte .=20=20 I had tried a simple cross-linking under Linux SuSe7.2=20 (Kernel 2.4.4.4-GB ) to finish , i use Coaxia cable like=20 a Media type however it did not work, with twisted cable it works very wel= l. Then I have with the 3Com Tool floppy under DOS the following=20 conversion made:like a media type I select Coaxi cable and it have functioned however with the twisted cable do not go. my quations are :=20 1)how can i please Set the configuration EEPROM=20 back=20 to the default setting, using themodule options from=20=20=20=20 http://www.scyld.com/network/vortex.htmlwhat are those stage to follow? i know There is no way to reliably detect a connected 10base2 network=20 cable,=20 2)but how can i to explicitly set the media=20 3)how to change the EEPROM to switch? thanks=20 nadia,houziri yours sincerly --------------------------------- Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en fran=E7ais ! Testez le nouveau Yahoo! Mail [[HTML alternate version deleted]] From bogdan.costescu@iwr.uni-heidelberg.de Mon Feb 3 04:16:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 03 Feb 2003 04:16:56 -0800 (PST) Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h13CGl3v031720 for ; Mon, 3 Feb 2003 04:16:49 -0800 Received: from kenzo.iwr.uni-heidelberg.de (kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id h13COJJ25904; Mon, 3 Feb 2003 13:24:19 +0100 (MET) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.11.6/8.11.6) with ESMTP id h13COJa27900; Mon, 3 Feb 2003 13:24:19 +0100 Date: Mon, 3 Feb 2003 13:24:19 +0100 (CET) From: Bogdan Costescu To: =?iso-8859-1?q?nadia=20houziri?= cc: netdev@oss.sgi.com Subject: Re: how to set the configuration EEPROM back to the default setting In-Reply-To: <20030203082920.38177.qmail@web13306.mail.yahoo.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1646 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bogdan.costescu@iwr.uni-heidelberg.de Precedence: bulk X-list: netdev On Mon, 3 Feb 2003, [iso-8859-1] nadia houziri wrote: > 1)how can i please Set the configuration EEPROM back By using the DOS-based tool that you used to set it to coaxial :-) > 2)but how can i to explicitly set the media The Scyld page that you mentioned exoplains this. IN /etc/modules.conf, you need something like: options 3c59x 3 to select BNC. > 3)how to change the EEPROM to switch? If you mean by this to just change the EEPROM every time you boot to switch to a different media type, then you shouldn't do it ! The EEPROM has a limited amount of writes and at some point you will not be able to write to it anymore. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From emailtintashtml@yahoo.com Mon Feb 3 18:55:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 03 Feb 2003 18:55:24 -0800 (PST) Received: from pentium3. (168-226-107-57.speedy.com.ar [168.226.107.57]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h142t93v006177 for ; Mon, 3 Feb 2003 18:55:10 -0800 x-esmtp: 0 0 1 Message-ID: <117523-220032242492956@pentium3> Reply-To: "TintasInk" From: "TintasInk" To: "netdev@oss.sgi.com" Subject: TINTAS PARA IMPRESORAS Date: Mon, 3 Feb 2003 23:49:02 -0300 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h142t93v006177 X-archive-position: 1647 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tintasink@techniciansupport.com.ar Precedence: bulk X-list: netdev Hola ! Le ofrecemos TINTAS PARA IMPRESORAS CHORRO DE TINTA, para todas las marcas y modelos (HP - EPSON - CANON - LEXMARK). TINTAS PIGMENTADAS - CALIDAD ORIGINAL 1 LITRO DE TINTA NEGRA $ 90.- 1 LITRO DE TINTA COLOR $ 105.- También por 1/2 Litro,a $ 60.- el negro y $ 70 cada color. Y por 1/4 litro a $ 40.- cualquier color, incluído el negro. Todos Precios Netos. Proveemos cargadores e instrucciones sin cargo. RECARGUE HASTA 70 VECES SU CARTUCHO!!!! Envíos al Interior por ContraReembolso. Tiene toda la informacion necesaria en NUESTRO SITIO WEB: www.techniciansupport.com.ar/tintasink CONSULTENOS: POR TELEFONO AL (011) 4666-6733 POR E-MAIL A tintasink@techniciansupport.com.ar O EN NUESTRO SITIO WEB: www.techniciansupport.com.ar/tintasink O POR ICQ: 159631730 Para no recibir más estos mails, por favor, RESPONDA con la palabra REMOVE en el asunto. Todos los pedidos son procesados. Muchas Gracias From mrozhavsky@mrv.com Wed Feb 5 00:04:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 05 Feb 2003 00:05:25 -0800 (PST) Received: from apollo.nbase.co.il (apollo.nbase.co.il [194.90.137.2]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1584b3v030045 for ; Wed, 5 Feb 2003 00:04:38 -0800 Received: from mike.nbase.co.il ([194.90.136.58]) by apollo.nbase.co.il (Post.Office MTA v3.1.2 release (PO205-101c) ID# 0-44418U200L2S100) with ESMTP id AAA638; Wed, 5 Feb 2003 10:17:09 +0200 Received: by mike.nbase.co.il (Postfix, from userid 1000) id B3D7F11881; Wed, 5 Feb 2003 10:12:10 +0200 (IST) Date: Wed, 5 Feb 2003 10:12:10 +0200 From: Michael Rozhavsky To: linux.nics@intel.com Cc: netdev@oss.sgi.com Subject: [PATCH] e100 oops in 2.4.20 Message-ID: <20030205081210.GB10600@mike.nbase.co.il> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="p2kqVDKq5asng8Dg" Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 1648 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mrozhavsky@mrv.com Precedence: bulk X-list: netdev --p2kqVDKq5asng8Dg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, In 2.4.20 with CONFIG_NET_CARRIER_NETLINK compiled when I'm loading e100 module I'm receiving consistent OOPS. The reason for this is e100_update_link_state() on device init calls to netif_carrier_on() or netif_carrier_off() while dev->carrier_task is not initialized yet (will be initialized on register_netdev) and thus it causes NULL pointer exception. [] e100_find_speed_duplex [] e100_auto_neg [] e100_phy_set_speed_duplex [] e100_phy_set_loopback [] e100_hw_init [] e100_init [] e100_found1 I'm not sure that my fix is elegant but it works for me. -- Michael Rozhavsky Senior Software Engineer MRV International Tel: +972 (4) 993-6248 Fax: +972 (4) 989-0564 http://www.mrv.com --p2kqVDKq5asng8Dg Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="e100_phy.c.diff" --- linux-2.4.20/drivers/net/e100/e100_phy.c 2002-11-29 01:53:13.000000000 +0200 +++ linux-2.4.20-test/drivers/net/e100/e100_phy.c 2003-02-05 09:38:07.000000000 +0200 @@ -963,14 +963,16 @@ /* Logical AND PHY link & netif_running */ link = e100_get_link_state(bdp) && netif_running(bdp->device); - - if (link) { - if (!netif_carrier_ok(bdp->device)) - netif_carrier_on(bdp->device); - } else { - if (netif_carrier_ok(bdp->device)) - netif_carrier_off(bdp->device); - } +#ifdef CONFIG_NET_CARRIER_NETLINK + if (bdp->device->carrier_task) +#endif + if (link) { + if (!netif_carrier_ok(bdp->device)) + netif_carrier_on(bdp->device); + } else { + if (netif_carrier_ok(bdp->device)) + netif_carrier_off(bdp->device); + } return link; } --p2kqVDKq5asng8Dg-- From mrozhavsky@mrv.com Wed Feb 5 00:14:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 05 Feb 2003 00:14:39 -0800 (PST) Received: from apollo.nbase.co.il (apollo.nbase.co.il [194.90.137.2]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h158EW3v030533 for ; Wed, 5 Feb 2003 00:14:33 -0800 Received: from mike.nbase.co.il ([194.90.136.58]) by apollo.nbase.co.il (Post.Office MTA v3.1.2 release (PO205-101c) ID# 0-44418U200L2S100) with ESMTP id AAA679; Wed, 5 Feb 2003 10:27:05 +0200 Received: by mike.nbase.co.il (Postfix, from userid 1000) id 51E6511881; Wed, 5 Feb 2003 10:22:06 +0200 (IST) Date: Wed, 5 Feb 2003 10:22:06 +0200 From: Michael Rozhavsky To: linux.nics@intel.com Cc: netdev@oss.sgi.com Subject: Re: [PATCH] e100 oops in 2.4.20 Message-ID: <20030205082206.GC10600@mike.nbase.co.il> References: <20030205081210.GB10600@mike.nbase.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030205081210.GB10600@mike.nbase.co.il> User-Agent: Mutt/1.4i X-archive-position: 1649 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mrozhavsky@mrv.com Precedence: bulk X-list: netdev Please ignore this message. On Wed, Feb 05, 2003 at 10:12:10AM +0200, Michael Rozhavsky wrote: > Hi, > > In 2.4.20 with CONFIG_NET_CARRIER_NETLINK compiled when I'm loading e100 > module I'm receiving consistent OOPS. > > The reason for this is e100_update_link_state() on device init calls to > netif_carrier_on() or netif_carrier_off() while dev->carrier_task is not > initialized yet (will be initialized on register_netdev) and thus it > causes NULL pointer exception. > > [] e100_find_speed_duplex > [] e100_auto_neg > [] e100_phy_set_speed_duplex > [] e100_phy_set_loopback > [] e100_hw_init > [] e100_init > [] e100_found1 > > I'm not sure that my fix is elegant but it works for me. > > > -- > Michael Rozhavsky > Senior Software Engineer > MRV International > Tel: +972 (4) 993-6248 > Fax: +972 (4) 989-0564 > http://www.mrv.com > --- linux-2.4.20/drivers/net/e100/e100_phy.c 2002-11-29 01:53:13.000000000 +0200 > +++ linux-2.4.20-test/drivers/net/e100/e100_phy.c 2003-02-05 09:38:07.000000000 +0200 > @@ -963,14 +963,16 @@ > > /* Logical AND PHY link & netif_running */ > link = e100_get_link_state(bdp) && netif_running(bdp->device); > - > - if (link) { > - if (!netif_carrier_ok(bdp->device)) > - netif_carrier_on(bdp->device); > - } else { > - if (netif_carrier_ok(bdp->device)) > - netif_carrier_off(bdp->device); > - } > +#ifdef CONFIG_NET_CARRIER_NETLINK > + if (bdp->device->carrier_task) > +#endif > + if (link) { > + if (!netif_carrier_ok(bdp->device)) > + netif_carrier_on(bdp->device); > + } else { > + if (netif_carrier_ok(bdp->device)) > + netif_carrier_off(bdp->device); > + } > > return link; > } -- Michael Rozhavsky Senior Software Engineer MRV International Tel: +972 (4) 993-6248 Fax: +972 (4) 989-0564 http://www.mrv.com From ramarayaluv@yahoo.com Wed Feb 5 02:34:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 05 Feb 2003 02:34:22 -0800 (PST) Received: from web10905.mail.yahoo.com (web10905.mail.yahoo.com [216.136.131.41]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h15AYE3v001257 for ; Wed, 5 Feb 2003 02:34:15 -0800 Message-ID: <20030205104156.18004.qmail@web10905.mail.yahoo.com> Received: from [202.142.82.43] by web10905.mail.yahoo.com via HTTP; Wed, 05 Feb 2003 02:41:56 PST Date: Wed, 5 Feb 2003 02:41:56 -0800 (PST) From: ramarayalu vattikuti Subject: i need u r help. To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 1650 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ramarayaluv@yahoo.com Precedence: bulk X-list: netdev Hi All, can we make the device PCI-DPM card which provides the communication between two processing elements as a network device(as IP device). any help is greatly appreciable. thanks rayalu __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com From christopher.leech@intel.com Wed Feb 5 13:03:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 05 Feb 2003 13:03:14 -0800 (PST) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h15L383v005522 for ; Wed, 5 Feb 2003 13:03:09 -0800 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by hermes.fm.intel.com (8.11.6/8.11.6/d: outer.mc,v 1.51 2002/09/23 20:43:23 dmccart Exp $) with ESMTP id h15L7ws27732 for ; Wed, 5 Feb 2003 21:07:58 GMT Received: from fmsmsxvs043.fm.intel.com (fmsmsxvs043.fm.intel.com [132.233.42.129]) by petasus.fm.intel.com (8.11.6/8.11.6/d: inner.mc,v 1.28 2003/01/13 19:44:39 dmccart Exp $) with SMTP id h15L5fi21813 for ; Wed, 5 Feb 2003 21:05:41 GMT Received: from [134.134.177.102] ([134.134.177.102]) by fmsmsxvs043.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003020513085206033 ; Wed, 05 Feb 2003 13:08:52 -0800 Subject: skb_padto and small fragmented transmits From: Chris Leech To: netdev@oss.sgi.com, linux-kernel Content-Type: text/plain Organization: Message-Id: <1044481190.9268.43.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 Date: 05 Feb 2003 13:39:51 -0800 Content-Transfer-Encoding: 7bit X-archive-position: 1651 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: christopher.leech@intel.com Precedence: bulk X-list: netdev While looking at the new software padding routines, something caught my eye in skb_padto. It seemed that the fragmented portion of a packet would actually be counted twice when checking to see if padding is needed, as skb->len already includes the count of skb->data_len. > unsigned int size = skb->len + skb->data_len; I tested this by modifying e1000 to use skb_padto, disabling TCP timestamps, and writing a small app to transmit 4 bytes using sendfile. The resulting packet had 54 bytes of headers, and 4 bytes of data in a separate fragment. Calling skb_padto(skb,60) should have linearized the skb, and zeroed out the first 2 bytes of tailroom. Instead the length was incorrectly calculated as 62 bytes, and the buffer was returned as is. Changing skb_padto to simply use size = skb->len fixed the padding, but then I started seeing incorrect TCP checksums going out. I found this comment in skb_copy_expand that seemed to explain things. > BUG ALERT: ip_summed is not copied. Why does this work? Is it used > only by netfilter in the cases when checksum is recalculated? --ANK So after calling skb_copy_expand the checksum is not recalculated in software, but the checksum offload information is discarded. -- Chris Leech From davem@redhat.com Thu Feb 6 03:08:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 03:08:08 -0800 (PST) Received: from rth.ninka.net (rth.ninka.net [216.101.162.244]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16B803v017371 for ; Thu, 6 Feb 2003 03:08:01 -0800 Received: from rth.ninka.net (localhost.localdomain [127.0.0.1]) by rth.ninka.net (8.12.5/8.12.5) with ESMTP id h16BwElh007707; Thu, 6 Feb 2003 03:58:14 -0800 Received: (from davem@localhost) by rth.ninka.net (8.12.5/8.12.5/Submit) id h16BwETJ007705; Thu, 6 Feb 2003 03:58:14 -0800 X-Authentication-Warning: rth.ninka.net: davem set sender to davem@redhat.com using -f Subject: Re: skb_padto and small fragmented transmits From: "David S. Miller" To: Chris Leech Cc: netdev@oss.sgi.com, linux-kernel In-Reply-To: <1044481190.9268.43.camel@localhost.localdomain> References: <1044481190.9268.43.camel@localhost.localdomain> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 06 Feb 2003 03:58:14 -0800 Message-Id: <1044532694.7679.1.camel@rth.ninka.net> Mime-Version: 1.0 X-archive-position: 1652 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev skb_padto() only works on linear skb. And if you look at all the drivers where it is used, they do not enable things like scatter-gather. From jmorris@intercode.com.au Thu Feb 6 07:03:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:03:59 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16F3n3v025057 for ; Thu, 6 Feb 2003 07:03:51 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02269; Fri, 7 Feb 2003 02:11:10 +1100 Date: Fri, 7 Feb 2003 02:11:09 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: [PATCH] LSM networking update: summary (0/5) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1653 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev The following five patches are an updated version of the LSM (Linux Security Modules) networking support hooks, submitted for inclusion in 2.5 mainline. Since the post last week, the networking hooks have been reworked so that they are more generalized and do not poke as deeply into network protocols. Change summary: o The netdevice, skb and ipv4 hooks are gone. o The sock_queue_rcv_skb() hook has been encapsulated within sk_filter() as suggested by David Miller. o The sk->security field has been removed (use the socket inode field instead, if needed, or infer the value). o The sk_filter() calls for TCPv4 and TCPv6 have been relocated so that they are called before skb->dev is cleared (which also fixes a mainline issue). o An sk_filter() call was added to SCTP. o The default Netlink capability hooks have been inlined so that they do not call out to a module when CONFIG_SECURITY is disabled, per requirements from David Miller. o The Netlink hooks now also cover ip6_queue and xfrm_user. Full diffstat: include/linux/security.h | 429 ++++++++++++++++++++++++++++++++++++++++- include/net/sock.h | 95 ++++++--- net/core/rtnetlink.c | 3 net/decnet/dn_nsp_in.c | 29 +- net/ipv4/netfilter/ip_queue.c | 3 net/ipv4/tcp_ipv4.c | 9 net/ipv4/xfrm_user.c | 3 net/ipv6/netfilter/ip6_queue.c | 6 net/ipv6/tcp_ipv6.c | 15 - net/netlink/af_netlink.c | 8 net/sctp/input.c | 4 net/socket.c | 72 ++++++ net/unix/af_unix.c | 16 + security/Kconfig | 9 security/capability.c | 2 security/dummy.c | 135 ++++++++++++ 16 files changed, 760 insertions(+), 78 deletions(-) - James -- James Morris From jmorris@intercode.com.au Thu Feb 6 07:05:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:05:43 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16F5b3v025433 for ; Thu, 6 Feb 2003 07:05:39 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02281; Fri, 7 Feb 2003 02:13:10 +1100 Date: Fri, 7 Feb 2003 02:13:09 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: [PATCH] LSM networking update: kconfig (1/5) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1654 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev include/linux/security.h | 13 +++++++++---- security/Kconfig | 9 +++++++++ security/dummy.c | 5 +++++ 3 files changed, 23 insertions(+), 4 deletions(-) diff -urN -X dontdiff linux-2.5.59.w0/include/linux/security.h linux-2.5.59.w1/include/linux/security.h --- linux-2.5.59.w0/include/linux/security.h Thu Jan 16 22:51:34 2003 +++ linux-2.5.59.w1/include/linux/security.h Fri Feb 7 01:13:34 2003 @@ -63,16 +63,14 @@ /* setfsuid or setfsgid, id0 == fsuid or fsgid */ #define LSM_SETID_FS 8 - -#ifdef CONFIG_SECURITY - /* forward declares to avoid warnings */ struct sk_buff; -struct net_device; struct nfsctl_arg; struct sched_param; struct swap_info_struct; +#ifdef CONFIG_SECURITY + /** * struct security_operations - main security structure * @@ -952,6 +950,9 @@ struct security_operations *ops); int (*unregister_security) (const char *name, struct security_operations *ops); + +#ifdef CONFIG_SECURITY_NETWORK +#endif /* CONFIG_SECURITY_NETWORK */ }; /* global variables */ @@ -2106,5 +2107,9 @@ #endif /* CONFIG_SECURITY */ +#ifdef CONFIG_SECURITY_NETWORK +#else /* CONFIG_SECURITY_NETWORK */ +#endif /* CONFIG_SECURITY_NETWORK */ + #endif /* ! __LINUX_SECURITY_H */ diff -urN -X dontdiff linux-2.5.59.w0/security/Kconfig linux-2.5.59.w1/security/Kconfig --- linux-2.5.59.w0/security/Kconfig Tue Dec 24 23:31:09 2002 +++ linux-2.5.59.w1/security/Kconfig Fri Feb 7 01:13:34 2003 @@ -15,6 +15,15 @@ If you are unsure how to answer this question, answer N. +config SECURITY_NETWORK + bool "Socket and Networking Security Hooks" + depends on SECURITY + help + This enables the socket and networking security hooks. + If enabled, a security module can use these hooks to + implement socket and networking access controls. + If you are unsure how to answer this question, answer N. + config SECURITY_CAPABILITIES tristate "Default Linux Capabilities" depends on SECURITY!=n diff -urN -X dontdiff linux-2.5.59.w0/security/dummy.c linux-2.5.59.w1/security/dummy.c --- linux-2.5.59.w0/security/dummy.c Thu Jan 16 22:51:35 2003 +++ linux-2.5.59.w1/security/dummy.c Fri Feb 7 01:13:34 2003 @@ -597,6 +597,9 @@ return 0; } +#ifdef CONFIG_SECURITY_NETWORK +#endif /* CONFIG_SECURITY_NETWORK */ + static int dummy_register_security (const char *name, struct security_operations *ops) { return -EINVAL; @@ -725,5 +728,7 @@ set_to_dummy_if_null(ops, sem_semop); set_to_dummy_if_null(ops, register_security); set_to_dummy_if_null(ops, unregister_security); +#ifdef CONFIG_SECURITY_NETWORK +#endif /* CONFIG_SECURITY_NETWORK */ } From jmorris@intercode.com.au Thu Feb 6 07:07:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:07:51 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16F7h3v026002 for ; Thu, 6 Feb 2003 07:07:45 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02307; Fri, 7 Feb 2003 02:15:10 +1100 Date: Fri, 7 Feb 2003 02:15:09 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: [PATCH] LSM networking update: socket.c hooks (2/5) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1655 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev include/linux/security.h | 285 +++++++++++++++++++++++++++++++++++++++++++++++ net/socket.c | 72 +++++++++++ security/dummy.c | 92 +++++++++++++++ 3 files changed, 447 insertions(+), 2 deletions(-) diff -urN -X dontdiff linux-2.5.59.w0/include/linux/security.h linux-2.5.59.w1/include/linux/security.h --- linux-2.5.59.w0/include/linux/security.h Fri Feb 7 01:14:49 2003 +++ linux-2.5.59.w1/include/linux/security.h Fri Feb 7 01:14:59 2003 @@ -64,6 +64,10 @@ #define LSM_SETID_FS 8 /* forward declares to avoid warnings */ +struct sock; +struct socket; +struct sockaddr; +struct msghdr; struct sk_buff; struct nfsctl_arg; struct sched_param; @@ -584,6 +588,103 @@ * is being reparented to the init task. * @p contains the task_struct for the kernel thread. * + * Security hooks for socket operations. + * + * @socket_create: + * Check permissions prior to creating a new socket. + * @family contains the requested protocol family. + * @type contains the requested communications type. + * @protocol contains the requested protocol. + * Return 0 if permission is granted. + * @socket_post_create: + * This hook allows a module to update or allocate a per-socket security + * structure. Note that the security field was not added directly to the + * socket structure, but rather, the socket security information is stored + * in the associated inode. Typically, the inode alloc_security hook will + * allocate and and attach security information to + * sock->inode->i_security. This hook may be used to update the + * sock->inode->i_security field with additional information that wasn't + * available when the inode was allocated. + * @sock contains the newly created socket structure. + * @family contains the requested protocol family. + * @type contains the requested communications type. + * @protocol contains the requested protocol. + * @socket_bind: + * Check permission before socket protocol layer bind operation is + * performed and the socket @sock is bound to the address specified in the + * @address parameter. + * @sock contains the socket structure. + * @address contains the address to bind to. + * @addrlen contains the length of address. + * Return 0 if permission is granted. + * @socket_connect: + * Check permission before socket protocol layer connect operation + * attempts to connect socket @sock to a remote address, @address. + * @sock contains the socket structure. + * @address contains the address of remote endpoint. + * @addrlen contains the length of address. + * Return 0 if permission is granted. + * @socket_listen: + * Check permission before socket protocol layer listen operation. + * @sock contains the socket structure. + * @backlog contains the maximum length for the pending connection queue. + * Return 0 if permission is granted. + * @socket_accept: + * Check permission before accepting a new connection. Note that the new + * socket, @newsock, has been created and some information copied to it, + * but the accept operation has not actually been performed. + * @sock contains the listening socket structure. + * @newsock contains the newly created server socket for connection. + * Return 0 if permission is granted. + * @socket_post_accept: + * This hook allows a security module to copy security + * information into the newly created socket's inode. + * @sock contains the listening socket structure. + * @newsock contains the newly created server socket for connection. + * @socket_sendmsg: + * Check permission before transmitting a message to another socket. + * @sock contains the socket structure. + * @msg contains the message to be transmitted. + * @size contains the size of message. + * Return 0 if permission is granted. + * @socket_recvmsg: + * Check permission before receiving a message from a socket. + * @sock contains the socket structure. + * @msg contains the message structure. + * @size contains the size of message structure. + * @flags contains the operational flags. + * Return 0 if permission is granted. + * @socket_getsockname: + * Check permission before the local address (name) of the socket object + * @sock is retrieved. + * @sock contains the socket structure. + * Return 0 if permission is granted. + * @socket_getpeername: + * Check permission before the remote address (name) of a socket object + * @sock is retrieved. + * @sock contains the socket structure. + * Return 0 if permission is granted. + * @socket_getsockopt: + * Check permissions before retrieving the options associated with socket + * @sock. + * @sock contains the socket structure. + * @level contains the protocol level to retrieve option from. + * @optname contains the name of option to retrieve. + * Return 0 if permission is granted. + * @socket_setsockopt: + * Check permissions before setting the options associated with socket + * @sock. + * @sock contains the socket structure. + * @level contains the protocol level to set options for. + * @optname contains the name of the option to set. + * Return 0 if permission is granted. + * @socket_shutdown: + * Checks permission before all or part of a connection on the socket + * @sock is shut down. + * @sock contains the socket structure. + * @how contains the flag indicating how future sends and receives are handled. + * Return 0 if permission is granted. + * * Security hooks affecting all System V IPC operations. * * @ipc_permission: @@ -952,6 +1053,26 @@ struct security_operations *ops); #ifdef CONFIG_SECURITY_NETWORK + int (*socket_create) (int family, int type, int protocol); + void (*socket_post_create) (struct socket * sock, int family, + int type, int protocol); + int (*socket_bind) (struct socket * sock, + struct sockaddr * address, int addrlen); + int (*socket_connect) (struct socket * sock, + struct sockaddr * address, int addrlen); + int (*socket_listen) (struct socket * sock, int backlog); + int (*socket_accept) (struct socket * sock, struct socket * newsock); + void (*socket_post_accept) (struct socket * sock, + struct socket * newsock); + int (*socket_sendmsg) (struct socket * sock, + struct msghdr * msg, int size); + int (*socket_recvmsg) (struct socket * sock, + struct msghdr * msg, int size, int flags); + int (*socket_getsockname) (struct socket * sock); + int (*socket_getpeername) (struct socket * sock); + int (*socket_getsockopt) (struct socket * sock, int level, int optname); + int (*socket_setsockopt) (struct socket * sock, int level, int optname); + int (*socket_shutdown) (struct socket * sock, int how); #endif /* CONFIG_SECURITY_NETWORK */ }; @@ -2108,7 +2229,171 @@ #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK +static inline int security_socket_create (int family, int type, int protocol) +{ + return security_ops->socket_create(family, type, protocol); +} + +static inline void security_socket_post_create(struct socket * sock, + int family, + int type, + int protocol) +{ + security_ops->socket_post_create(sock, family, type, protocol); +} + +static inline int security_socket_bind(struct socket * sock, + struct sockaddr * address, + int addrlen) +{ + return security_ops->socket_bind(sock, address, addrlen); +} + +static inline int security_socket_connect(struct socket * sock, + struct sockaddr * address, + int addrlen) +{ + return security_ops->socket_connect(sock, address, addrlen); +} + +static inline int security_socket_listen(struct socket * sock, int backlog) +{ + return security_ops->socket_listen(sock, backlog); +} + +static inline int security_socket_accept(struct socket * sock, + struct socket * newsock) +{ + return security_ops->socket_accept(sock, newsock); +} + +static inline void security_socket_post_accept(struct socket * sock, + struct socket * newsock) +{ + security_ops->socket_post_accept(sock, newsock); +} + +static inline int security_socket_sendmsg(struct socket * sock, + struct msghdr * msg, int size) +{ + return security_ops->socket_sendmsg(sock, msg, size); +} + +static inline int security_socket_recvmsg(struct socket * sock, + struct msghdr * msg, int size, + int flags) +{ + return security_ops->socket_recvmsg(sock, msg, size, flags); +} + +static inline int security_socket_getsockname(struct socket * sock) +{ + return security_ops->socket_getsockname(sock); +} + +static inline int security_socket_getpeername(struct socket * sock) +{ + return security_ops->socket_getpeername(sock); +} + +static inline int security_socket_getsockopt(struct socket * sock, + int level, int optname) +{ + return security_ops->socket_getsockopt(sock, level, optname); +} + +static inline int security_socket_setsockopt(struct socket * sock, + int level, int optname) +{ + return security_ops->socket_setsockopt(sock, level, optname); +} + +static inline int security_socket_shutdown(struct socket * sock, int how) +{ + return security_ops->socket_shutdown(sock, how); +} #else /* CONFIG_SECURITY_NETWORK */ +static inline int security_socket_create (int family, int type, int protocol) +{ + return 0; +} + +static inline void security_socket_post_create(struct socket * sock, + int family, + int type, + int protocol) +{ +} + +static inline int security_socket_bind(struct socket * sock, + struct sockaddr * address, + int addrlen) +{ + return 0; +} + +static inline int security_socket_connect(struct socket * sock, + struct sockaddr * address, + int addrlen) +{ + return 0; +} + +static inline int security_socket_listen(struct socket * sock, int backlog) +{ + return 0; +} + +static inline int security_socket_accept(struct socket * sock, + struct socket * newsock) +{ + return 0; +} + +static inline void security_socket_post_accept(struct socket * sock, + struct socket * newsock) +{ +} + +static inline int security_socket_sendmsg(struct socket * sock, + struct msghdr * msg, int size) +{ + return 0; +} + +static inline int security_socket_recvmsg(struct socket * sock, + struct msghdr * msg, int size, + int flags) +{ + return 0; +} + +static inline int security_socket_getsockname(struct socket * sock) +{ + return 0; +} + +static inline int security_socket_getpeername(struct socket * sock) +{ + return 0; +} + +static inline int security_socket_getsockopt(struct socket * sock, + int level, int optname) +{ + return 0; +} + +static inline int security_socket_setsockopt(struct socket * sock, + int level, int optname) +{ + return 0; +} + +static inline int security_socket_shutdown(struct socket * sock, int how) +{ + return 0; +} #endif /* CONFIG_SECURITY_NETWORK */ #endif /* ! __LINUX_SECURITY_H */ diff -urN -X dontdiff linux-2.5.59.w0/net/socket.c linux-2.5.59.w1/net/socket.c --- linux-2.5.59.w0/net/socket.c Thu Jan 9 16:08:27 2003 +++ linux-2.5.59.w1/net/socket.c Fri Feb 7 01:14:59 2003 @@ -77,6 +77,7 @@ #include #include #include +#include #if defined(CONFIG_KMOD) && defined(CONFIG_NET) #include @@ -527,6 +528,10 @@ si->msg = msg; si->size = size; + err = security_socket_sendmsg(sock, msg, size); + if (err) + return err; + err = scm_send(sock, msg, si->scm); if (err >= 0) { err = sock->ops->sendmsg(iocb, sock, msg, size, si->scm); @@ -551,6 +556,7 @@ int __sock_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, int size, int flags) { + int err; struct sock_iocb *si = kiocb_to_siocb(iocb); si->sock = sock; @@ -560,6 +566,10 @@ si->size = size; si->flags = flags; + err = security_socket_recvmsg(sock, msg, size, flags); + if (err) + return err; + memset(si->scm, 0, sizeof(*si->scm)); size = sock->ops->recvmsg(iocb, sock, msg, size, flags, si->scm); @@ -963,6 +973,7 @@ int sock_create(int family, int type, int protocol, struct socket **res) { int i; + int err; struct socket *sock; /* @@ -986,6 +997,10 @@ } family = PF_PACKET; } + + err = security_socket_create(family, type, protocol); + if (err) + return err; #if defined(CONFIG_KMOD) && defined(CONFIG_NET) /* Attempt to load a protocol module if the find failed. @@ -1031,6 +1046,7 @@ } *res = sock; + security_socket_post_create(sock, family, type, protocol); out: net_family_read_unlock(); @@ -1141,8 +1157,14 @@ if((sock = sockfd_lookup(fd,&err))!=NULL) { - if((err=move_addr_to_kernel(umyaddr,addrlen,address))>=0) + if((err=move_addr_to_kernel(umyaddr,addrlen,address))>=0) { + err = security_socket_bind(sock, (struct sockaddr *)address, addrlen); + if (err) { + sockfd_put(sock); + return err; + } err = sock->ops->bind(sock, (struct sockaddr *)address, addrlen); + } sockfd_put(sock); } return err; @@ -1163,6 +1185,13 @@ if ((sock = sockfd_lookup(fd, &err)) != NULL) { if ((unsigned) backlog > SOMAXCONN) backlog = SOMAXCONN; + + err = security_socket_listen(sock, backlog); + if (err) { + sockfd_put(sock); + return err; + } + err=sock->ops->listen(sock, backlog); sockfd_put(sock); } @@ -1199,6 +1228,10 @@ newsock->type = sock->type; newsock->ops = sock->ops; + err = security_socket_accept(sock, newsock); + if (err) + goto out_release; + err = sock->ops->accept(sock, newsock, sock->file->f_flags); if (err < 0) goto out_release; @@ -1218,6 +1251,8 @@ if ((err = sock_map_fd(newsock)) < 0) goto out_release; + security_socket_post_accept(sock, newsock); + out_put: sockfd_put(sock); out: @@ -1253,6 +1288,11 @@ err = move_addr_to_kernel(uservaddr, addrlen, address); if (err < 0) goto out_put; + + err = security_socket_connect(sock, (struct sockaddr *)address, addrlen); + if (err) + goto out_put; + err = sock->ops->connect(sock, (struct sockaddr *) address, addrlen, sock->file->f_flags); out_put: @@ -1275,6 +1315,11 @@ sock = sockfd_lookup(fd, &err); if (!sock) goto out; + + err = security_socket_getsockname(sock); + if (err) + goto out_put; + err = sock->ops->getname(sock, (struct sockaddr *)address, &len, 0); if (err) goto out_put; @@ -1299,6 +1344,12 @@ if ((sock = sockfd_lookup(fd, &err))!=NULL) { + err = security_socket_getpeername(sock); + if (err) { + sockfd_put(sock); + return err; + } + err = sock->ops->getname(sock, (struct sockaddr *)address, &len, 1); if (!err) err=move_addr_to_user(address,len, usockaddr, usockaddr_len); @@ -1427,6 +1478,12 @@ if ((sock = sockfd_lookup(fd, &err))!=NULL) { + err = security_socket_setsockopt(sock,level,optname); + if (err) { + sockfd_put(sock); + return err; + } + if (level == SOL_SOCKET) err=sock_setsockopt(sock,level,optname,optval,optlen); else @@ -1448,6 +1505,13 @@ if ((sock = sockfd_lookup(fd, &err))!=NULL) { + err = security_socket_getsockopt(sock, level, + optname); + if (err) { + sockfd_put(sock); + return err; + } + if (level == SOL_SOCKET) err=sock_getsockopt(sock,level,optname,optval,optlen); else @@ -1469,6 +1533,12 @@ if ((sock = sockfd_lookup(fd, &err))!=NULL) { + err = security_socket_shutdown(sock, how); + if (err) { + sockfd_put(sock); + return err; + } + err=sock->ops->shutdown(sock, how); sockfd_put(sock); } diff -urN -X dontdiff linux-2.5.59.w0/security/dummy.c linux-2.5.59.w1/security/dummy.c --- linux-2.5.59.w0/security/dummy.c Fri Feb 7 01:14:49 2003 +++ linux-2.5.59.w1/security/dummy.c Fri Feb 7 01:14:59 2003 @@ -20,7 +20,7 @@ #include #include #include - +#include static int dummy_ptrace (struct task_struct *parent, struct task_struct *child) { @@ -598,6 +598,82 @@ } #ifdef CONFIG_SECURITY_NETWORK +static int dummy_socket_create (int family, int type, int protocol) +{ + return 0; +} + +static void dummy_socket_post_create (struct socket *sock, int family, int type, + int protocol) +{ + return; +} + +static int dummy_socket_bind (struct socket *sock, struct sockaddr *address, + int addrlen) +{ + return 0; +} + +static int dummy_socket_connect (struct socket *sock, struct sockaddr *address, + int addrlen) +{ + return 0; +} + +static int dummy_socket_listen (struct socket *sock, int backlog) +{ + return 0; +} + +static int dummy_socket_accept (struct socket *sock, struct socket *newsock) +{ + return 0; +} + +static void dummy_socket_post_accept (struct socket *sock, + struct socket *newsock) +{ + return; +} + +static int dummy_socket_sendmsg (struct socket *sock, struct msghdr *msg, + int size) +{ + return 0; +} + +static int dummy_socket_recvmsg (struct socket *sock, struct msghdr *msg, + int size, int flags) +{ + return 0; +} + +static int dummy_socket_getsockname (struct socket *sock) +{ + return 0; +} + +static int dummy_socket_getpeername (struct socket *sock) +{ + return 0; +} + +static int dummy_socket_setsockopt (struct socket *sock, int level, int optname) +{ + return 0; +} + +static int dummy_socket_getsockopt (struct socket *sock, int level, int optname) +{ + return 0; +} + +static int dummy_socket_shutdown (struct socket *sock, int how) +{ + return 0; +} + #endif /* CONFIG_SECURITY_NETWORK */ static int dummy_register_security (const char *name, struct security_operations *ops) @@ -729,6 +805,20 @@ set_to_dummy_if_null(ops, register_security); set_to_dummy_if_null(ops, unregister_security); #ifdef CONFIG_SECURITY_NETWORK + set_to_dummy_if_null(ops, socket_create); + set_to_dummy_if_null(ops, socket_post_create); + set_to_dummy_if_null(ops, socket_bind); + set_to_dummy_if_null(ops, socket_connect); + set_to_dummy_if_null(ops, socket_listen); + set_to_dummy_if_null(ops, socket_accept); + set_to_dummy_if_null(ops, socket_post_accept); + set_to_dummy_if_null(ops, socket_sendmsg); + set_to_dummy_if_null(ops, socket_recvmsg); + set_to_dummy_if_null(ops, socket_getsockname); + set_to_dummy_if_null(ops, socket_getpeername); + set_to_dummy_if_null(ops, socket_setsockopt); + set_to_dummy_if_null(ops, socket_getsockopt); + set_to_dummy_if_null(ops, socket_shutdown); #endif /* CONFIG_SECURITY_NETWORK */ } From jmorris@intercode.com.au Thu Feb 6 07:10:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:10:19 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16FAC3v026572 for ; Thu, 6 Feb 2003 07:10:14 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02348; Fri, 7 Feb 2003 02:17:39 +1100 Date: Fri, 7 Feb 2003 02:17:39 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: [PATCH] LSM networking update: socket_sock_rcv_skb() hook (3/5) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1656 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev include/linux/security.h | 18 ++++++++ include/net/sock.h | 95 +++++++++++++++++++++++++++++++---------------- net/decnet/dn_nsp_in.c | 29 +++++--------- net/ipv4/tcp_ipv4.c | 9 +--- net/ipv6/tcp_ipv6.c | 15 +++---- net/sctp/input.c | 4 + security/dummy.c | 5 ++ 7 files changed, 112 insertions(+), 63 deletions(-) diff -urN -X dontdiff linux-2.5.59.w0/include/linux/security.h linux-2.5.59.w1/include/linux/security.h --- linux-2.5.59.w0/include/linux/security.h Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/include/linux/security.h Fri Feb 7 01:16:22 2003 @@ -684,6 +684,12 @@ * @sock contains the socket structure. * @how contains the flag indicating how future sends and receives are handled. * Return 0 if permission is granted. + * @socket_sock_rcv_skb: + * Check permissions on incoming network packets. This hook is distinct + * from Netfilter's IP input hooks since it is the first time that the + * incoming sk_buff @skb has been associated with a particular socket, @sk. + * @sk contains the sock (not socket) associated with the incoming sk_buff. + * @skb contains the incoming network data. * * Security hooks affecting all System V IPC operations. * @@ -1073,6 +1079,7 @@ int (*socket_getsockopt) (struct socket * sock, int level, int optname); int (*socket_setsockopt) (struct socket * sock, int level, int optname); int (*socket_shutdown) (struct socket * sock, int how); + int (*socket_sock_rcv_skb) (struct sock * sk, struct sk_buff * skb); #endif /* CONFIG_SECURITY_NETWORK */ }; @@ -2312,6 +2319,12 @@ { return security_ops->socket_shutdown(sock, how); } + +static inline int security_sock_rcv_skb (struct sock * sk, + struct sk_buff * skb) +{ + return security_ops->socket_sock_rcv_skb (sk, skb); +} #else /* CONFIG_SECURITY_NETWORK */ static inline int security_socket_create (int family, int type, int protocol) { @@ -2394,6 +2407,11 @@ { return 0; } +static inline int security_sock_rcv_skb (struct sock * sk, + struct sk_buff * skb) +{ + return 0; +} #endif /* CONFIG_SECURITY_NETWORK */ #endif /* ! __LINUX_SECURITY_H */ diff -urN -X dontdiff linux-2.5.59.w0/include/net/sock.h linux-2.5.59.w1/include/net/sock.h --- linux-2.5.59.w0/include/net/sock.h Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/include/net/sock.h Fri Feb 7 01:18:37 2003 @@ -44,6 +44,7 @@ #include #include /* struct sk_buff */ +#include #ifdef CONFIG_FILTER #include @@ -458,28 +459,45 @@ #ifdef CONFIG_FILTER /** - * sk_filter - run a packet through a socket filter + * __sk_filter - run a packet through a socket filter + * @sk: sock associated with &sk_buff * @skb: buffer to filter - * @filter: filter to apply + * @needlock: set to 1 if the sock is not locked by caller. * * Run the filter code and then cut skb->data to correct size returned by * sk_run_filter. If pkt_len is 0 we toss packet. If skb->len is smaller * than pkt_len we keep whole skb->data. This is the socket level * wrapper to sk_run_filter. It returns 0 if the packet should - * be accepted or 1 if the packet should be tossed. + * be accepted or -EPERM if the packet should be tossed. + * + * This function should not be called directly, use sk_filter instead + * to ensure that the LSM security check is also performed. */ - -static inline int sk_filter(struct sk_buff *skb, struct sk_filter *filter) + +static inline int __sk_filter(struct sock *sk, struct sk_buff *skb, int needlock) { - int pkt_len; + int err = 0; - pkt_len = sk_run_filter(skb, filter->insns, filter->len); - if(!pkt_len) - return 1; /* Toss Packet */ - else - skb_trim(skb, pkt_len); + if (sk->filter) { + struct sk_filter *filter; + + if (needlock) + bh_lock_sock(sk); + + filter = sk->filter; + if (filter) { + int pkt_len = sk_run_filter(skb, filter->insns, + filter->len); + if (!pkt_len) + err = -EPERM; + else + skb_trim(skb, pkt_len); + } - return 0; + if (needlock) + bh_unlock_sock(sk); + } + return err; } /** @@ -506,8 +524,26 @@ atomic_add(sk_filter_len(fp), &sk->omem_alloc); } +#else + +static inline int __sk_filter(struct sock *sk, struct sk_buff *skb, int needlock) +{ + return 0; +} + #endif /* CONFIG_FILTER */ +static inline int sk_filter(struct sock *sk, struct sk_buff *skb, int needlock) +{ + int err; + + err = security_sock_rcv_skb(sk, skb); + if (err) + return err; + + return __sk_filter(sk, skb, needlock); +} + /* * Socket reference counting postulates. * @@ -712,36 +748,31 @@ static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { + int err = 0; + /* Cast skb->rcvbuf to unsigned... It's pointless, but reduces number of warnings when compiling with -W --ANK */ - if (atomic_read(&sk->rmem_alloc) + skb->truesize >= (unsigned)sk->rcvbuf) - return -ENOMEM; - -#ifdef CONFIG_FILTER - if (sk->filter) { - int err = 0; - struct sk_filter *filter; - - /* It would be deadlock, if sock_queue_rcv_skb is used - with socket lock! We assume that users of this - function are lock free. - */ - bh_lock_sock(sk); - if ((filter = sk->filter) != NULL && sk_filter(skb, filter)) - err = -EPERM; - bh_unlock_sock(sk); - if (err) - return err; /* Toss packet */ + if (atomic_read(&sk->rmem_alloc) + skb->truesize >= (unsigned)sk->rcvbuf) { + err = -ENOMEM; + goto out; } -#endif /* CONFIG_FILTER */ + + /* It would be deadlock, if sock_queue_rcv_skb is used + with socket lock! We assume that users of this + function are lock free. + */ + err = sk_filter(sk, skb, 1); + if (err) + goto out; skb->dev = NULL; skb_set_owner_r(skb, sk); skb_queue_tail(&sk->receive_queue, skb); if (!sk->dead) sk->data_ready(sk,skb->len); - return 0; +out: + return err; } static inline int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb) diff -urN -X dontdiff linux-2.5.59.w0/net/decnet/dn_nsp_in.c linux-2.5.59.w1/net/decnet/dn_nsp_in.c --- linux-2.5.59.w0/net/decnet/dn_nsp_in.c Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/net/decnet/dn_nsp_in.c Fri Feb 7 01:16:22 2003 @@ -566,26 +566,19 @@ */ static __inline__ int dn_queue_skb(struct sock *sk, struct sk_buff *skb, int sig, struct sk_buff_head *queue) { -#ifdef CONFIG_FILTER - struct sk_filter *filter; -#endif - + int err; + /* Cast skb->rcvbuf to unsigned... It's pointless, but reduces number of warnings when compiling with -W --ANK */ - if (atomic_read(&sk->rmem_alloc) + skb->truesize >= (unsigned)sk->rcvbuf -) - return -ENOMEM; - -#ifdef CONFIG_FILTER - if (sk->filter) { - int err = 0; - if ((filter = sk->filter) != NULL && sk_filter(skb, sk->filter)) - err = -EPERM; /* Toss packet */ - if (err) - return err; + if (atomic_read(&sk->rmem_alloc) + skb->truesize >= (unsigned)sk->rcvbuf) { + err = -ENOMEM; + goto out; } -#endif /* CONFIG_FILTER */ + + err = sk_filter(sk, skb, 0); + if (err) + goto out; skb_set_owner_r(skb, sk); skb_queue_tail(queue, skb); @@ -603,8 +596,8 @@ (sig == SIGURG) ? POLL_PRI : POLL_IN); } read_unlock(&sk->callback_lock); - - return 0; +out: + return err; } static void dn_nsp_otherdata(struct sock *sk, struct sk_buff *skb) diff -urN -X dontdiff linux-2.5.59.w0/net/ipv4/tcp_ipv4.c linux-2.5.59.w1/net/ipv4/tcp_ipv4.c --- linux-2.5.59.w0/net/ipv4/tcp_ipv4.c Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/net/ipv4/tcp_ipv4.c Fri Feb 7 01:16:22 2003 @@ -1696,12 +1696,6 @@ */ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) { -#ifdef CONFIG_FILTER - struct sk_filter *filter = sk->filter; - if (filter && sk_filter(skb, filter)) - goto discard; -#endif /* CONFIG_FILTER */ - if (sk->state == TCP_ESTABLISHED) { /* Fast path */ TCP_CHECK_TIMER(sk); if (tcp_rcv_established(sk, skb, skb->h.th, skb->len)) @@ -1804,6 +1798,9 @@ if (!xfrm_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; + if (sk_filter(sk, skb, 0)) + goto discard_and_relse; + skb->dev = NULL; bh_lock_sock(sk); diff -urN -X dontdiff linux-2.5.59.w0/net/ipv6/tcp_ipv6.c linux-2.5.59.w1/net/ipv6/tcp_ipv6.c --- linux-2.5.59.w0/net/ipv6/tcp_ipv6.c Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/net/ipv6/tcp_ipv6.c Fri Feb 7 01:16:22 2003 @@ -1470,9 +1470,6 @@ { struct ipv6_pinfo *np = inet6_sk(sk); struct tcp_opt *tp; -#ifdef CONFIG_FILTER - struct sk_filter *filter; -#endif struct sk_buff *opt_skb = NULL; /* Imagine: socket is IPv6. IPv4 packet arrives, @@ -1486,11 +1483,8 @@ if (skb->protocol == htons(ETH_P_IP)) return tcp_v4_do_rcv(sk, skb); -#ifdef CONFIG_FILTER - filter = sk->filter; - if (filter && sk_filter(skb, filter)) + if (sk_filter(sk, skb, 0)) goto discard; -#endif /* CONFIG_FILTER */ /* * socket locking is here for SMP purposes as backlog rcv @@ -1641,6 +1635,9 @@ if(sk->state == TCP_TIME_WAIT) goto do_time_wait; + if (sk_filter(sk, skb, 0)) + goto discard_and_relse; + skb->dev = NULL; bh_lock_sock(sk); @@ -1672,6 +1669,10 @@ kfree_skb(skb); return 0; +discard_and_relse: + sock_put(sk); + goto discard_it; + do_time_wait: if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(TcpInErrs); diff -urN -X dontdiff linux-2.5.59.w0/net/sctp/input.c linux-2.5.59.w1/net/sctp/input.c --- linux-2.5.59.w0/net/sctp/input.c Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/net/sctp/input.c Fri Feb 7 01:27:49 2003 @@ -159,6 +159,10 @@ if (!xfrm_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_release; + ret = sk_filter(sk, skb, 1); + if (ret) + goto discard_release; + /* Create an SCTP packet structure. */ chunk = sctp_chunkify(skb, asoc, sk); if (!chunk) { diff -urN -X dontdiff linux-2.5.59.w0/security/dummy.c linux-2.5.59.w1/security/dummy.c --- linux-2.5.59.w0/security/dummy.c Fri Feb 7 01:29:51 2003 +++ linux-2.5.59.w1/security/dummy.c Fri Feb 7 01:16:22 2003 @@ -674,6 +674,10 @@ return 0; } +static int dummy_socket_sock_rcv_skb (struct sock *sk, struct sk_buff *skb) +{ + return 0; +} #endif /* CONFIG_SECURITY_NETWORK */ static int dummy_register_security (const char *name, struct security_operations *ops) @@ -819,6 +823,7 @@ set_to_dummy_if_null(ops, socket_setsockopt); set_to_dummy_if_null(ops, socket_getsockopt); set_to_dummy_if_null(ops, socket_shutdown); + set_to_dummy_if_null(ops, socket_sock_rcv_skb); #endif /* CONFIG_SECURITY_NETWORK */ } From jmorris@intercode.com.au Thu Feb 6 07:13:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:13:37 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16FDW3v027182 for ; Thu, 6 Feb 2003 07:13:33 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02397; Fri, 7 Feb 2003 02:21:02 +1100 Date: Fri, 7 Feb 2003 02:21:02 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: [PATCH] LSM networking update: af_unix hooks (4/5) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1657 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev include/linux/security.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++ net/unix/af_unix.c | 16 +++++++++++++ security/dummy.c | 15 ++++++++++++ 3 files changed, 87 insertions(+) diff -urN -X dontdiff linux-2.5.59.w0/include/linux/security.h linux-2.5.59.w1/include/linux/security.h --- linux-2.5.59.w0/include/linux/security.h Fri Feb 7 01:32:30 2003 +++ linux-2.5.59.w1/include/linux/security.h Fri Feb 7 01:33:34 2003 @@ -588,6 +588,31 @@ * is being reparented to the init task. * @p contains the task_struct for the kernel thread. * + * Security hooks for Unix domain networking. + * + * @unix_stream_connect: + * Check permissions before establishing a Unix domain stream connection + * between @sock and @other. + * @sock contains the socket structure. + * @other contains the peer socket structure. + * Return 0 if permission is granted. + * @unix_may_send: + * Check permissions before connecting or sending datagrams from @sock to + * @other. + * @sock contains the socket structure. + * @sock contains the peer socket structure. + * Return 0 if permission is granted. + * + * The @unix_stream_connect and @unix_may_send hooks were necessary because + * Linux provides an alternative to the conventional file name space for Unix + * domain sockets. Whereas binding and connecting to sockets in the file name + * space is mediated by the typical file permissions (and caught by the mknod + * and permission hooks in inode_security_ops), binding and connecting to + * sockets in the abstract name space is completely unmediated. Sufficient + * control of Unix domain sockets in the abstract name space isn't possible + * using only the socket layer hooks, since we need to know the actual target + * socket, which is not looked up until we are inside the af_unix code. + * * Security hooks for socket operations. * * @socket_create: @@ -1059,6 +1084,10 @@ struct security_operations *ops); #ifdef CONFIG_SECURITY_NETWORK + int (*unix_stream_connect) (struct socket * sock, + struct socket * other, struct sock * newsk); + int (*unix_may_send) (struct socket * sock, struct socket * other); + int (*socket_create) (int family, int type, int protocol); void (*socket_post_create) (struct socket * sock, int family, int type, int protocol); @@ -2236,6 +2265,20 @@ #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK +static inline int security_unix_stream_connect(struct socket * sock, + struct socket * other, + struct sock * newsk) +{ + return security_ops->unix_stream_connect(sock, other, newsk); +} + + +static inline int security_unix_may_send(struct socket * sock, + struct socket * other) +{ + return security_ops->unix_may_send(sock, other); +} + static inline int security_socket_create (int family, int type, int protocol) { return security_ops->socket_create(family, type, protocol); @@ -2326,6 +2369,19 @@ return security_ops->socket_sock_rcv_skb (sk, skb); } #else /* CONFIG_SECURITY_NETWORK */ +static inline int security_unix_stream_connect(struct socket * sock, + struct socket * other, + struct sock * newsk) +{ + return 0; +} + +static inline int security_unix_may_send(struct socket * sock, + struct socket * other) +{ + return 0; +} + static inline int security_socket_create (int family, int type, int protocol) { return 0; diff -urN -X dontdiff linux-2.5.59.w0/net/unix/af_unix.c linux-2.5.59.w1/net/unix/af_unix.c --- linux-2.5.59.w0/net/unix/af_unix.c Sat Jan 11 10:47:20 2003 +++ linux-2.5.59.w1/net/unix/af_unix.c Fri Feb 7 01:33:34 2003 @@ -115,6 +115,7 @@ #include #include #include +#include int sysctl_unix_max_dgram_qlen = 10; @@ -816,6 +817,11 @@ err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; + + err = security_unix_may_send(sk->socket, other->socket); + if (err) + goto out_unlock; + } else { /* * 1003.1g breaking connected state with AF_UNSPEC @@ -981,6 +987,12 @@ goto restart; } + err = security_unix_stream_connect(sock, other->socket, newsk); + if (err) { + unix_state_wunlock(sk); + goto out_unlock; + } + /* The way is open! Fastly set all the necessary fields... */ sock_hold(sk); @@ -1280,6 +1292,10 @@ if (other->shutdown&RCV_SHUTDOWN) goto out_unlock; + err = security_unix_may_send(sk->socket, other->socket); + if (err) + goto out_unlock; + if (unix_peer(other) != sk && skb_queue_len(&other->receive_queue) > other->max_ack_backlog) { if (!timeo) { diff -urN -X dontdiff linux-2.5.59.w0/security/dummy.c linux-2.5.59.w1/security/dummy.c --- linux-2.5.59.w0/security/dummy.c Fri Feb 7 01:32:30 2003 +++ linux-2.5.59.w1/security/dummy.c Fri Feb 7 01:33:34 2003 @@ -598,6 +598,19 @@ } #ifdef CONFIG_SECURITY_NETWORK +static int dummy_unix_stream_connect (struct socket *sock, + struct socket *other, + struct sock *newsk) +{ + return 0; +} + +static int dummy_unix_may_send (struct socket *sock, + struct socket *other) +{ + return 0; +} + static int dummy_socket_create (int family, int type, int protocol) { return 0; @@ -809,6 +822,8 @@ set_to_dummy_if_null(ops, register_security); set_to_dummy_if_null(ops, unregister_security); #ifdef CONFIG_SECURITY_NETWORK + set_to_dummy_if_null(ops, unix_stream_connect); + set_to_dummy_if_null(ops, unix_may_send); set_to_dummy_if_null(ops, socket_create); set_to_dummy_if_null(ops, socket_post_create); set_to_dummy_if_null(ops, socket_bind); From jmorris@intercode.com.au Thu Feb 6 07:16:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 07:16:36 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16FGT3v027668 for ; Thu, 6 Feb 2003 07:16:31 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id CAA02452; Fri, 7 Feb 2003 02:23:58 +1100 Date: Fri, 7 Feb 2003 02:23:58 +1100 (EST) From: James Morris To: "David S. Miller" , cc: linux-security-module@wirex.com, Subject: Re: [PATCH] LSM networking update: netlink hooks (5/5) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1658 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev include/linux/security.h | 65 +++++++++++++++++++++++++++++++++++++---- net/core/rtnetlink.c | 3 + net/ipv4/netfilter/ip_queue.c | 3 + net/ipv4/xfrm_user.c | 3 + net/ipv6/netfilter/ip6_queue.c | 6 +-- net/netlink/af_netlink.c | 8 ++++- security/capability.c | 2 + security/dummy.c | 18 +++++++++++ 8 files changed, 95 insertions(+), 13 deletions(-) diff -urN -X dontdiff linux-2.5.59.w0/include/linux/security.h linux-2.5.59.w1/include/linux/security.h --- linux-2.5.59.w0/include/linux/security.h Fri Feb 7 01:34:42 2003 +++ linux-2.5.59.w1/include/linux/security.h Fri Feb 7 01:34:49 2003 @@ -31,7 +31,8 @@ #include #include #include - +#include +#include /* * These functions are in security/capability.c and are used @@ -48,6 +49,20 @@ extern void cap_task_kmod_set_label (void); extern void cap_task_reparent_to_init (struct task_struct *p); +static inline int cap_netlink_send (struct sk_buff *skb) +{ + NETLINK_CB (skb).eff_cap = current->cap_effective; + return 0; +} + +static inline int cap_netlink_recv (struct sk_buff *skb) +{ + if (!cap_raised (NETLINK_CB (skb).eff_cap, CAP_NET_ADMIN)) + return -EPERM; + return 0; +} + + /* * Values used in the task_security_ops calls */ @@ -64,11 +79,6 @@ #define LSM_SETID_FS 8 /* forward declares to avoid warnings */ -struct sock; -struct socket; -struct sockaddr; -struct msghdr; -struct sk_buff; struct nfsctl_arg; struct sched_param; struct swap_info_struct; @@ -588,6 +598,21 @@ * is being reparented to the init task. * @p contains the task_struct for the kernel thread. * + * Security hooks for Netlink messaging. + * + * @netlink_send: + * Save security information for a netlink message so that permission + * checking can be performed when the message is processed. The security + * information can be saved using the eff_cap field of the + * netlink_skb_parms structure. + * @skb contains the sk_buff structure for the netlink message. + * Return 0 if the information was successfully saved. + * @netlink_recv: + * Check permission before processing the received netlink message in + * @skb. + * @skb contains the sk_buff structure for the netlink message. + * Return 0 if permission is granted. + * * Security hooks for Unix domain networking. * * @unix_stream_connect: @@ -1077,6 +1102,9 @@ int (*sem_semop) (struct sem_array * sma, struct sembuf * sops, unsigned nsops, int alter); + int (*netlink_send) (struct sk_buff * skb); + int (*netlink_recv) (struct sk_buff * skb); + /* allow module stacking */ int (*register_security) (const char *name, struct security_operations *ops); @@ -1701,6 +1729,16 @@ return security_ops->sem_semop(sma, sops, nsops, alter); } +static inline int security_netlink_send(struct sk_buff * skb) +{ + return security_ops->netlink_send(skb); +} + +static inline int security_netlink_recv(struct sk_buff * skb) +{ + return security_ops->netlink_recv(skb); +} + /* prototypes */ extern int security_scaffolding_startup (void); extern int register_security (struct security_operations *ops); @@ -2262,6 +2300,21 @@ return 0; } +/* + * The netlink capability defaults need to be used inline by default + * (rather than hooking into the capability module) to reduce overhead + * in the networking code. + */ +static inline int security_netlink_send (struct sk_buff *skb) +{ + return cap_netlink_send (skb); +} + +static inline int security_netlink_recv (struct sk_buff *skb) +{ + return cap_netlink_recv (skb); +} + #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff -urN -X dontdiff linux-2.5.59.w0/net/core/rtnetlink.c linux-2.5.59.w1/net/core/rtnetlink.c --- linux-2.5.59.w0/net/core/rtnetlink.c Fri Jan 17 19:46:08 2003 +++ linux-2.5.59.w1/net/core/rtnetlink.c Fri Feb 7 01:34:49 2003 @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -363,7 +364,7 @@ sz_idx = type>>2; kind = type&3; - if (kind != 2 && !cap_raised(NETLINK_CB(skb).eff_cap, CAP_NET_ADMIN)) { + if (kind != 2 && security_netlink_recv(skb)) { *errp = -EPERM; return -1; } diff -urN -X dontdiff linux-2.5.59.w0/net/ipv4/netfilter/ip_queue.c linux-2.5.59.w1/net/ipv4/netfilter/ip_queue.c --- linux-2.5.59.w0/net/ipv4/netfilter/ip_queue.c Sun Aug 11 12:20:40 2002 +++ linux-2.5.59.w1/net/ipv4/netfilter/ip_queue.c Fri Feb 7 01:34:49 2003 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -496,7 +497,7 @@ if (type <= IPQM_BASE) return; - if(!cap_raised(NETLINK_CB(skb).eff_cap, CAP_NET_ADMIN)) + if (security_netlink_recv(skb)) RCV_SKB_FAIL(-EPERM); write_lock_bh(&queue_lock); diff -urN -X dontdiff linux-2.5.59.w0/net/ipv4/xfrm_user.c linux-2.5.59.w1/net/ipv4/xfrm_user.c --- linux-2.5.59.w0/net/ipv4/xfrm_user.c Sun Nov 24 12:27:57 2002 +++ linux-2.5.59.w1/net/ipv4/xfrm_user.c Fri Feb 7 01:34:49 2003 @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -772,7 +773,7 @@ link = &xfrm_dispatch[type]; /* All operations require privileges, even GET */ - if (!cap_raised(NETLINK_CB(skb).eff_cap, CAP_NET_ADMIN)) { + if (security_netlink_recv(skb)) { *errp = -EPERM; return -1; } diff -urN -X dontdiff linux-2.5.59.w0/net/ipv6/netfilter/ip6_queue.c linux-2.5.59.w1/net/ipv6/netfilter/ip6_queue.c --- linux-2.5.59.w0/net/ipv6/netfilter/ip6_queue.c Wed Oct 9 22:39:39 2002 +++ linux-2.5.59.w1/net/ipv6/netfilter/ip6_queue.c Fri Feb 7 01:34:49 2003 @@ -538,10 +538,10 @@ if (type <= IPQM_BASE) return; - - if(!cap_raised(NETLINK_CB(skb).eff_cap, CAP_NET_ADMIN)) - RCV_SKB_FAIL(-EPERM); + if (security_netlink_recv(skb)) + RCV_SKB_FAIL(-EPERM); + write_lock_bh(&queue_lock); if (peer_pid) { diff -urN -X dontdiff linux-2.5.59.w0/net/netlink/af_netlink.c linux-2.5.59.w1/net/netlink/af_netlink.c --- linux-2.5.59.w0/net/netlink/af_netlink.c Tue Dec 10 15:02:03 2002 +++ linux-2.5.59.w1/net/netlink/af_netlink.c Fri Feb 7 01:34:49 2003 @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -636,7 +637,12 @@ check them, when this message will be delivered to corresponding kernel module. --ANK (980802) */ - NETLINK_CB(skb).eff_cap = current->cap_effective; + + err = security_netlink_send(skb); + if (err) { + kfree_skb(skb); + goto out; + } err = -EFAULT; if (memcpy_fromiovec(skb_put(skb,len), msg->msg_iov, len)) { diff -urN -X dontdiff linux-2.5.59.w0/security/capability.c linux-2.5.59.w1/security/capability.c --- linux-2.5.59.w0/security/capability.c Tue Dec 10 15:02:03 2002 +++ linux-2.5.59.w1/security/capability.c Fri Feb 7 01:34:49 2003 @@ -286,6 +286,8 @@ .capset_check = cap_capset_check, .capset_set = cap_capset_set, .capable = cap_capable, + .netlink_send = cap_netlink_send, + .netlink_recv = cap_netlink_recv, .bprm_compute_creds = cap_bprm_compute_creds, .bprm_set_security = cap_bprm_set_security, diff -urN -X dontdiff linux-2.5.59.w0/security/dummy.c linux-2.5.59.w1/security/dummy.c --- linux-2.5.59.w0/security/dummy.c Fri Feb 7 01:34:42 2003 +++ linux-2.5.59.w1/security/dummy.c Fri Feb 7 01:34:49 2003 @@ -597,6 +597,22 @@ return 0; } +static int dummy_netlink_send (struct sk_buff *skb) +{ + if (current->euid == 0) + cap_raise (NETLINK_CB (skb).eff_cap, CAP_NET_ADMIN); + else + NETLINK_CB (skb).eff_cap = 0; + return 0; +} + +static int dummy_netlink_recv (struct sk_buff *skb) +{ + if (!cap_raised (NETLINK_CB (skb).eff_cap, CAP_NET_ADMIN)) + return -EPERM; + return 0; +} + #ifdef CONFIG_SECURITY_NETWORK static int dummy_unix_stream_connect (struct socket *sock, struct socket *other, @@ -819,6 +835,8 @@ set_to_dummy_if_null(ops, sem_associate); set_to_dummy_if_null(ops, sem_semctl); set_to_dummy_if_null(ops, sem_semop); + set_to_dummy_if_null(ops, netlink_send); + set_to_dummy_if_null(ops, netlink_recv); set_to_dummy_if_null(ops, register_security); set_to_dummy_if_null(ops, unregister_security); #ifdef CONFIG_SECURITY_NETWORK From christopher.leech@intel.com Thu Feb 6 10:46:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 10:46:16 -0800 (PST) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16Ik53v010331 for ; Thu, 6 Feb 2003 10:46:06 -0800 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by caduceus.fm.intel.com (8.11.6/8.11.6/d: outer.mc,v 1.51 2002/09/23 20:43:23 dmccart Exp $) with ESMTP id h16IlvA18304 for ; Thu, 6 Feb 2003 18:47:57 GMT Received: from fmsmsxvs042.fm.intel.com (fmsmsxvs042.fm.intel.com [132.233.42.128]) by talaria.fm.intel.com (8.11.6/8.11.6/d: inner.mc,v 1.28 2003/01/13 19:44:39 dmccart Exp $) with SMTP id h16Itd228488 for ; Thu, 6 Feb 2003 18:55:39 GMT Received: from [134.134.177.102] ([134.134.177.102]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003020610555817533 ; Thu, 06 Feb 2003 10:55:58 -0800 Subject: Re: skb_padto and small fragmented transmits From: Chris Leech To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-kernel In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1044559370.4620.36.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 Date: 06 Feb 2003 11:22:51 -0800 Content-Transfer-Encoding: 7bit X-archive-position: 1659 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: christopher.leech@intel.com Precedence: bulk X-list: netdev On Thu, 2003-02-06 at 03:58, David S. Miller wrote: > skb_padto() only works on linear skb. The result is always a linear skb. Given that skb_padto() takes into account data_len (incorrectly, but still) and skb_pad() contains a comment about non-linear skb always having zero tailroom, it certainly looks like these were written with the attempt to work for non-linear buffers. I fail to see how the statement "skb->len + skb->data_len" has any usable meaning, or how it can be anything other than a bug. The checksum issue I mentioned is not as clear. I haven't looked at all the callers of skb_copy_expand() and copy_skb_header() to see what effect copying ip_summed in one of those calls might have elsewhere. > And if you look at all the drivers where it is used, they > do not enable things like scatter-gather. So because the problem is not currently exposed, it's acceptable for the code to be incorrect? -- Chris diff -aur a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h 2003-01-13 12:45:20.000000000 -0800 +++ b/include/linux/skbuff.h 2003-02-05 12:25:38.000000000 -0800 @@ -1102,7 +1102,7 @@ static inline struct sk_buff *skb_padto(struct sk_buff *skb, unsigned int len) { - unsigned int size = skb->len + skb->data_len; + unsigned int size = skb->len; if (likely(size >= len)) return skb; return skb_pad(skb, len-size); From davem@redhat.com Thu Feb 6 10:50:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 10:50:07 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16Io43v010718 for ; Thu, 6 Feb 2003 10:50:05 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id KAA11892; Thu, 6 Feb 2003 10:44:24 -0800 Date: Thu, 06 Feb 2003 10:44:24 -0800 (PST) Message-Id: <20030206.104424.39167597.davem@redhat.com> To: christopher.leech@intel.com Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: skb_padto and small fragmented transmits From: "David S. Miller" In-Reply-To: <1044559370.4620.36.camel@localhost.localdomain> References: <1044559370.4620.36.camel@localhost.localdomain> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1660 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Chris Leech Date: 06 Feb 2003 11:22:51 -0800 I fail to see how the statement "skb->len + skb->data_len" has any usable meaning, or how it can be anything other than a bug. This equation is the standard way to find the full length on any skb. For linear skbs, data_len is always zero. I asked Alan to use this formula so that greps on the source tree would always show data_len being taken into account, and thus usage would be consistent. From christopher.leech@intel.com Thu Feb 6 11:13:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 11:13:56 -0800 (PST) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16JDl3v011305 for ; Thu, 6 Feb 2003 11:13:48 -0800 Received: from petasus.fm.intel.com (petasus.fm.intel.com [10.1.192.37]) by hermes.fm.intel.com (8.11.6/8.11.6/d: outer.mc,v 1.51 2002/09/23 20:43:23 dmccart Exp $) with ESMTP id h16JIf429142 for ; Thu, 6 Feb 2003 19:18:41 GMT Received: from fmsmsxvs041.fm.intel.com (fmsmsxvs041.fm.intel.com [132.233.42.126]) by petasus.fm.intel.com (8.11.6/8.11.6/d: inner.mc,v 1.28 2003/01/13 19:44:39 dmccart Exp $) with SMTP id h16JGOi22641 for ; Thu, 6 Feb 2003 19:16:24 GMT Received: from [134.134.177.102] ([134.134.177.102]) by fmsmsxvs041.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003020611192926389 ; Thu, 06 Feb 2003 11:19:29 -0800 Subject: Re: skb_padto and small fragmented transmits From: Chris Leech To: "David S. Miller" Cc: netdev@oss.sgi.com, linux-kernel In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1044559328.4618.54.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 Date: 06 Feb 2003 11:22:08 -0800 Content-Transfer-Encoding: 7bit X-archive-position: 1661 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: christopher.leech@intel.com Precedence: bulk X-list: netdev On Thu, 2003-02-06 at 10:44, David S. Miller wrote: > From: Chris Leech > Date: 06 Feb 2003 11:22:51 -0800 > > I fail to see how the statement "skb->len + skb->data_len" has any > usable meaning, or how it can be anything other than a bug. > > This equation is the standard way to find the full length > on any skb. For linear skbs, data_len is always zero. > > I asked Alan to use this formula so that greps on the source > tree would always show data_len being taken into account, and > thus usage would be consistent. OK, now I'm really getting confused. Every other example I can find in the networking code, and every scatter-gather capable driver, uses skb->len as the full length and skb->len - skb->data_len as the length of the first or linear portion. From davem@redhat.com Thu Feb 6 14:48:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 06 Feb 2003 14:48:54 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h16Mml3v014234 for ; Thu, 6 Feb 2003 14:48:48 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA12361; Thu, 6 Feb 2003 14:43:06 -0800 Date: Thu, 06 Feb 2003 14:43:06 -0800 (PST) Message-Id: <20030206.144306.14966745.davem@redhat.com> To: christopher.leech@intel.com Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: skb_padto and small fragmented transmits From: "David S. Miller" In-Reply-To: <1044559328.4618.54.camel@localhost.localdomain> References: <1044559328.4618.54.camel@localhost.localdomain> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1662 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Chris Leech Date: 06 Feb 2003 11:22:08 -0800 OK, now I'm really getting confused. Every other example I can find in the networking code, and every scatter-gather capable driver, uses skb->len as the full length and skb->len - skb->data_len as the length of the first or linear portion. Indeed, Alan you need to fix the skb_padto stuff to use skb->len, ignore the skb->data_len as skb->len is the full length. Sorry for telling you to do the wrong thing Alan, my bad :) From davem@redhat.com Fri Feb 7 01:44:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 01:44:51 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h179il3v025075 for ; Fri, 7 Feb 2003 01:44:47 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA13254; Fri, 7 Feb 2003 01:38:24 -0800 Date: Fri, 07 Feb 2003 01:38:23 -0800 (PST) Message-Id: <20030207.013823.95513672.davem@redhat.com> To: jmorris@intercode.com.au Cc: kuznet@ms2.inr.ac.ru, linux-security-module@wirex.com, netdev@oss.sgi.com Subject: Re: [PATCH] LSM networking update: netlink hooks (5/5) From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1663 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev All LSM networking patches applied, thanks a lot James. From AHERRMAN@de.ibm.com Fri Feb 7 02:13:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 02:13:32 -0800 (PST) Received: from d12lmsgate-4.de.ibm.com (d12lmsgate-4.de.ibm.com [194.196.100.237]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h17ADO3v025744 for ; Fri, 7 Feb 2003 02:13:25 -0800 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate-4.de.ibm.com (8.12.3/8.12.3) with ESMTP id h17AL6bt040926 for ; Fri, 7 Feb 2003 11:21:08 +0100 Received: from d12ml033.de.ibm.com (d12ml033_cs0 [9.165.223.11]) by d12relay02.de.ibm.com (8.12.3/NCO/VER6.4) with ESMTP id h17AL6Rg256340 for ; Fri, 7 Feb 2003 11:21:06 +0100 To: netdev@oss.sgi.com Subject: X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: "Andreas Herrmann" Date: Fri, 7 Feb 2003 11:21:04 +0100 X-MIMETrack: Serialize by Router on D12ML033/12/M/IBM(Release 5.0.9a |January 7, 2002) at 07/02/2003 11:21:06, Serialize complete at 07/02/2003 11:21:06 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 1664 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: AHERRMAN@de.ibm.com Precedence: bulk X-list: netdev help -- Linux for eServer Development Tel : +49-7031-16-4640 Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE email : aherrman@de.ibm.com From alan@lxorguk.ukuu.org.uk Fri Feb 7 04:27:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 04:28:01 -0800 (PST) Received: from irongate.swansea.linux.org.uk (pc2-cwma1-4-cust86.swan.cable.ntl.com [213.105.254.86]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h17CRr3v010539 for ; Fri, 7 Feb 2003 04:27:54 -0800 Received: from irongate.swansea.linux.org.uk (localhost [127.0.0.1]) by irongate.swansea.linux.org.uk (8.12.6/8.12.6) with ESMTP id h17DXhYU014217; Fri, 7 Feb 2003 13:33:44 GMT Received: (from alan@localhost) by irongate.swansea.linux.org.uk (8.12.6/8.12.6/Submit) id h17DXg1S014215; Fri, 7 Feb 2003 13:33:42 GMT X-Authentication-Warning: irongate.swansea.linux.org.uk: alan set sender to alan@lxorguk.ukuu.org.uk using -f Subject: Re: skb_padto and small fragmented transmits From: Alan Cox To: "David S. Miller" Cc: christopher.leech@intel.com, netdev@oss.sgi.com, Linux Kernel Mailing List In-Reply-To: <20030206.144306.14966745.davem@redhat.com> References: <1044559328.4618.54.camel@localhost.localdomain> <20030206.144306.14966745.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1044624820.14026.7.camel@irongate.swansea.linux.org.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 (1.2.1-2) Date: 07 Feb 2003 13:33:41 +0000 X-archive-position: 1665 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alan@lxorguk.ukuu.org.uk Precedence: bulk X-list: netdev On Thu, 2003-02-06 at 22:43, David S. Miller wrote: > From: Chris Leech > Date: 06 Feb 2003 11:22:08 -0800 > > OK, now I'm really getting confused. Every other example I can find in > the networking code, and every scatter-gather capable driver, uses > skb->len as the full length and skb->len - skb->data_len as the length > of the first or linear portion. > > Indeed, Alan you need to fix the skb_padto stuff to use > skb->len, ignore the skb->data_len as skb->len is the > full length. Dave just fix it next time you touch the code and push it to Marcelo. It doesnt affect the 2.2 backport so that will be ok From Makan.Pourzandi@ericsson.ca Fri Feb 7 08:52:29 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 08:52:33 -0800 (PST) Received: from imr2.ericy.com (imr2.ericy.com [198.24.6.3]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h17GqS3v015477 for ; Fri, 7 Feb 2003 08:52:28 -0800 Received: from mr7.exu.ericsson.se (mr7att.ericy.com [138.85.224.158]) by imr2.ericy.com (8.11.3/8.11.3) with ESMTP id h17Gwrb29766; Fri, 7 Feb 2003 10:58:53 -0600 (CST) Received: from noah.lmc.ericsson.se (noah.lmc.ericsson.se [142.133.1.1]) by mr7.exu.ericsson.se (8.11.3/8.11.3) with ESMTP id h17Gwqq29906; Fri, 7 Feb 2003 10:58:52 -0600 (CST) Received: from EAMMLEX034.lmc.ericsson.se (eammlex034.lmc.ericsson.se [142.133.1.134]) by noah.lmc.ericsson.se (8.11.2/8.9.2) with ESMTP id h17Gwnh24933; Fri, 7 Feb 2003 11:58:50 -0500 (EST) Received: by eammlex034.lmc.ericsson.se with Internet Mail Service (5.5.2655.55) id <1NQLRYZN>; Fri, 7 Feb 2003 11:58:49 -0500 Message-ID: <7B2A7784F4B7F0409947481F3F3FEF8305CC9531@eammlex037.lmc.ericsson.se> From: "Makan Pourzandi (LMC)" To: "'James Morris'" , "David S. Miller" , kuznet@ms2.inr.ac.ru Cc: linux-security-module@wirex.com, netdev@oss.sgi.com Subject: RE: [PATCH] LSM networking update: summary (0/5) Date: Fri, 7 Feb 2003 11:58:49 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2655.55) Content-Type: text/plain; charset="iso-8859-1" X-archive-position: 1666 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Makan.Pourzandi@ericsson.ca Precedence: bulk X-list: netdev Hi all, My comments conecrn the (ip_decode_options, ip_encapsulate and ip_decapsulate) hooks. Even, if James has done much regarding this topic and I'm sure that he knows much more than me about it, I wanted to give my 2 cents on why we should keep these hooks in future releases. Actually, we know that based on FIPS documents (http://csrc.nist.gov/publications/fips/fips188/fips188.ps) we can use ip options for security purposes. I believe for my part that this hook can be useful if used to decode ip options and decide to drop or not the ip packets. I don't believe that this level of control can be achieved using other hooks at socket layer. In DSI project (security in telecom clustered servers, www.sourceforge.net/projects/disec), we use this hook for dropping ip packets from processes in the cluster that do not have privileges to communicate to defined nodes in the cluster. Mainly, this hook is useful because it allows us to have a process-level control over communications from a node to another node. I don't believe that this can be achieved with firewalling rules as this means setting up rules for an ip address (which means for all processes in a node in the cluster). Further more, SSL does not cut it as it is based on developer's will and competence to be used properly. I believe that the use of ip_decode_options by DSI is only one example and its use can be easily extended to various other cases. Regards, Makan Pourzandi ------------------------------------------------------- Makan Pourzandi Ericsson Research Canada http://sourceforge.net/projects/disec/ ------------------------------------------------------- This email does not represent or express the opinions of Ericsson Corporation. > -----Original Message----- > From: James Morris [mailto:jmorris@intercode.com.au] > Sent: Thursday, February 06, 2003 10:11 AM > To: David S. Miller; kuznet@ms2.inr.ac.ru > Cc: linux-security-module@wirex.com; netdev@oss.sgi.com > Subject: [PATCH] LSM networking update: summary (0/5) > > > The following five patches are an updated version of the LSM (Linux > Security Modules) networking support hooks, submitted for > inclusion in 2.5 > mainline. > > Since the post last week, the networking hooks have been > reworked so that > they are more generalized and do not poke as deeply into network > protocols. > > Change summary: > > o The netdevice, skb and ipv4 hooks are gone. > > o The sock_queue_rcv_skb() hook has been encapsulated within > sk_filter() as suggested by David Miller. > > o The sk->security field has been removed (use the socket > inode field > instead, if needed, or infer the value). > > o The sk_filter() calls for TCPv4 and TCPv6 have been > relocated so that > they are called before skb->dev is cleared (which also fixes a > mainline issue). > > o An sk_filter() call was added to SCTP. > > o The default Netlink capability hooks have been inlined so > that they do > not call out to a module when CONFIG_SECURITY is disabled, per > requirements from David Miller. > > o The Netlink hooks now also cover ip6_queue and xfrm_user. > > > Full diffstat: > > include/linux/security.h | 429 > ++++++++++++++++++++++++++++++++++++++++- > include/net/sock.h | 95 ++++++--- > net/core/rtnetlink.c | 3 > net/decnet/dn_nsp_in.c | 29 +- > net/ipv4/netfilter/ip_queue.c | 3 > net/ipv4/tcp_ipv4.c | 9 > net/ipv4/xfrm_user.c | 3 > net/ipv6/netfilter/ip6_queue.c | 6 > net/ipv6/tcp_ipv6.c | 15 - > net/netlink/af_netlink.c | 8 > net/sctp/input.c | 4 > net/socket.c | 72 ++++++ > net/unix/af_unix.c | 16 + > security/Kconfig | 9 > security/capability.c | 2 > security/dummy.c | 135 ++++++++++++ > 16 files changed, 760 insertions(+), 78 deletions(-) > > > - James > -- > James Morris > > > > _______________________________________________ > linux-security-module mailing list > linux-security-module@wirex.com > http://mail.wirex.com/mailman/listinfo/linux-security-module > From sri@us.ibm.com Fri Feb 7 09:46:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 09:46:43 -0800 (PST) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h17Hka3v016455 for ; Fri, 7 Feb 2003 09:46:37 -0800 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e1.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h17HsMEs049402 for ; Fri, 7 Feb 2003 12:54:22 -0500 Received: from dyn9-47-18-86.beaverton.ibm.com (dyn9-47-18-86.beaverton.ibm.com [9.47.18.86]) by northrelay01.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h17HsJAn120004 for ; Fri, 7 Feb 2003 12:54:19 -0500 Date: Fri, 7 Feb 2003 09:55:35 -0800 (PST) From: Sridhar Samudrala X-X-Sender: To: Subject: [PATCH] cut'n' paste bug in fold_field() Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1667 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sri@us.ibm.com Precedence: bulk X-list: netdev Attached is a small patch for a cut'n'paste bug in linux 2.5.59 net/ipv4/proc.c:fold_field() This causes some /proc/net counters that are incremented in BH to be counted twice and the counters incremented in user context not counted. --- proc.c Tue Jan 21 17:18:52 2003 +++ proc.c.new Fri Feb 7 09:39:34 2003 @@ -99,7 +99,7 @@ *((unsigned long *) (((void *) per_cpu_ptr(mib[0], i)) + sizeof (unsigned long) * nr)); res += - *((unsigned long *) (((void *) per_cpu_ptr(mib[0], i)) + + *((unsigned long *) (((void *) per_cpu_ptr(mib[1], i)) + sizeof (unsigned long) * nr)); } return res; The same routine(fold_field()) is duplicated in net/ipv6/proc.c, but it doesn't have the bug. Could we convert this to a static inline and move to a header file to avoid the duplication and 'sctp' too can use it. Thanks Sridhar From jmorris@intercode.com.au Fri Feb 7 15:03:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 07 Feb 2003 15:03:25 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h17N3E3v020142 for ; Fri, 7 Feb 2003 15:03:16 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id KAA09485; Sat, 8 Feb 2003 10:10:44 +1100 Date: Sat, 8 Feb 2003 10:10:44 +1100 (EST) From: James Morris To: "Makan Pourzandi (LMC)" cc: "David S. Miller" , , , Subject: RE: [PATCH] LSM networking update: summary (0/5) In-Reply-To: <7B2A7784F4B7F0409947481F3F3FEF8305CC9531@eammlex037.lmc.ericsson.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1668 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Fri, 7 Feb 2003, Makan Pourzandi (LMC) wrote: > Hi all, > > My comments conecrn the (ip_decode_options, ip_encapsulate and > ip_decapsulate) hooks. Even, if James has done much regarding this topic > and I'm sure that he knows much more than me about it, I wanted to give > my 2 cents on why we should keep these hooks in future releases. > As mentioned during the last week, the current set of network hooks will not directly support explicitly labeled networking. It's not just the ip hooks: you'd also need the skb and possibly other rejected hooks to make it useful. Possibilities moving forward include reworking the design of the relevant LSM frameork components so that they are acceptable to the network maintainers in a future kernel release cycle, and investigating other schemes such as implicit labeling (e.g. Ajaya Chitturi's work on the Flask project). - James -- James Morris From davem@redhat.com Sat Feb 8 00:30:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 08 Feb 2003 00:30:47 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h188Uc3v025539 for ; Sat, 8 Feb 2003 00:30:39 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA14945; Sat, 8 Feb 2003 00:24:08 -0800 Date: Sat, 08 Feb 2003 00:24:08 -0800 (PST) Message-Id: <20030208.002408.52174985.davem@redhat.com> To: Makan.Pourzandi@ericsson.ca Cc: jmorris@intercode.com.au, kuznet@ms2.inr.ac.ru, linux-security-module@wirex.com, netdev@oss.sgi.com Subject: Re: [PATCH] LSM networking update: summary (0/5) From: "David S. Miller" In-Reply-To: <7B2A7784F4B7F0409947481F3F3FEF8305CC9531@eammlex037.lmc.ericsson.se> References: <7B2A7784F4B7F0409947481F3F3FEF8305CC9531@eammlex037.lmc.ericsson.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1669 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: "Makan Pourzandi (LMC)" Date: Fri, 7 Feb 2003 11:58:49 -0500 Actually, we know that based on FIPS documents (http://csrc.nist.gov/publications/fips/fips188/fips188.ps) we can use ip options for security purposes. I believe for my part that this hook can be useful if used to decode ip options and decide to drop or not the ip packets. I don't believe that this level of control can be achieved using other hooks at socket layer. James added a hook for SKB reception, you can do whatever you want in analzying incoming packet contents using that generic hook. From davem@redhat.com Sat Feb 8 00:51:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 08 Feb 2003 00:51:12 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h188p93v026126 for ; Sat, 8 Feb 2003 00:51:10 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA14994; Sat, 8 Feb 2003 00:44:41 -0800 Date: Sat, 08 Feb 2003 00:44:40 -0800 (PST) Message-Id: <20030208.004440.91323755.davem@redhat.com> To: jmorris@intercode.com.au Cc: Makan.Pourzandi@ericsson.ca, kuznet@ms2.inr.ac.ru, linux-security-module@wirex.com, netdev@oss.sgi.com Subject: Re: [PATCH] LSM networking update: summary (0/5) From: "David S. Miller" In-Reply-To: References: <7B2A7784F4B7F0409947481F3F3FEF8305CC9531@eammlex037.lmc.ericsson.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1670 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Sat, 8 Feb 2003 10:10:44 +1100 (EST) As mentioned during the last week, the current set of network hooks will not directly support explicitly labeled networking. Why not? I thought we had completely established that anything the socket receive SKB hook could not handle would be implementable via netfilter. From davem@redhat.com Sat Feb 8 00:59:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 08 Feb 2003 00:59:03 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h188x03v026617 for ; Sat, 8 Feb 2003 00:59:00 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA15049; Sat, 8 Feb 2003 00:53:03 -0800 Date: Sat, 08 Feb 2003 00:53:03 -0800 (PST) Message-Id: <20030208.005303.80023391.davem@redhat.com> To: alan@lxorguk.ukuu.org.uk Cc: christopher.leech@intel.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: skb_padto and small fragmented transmits From: "David S. Miller" In-Reply-To: <1044624820.14026.7.camel@irongate.swansea.linux.org.uk> References: <1044559328.4618.54.camel@localhost.localdomain> <20030206.144306.14966745.davem@redhat.com> <1044624820.14026.7.camel@irongate.swansea.linux.org.uk> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1671 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Alan Cox Date: 07 Feb 2003 13:33:41 +0000 On Thu, 2003-02-06 at 22:43, David S. Miller wrote: > Indeed, Alan you need to fix the skb_padto stuff to use > skb->len, ignore the skb->data_len as skb->len is the > full length. Dave just fix it next time you touch the code and push it to Marcelo. It doesnt affect the 2.2 backport so that will be ok Ok, I will take care of this. From jmorris@intercode.com.au Sat Feb 8 01:36:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 08 Feb 2003 01:36:33 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h189aN3v027269 for ; Sat, 8 Feb 2003 01:36:25 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id UAA10872; Sat, 8 Feb 2003 20:43:56 +1100 Date: Sat, 8 Feb 2003 20:43:55 +1100 (EST) From: James Morris To: "David S. Miller" cc: Makan.Pourzandi@ericsson.ca, , , Subject: Re: [PATCH] LSM networking update: summary (0/5) In-Reply-To: <20030208.004440.91323755.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1672 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Sat, 8 Feb 2003, David S. Miller wrote: > From: James Morris > Date: Sat, 8 Feb 2003 10:10:44 +1100 (EST) > > As mentioned during the last week, the current set of network hooks will > not directly support explicitly labeled networking. > > Why not? I thought we had completely established that anything > the socket receive SKB hook could not handle would be implementable > via netfilter. > By not directly supported, I mean by the LSM network hooks. - James -- James Morris From no_aqui@uol.com.ar Sat Feb 8 08:03:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 08 Feb 2003 08:03:49 -0800 (PST) Received: from uol.com.ar (ADSL142-165.advancedsl.com.ar [200.63.142.165]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h18G3e3v000489 for ; Sat, 8 Feb 2003 08:03:42 -0800 Message-Id: <200302081603.h18G3e3v000489@oss.sgi.com> From: "Alejandro - No responder aqui" To: Subject: liquido urgente por viaje Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Date: Sat, 8 Feb 2003 13:08:17 -0300 X-Priority: 1 (Highest) Content-Transfer-Encoding: 8bit X-archive-position: 1673 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: no_aqui@uol.com.ar Precedence: bulk X-list: netdev POR VIAJE LIQUIDO URGENTE 3 lotes de partes de PC LOTE 1: Disco rigido 4 GB marca Western Digital valor $200 **************************** LOTE 2: Disco rigido 2.57 GB marca Fujitsu valor $140 Ambos discos son sin sectores defectuosos (se prueba delante suyo) y tienen poco uso Pueden entregarse con windows 98 segunda edicion y Oficce completo instalados si lo desea **************************** LOTE 3: Gabinete con fuente contiene Motherboard con Microprocesador Intel Pentium 200 Mhz 64 MB memoria RAM SIMM Placa de sonido Sound Pro multimedia con entrada para joistik modem Zoltrix 56 kbps V 90 con driver en CD mouse cooler valor $250 (este lote 3 es una CPU completa a la que le falta placa de video y disco rigido) *************************** soy particular y trato con particulares - revendedores abstenerse NO RESPONDER ESTE MAIL comunicarse UNICAMENTE al 15-4427-1044 lunes a lunes de 10 a 22 hs necesito definir a la brevedad soy de Floresta, buenos aires From davej@suse.de Mon Feb 10 19:55:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 10 Feb 2003 19:55:57 -0800 (PST) Received: from noodles.internal (noodles.codemonkey.org.uk [213.152.47.19]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1B3tp3v025520 for ; Mon, 10 Feb 2003 19:55:52 -0800 Received: from noodles.internal (localhost [127.0.0.1]) by noodles.internal (8.12.7/8.12.7/Debian-2) with ESMTP id h1B402BN028208 for ; Tue, 11 Feb 2003 04:00:02 GMT Received: (from davej@localhost) by noodles.internal (8.12.7/8.12.7/Debian-2) id h1B3xuvo028206; Tue, 11 Feb 2003 03:59:56 GMT Date: Tue, 11 Feb 2003 03:59:56 GMT Message-Id: <200302110359.h1B3xuvo028206@noodles.internal> To: netdev@oss.sgi.com From: davej@codemonkey.org.uk Subject: missing forward port. X-archive-position: 1674 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davej@codemonkey.org.uk Precedence: bulk X-list: netdev This patch showed up in 2.4 a while back, and still isn't in 2.5. Dave diff -urpN --exclude-from=/home/davej/.exclude bk-linus/net/ipv4/tcp.c linux-2.5/net/ipv4/tcp.c --- bk-linus/net/ipv4/tcp.c 2003-01-08 10:49:35.000000000 -0100 +++ linux-2.5/net/ipv4/tcp.c 2003-01-17 00:59:34.000000000 -0100 @@ -1192,7 +1192,8 @@ new_segment: from += copy; copied += copy; - seglen -= copy; + if ((seglen -= copy) == 0 && iovlen == 0) + goto out; if (skb->len != mss_now || (flags & MSG_OOB)) continue; From m@martinh.net Wed Feb 12 02:12:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 12 Feb 2003 02:12:48 -0800 (PST) Received: from mostly-harmless.lut.ac.uk (mostly-harmless.lut.ac.uk [131.231.80.12]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1CACi3v017828 for ; Wed, 12 Feb 2003 02:12:45 -0800 Received: from martin (helo=mostly-harmless.lut.ac.uk) by mostly-harmless.lut.ac.uk with local-esmtp (Exim 3.35 #2) id 18iu04-0005Ba-00 for netdev@oss.sgi.com; Wed, 12 Feb 2003 10:20:20 +0000 From: Martin Hamilton To: netdev@oss.sgi.com X-URI: Subject: NOARP and 2.4 kernels Date: Wed, 12 Feb 2003 10:20:20 +0000 Message-Id: X-archive-position: 1675 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: m@martinh.net Precedence: bulk X-list: netdev Hi folks, what's the current thinking on this? Once upon a time (2.2 kernel) it was made possible to mark an interface as "hidden", in which case Linux wouldn't respond to ARP who-has broadcasts for any IP address associated with the hidden interface. This feature was squished in the 2.4 kernel, though Julian Anastasov has now implemented a number of alternate approaches to solving the problem... http://www.linuxvirtualserver.org/~julian/#hidden Is it likely that any of these will be making an appearance in (or return to ;-) the canonical (Linus) kernel in the 2.4 series? This feature is very useful for anyone trying to build server clusters, e.g. using L4 switching. Obviously one can always build a kernel which includes one of the ARP hiding patches, but it would be much less painful to have this feature back in the Linus kernel again and available in vendors' default distributions without any heavy lifting being required. If nothing else, loopback and dummy interfaces should surely not respond to ARP broadcasts, which they currently (2.4.20/2.4.21-pre) still appear to do. If the NOARP flag means nothing, then it would help to avoid confusion if the kernel was rigged so that attempts to set it result in an error message (ifconfig eth0:0 -arp, ip link set eth0:0 arp off). Likewise, NOARP should not be set on dummy interfaces if it has no effect... ? # ifconfig dummy0 inet 10.9.8.7 netmask 255.255.255.255 broadcast 10.9.8.7 up [root@mostly-harmless root]# ifconfig dummy0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 inet addr:10.9.8.7 Bcast:10.9.8.7 Mask:255.255.255.255 UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Thanks in advance for any thoughts :-) Cheers, Martin From kaber@trash.net Wed Feb 12 08:57:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 12 Feb 2003 08:57:27 -0800 (PST) Received: from el-zoido.localnet (port-212-202-187-147.reverse.qdsl-home.de [212.202.187.147]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1CGvG3v010910 for ; Wed, 12 Feb 2003 08:57:17 -0800 Received: from trash.net (ws.localnet [192.168.0.23]) by el-zoido.localnet (8.11.6/linuxconf) with ESMTP id h1CH5L703194; Wed, 12 Feb 2003 18:05:21 +0100 Message-ID: <3E4A7ECB.1020207@trash.net> Date: Wed, 12 Feb 2003 18:05:15 +0100 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: Netfilter Development Mailinglist Subject: Troubles with NFS & ip_conntrack: packets go to wrong mac Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1676 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev I've been experiencing strange problems with nfs and ip_conntrack for a while now, unfortunately noone so far was able to help. The problem occurs when ip_conntrack is loaded on the nfs server. nfs reads hang and the clients start logging UDP: short packet: 192.168.0.1:0 0/120 to 192.168.0.23:0 UDP: short packet: 192.168.0.1:6439 28562/120 to 192.168.0.23:60558 There are two ways to make it work: 1. remove ip_conntrack or 2. set mtu to 1484 on the nfs server. One suspicion was ip_conntrack breaking udp path mtu discovery since it seems to defragment packets with DF|MF and refragment them (with possibly different mtu) at POSTROUTING. This doesn't seem to be the problem, but i noted the nfs server sends out fragments with wrong destination mac. This is a packet captured on 192.168.0.23: 0:e0:7d:74:ab:cc 0:e0:7d:74:ab:cd 0800 1514: 192.168.0.1 > 192.168.0.223: (frag 44777:1480@4440+) (ttl 64, len 1500, bad cksum 2294!) This happens every 1-30 seconds. I never saw more than one misdirected fragment per packet. Relevent parts from neighbour table: 192.168.0.223 dev eth0 lladdr 00:e0:29:3c:c1:c9 nud reachable 192.168.0.23 dev eth0 lladdr 00:e0:7d:74:ab:cd nud reachable On 192.168.0.223 packets for 192.168.0.23 show up. Both clients time out during reassembly. I placed some printks though the netfilter code and ip_output.c but couldn't find any further pointers. I looked for broken checksums (something seems to alter the ip after checksumming) in ip_finish_output2, but everything is ok there. I can't see anything netfilter related touching packets after that. Any help and/or pointers where to look further would be appreciated. Regards, Patrick From bwa@us.ibm.com Wed Feb 12 15:07:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 12 Feb 2003 15:07:43 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1CN7X3v019841 for ; Wed, 12 Feb 2003 15:07:34 -0800 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e2.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1CNFfl3068714; Wed, 12 Feb 2003 18:15:41 -0500 Received: from w-bwa1.beaverton.ibm.com (w-bwa1.beaverton.ibm.com [9.47.18.12]) by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1CNFbGH029920; Wed, 12 Feb 2003 18:15:38 -0500 Subject: [PATCH] subset of RFC2553 From: Bruce Allan To: davem@redhat.com Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 12 Feb 2003 15:15:39 -0800 Message-Id: <1045091741.8858.320.camel@w-bwa1.beaverton.ibm.com> Mime-Version: 1.0 X-archive-position: 1677 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwa@us.ibm.com Precedence: bulk X-list: netdev This patch against 2.5.59 adds bits of RFC 2553 Basic Socket Interface Extensions for IPv6; specifically the in6addr_any wildcard address and in6addr_loopback address structures, the respective IN6ADDR_*_INIT constants, and moves the definition of struct sockaddr_storage from the SCTP code to a common IPv6 header file. These changes will assist in making certain networking kernel code portable across multiple address families. -- Bruce Allan Linux Technology Center IBM Corporation, Beaverton OR ======================================================================== diff -Naur linux-2.5.59/include/linux/in6.h linux-2.5.59-RFC2553/include/linux/in6.h --- linux-2.5.59/include/linux/in6.h 2003-02-12 10:29:14.000000000 -0800 +++ linux-2.5.59-RFC2553/include/linux/in6.h 2003-02-12 10:09:23.000000000 -0800 @@ -40,6 +40,15 @@ #define s6_addr32 in6_u.u6_addr32 }; +/* IPv6 Wildcard Address (::) and Loopback Address (::1) defined in RFC2553 + * NOTE: Be aware the IN6ADDR_* constants and in6addr_* externals are defined + * in network byte order, not in host byte order as are the IPv4 equivalents + */ +extern const struct in6_addr in6addr_any; +#define IN6ADDR_ANY_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } } } +extern const struct in6_addr in6addr_loopback; +#define IN6ADDR_LOOPBACK_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 } } } + struct sockaddr_in6 { unsigned short int sin6_family; /* AF_INET6 */ __u16 sin6_port; /* Transport layer port # */ diff -Naur linux-2.5.59/include/linux/socket.h linux-2.5.59-RFC2553/include/linux/socket.h --- linux-2.5.59/include/linux/socket.h 2003-02-12 10:29:14.000000000 -0800 +++ linux-2.5.59-RFC2553/include/linux/socket.h 2003-02-12 10:09:09.000000000 -0800 @@ -25,6 +25,34 @@ }; /* + * Desired design of maximum size and alignment + */ +#define _SS_MAXSIZE 128 /* Implementation specific max size */ +#define _SS_ALIGNSIZE (sizeof (int64_t)) + /* Implementation specific desired alignment */ +/* + * Definitions used for sockaddr_storage structure paddings design. + */ +#define _SS_PAD1SIZE (_SS_ALIGNSIZE - sizeof (sa_family_t)) +#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t) + \ + _SS_PAD1SIZE + _SS_ALIGNSIZE)) + +struct sockaddr_storage { + sa_family_t ss_family; /* address family */ + /* Following fields are implementation specific */ + char __ss_pad1[_SS_PAD1SIZE]; + /* 6 byte pad, this is to make implementation */ + /* specific pad up to alignment field that */ + /* follows explicit in the data structure */ + int64_t __ss_align; /* field to force desired structure */ + /* storage alignment */ + char __ss_pad2[_SS_PAD2SIZE]; + /* 112 byte pad to achieve desired size, */ + /* _SS_MAXSIZE value minus size of ss_family */ + /* __ss_pad1, __ss_align fields is 112 */ +}; + +/* * As we do 4.4BSD message passing we use a 4.4BSD message passing * system, not 4.3. Thus msg_accrights(len) are now missing. They * belong in an obscure libc emulation or the bin. diff -Naur linux-2.5.59/include/net/sctp/structs.h linux-2.5.59-RFC2553/include/net/sctp/structs.h --- linux-2.5.59/include/net/sctp/structs.h 2003-02-12 10:29:14.000000000 -0800 +++ linux-2.5.59-RFC2553/include/net/sctp/structs.h 2003-02-12 08:35:07.000000000 -0800 @@ -61,38 +61,6 @@ #include /* We need tq_struct. */ #include /* We need sctp* header structs. */ -/* - * This is (almost) a direct quote from RFC 2553. - */ - -/* - * Desired design of maximum size and alignment - */ -#define _SS_MAXSIZE 128 /* Implementation specific max size */ -#define _SS_ALIGNSIZE (sizeof (__s64)) - /* Implementation specific desired alignment */ -/* - * Definitions used for sockaddr_storage structure paddings design. - */ -#define _SS_PAD1SIZE (_SS_ALIGNSIZE - sizeof (sa_family_t)) -#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t)+ \ - _SS_PAD1SIZE + _SS_ALIGNSIZE)) - -struct sockaddr_storage { - sa_family_t __ss_family; /* address family */ - /* Following fields are implementation specific */ - char __ss_pad1[_SS_PAD1SIZE]; - /* 6 byte pad, to make implementation */ - /* specific pad up to alignment field that */ - /* follows explicit in the data structure */ - __s64 __ss_align; /* field to force desired structure */ - /* storage alignment */ - char __ss_pad2[_SS_PAD2SIZE]; - /* 112 byte pad to achieve desired size, */ - /* _SS_MAXSIZE value minus size of ss_family */ - /* __ss_pad1, __ss_align fields is 112 */ -}; - /* A convenience structure for handling sockaddr structures. * We should wean ourselves off this. */ diff -Naur linux-2.5.59/net/ipv6/addrconf.c linux-2.5.59-RFC2553/net/ipv6/addrconf.c --- linux-2.5.59/net/ipv6/addrconf.c 2003-01-16 18:22:44.000000000 -0800 +++ linux-2.5.59-RFC2553/net/ipv6/addrconf.c 2003-02-12 13:55:03.000000000 -0800 @@ -136,6 +136,10 @@ MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ }; +/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ +const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; +const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; + int ipv6_addr_type(struct in6_addr *addr) { u32 st; From davem@redhat.com Wed Feb 12 21:49:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 12 Feb 2003 21:49:43 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1D5nc3v001733 for ; Wed, 12 Feb 2003 21:49:39 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA31906; Wed, 12 Feb 2003 21:43:06 -0800 Date: Wed, 12 Feb 2003 21:43:05 -0800 (PST) Message-Id: <20030212.214305.67672796.davem@redhat.com> To: bwa@us.ibm.com Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] subset of RFC2553 From: "David S. Miller" In-Reply-To: <1045091741.8858.320.camel@w-bwa1.beaverton.ibm.com> References: <1045091741.8858.320.camel@w-bwa1.beaverton.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1678 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Bruce Allan Date: 12 Feb 2003 15:15:39 -0800 I don't like how sockaddr_storage works, so you'll have to clean it up before we move it to a generic spot. +struct sockaddr_storage { + sa_family_t ss_family; /* address family */ + /* Following fields are implementation specific */ + char __ss_pad1[_SS_PAD1SIZE]; + /* 6 byte pad, this is to make implementation */ + /* specific pad up to alignment field that */ + /* follows explicit in the data structure */ + int64_t __ss_align; /* field to force desired structure */ + /* storage alignment */ + char __ss_pad2[_SS_PAD2SIZE]; + /* 112 byte pad to achieve desired size, */ + /* _SS_MAXSIZE value minus size of ss_family */ + /* __ss_pad1, __ss_align fields is 112 */ +}; All of this pad stuff is really unnecessary, just specify ss_family and then "stuff" where "stuff" can be something like "char __data[0];" Then you can add "attribute((aligned(64)))" or whatever to the declaration as well. And if you're going to put some 64-bit type in here, use "__u64" which actually makes you consistent with the rest of the kernel. You could also do something like: __u64 data[_SS_MAXSIZE / sizeof(__u64)]; Anything but this pad stuff... From ipv6_san@rediffmail.com Wed Feb 12 23:10:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 12 Feb 2003 23:10:06 -0800 (PST) Received: from rediffmail.com (webmail18.rediffmail.com [203.199.83.28] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1D79v3v006579 for ; Wed, 12 Feb 2003 23:09:59 -0800 Received: (qmail 17998 invoked by uid 510); 13 Feb 2003 07:22:49 -0000 Date: 13 Feb 2003 07:22:49 -0000 Message-ID: <20030213072249.17997.qmail@webmail18.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 13 feb 2003 07:22:48 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: netdev@oss.sgi.com Subject: IPv6 enabled Network services Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1679 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Hi everybody, I have installed Red Hat Linux 2.4.18-14 on two i386 machines, with kernel compiled for IPv6 support. These machines are connected through a LAN. when i try telnet> 3ffe::102:304:1234/64 i get 3ffe::102:304:1234/64 unknown host when doing ftp, same message is given. I would like to know whether i have to configure/modify any scripts, or whether these(ftp, telnet, etc) utilities are IPv6 enabled ??? I'm eager to know what is wrong. Please suggest me. thanx, San ----------------------------------------------- From ipv6_san@rediffmail.com Thu Feb 13 00:39:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 00:39:59 -0800 (PST) Received: from rediffmail.com (webmail18.rediffmail.com [203.199.83.28] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1D8dl3v011865 for ; Thu, 13 Feb 2003 00:39:50 -0800 Received: (qmail 4097 invoked by uid 510); 13 Feb 2003 08:52:38 -0000 Date: 13 Feb 2003 08:52:38 -0000 Message-ID: <20030213085238.4096.qmail@webmail18.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 13 feb 2003 08:52:38 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "=?iso-8859-1?Q?Marcin_Kami=F1ski?=" Cc: netdev@oss.sgi.com Subject: Re: Re: IPv6 enabled Network services Content-type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8bit Content-Disposition: inline X-archive-position: 1680 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Hi Marc i tried for telnet> 3ffe::102:304:1234 still i get 3ffe::102:304:1234: unknown host pls help. thanx, San ------------------------- On Thu, 13 Feb 2003 Marcin Kamiñski wrote : >On Thu, 13 Feb 2003, santosh kumar gowda wrote: > > > Hi everybody, > > > > I have installed Red Hat Linux 2.4.18-14 on two i386 >machines, > > with > > kernel compiled for IPv6 support. > > These machines are connected through a LAN. > > > > when i try > > telnet> 3ffe::102:304:1234/64 > > i get > > 3ffe::102:304:1234/64 unknown host > > > > when doing ftp, same message is given. > > > > I would like to know whether i have to configure/modify any > > scripts, > > or whether these(ftp, telnet, etc) utilities are IPv6 >enabled > > ??? > > > > I'm eager to know what is wrong. > > Please suggest me. > >Why do You use prefix-length in addreses? Cut everything after >'/' and '/' >itself. > >-- >- Marcin Kaminski --------------------------------- maxiu - >--- software developer ------------------- 6net project --- >----- network administrator -------- Best Group admin ----- >------- Poznañ Supercomputing and Networking Center ------- From divulgadorssp@openline.com.br Thu Feb 13 00:40:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 00:40:49 -0800 (PST) Received: from openline.com.br ([200.173.164.6]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1D8eg3v012059 for ; Thu, 13 Feb 2003 00:40:45 -0800 Message-Id: <200302130840.h1D8eg3v012059@oss.sgi.com> From: "Divulgador de Artistas" To: Subject: =?ISO-8859-1?Q?Divulga=E7=E3o?= de Artistas em =?ISO-8859-1?Q?R=E1dios?= do Interior de SP Mime-Version: 1.0 Content-Type: text/html; charset="ISO-8859-1" Date: Thu, 13 Feb 2003 06:49:24 -0300 Content-Transfer-Encoding: 8bit X-archive-position: 1681 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: divulgadorssp@openline.com.br Precedence: bulk X-list: netdev

Realizo a divulgação de artistas dos mais variados gêneros musicais nas principais rádios do interior de São Paulo, sendo esta divulgação de extrema importância para os novos artistas que buscam atrair a atenção de gravadoras e das grandes rádios da capital de SP.

Para maiores informações acesse o site www.divulgadorsp.kit.net  

 Caso não queira mais receber nossos informativos Click aqui From pb@bieringer.de Thu Feb 13 01:32:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 01:32:16 -0800 (PST) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1D9Vv3v015428 for ; Thu, 13 Feb 2003 01:31:58 -0800 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 39BE513870 for ; Thu, 13 Feb 2003 09:59:36 +0100 (CET) X-AV-Checked: Thu Feb 13 09:59:36 2003 smtp2.aerasec.de Received: from pD950FA53.dip.t-dialin.net (pD950FA53.dip.t-dialin.net [217.80.250.83]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 90E201386F for ; Thu, 13 Feb 2003 09:59:35 +0100 (CET) Date: Thu, 13 Feb 2003 09:59:34 +0100 From: Peter Bieringer To: netdev@oss.sgi.com Subject: Re: Re: IPv6 enabled Network services Message-ID: <15100000.1045126774@gate.muc.bieringer.de> In-Reply-To: <20030213085238.4096.qmail@webmail18.rediffmail.com> References: <20030213085238.4096.qmail@webmail18.rediffmail.com> X-Mailer: Mulberry/3.0.1 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 1682 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, February 13, 2003 08:52:38 AM +0000 santosh kumar gowda wrote: > Hi Marc > i tried for > telnet> 3ffe::102:304:1234 > still i get > 3ffe::102:304:1234: unknown host Can you ping6 this address? Are the shown addresses configured in real or are they rewritten for public ML posting? They look a little bit strange to me. Also try $ telnet perhaps this kind will work. If not, telnet has perhaps no IPv6 support (which version/RPM do you use)? Check also $ which telnet to be sure you're using the right one. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From laforge@gnumonks.org Thu Feb 13 02:08:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 02:08:52 -0800 (PST) Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DA8Y3v017661 for ; Thu, 13 Feb 2003 02:08:40 -0800 Received: from sunbeam-tap0.de.gnumonks.org ([192.168.200.2] helo=sunbeam.gnumonks.org) by coruscant.gnumonks.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 3.34 #1) id 18jGPt-00033S-00; Thu, 13 Feb 2003 11:16:33 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.35 #1) id 18jGLs-0004Fa-00; Thu, 13 Feb 2003 11:12:20 +0100 Date: Thu, 13 Feb 2003 11:12:20 +0100 From: Harald Welte To: Patrick McHardy Cc: netdev@oss.sgi.com Subject: Re: Troubles with NFS & ip_conntrack: packets go to wrong mac Message-ID: <20030213101220.GC14794@sunbeam.de.gnumonks.org> References: <3E4A7ECB.1020207@trash.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="kfjH4zxOES6UT95V" Content-Disposition: inline In-Reply-To: <3E4A7ECB.1020207@trash.net> User-Agent: Mutt/1.3.28i X-Operating-System: Linux sunbeam 2.4.20-nfpom X-Date: Today is Prickle-Prickle, the 44th day of Chaos in the YOLD 3169 X-archive-position: 1683 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@gnumonks.org Precedence: bulk X-list: netdev --kfjH4zxOES6UT95V Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 12, 2003 at 06:05:15PM +0100, Patrick McHardy wrote: > I've been experiencing strange problems with nfs and ip_conntrack for a= =20 > while now, unfortunately noone so far was able to help. The problem > occurs when ip_conntrack is loaded on the nfs server. nfs reads hang > and the clients start logging Just for consistency: Can you please report this to http://bugzilla.netfilter.org/ ?=20 Addiditional information like what kind of machine (UP/SMP), and network board might be interesting. > Regards, > Patrick --=20 - Harald Welte http://www.gnumonks.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 --kfjH4zxOES6UT95V Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+S2+EXaXGVTD0i/8RAs+BAJ9LJCE7pzlqnfJBuEMYYvku+0RpmwCgoG6S MrzI6jBhSuyg9UO18VVNcLs= =Cu4h -----END PGP SIGNATURE----- --kfjH4zxOES6UT95V-- From ipv6_san@rediffmail.com Thu Feb 13 03:02:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 03:02:26 -0800 (PST) Received: from rediffmail.com (webmail27.rediffmail.com [203.199.83.37] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DB2K3v027842 for ; Thu, 13 Feb 2003 03:02:22 -0800 Received: (qmail 21922 invoked by uid 510); 13 Feb 2003 11:15:57 -0000 Date: 13 Feb 2003 11:15:57 -0000 Message-ID: <20030213111557.21921.qmail@webmail27.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 13 feb 2003 11:15:57 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "Peter Bieringer" Cc: netdev@oss.sgi.com Subject: Re: Re: Re: IPv6 enabled Network services Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1684 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Dr.Peter, I'm able to ping (using ping6) from both the machines. But, when i perform telnet, in machine-1 with Linux version 2.4.18-3smp gcc version 2.96, Red Hat Linux 7.3, i get # telnet fe80::2b0:d0ff:fed2:65ff Trying fe80::2b0:d0ff:fed2:65ff telnet: connect to address fe80::2b0:d0ff:fed2:65ff: Invalid argument And the machine-2 with Linux version 2.4.18-14 (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7), i get #telnet fe80::250:daff:fed2:d90e fe80::250:daff:fed2:d90e: Unknown host In case, if i need to upgrade telnet, ftp, etc, suggest me where i can find them. Any docs for the same. looking forward for ur help -San --------------------------------------------- On Thu, 13 Feb 2003 Peter Bieringer wrote : > > >--On Thursday, February 13, 2003 08:52:38 AM +0000 santosh >kumar >gowda wrote: > > > Hi Marc > > i tried for > > telnet> 3ffe::102:304:1234 > > still i get > > 3ffe::102:304:1234: unknown host > >Can you ping6 this address? Are the shown addresses configured >in >real or are they rewritten for public ML posting? > >They look a little bit strange to me. > >Also try >$ telnet > >perhaps this kind will work. > >If not, telnet has perhaps no IPv6 support (which version/RPM do >you >use)? Check also >$ which telnet >to be sure you're using the right one. > > Peter >-- >Dr. Peter Bieringer >http://www.bieringer.de/pb/ >GPG/PGP Key 0x958F422D mailto: pb at bieringer dot >de >Deep Space 6 Co-Founder and Core Member >http://www.deepspace6.net/ > > From pb@bieringer.de Thu Feb 13 04:05:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 04:05:41 -0800 (PST) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DC5a3v001405 for ; Thu, 13 Feb 2003 04:05:37 -0800 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id 78F1813870; Thu, 13 Feb 2003 13:13:52 +0100 (CET) X-AV-Checked: Thu Feb 13 13:13:52 2003 smtp2.aerasec.de Received: from [192.168.1.2] (pD950F5CB.dip.t-dialin.net [217.80.245.203]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id B6EA31386F; Thu, 13 Feb 2003 13:13:51 +0100 (CET) Date: Thu, 13 Feb 2003 13:13:54 +0100 From: Peter Bieringer To: netdev@oss.sgi.com Cc: santosh kumar gowda Subject: Re: Re: Re: IPv6 enabled Network services Message-ID: <31010000.1045138434@worker.muc.bieringer.de> In-Reply-To: <20030213111557.21921.qmail@webmail27.rediffmail.com> References: <20030213111557.21921.qmail@webmail27.rediffmail.com> X-Mailer: Mulberry/3.0.1 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 1685 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Thursday, February 13, 2003 11:15:57 AM +0000 santosh kumar gowda wrote: > Dr.Peter, > > I'm able to ping (using ping6) from both the machines. > > But, when i perform telnet, in machine-1 with Linux version > 2.4.18-3smp gcc version 2.96, Red Hat Linux 7.3, i get > ># telnet fe80::2b0:d0ff:fed2:65ff > Trying fe80::2b0:d0ff:fed2:65ff > telnet: connect to address fe80::2b0:d0ff:fed2:65ff: Invalid > argument > > And the machine-2 with Linux version 2.4.18-14 (gcc version 3.2 > 20020903 (Red Hat Linux 8.0 3.2-7), i get > ># telnet fe80::250:daff:fed2:d90e > fe80::250:daff:fed2:d90e: Unknown host > > In case, if i need to upgrade telnet, ftp, etc, suggest me where i > can find them. Any docs for the same. looking forward for ur help Grmmml, read FAQ in HowTo: do not use link-local addresses for normal purposes, they won't work because of the scoping issue. Configure and use site-local instead. Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From ipv6_san@rediffmail.com Thu Feb 13 04:50:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 04:50:33 -0800 (PST) Received: from rediffmail.com (webmail17.rediffmail.com [203.199.83.27] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DCoM3v002255 for ; Thu, 13 Feb 2003 04:50:24 -0800 Received: (qmail 7495 invoked by uid 510); 13 Feb 2003 13:05:14 -0000 Date: 13 Feb 2003 13:05:14 -0000 Message-ID: <20030213130514.7494.qmail@webmail17.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 13 feb 2003 13:05:14 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "Peter Bieringer" Cc: netdev@oss.sgi.com Subject: Re: Re: Re: Re: IPv6 enabled Network services Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1686 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Ooops...i still have some problem From machine-1 with Linux version 2.4.18-3smp gcc version 2.96, Red Hat Linux 7.3, telnet>feco::102:304:5678 Trying feco::102:304:5678 telnet: connect to address feco::102:304:5678: Connection refused why am i getting connection refused ?? and machine-2 with Linux version 2.4.18-14 (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7), telnet>feco::102:304:1357 feco::102:304:1357/28: Unknown host Response from machine-2 is quite differnet than machine-1. why ?? thanx, San ---------------------------------------- On Thu, 13 Feb 2003 Peter Bieringer wrote : > > >--On Thursday, February 13, 2003 11:15:57 AM +0000 santosh >kumar >gowda wrote: > > > Dr.Peter, > > > > I'm able to ping (using ping6) from both the machines. > > > > But, when i perform telnet, in machine-1 with Linux version > > 2.4.18-3smp gcc version 2.96, Red Hat Linux 7.3, i get > > > ># telnet fe80::2b0:d0ff:fed2:65ff > > Trying fe80::2b0:d0ff:fed2:65ff > > telnet: connect to address fe80::2b0:d0ff:fed2:65ff: Invalid > > argument > > > > And the machine-2 with Linux version 2.4.18-14 (gcc version >3.2 > > 20020903 (Red Hat Linux 8.0 3.2-7), i get > > > ># telnet fe80::250:daff:fed2:d90e > > fe80::250:daff:fed2:d90e: Unknown host > > > > In case, if i need to upgrade telnet, ftp, etc, suggest me >where i > > can find them. Any docs for the same. looking forward for ur >help > >Grmmml, read FAQ in HowTo: do not use link-local addresses for >normal >purposes, they won't work because of the scoping issue. Configure >and >use site-local instead. > > Peter >-- >Dr. Peter Bieringer >http://www.bieringer.de/pb/ >GPG/PGP Key 0x958F422D mailto: pb at bieringer dot >de >Deep Space 6 Co-Founder and Core Member >http://www.deepspace6.net/ From kaber@trash.net Thu Feb 13 08:32:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 08:32:38 -0800 (PST) Received: from el-zoido.localnet (port-212-202-185-113.reverse.qdsl-home.de [212.202.185.113]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DGWX3v032689 for ; Thu, 13 Feb 2003 08:32:35 -0800 Received: from trash.net (ws.localnet [192.168.0.23]) by el-zoido.localnet (8.11.6/linuxconf) with ESMTP id h1DGegG05832; Thu, 13 Feb 2003 17:40:42 +0100 Message-ID: <3E4BCA83.4060304@trash.net> Date: Thu, 13 Feb 2003 17:40:35 +0100 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: Netfilter Development Mailinglist Subject: Re: Troubles with NFS & ip_conntrack: packets go to wrong mac References: <3E4A7ECB.1020207@trash.net> In-Reply-To: <3E4A7ECB.1020207@trash.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1687 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev Some of the information in my last mail was wrong, i'll cut-and-paste the information entered into the netfilter bugtracking system, maybe someone has a suggestion. The important part is the packets checksum turns out to be right if the destination ip is replaced by the ip the mac belongs to, so the mac is correct. --cut-n-paste-- With ip_conntrack loaded on the nfs server, client reads time out. Clients start logging: UDP: short packet: 192.168.0.1:0 0/120 to 192.168.0.23:0 UDP: short packet: 192.168.0.1:6439 28562/120 to 192.168.0.23:60558 tcpdump shows corrupted packets: 0:e0:7d:74:ab:cc 0:e0:7d:74:ab:cd 0800 1514: 192.168.0.1 > 192.168.0.223: (frag 42878:1480@4440+) (ttl 64, len 1500, bad cksum 29ff!) 0:e0:7d:74:ab:cc 0:e0:7d:74:ab:cd 0800 1514: 192.168.0.1 > 192.168.0.223: (frag 42879:1480@4440+) (ttl 64, len 1500, bad cksum 29fe!) 0:e0:7d:74:ab:cd is not the mac of 192.168.0.223 but 192.168.0.23. Both are nfs-clients. If more nfs-clients show up more incorrect destination ips appear. The packets checksum is correct if the (incorrect) destination ip is replaced by the correct destination ip. Only single fragments have incorrect destination, the remaining fragments of a packet are fine. Packets verified (checksum) in ip_finish_output2 show no corruption. Neighbour table of nfs server: 192.168.0.223 dev eth0 lladdr 00:e0:29:3c:c1:c9 nud reachable 192.168.0.23 dev eth0 lladdr 00:e0:7d:74:ab:cd nud reachable The problem goes away as soon as ip_conntrack is unloaded. Another possibility is to set the interface mtu to 1486 on the nfs server. CPU: AMD-K6(tm) 3D processor, 256MB RAM Kernel: 2.4.21-pre3 with few netfilter patches applied, problem also exists in vanilla kernel, first noticed around 2.4.17 lspci: 00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04) 00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04) 00:07.0 ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev c3) 00:08.0 VGA compatible unclassified device: S3 Inc. 86c864 [Vision 864 DRAM] vers 0 00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10) 00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10) 00:0f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c2) --end-cut-n-paste-- Bye, Patrick Patrick McHardy wrote: > I've been experiencing strange problems with nfs and ip_conntrack for > a while now, > unfortunately noone so far was able to help. The problem occurs when > ip_conntrack > is loaded on the nfs server. nfs reads hang and the clients start logging > > UDP: short packet: 192.168.0.1:0 0/120 to 192.168.0.23:0 > UDP: short packet: 192.168.0.1:6439 28562/120 to 192.168.0.23:60558 > > There are two ways to make it work: 1. remove ip_conntrack or 2. set > mtu to 1484 on > the nfs server. One suspicion was ip_conntrack breaking udp path mtu > discovery > since it seems to defragment packets with DF|MF and refragment them > (with possibly > different mtu) at POSTROUTING. This doesn't seem to be the problem, > but i noted the > nfs server sends out fragments with wrong destination mac. > > This is a packet captured on 192.168.0.23: > 0:e0:7d:74:ab:cc 0:e0:7d:74:ab:cd 0800 1514: 192.168.0.1 > > 192.168.0.223: (frag 44777:1480@4440+) (ttl 64, len 1500, bad cksum > 2294!) > > This happens every 1-30 seconds. I never saw more than one misdirected > fragment per packet. > > Relevent parts from neighbour table: > 192.168.0.223 dev eth0 lladdr 00:e0:29:3c:c1:c9 nud reachable > 192.168.0.23 dev eth0 lladdr 00:e0:7d:74:ab:cd nud reachable > > On 192.168.0.223 packets for 192.168.0.23 show up. Both clients time > out during reassembly. > I placed some printks though the netfilter code and ip_output.c but > couldn't find any further > pointers. I looked for broken checksums (something seems to alter the > ip after checksumming) > in ip_finish_output2, but everything is ok there. I can't see anything > netfilter related touching packets after that. > Any help and/or pointers where to look further would be appreciated. > > Regards, > Patrick > From wjhun@cisco.com Thu Feb 13 14:31:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 13 Feb 2003 14:31:22 -0800 (PST) Received: from sj-msg-core-3.cisco.com (sj-msg-core-3.cisco.com [171.70.157.152]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1DMVE3v017436 for ; Thu, 13 Feb 2003 14:31:15 -0800 Received: from vaf-lnx.cisco.com (vaf-lnx.cisco.com [128.107.165.26]) by sj-msg-core-3.cisco.com (8.12.2/8.12.6) with ESMTP id h1DMdDB6012999; Thu, 13 Feb 2003 14:39:13 -0800 (PST) Received: (from wjhun@localhost) by vaf-lnx.cisco.com (8.11.6/8.11.6) id h1DMdMW27882; Thu, 13 Feb 2003 14:39:22 -0800 Date: Thu, 13 Feb 2003 14:39:21 -0800 From: Will Jhun To: davem@redhat.com Cc: greearb@candelatech.com, netdev@oss.sgi.com Subject: Getting details about an 802.1q VLAN interface from userspace Message-ID: <20030213143921.A21977@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i X-archive-position: 1688 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wjhun@cisco.com Precedence: bulk X-list: netdev Dave, As stated below, I'm looking for a way to get specifics about an 802.1q VLAN interface from userspace. Currently, I listen to NEWLINK messages over a netlink socket and classify the different types of interfaces. I identify these VLAN devices by their name (e.g. "eth0.10", admittedly not a good idea since the vlan netdev name formats can vary) and existence in /proc/net/vlan/config. This seems kind of gross. Is there a better way to get this information via netlink or some ioctl()? Would it be useful if I (or someone) added an ioctl() type to get information about a VLAN interface? (vlan, ifindex of trunking (real_dev) interface, priority maps; basically the content of struct vlan_dev_info) Thanks, William ----- Forwarded message from Ben Greear ----- Date: Thu, 13 Feb 2003 14:09:41 -0800 From: Ben Greear To: Will Jhun Subject: Re: Determining if a new interface is an 802.1q VLAN interface Will Jhun wrote: > Ben, > > I have a program that listens for netlink NEWLINK updates and keeps > track of interfaces in a system. I need to know if a new interface is an > 802.1q VLAN interface and also what its trunking interface and vlan are. > Right now, I'm just deducing these from the netdevice name and > /proc/net/vlan/config, since it looks like the netdev's priv_flags is > never exported to userspace, either by ioctl() or via netlink. > > Is there a better way to do this? The flag that determines if an interface can handle VLAN is not currently exported to user-space through any ioctl that I know of. Maybe we could talk DaveM into adding such an ioctl? The code would be simple enough if you could get Dave to accept it. Please ask Dave, and CC me. (I didn't want to forward your email w/out permission...) Ben > > Thanks, > William > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ----- End forwarded message ----- From gandalf@wlug.westbo.se Fri Feb 14 08:01:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 14 Feb 2003 08:01:54 -0800 (PST) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1EG1Z3v024806 for ; Fri, 14 Feb 2003 08:01:36 -0800 Received: by tux.rsn.bth.se (Postfix, from userid 501) id 8394736FE7; Fri, 14 Feb 2003 16:55:03 +0100 (CET) Subject: [PATCH] zero rt_cache_stat statistics at init From: Martin Josefsson To: "David S. Miller" Cc: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1045238103.23312.70.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 Date: 14 Feb 2003 16:55:03 +0100 X-archive-position: 1689 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Hi, I'm running 2.5.58-mm1 with slab-debugging enabled and noticed this in /proc/net/rt_cache_stat (long line): 000000ae 5b61f6cf 5ab13072 5a5a5a5a 5a5a5a5a 5a5d0c9a 5a5a5a7f 5a5a5aa6 5a61e025 5a5a961a 5a5a5aea 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a Here's a patch against 2.5.58(-mm1) to memset() it at init, also includes a whitespace change to conform to the other sizeof's in ip_rt_init(): --- linux-2.5.58/net/ipv4/route.c.orig 2003-02-14 16:07:46.000000000 +0100 +++ linux-2.5.58/net/ipv4/route.c 2003-02-14 16:43:11.000000000 +0100 @@ -2652,11 +2652,18 @@ ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1); ip_rt_max_size = (rt_hash_mask + 1) * 16; - rt_cache_stat = kmalloc_percpu(sizeof (struct rt_cache_stat), + rt_cache_stat = kmalloc_percpu(sizeof(struct rt_cache_stat), GFP_KERNEL); if (!rt_cache_stat) goto out_enomem1; + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_possible(i)) + continue; + memset(per_cpu_ptr(rt_cache_stat, i), 0, + sizeof(struct rt_cache_stat)); + } + devinet_init(); ip_fib_init(); -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. From jgarzik@pobox.com Fri Feb 14 15:50:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 14 Feb 2003 15:50:46 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1ENoW3v028039 for ; Fri, 14 Feb 2003 15:50:34 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18jpjI-0005Mh-00; Fri, 14 Feb 2003 23:58:52 +0000 Message-ID: <3E4D8295.2050400@pobox.com> Date: Fri, 14 Feb 2003 18:58:13 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: Manfred Spraul CC: Pete Zaitcev , James Bourne , davem@redhat.com, netdev@oss.sgi.com Subject: NAPI note (was Re: lockups with 2.4.20 (tg3? net/core/dev.c|deliver_to_old_ones)) References: <3E4D66DF.3040800@colorfullife.com> In-Reply-To: <3E4D66DF.3040800@colorfullife.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1690 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Manfred Spraul wrote: > It seems to be a generic NAPI restriction: > The caller of netif_receive_skb() must not own a spinlock that is > acquired from an interrupt handler. Thanks much for noticing this, Manfred. tg3 is definitely buggy in this regard. I've CC'd netdev as an FYI... We should probably patch NAPI_HOWTO for this note. I note that David pointed this out as an area for improvement, so he was already thinking in this direction anyway :) Jeff From jgarzik@pobox.com Fri Feb 14 23:17:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 14 Feb 2003 23:17:06 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1F7Gx3v003211 for ; Fri, 14 Feb 2003 23:17:00 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18jwhP-0002vg-00; Sat, 15 Feb 2003 07:25:23 +0000 Message-ID: <3E4DEB47.1070909@pobox.com> Date: Sat, 15 Feb 2003 02:24:55 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI interrupt data References: <3E4DE95C.2050804@pobox.com> In-Reply-To: <3E4DE95C.2050804@pobox.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1691 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Jeff Garzik wrote: > "approximate packets per second": > > bash-2.05b$ ./x.pl data.crumb > 135 samples, 21578 avg > bash-2.05b$ ./x.pl data.hum > 130 samples, 11213 avg er. I meant _interrupts_ per second. From jgarzik@pobox.com Sat Feb 15 00:03:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 15 Feb 2003 00:03:51 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1F8383v004164 for ; Sat, 15 Feb 2003 00:03:09 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18jwZT-0002sF-00; Sat, 15 Feb 2003 07:17:11 +0000 Message-ID: <3E4DE95C.2050804@pobox.com> Date: Sat, 15 Feb 2003 02:16:44 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: NAPI interrupt data Content-Type: multipart/mixed; boundary="------------050203000908090607000205" X-archive-position: 1692 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------050203000908090607000205 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit I looked at my latest tg3 driver's activity in /proc/interrupts and was a bit surprised. Using "ttcp" to send 500,000 bursts from a uniprocessor P3 ("hum") to a dual athlon ("crumb"), I recorded the interrupts using the simple while true do cat /proc/interrupts >> data sleep 1 done method. On hum, eth0 shared interrupts with acpi. On crumb, eth0 shared interrupts with the potentially-skewing aic7xxx. The results of tg3[NAPI] one ttcp process on unloaded boxes are the following, in "approximate packets per second": bash-2.05b$ ./x.pl data.crumb 135 samples, 21578 avg bash-2.05b$ ./x.pl data.hum 130 samples, 11213 avg The raw sample data and compute-the-average perl script were so small that I simply attached them to this email. Feel free to check my math for something dumb. Jeff --------------050203000908090607000205 Content-Type: application/octet-stream; name="interrupt-data.tar.bz2" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="interrupt-data.tar.bz2" QlpoOTFBWSZTWbFHR+QABf3/huywAgB+b//vAKIgBP//3/oABAIAAAhgB18w Z93EBAAAAAcYEohGhCZommjQZGgPRlA9Q0aDGTSDVT/IaaUqChoZAAAAaGQA AA5piZMmjCYJiaYBMAhgjAjAc0xMmTRhMExNMAmAQwRgRgIp5U/VPJDIAyaa ADahowINAAAJEQTRKZPRA9Q0z1QAAA0yAA094efo5eXd2Vx1uG9DaAeBAiuu vJqUxAGttXW3DNKAYEdYGgUyGi6oxtd/YP+MmHH9Y/1HhPw+5T5q+d7WWqfu GXVWqt5rvPg+A8dYe8d66dzETumajsHKL1Hr9O9R5zDRg2ZtV2a7J6qlTItk s6QH2LTcr1vbeZmFEjKqB+j2tVOoyC757keQlS76xeE0mWS7y0+2Vr2ZucLt 3WdUFG3Mj7ZctULcJ6u3MNsjVj4UkC4C3glD9D4zlfhja2ZzFgZp6KcjLaRg r9jlebRZyfPL/WMTMosrGLoY1r2sSdS2tr1yuPOCFtVy/AYAPIPgldIEVSNb GxiEAWyZXp6pxqQ8aUkldRGkl6CBGuU9i1wWVoBcYLKjWHzuXsvCqwpOTnm5 uedN68j2+FEGz7AbLXs979vzN9Ejzagu+UNmtWo5Jg0khEcdHL1AWQThrEEe ihF7q262qsjCCcOEmohpsqwgSDuUF6e5fHSIdLUGYO7CzMVhZNVdR4atiJrW VS8YZ01WC70vFoq1jO9CHFnMTmXiFx+arUjbIYhtBk4YhtFWiGi7RDS2Mzcu zQQ6NBBEmJoAx3YZNKWLRhiYRRohp6Mw862xSpNppWE7WlVa0yRXSkjYWcOz CGhtju1DWjIaHMKWS7Mo7OjHjJypalkNUZDow0mEN2iIaho0iE2F4hReIKTI ENGZlCux6zkBCBaSPr1hA50vI/G1Lo4dvbV35dSAird7lCLyjaiwKX1Ub9te a3cy771S7b73G9hHezcNeDzzbsPCrG2uFRO2OzM2tWKvXyY2oZ2UIs2vO0By dzQtprJmDNQlGUvN9XhbElXBKfTT3YutYNxA1wW12JNvqAXLk5YsO7FTDe1W V3WbpUK3LNvrwVlhK7qiJdbcIJo+sUV0290KV8lDTfSZl9lYDa83AMJ0o8CG WX51XdsPBYiRJG6gbau7NgQhLtrimOxbgRFxM0T2xCzihhCJcqGAzFOlDuug sGgCg+mSY79bvU2+7W92u0TIUXfGw46qIvOt0kc2Kh3K32aRHdY7DOJDTZCw q0du41oq6gvKhs6D8pvSuBWI+K0iCOi6KvLJdjix19bhWqZxzjhDK5UjBfa9 BHZTu3tDqVFntNqmRRE9+3R8xeb2ZqT8yKeY2C5KR3srSXWKrxypmdbaEaK2 lbpmxyDtmhsQGg6gIa1Opqy4aw2bLiM2mRh1xvNFpvtuEClBtztNW5S2DbD0 qwDx9DMN5zGw6FzatZxclGZWSRi84sTCbIaZlO8xX1dTG5uE7nrNJ4CvS5I4 xt7fQAOyr1hHRxkdXcb0duClGbUWA2XljFMyhCcM2SVVady1eBpO+wh1tQ3C ktm2tZ3Vrd3XDVVZjeOiqLVYLmSWMRcNRNpX0IyLd4vWl2Obm08xja22QPeA K3w55SfmSrPbU3ywg73UE+ztG1renZO5KZTvUMZuc9p1tKQNXp7Xcw3RD2an 1CtDzhmXcst5M6cKKNTKzWQqlnqrUBWLHXOVNuTlyvtCF9laMiwiJ8zmAl0J WVNJa5cFRFrKt8tlbmyuurlBmmbzC2J1xxiXYLx9DxUjB6hl2vQ7shmOYb70 hxpsdUVW6WTAWH1MUInOpQyF74OzmyzZK3PW09py3yNQ1YENf0SI0FE61BwF CaxIqPSLRIH5tgKQckkxwLhug7MyPEipEm6sDTVd8ZcKWAbtlC76IryU4wqD ABih4Ua3Uhv9+QLHvgSEGIS+RdcNdd0aZBsDk7m/80dJabQuBTvYJVFATtWf dxEGYRnMLAmEIf+OCFpwgw3+DoDHgpTbwhcDaa23IF/xdyRThQkLFHR+QA== --------------050203000908090607000205-- From hadi@cyberus.ca Sat Feb 15 06:25:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 15 Feb 2003 06:26:04 -0800 (PST) Received: from mx01.cyberus.ca (mx01.cyberus.ca [216.191.240.22]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1FEPt3v009841 for ; Sat, 15 Feb 2003 06:25:56 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx01.cyberus.ca with esmtp (Exim 4.10) id 18k3OW-0001d8-00; Sat, 15 Feb 2003 09:34:20 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1FEY3YO016834; Sat, 15 Feb 2003 09:34:03 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1FEY29l016831; Sat, 15 Feb 2003 09:34:02 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 15 Feb 2003 09:34:02 -0500 (EST) From: jamal To: Jeff Garzik cc: netdev@oss.sgi.com, "" Subject: Re: NAPI interrupt data In-Reply-To: <3E4DE95C.2050804@pobox.com> Message-ID: <20030215092908.E16812@shell.cyberus.ca> References: <3E4DE95C.2050804@pobox.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1693 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 15 Feb 2003, Jeff Garzik wrote: > bash-2.05b$ ./x.pl data.crumb > 135 samples, 21578 avg > bash-2.05b$ ./x.pl data.hum > 130 samples, 11213 avg > Probably the first 5-10 samples as well as the last 5-10 amples to get more accuracy. This data looks fine, no? definetly the scsi device is skewing things (you are writting data to disk for example). - The 500Kpps from ttcp doesnt sound right; tcp will slow you down. perhaps use ttcp to send udp packets to get a more interesting view. cheers, jamal From jgarzik@pobox.com Sat Feb 15 10:47:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 15 Feb 2003 10:47:52 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1FIlm3v013605 for ; Sat, 15 Feb 2003 10:47:49 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18k7Tx-0008JT-00; Sat, 15 Feb 2003 18:56:14 +0000 Message-ID: <3E4E8D32.6090706@pobox.com> Date: Sat, 15 Feb 2003 13:55:46 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI interrupt data References: <3E4DE95C.2050804@pobox.com> <20030215092908.E16812@shell.cyberus.ca> In-Reply-To: <20030215092908.E16812@shell.cyberus.ca> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1694 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev jamal wrote: > > On Sat, 15 Feb 2003, Jeff Garzik wrote: > > >>bash-2.05b$ ./x.pl data.crumb >>135 samples, 21578 avg >>bash-2.05b$ ./x.pl data.hum >>130 samples, 11213 avg >> > > > Probably the first 5-10 samples as well as the last 5-10 amples to get > more accuracy. > > This data looks fine, no? Over 4000 interrupts per second was not something I was hoping for, to be honest. ttcp did not even report 50% CPU utilization, so I reach the conclusion that both machines can handle well in excess of 4,000 interrupts per second... but overall I do not like the unbounded nature of the interrupt rate. This data makes me lean towards a software[NAPI] + hardware mitigation solution, as opposed to totally depending on software interrupt mitigation. > definetly the scsi device is skewing things > (you are writting data to disk for example). Yes, though only once 5 seconds when ext3 flushes. With nothing else going on but "ttcp" and "cat /proc/interrupts >> data ; sleep 1" there should be very little disk I/O. I agree it is skewing by an unknown factor, however. > - The 500Kpps from ttcp doesnt sound right; tcp will slow you down. > perhaps use ttcp to send udp packets to get a more interesting view. No, I ran 500,000 buffer I/Os total from ttcp ("-n 500000"). That doesn't really say anything about packets per second. The only thing I measured was interrupts per second. It was my mistake to type "packets" in the first email :/ Jeff From hadi@cyberus.ca Sat Feb 15 14:06:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 15 Feb 2003 14:06:50 -0800 (PST) Received: from mx03.cyberus.ca (mx03.cyberus.ca [216.191.240.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1FM6j3v022248 for ; Sat, 15 Feb 2003 14:06:46 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx03.cyberus.ca with esmtp (Exim 4.10) id 18kAaW-0003yW-00; Sat, 15 Feb 2003 17:15:12 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1FMEtYO017428; Sat, 15 Feb 2003 17:14:55 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1FMEsjS017425; Sat, 15 Feb 2003 17:14:54 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 15 Feb 2003 17:14:53 -0500 (EST) From: jamal To: Jeff Garzik cc: netdev@oss.sgi.com, "" Subject: Re: NAPI interrupt data In-Reply-To: <3E4E8D32.6090706@pobox.com> Message-ID: <20030215164516.C16812@shell.cyberus.ca> References: <3E4DE95C.2050804@pobox.com> <20030215092908.E16812@shell.cyberus.ca> <3E4E8D32.6090706@pobox.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1695 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 15 Feb 2003, Jeff Garzik wrote: > jamal wrote: > > > > On Sat, 15 Feb 2003, Jeff Garzik wrote: > > > > > > Probably the first 5-10 samples as well as the last 5-10 amples to get > > more accuracy. > > I actually meant to say ignore those first 5-10 and last 5-10 samples -- looking at your data that wouldnt have made a big difference. > > This data looks fine, no? > > Over 4000 interrupts per second was not something I was hoping for, to > be honest. ttcp did not even report 50% CPU utilization, so I reach the > conclusion that both machines can handle well in excess of 4,000 > interrupts per second... but overall I do not like the unbounded nature > of the interrupt rate. This data makes me lean towards a software[NAPI] > + hardware mitigation solution, as opposed to totally depending on > software interrupt mitigation. > Well, it is not "unbounded" perse. It scales according to the CPU capacity. For any CPU there is an upper limit input rate where the device would forever remain in polling mode. If this limit is exceeded say on bootup, and a million packets are received in a burst then youll probably see only one interupt for the million packets. If you remove that processor and add a faster in the same motherboard you should see more interupts than one being processed. Therefore there is an upper bound interupt rate and it is dependent on the CPU capacity (not to ignore other factors like PCI bus speed, memory bandwidth etc; cpu capacity plays a much bigger role though) Mitigation is valuable when the cost of PCI IO per packet is something that is bothersome. It becomes bothersome if the rate of input packets is such that you end up processing one packet per interupt; as you yourself have pointed out in the past, the cost of PCI IO per packet is high with NAPI. Of course cost of PCI IO per packet is demonstrated in CPU load observed. On slow CPUs this is clearly observed; Manfreds results for example demonstrated this. I also saw upto 8% CPU more with NAPI on 10kpps input rate. On a fast CPU that will probably show up as 0.5% more load (so the question is who cares?). What mitigation would do in the above case is amortize the cost of PCI-IO per packet. Instead of one packet, for the same PCI cost now its 2 etc. Mitigation becomes useless on higher input rates. In summary: Adding mitigation helps in the low rate case and doesnt harm in the high input case. BTW 4k interupts/sec is a very small rate. Try sending 5 or 6 ttcp flows instead of one and observe. > > > definetly the scsi device is skewing things > > (you are writting data to disk for example). > > Yes, though only once 5 seconds when ext3 flushes. With nothing else > going on but "ttcp" and "cat /proc/interrupts >> data ; sleep 1" there > should be very little disk I/O. I agree it is skewing by an unknown > factor, however. > theres not that many interupts, so nothing to worry about there. Of course if you want cleaner results dont share interupts or collect the data from the driver instead. > > > - The 500Kpps from ttcp doesnt sound right; tcp will slow you down. > > perhaps use ttcp to send udp packets to get a more interesting view. > > > No, I ran 500,000 buffer I/Os total from ttcp ("-n 500000"). That > doesn't really say anything about packets per second. The only thing I > measured was interrupts per second. It was my mistake to type "packets" > in the first email :/ > hit it with 10 ttcps instead or send 2 or so udp ttcp flows. It starts getting interesting then .. cheers, jamal From kazunori@miyazawa.org Sat Feb 15 22:43:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 15 Feb 2003 22:44:04 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1G6hw3v025261 for ; Sat, 15 Feb 2003 22:43:59 -0800 Received: from monza.miyazawa.org ([2001:200:1b0:1000:2d0:59ff:feab:4ac0]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Sun, 16 Feb 2003 15:36:01 +0900 Date: Sun, 16 Feb 2003 15:52:38 +0900 From: Kazunori MIyazawa To: netdev@oss.sgi.com, davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: usagi-core@linux-ipv6.org Subject: IPsec in linux-2.5.61 Message-Id: <20030216155238.5fb12777.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.8.9 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1696 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, I'm Miyazawa@USAGI Project. I'm sorry to sent mail from another address. I saw linux-2.5.61 and I think some codes is from my patch. If you think my opinion is proper, please add credits of changes MIYAZAWA Kazunori @USAGI : support IPv6 IPsec BTW, I'm trying to set the IPsec SA of IPv6, but the kernel returns ENOBUF. Is there any changes around pfkey interface? I'm using alexey's setkey in kametools. I succeeded to set IPsec SA of IPv4 and Policies. I prepare to send a patch to process the IPsec packet in IPv6 stack. Best Regards, --Kazunori Miyazawa (Yokogawa Electric Corporation) From laforge@netfilter.org Sun Feb 16 12:08:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 16 Feb 2003 12:09:07 -0800 (PST) Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1GK8t3v006234 for ; Sun, 16 Feb 2003 12:08:57 -0800 Received: from sunbeam-tap0.de.gnumonks.org ([192.168.200.2] helo=sunbeam.gnumonks.org) by coruscant.gnumonks.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 3.34 #1) id 18kVDr-0003Qp-00; Sun, 16 Feb 2003 21:17:22 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.35 #1) id 18kV8k-000847-00; Sun, 16 Feb 2003 21:11:54 +0100 Date: Sun, 16 Feb 2003 21:11:54 +0100 From: Harald Welte To: Patrick McHardy Cc: Don Cohen , netfilter-devel@lists.netfilter.org, netdev@oss.sgi.com Subject: Possible ip_defrag DoS ? Message-ID: <20030216201154.GA30787@sunbeam.de.gnumonks.org> Mail-Followup-To: Harald Welte , Patrick McHardy , Don Cohen , netfilter-devel@lists.netfilter.org, netdev@oss.sgi.com References: <20030215232635.25928.78900.Mailman@kashyyyk> <15950.60635.389199.836425@isis.cs3-inc.com> <3E4F0881.70302@trash.net> <15951.10496.914173.716313@isis.cs3-inc.com> <3E4F8660.5020409@trash.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4Ckj6UjgE2iN1+kY" Content-Disposition: inline In-Reply-To: <3E4F8660.5020409@trash.net> User-Agent: Mutt/1.3.28i X-Operating-System: Linux sunbeam 2.4.20-nfpom X-Date: Today is Boomtime, the 47th day of Chaos in the YOLD 3169 X-archive-position: 1697 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@netfilter.org Precedence: bulk X-list: netdev --4Ckj6UjgE2iN1+kY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 16, 2003 at 01:38:56PM +0100, Patrick McHardy wrote: > inerestingly, it seems linux defragmentation is vulnerable to dos attack. > the evictor (called before defragmentation) just kills the oldest entry > of each hash slot, starting with 0 until memory is below > sysctl_ipfrag_low_thresh. by sending enough fragments=20 > (>sysctl_ipfrag_high_thresh) which hash to the highest bucket you can > stop reassembly of valid packets. I'm forwarding this (from netfilter-devel) to the linux networking developers at netdev@oss.sgi.com. If your assumption is valid, they might want to have a look at this... thanks. > Patrick --=20 - Harald Welte http://www.netfilter.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie --4Ckj6UjgE2iN1+kY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+T/CKXaXGVTD0i/8RAk0YAKCHcxU3Z8IVQlfniFRJ46wDL+aKRACfaNf1 xUsNuY9XvXsdzUKesSfqERk= =2JId -----END PGP SIGNATURE----- --4Ckj6UjgE2iN1+kY-- From cloos@jhcloos.com Sun Feb 16 14:16:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 16 Feb 2003 14:16:59 -0800 (PST) Received: from ore.jhcloos.com (ore.jhcloos.com [64.240.156.239]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1GMGn3v007804 for ; Sun, 16 Feb 2003 14:16:51 -0800 Received: from lugabout.jhcloos.org (ppp40.pm3-5.buf-ch.ny.localnet.com [207.251.195.40]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client CN "lugabout.jhcloos.org", Issuer "ca.jhcloos.com" (verified OK)) by ore.jhcloos.com (Postfix) with ESMTP id 7EBA11C36F for ; Sun, 16 Feb 2003 16:25:16 -0600 (CST) Received: from lugabout.jhcloos.org (localhost [127.0.0.1]) by lugabout.jhcloos.org (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id D570E343 for ; Sun, 16 Feb 2003 22:25:06 +0000 (GMT) To: netdev@oss.sgi.com Subject: e100 in 2.5.59 through 2.5.61 From: "James H. Cloos Jr." Date: 16 Feb 2003 17:25:06 -0500 Message-ID: Lines: 20 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 1698 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cloos@jhcloos.com Precedence: bulk X-list: netdev I have been unable to get the e100 on the (actiontec) mini-pci card in my laptop to work at 100baseT. It was fine when I last booted a 2.4 kernel, and from fiddling with ethtool I've found that it works fine at 10baseT half or full duplex, provided I configure autoneg off. The switch is fine with other 100 devices, such as the printer and the 802.11 gateway. It is also happy with the 3com card in the laptop's port duplicator. The cables say they are verified for gigabit ethernet, so I doubt they are the problem. Both the e100 and the eepro100 modules fail at 100baseT. Has anyone else seen this? Any thoughts on it? I've no real need right now for 100, but it is a bit of a pain.... -JimC From scott.feldman@intel.com Sun Feb 16 20:58:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 16 Feb 2003 20:58:37 -0800 (PST) Received: from caduceus.fm.intel.com (fmr02.intel.com [192.55.52.25]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1H4wX3v021990 for ; Sun, 16 Feb 2003 20:58:34 -0800 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by caduceus.fm.intel.com (8.11.6/8.11.6/d: outer.mc,v 1.51 2002/09/23 20:43:23 dmccart Exp $) with ESMTP id h1H510T09777 for ; Mon, 17 Feb 2003 05:01:00 GMT Received: from fmsmsxv040-1.fm.intel.com (fmsmsxvs040.fm.intel.com [132.233.42.124]) by talaria.fm.intel.com (8.11.6/8.11.6/d: inner.mc,v 1.28 2003/01/13 19:44:39 dmccart Exp $) with SMTP id h1H58lD04536 for ; Mon, 17 Feb 2003 05:08:47 GMT Received: from FMSMSX016.fm.intel.com ([132.233.42.195]) by fmsmsxv040-1.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003021621072219415 ; Sun, 16 Feb 2003 21:07:22 -0800 Received: by fmsmsx016.fm.intel.com with Internet Mail Service (5.5.2653.19) id <118BKL31>; Sun, 16 Feb 2003 21:07:04 -0800 Message-ID: From: "Feldman, Scott" To: "James H. Cloos Jr." , netdev@oss.sgi.com Subject: RE: e100 in 2.5.59 through 2.5.61 Date: Sun, 16 Feb 2003 21:06:59 -0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) content-class: urn:content-classes:message Content-Type: text/plain; charset="iso-8859-1" X-archive-position: 1699 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > The switch is fine with other 100 devices, such as the > printer and the 802.11 gateway. It is also happy with the > 3com card in the laptop's port duplicator. Switch vendor/model? > Both the e100 and the eepro100 modules fail at 100baseT. Is the switch port forced or autoneg? -scott From xerox@foonet.net Sun Feb 16 21:16:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 16 Feb 2003 21:16:50 -0800 (PST) Received: from foonix.foonet.net (root@foonix.foonet.net [216.207.29.74]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1H5Gh3v022495 for ; Sun, 16 Feb 2003 21:16:44 -0800 Received: from badass (groovy.foonet.net [216.207.29.81]) by foonix.foonet.net (8.12.5/8.12.5) with ESMTP id h1H5PGrs022100 for ; Mon, 17 Feb 2003 00:25:16 -0500 From: "CIT/Paul" To: Subject: Bad problem with route cache using as a large router Date: Mon, 17 Feb 2003 00:15:22 -0500 Organization: CIT Message-ID: <004301c2d643$92567e10$4a00000a@badass> MIME-Version: 1.0 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal Content-Disposition: inline Content-Type: text/plain Content-Transfer-Encoding: 7bit Content-length: 574 X-archive-position: 1700 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: xerox@foonet.net Precedence: bulk X-list: netdev I'm having a BAD cpu usage problem using linux routers for routing an extreme number of flows at high pps rates. Take for instance, random source ips, single destination, or random destinations at 100kpps and it destroys the route cache even on a dual p3 1.26. The k_softirqd processes spend their entire times updating the running gc on the cache and the machine can't do anything else. Is there a way to TURN OFF the route cache (Like Cisco) so that it doesn't exist any more, maybe use more of an 'adjacency cache' ? Thanks! Paul [[HTML alternate version deleted]] From erik@hensema.net Mon Feb 17 06:48:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 06:48:59 -0800 (PST) Received: from dexter.hensema.net (cc78409-a.hnglo1.ov.home.nl [212.120.97.185]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1HEms3v013665 for ; Mon, 17 Feb 2003 06:48:57 -0800 Received: from bender.home.hensema.net (bender.home.hensema.net [192.168.1.252]) by dexter.hensema.net (8.12.3/8.12.3) with ESMTP id h1HEvSer022086 for ; Mon, 17 Feb 2003 15:57:28 +0100 Received: from bender.home.hensema.net (localhost [127.0.0.1]) by bender.home.hensema.net (8.12.3/8.12.3) with ESMTP id h1HEvRcS003419 for ; Mon, 17 Feb 2003 15:57:27 +0100 Received: (from erik@localhost) by bender.home.hensema.net (8.12.3/8.12.3/Submit) id h1HEvRgQ003418 for netdev@oss.sgi.com; Mon, 17 Feb 2003 15:57:27 +0100 Date: Mon, 17 Feb 2003 15:57:27 +0100 From: Erik Hensema To: netdev@oss.sgi.com Subject: RFC: promote netfilter MARK value from IPv6 packets to sit packets Message-ID: <20030217145727.GA3413@hensema.net> Reply-To: erik@hensema.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.27i X-archive-position: 1701 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: erik@hensema.net Precedence: bulk X-list: netdev Hi, I'm new to the list, so be gentle with your flames ;-) On my outgoing connection to the internet I provide QoS using ratelimiting and prioritizing packets. However, it's a cablemodem and my provider only speaks IPv4. Over the connections I run multiple IPv6 tunnels, and I want to provide QoS on these tunnels too. QoS however can only be provided on the physical outgoing interface. On that level, I can only see IPv4 packets and therefore it's impossible to provide QoS on IPv6 for me. In order to be able to provide QoS on tunneled IPv6 connections, I've created a simple patch (definately not ready for inclusion in the kernel, since it surely needs a configuration option) which promotes the netfilter MARK value from the IPv6 packets to the sit packets. Now I can mark packets using ip6tables, and on the ipv4 level I can still differentiate between the priorities. Problem solved, I'm happy ;-) Below is the patch, created on Linux 2.4.19: --- net/ipv6/sit.c.orig Mon Feb 17 15:30:41 2003 +++ net/ipv6/sit.c Mon Feb 17 15:29:40 2003 @@ -571,6 +571,9 @@ } if (skb->sk) skb_set_owner_w(new_skb, skb->sk); +#ifdef CONFIG_NETFILTER + new_skb->nfmark = skb->nfmark; +#endif dev_kfree_skb(skb); skb = new_skb; } -- Erik Hensema (erik@hensema.net) From cloos@jhcloos.com Mon Feb 17 07:55:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 07:55:16 -0800 (PST) Received: from ore.jhcloos.com (ore.jhcloos.com [64.240.156.239]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1HFt83v018622 for ; Mon, 17 Feb 2003 07:55:10 -0800 Received: from lugabout.jhcloos.org (ppp27.pm3-11.buf-ch.ny.localnet.com [207.251.195.91]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client CN "lugabout.jhcloos.org", Issuer "ca.jhcloos.com" (verified OK)) by ore.jhcloos.com (Postfix) with ESMTP id 5EC271C36F; Mon, 17 Feb 2003 10:03:33 -0600 (CST) Received: from lugabout.jhcloos.org (localhost [127.0.0.1]) by lugabout.jhcloos.org (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id 017F8362; Mon, 17 Feb 2003 16:03:19 +0000 (GMT) To: "Feldman, Scott" Cc: netdev@oss.sgi.com Subject: Re: e100 in 2.5.59 through 2.5.61 References: From: "James H. Cloos Jr." In-Reply-To: Date: 17 Feb 2003 11:03:18 -0500 Message-ID: Lines: 17 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 1702 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cloos@jhcloos.com Precedence: bulk X-list: netdev >>>>> "Scott" == Feldman, Scott writes: >> The switch is fine with other 100 devices, such as the printer and >> the 802.11 gateway. It is also happy with the 3com card in the >> laptop's port duplicator. Scott> Switch vendor/model? Good point. It is a Belkin F5D5130-8 8-port 10/100. >> Both the e100 and the eepro100 modules fail at 100baseT. Scott> Is the switch port forced or autoneg? Autoneg. -JimC From Eric.Lemoine@sun.com Mon Feb 17 08:38:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 08:38:13 -0800 (PST) Received: from s1.smtp.oleane.net (s1.smtp.oleane.net [195.25.12.3]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1HGc73v021338 for ; Mon, 17 Feb 2003 08:38:09 -0800 Received: from gbl-dhcp-198-194.europe.research.sun.com ([194.2.198.194]) by s1.smtp.oleane.net with ESMTP id h1HGkfvp007240 for ; Mon, 17 Feb 2003 17:46:41 +0100 Received: from eric by (null) with local (MasqMail 0.1.16) id 18koPh-0FG-00 for netdev@oss.sgi.com; Mon, 17 Feb 2003 17:46:41 +0100 Date: Mon, 17 Feb 2003 17:46:41 +0100 From: Eric Lemoine To: netdev@oss.sgi.com Subject: socket lock question Message-ID: <20030217164640.GC289@udine> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Warning: return path set from From: address X-archive-position: 1703 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Eric.Lemoine@ens-lyon.fr Precedence: bulk X-list: netdev Hi It seems to me that sk->slock.lock is grabbed twice successively in the same code path, namely in tcp_v4_rcv() and tcp_v4_do_rcv(). Does anyone know how this is possible? -- Eric From pavlic@de.ibm.com Mon Feb 17 10:51:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 10:51:19 -0800 (PST) Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [194.196.100.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1HIp83v024636 for ; Mon, 17 Feb 2003 10:51:09 -0800 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate.de.ibm.com (8.12.3/8.12.3) with ESMTP id h1HIxbtY139268 for ; Mon, 17 Feb 2003 19:59:37 +0100 Received: from d12ml012.de.ibm.com (d12ml012_cs0 [9.165.223.54]) by d12relay02.de.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1HIxbV9283736 for ; Mon, 17 Feb 2003 19:59:37 +0100 Importance: Normal MIME-Version: 1.0 Sensitivity: To: netdev@oss.sgi.com Subject: [PATCH] : shared ipv6 cards X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001 From: "Frank Pavlic" Message-ID: Date: Mon, 17 Feb 2003 19:59:35 +0100 X-MIMETrack: Serialize by Router on D12ML012/12/M/IBM(Release 5.0.9a |January 7, 2002) at 17/02/2003 19:59:36 Content-Type: multipart/mixed; boundary="=_mixed 0067A116C1256CD0_=" X-archive-position: 1704 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavlic@de.ibm.com Precedence: bulk X-list: netdev --=_mixed 0067A116C1256CD0_= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I ask for integration of the attached patch into the stock kernel. The patch is against kernel version 2.5.59 This patch allows network device drivers to write their own IPv6=20 address autoconfiguration function which will be then called by the=20 IPv6 stack.The ipv6_generate_eui64(..) function is used as default. The new net_device function pointer as well as the new struct member=20 dev_id is needed to avoid duplicate address conflicts on Linux for z/Series when= =20 shared OSA cards are used with IPv6 so that it is allowed to replace the part=20 0xFFFE of an EUI-64 based interface indentifier by another 16 bit value.=20 The following changes are performed: drivers/net/net_init.c initialization of dev_id and generate_eui64 in ether_setup, tr_setup and fddi_setup. include/linux/netdevice.h introduced new function pointer generate_eui64 and new variable=20 dev_id in net_device structure. include/net/addrconf.h function prototype for ipv6_generate_eui64 function=20 net/ipv6/addrconf.c make ipv6_generate_eui64 non-static replace all ipv6_generate_eui64 calls with dev->generate_eui64 net/8021q/vlan.c Take it into account for VLAN devices in function register_802_1Q_vlan_device(). On Linux for zSeries OSA network cards can be shared among various Linuxes. The OSA card has only one MAC address. This leads to duplicate address conflicts in conjunction with IPv6 and a vanilla kernel if more than one Linux use the same card. But the device driver for the card can deliver a unique 16-bit identifier for each Linux sharing the same card. This identifier is placed instead of the part 0xFFFE in the interface identifier. The "u" bit of the interface identifier is not inverted when the new feature is used. Hence the resulting interface identifier has local scope according to RFC2373. Consequently this change of the autoconfiguration does not violate any RFCs.=20 Mit freundlichen Gr=FCssen / Best regards Frank Pavlic =20 Linux for eServer Development Schoenaicher Str. 220, 71032 Boeblingen Phone: ext. +49-(0)7031/16-2463, int. *120-2463 mailto: pavlic@de.ibm.com --=_mixed 0067A116C1256CD0_= Content-Type: application/octet-stream; name="shared-ipv6.diff" Content-Disposition: attachment; filename="shared-ipv6.diff" Content-Transfer-Encoding: base64 ZGlmZiAtTmF1ciBsaW51eC0yLjUuNTktb2xkL2RyaXZlcnMvbmV0L25ldF9p bml0LmMgbGludXgtMi41LjU5L2RyaXZlcnMvbmV0L25ldF9pbml0LmMKLS0t IGxpbnV4LTIuNS41OS1vbGQvZHJpdmVycy9uZXQvbmV0X2luaXQuYwkyMDAz LTAyLTE3IDE4OjQ4OjQwLjAwMDAwMDAwMCArMDEwMAorKysgbGludXgtMi41 LjU5L2RyaXZlcnMvbmV0L25ldF9pbml0LmMJMjAwMy0wMi0xNyAxODo1Mjo1 Ny4wMDAwMDAwMDAgKzAxMDAKQEAgLTQxMyw3ICs0MTMsOCBAQAogCWRldi0+ aGFyZF9oZWFkZXJfY2FjaGUJPSBldGhfaGVhZGVyX2NhY2hlOwogCWRldi0+ aGVhZGVyX2NhY2hlX3VwZGF0ZT0gZXRoX2hlYWRlcl9jYWNoZV91cGRhdGU7 CiAJZGV2LT5oYXJkX2hlYWRlcl9wYXJzZQk9IGV0aF9oZWFkZXJfcGFyc2U7 Ci0KKwlkZXYtPmdlbmVyYXRlX2V1aTY0ICAgICA9IE5VTEw7CisJZGV2LT5k ZXZfaWQgICAgICAgICAgICAgPSAwOwogCWRldi0+dHlwZQkJPSBBUlBIUkRf RVRIRVI7CiAJZGV2LT5oYXJkX2hlYWRlcl9sZW4gCT0gRVRIX0hMRU47CiAJ ZGV2LT5tdHUJCT0gMTUwMDsgLyogZXRoX210dSAqLwpAQCAtNDM5LDYgKzQ0 MCw4IEBACiAJZGV2LT5jaGFuZ2VfbXR1CQkJPSBmZGRpX2NoYW5nZV9tdHU7 CiAJZGV2LT5oYXJkX2hlYWRlcgkJPSBmZGRpX2hlYWRlcjsKIAlkZXYtPnJl YnVpbGRfaGVhZGVyCQk9IGZkZGlfcmVidWlsZF9oZWFkZXI7CisJZGV2LT5n ZW5lcmF0ZV9ldWk2NCAgICAgICAgICAgICA9IE5VTEw7CisJZGV2LT5kZXZf aWQgICAgICAgICAgICAgICAgICAgICA9IDA7CiAKIAlkZXYtPnR5cGUJCQkJ PSBBUlBIUkRfRkRESTsKIAlkZXYtPmhhcmRfaGVhZGVyX2xlbgk9IEZERElf S19TTkFQX0hMRU4rMzsJLyogQXNzdW1lIDgwMi4yIFNOQVAgaGRyIGxlbiAr IDMgcGFkIGJ5dGVzICovCkBAIC01ODQsNiArNTg3LDggQEAKIAkKIAlkZXYt PmhhcmRfaGVhZGVyCT0gdHJfaGVhZGVyOwogCWRldi0+cmVidWlsZF9oZWFk ZXIJPSB0cl9yZWJ1aWxkX2hlYWRlcjsKKwlkZXYtPmdlbmVyYXRlX2V1aTY0 ICAgICA9IE5VTEw7CisJZGV2LT5kZXZfaWQgICAgICAgICAgICAgPSAwOwog CiAJZGV2LT50eXBlCQk9IEFSUEhSRF9JRUVFODAyX1RSOwogCWRldi0+aGFy ZF9oZWFkZXJfbGVuCT0gVFJfSExFTjsKZGlmZiAtTmF1ciBsaW51eC0yLjUu NTktb2xkL2luY2x1ZGUvbGludXgvbmV0ZGV2aWNlLmggbGludXgtMi41LjU5 L2luY2x1ZGUvbGludXgvbmV0ZGV2aWNlLmgKLS0tIGxpbnV4LTIuNS41OS1v bGQvaW5jbHVkZS9saW51eC9uZXRkZXZpY2UuaAkyMDAzLTAyLTE3IDE4OjQ4 OjU4LjAwMDAwMDAwMCArMDEwMAorKysgbGludXgtMi41LjU5L2luY2x1ZGUv bGludXgvbmV0ZGV2aWNlLmgJMjAwMy0wMi0xNyAxODo1MTo0My4wMDAwMDAw MDAgKzAxMDAKQEAgLTQyMiw3ICs0MjIsNyBAQAogCQkJCQkJICAgICB1bnNp Z25lZCBjaGFyICpoYWRkcik7CiAJaW50CQkJKCpuZWlnaF9zZXR1cCkoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldiwgc3RydWN0IG5laWdoX3Bhcm1zICopOwog CWludAkJCSgqYWNjZXB0X2Zhc3RwYXRoKShzdHJ1Y3QgbmV0X2RldmljZSAq LCBzdHJ1Y3QgZHN0X2VudHJ5Kik7Ci0KKwlpbnQgICAgICAgICAgICAgICAg ICAgICAoKmdlbmVyYXRlX2V1aTY0KSh1OCAqZXVpLCBzdHJ1Y3QgbmV0X2Rl dmljZSAqZGV2KTsKIAkvKiBvcGVuL3JlbGVhc2UgYW5kIHVzYWdlIG1hcmtp bmcgKi8KIAlzdHJ1Y3QgbW9kdWxlICpvd25lcjsKIApAQCAtNDQwLDYgKzQ0 MCw4IEBACiAJc3RydWN0IGRpdmVydF9ibGsJKmRpdmVydDsKICNlbmRpZiAv KiBDT05GSUdfTkVUX0RJVkVSVCAqLwogCisJLyogdXNlIGRldl9pZCBpbiBj b25qdW5jdGlvbiB3aXRoIHNoYXJlZCBuZXR3b3JrIGNhcmRzKi8KKwl1bnNp Z25lZCBzaG9ydCAgICAgICAgICAgZGV2X2lkOyAKIAkvKiBnZW5lcmljIG9i amVjdCByZXByZXNlbnRhdGlvbiAqLwogCXN0cnVjdCBrb2JqZWN0IGtvYmo7 CiB9OwpkaWZmIC1OYXVyIGxpbnV4LTIuNS41OS1vbGQvaW5jbHVkZS9uZXQv YWRkcmNvbmYuaCBsaW51eC0yLjUuNTkvaW5jbHVkZS9uZXQvYWRkcmNvbmYu aAotLS0gbGludXgtMi41LjU5LW9sZC9pbmNsdWRlL25ldC9hZGRyY29uZi5o CTIwMDMtMDItMTcgMTg6NDg6NTkuMDAwMDAwMDAwICswMTAwCisrKyBsaW51 eC0yLjUuNTkvaW5jbHVkZS9uZXQvYWRkcmNvbmYuaAkyMDAzLTAyLTE3IDE4 OjUyOjE0LjAwMDAwMDAwMCArMDEwMApAQCAtNTksNyArNTksNyBAQAogCQkJ CQkgICAgICAgc3RydWN0IGluNl9hZGRyICpkYWRkciwKIAkJCQkJICAgICAg IHN0cnVjdCBpbjZfYWRkciAqc2FkZHIpOwogZXh0ZXJuIGludAkJCWlwdjZf Z2V0X2xsYWRkcihzdHJ1Y3QgbmV0X2RldmljZSAqZGV2LCBzdHJ1Y3QgaW42 X2FkZHIgKik7Ci0KK2V4dGVybiBpbnQgICAgICAgICAgICAgICAgICAgICAg aXB2Nl9nZW5lcmF0ZV9ldWk2NCh1OCAqZXVpLCBzdHJ1Y3QgbmV0X2Rldmlj ZSAqZGV2KTsKIC8qCiAgKgltdWx0aWNhc3QgcHJvdG90eXBlcyAobWNhc3Qu YykKICAqLwpkaWZmIC1OYXVyIGxpbnV4LTIuNS41OS1vbGQvbmV0LzgwMjFx L3ZsYW4uYyBsaW51eC0yLjUuNTkvbmV0LzgwMjFxL3ZsYW4uYwotLS0gbGlu dXgtMi41LjU5LW9sZC9uZXQvODAyMXEvdmxhbi5jCTIwMDMtMDItMTcgMTg6 NDk6NDcuMDAwMDAwMDAwICswMTAwCisrKyBsaW51eC0yLjUuNTkvbmV0Lzgw MjFxL3ZsYW4uYwkyMDAzLTAyLTE3IDE4OjUzOjQ4LjAwMDAwMDAwMCArMDEw MApAQCAtNDQ0LDEwICs0NDQsMTQgQEAKIAkvKiBJRkZfQlJPQURDQVNUfElG Rl9NVUxUSUNBU1Q7ID8/PyAqLwogCW5ld19kZXYtPmZsYWdzID0gcmVhbF9k ZXYtPmZsYWdzOwogCW5ld19kZXYtPmZsYWdzICY9IH5JRkZfVVA7Ci0KKwkK IAkvKiBNYWtlIHRoaXMgdGhpbmcga25vd24gYXMgYSBWTEFOIGRldmljZSAq LwogCW5ld19kZXYtPnByaXZfZmxhZ3MgfD0gSUZGXzgwMl8xUV9WTEFOOwot CQkJCQorCQorCS8qIGlwdjYgc2hhcmVkIGNhcmQgcmVsYXRlZCBzdHVmZiAq LworCW5ld19kZXYtPmRldl9pZCA9IHJlYWxfZGV2LT5kZXZfaWQ7CisJbmV3 X2Rldi0+Z2VuZXJhdGVfZXVpNjQgPSByZWFsX2Rldi0+Z2VuZXJhdGVfZXVp NjQ7CisKIAkvKiBuZWVkIDQgYnl0ZXMgZm9yIGV4dHJhIFZMQU4gaGVhZGVy IGluZm8sCiAJICogaG9wZSB0aGUgdW5kZXJseWluZyBkZXZpY2UgY2FuIGhh bmRsZSBpdC4KIAkgKi8KZGlmZiAtTmF1ciBsaW51eC0yLjUuNTktb2xkL25l dC9pcHY2L2FkZHJjb25mLmMgbGludXgtMi41LjU5L25ldC9pcHY2L2FkZHJj b25mLmMKLS0tIGxpbnV4LTIuNS41OS1vbGQvbmV0L2lwdjYvYWRkcmNvbmYu YwkyMDAzLTAyLTE3IDE4OjQ5OjQ3LjAwMDAwMDAwMCArMDEwMAorKysgbGlu dXgtMi41LjU5L25ldC9pcHY2L2FkZHJjb25mLmMJMjAwMy0wMi0xNyAxODo1 NDoxOS4wMDAwMDAwMDAgKzAxMDAKQEAgLTY4NCw3ICs2ODQsNyBAQAogfQog CiAKLXN0YXRpYyBpbnQgaXB2Nl9nZW5lcmF0ZV9ldWk2NCh1OCAqZXVpLCBz dHJ1Y3QgbmV0X2RldmljZSAqZGV2KQoraW50IGlwdjZfZ2VuZXJhdGVfZXVp NjQodTggKmV1aSwgc3RydWN0IG5ldF9kZXZpY2UgKmRldikKIHsKIAlzd2l0 Y2ggKGRldi0+dHlwZSkgewogCWNhc2UgQVJQSFJEX0VUSEVSOgpAQCAtODk1 LDcgKzg5NSw3IEBACiAKIAkJaWYgKHBpbmZvLT5wcmVmaXhfbGVuID09IDY0 KSB7CiAJCQltZW1jcHkoJmFkZHIsICZwaW5mby0+cHJlZml4LCA4KTsKLQkJ CWlmIChpcHY2X2dlbmVyYXRlX2V1aTY0KGFkZHIuczZfYWRkciArIDgsIGRl dikgJiYKKwkJCWlmIChkZXYtPmdlbmVyYXRlX2V1aTY0KGFkZHIuczZfYWRk ciArIDgsIGRldikgJiYKIAkJCSAgICBpcHY2X2luaGVyaXRfZXVpNjQoYWRk ci5zNl9hZGRyICsgOCwgaW42X2RldikpIHsKIAkJCQlpbjZfZGV2X3B1dChp bjZfZGV2KTsKIAkJCQlyZXR1cm47CkBAIC0xMjQwLDggKzEyNDAsOSBAQAog CiAJbWVtc2V0KCZhZGRyLCAwLCBzaXplb2Yoc3RydWN0IGluNl9hZGRyKSk7 CiAJYWRkci5zNl9hZGRyMzJbMF0gPSBodG9ubCgweEZFODAwMDAwKTsKLQot CWlmIChpcHY2X2dlbmVyYXRlX2V1aTY0KGFkZHIuczZfYWRkciArIDgsIGRl dikgPT0gMCkKKwlpZiAoIWRldi0+Z2VuZXJhdGVfZXVpNjQpIAorCQlkZXYt PmdlbmVyYXRlX2V1aTY0ID0gaXB2Nl9nZW5lcmF0ZV9ldWk2NDsKKwlpZiAo ZGV2LT5nZW5lcmF0ZV9ldWk2NChhZGRyLnM2X2FkZHIgKyA4LCBkZXYpID09 IDApCiAJCWFkZHJjb25mX2FkZF9saW5rbG9jYWwoaWRldiwgJmFkZHIpOwog fQogCg== --=_mixed 0067A116C1256CD0_=-- From davem@redhat.com Mon Feb 17 18:36:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 18:36:11 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1I2a63v009751 for ; Mon, 17 Feb 2003 18:36:07 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA14284; Mon, 17 Feb 2003 18:29:15 -0800 Date: Mon, 17 Feb 2003 18:29:15 -0800 (PST) Message-Id: <20030217.182915.41644338.davem@redhat.com> To: wjhun@cisco.com Cc: greearb@candelatech.com, netdev@oss.sgi.com Subject: Re: Getting details about an 802.1q VLAN interface from userspace From: "David S. Miller" In-Reply-To: <20030213143921.A21977@cisco.com> References: <20030213143921.A21977@cisco.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1705 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Will Jhun Date: Thu, 13 Feb 2003 14:39:21 -0800 Is there a better way to get this information via netlink or some ioctl()? Would it be useful if I (or someone) added an ioctl() type to get information about a VLAN interface? (vlan, ifindex of trunking (real_dev) interface, priority maps; basically the content of struct vlan_dev_info) I would accept new netlink interfaces to get and set this information. ioctls are gross and will not be tolerated :) From davem@redhat.com Mon Feb 17 19:04:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 19:04:18 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1I34A3v010833 for ; Mon, 17 Feb 2003 19:04:11 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id SAA14393; Mon, 17 Feb 2003 18:57:19 -0800 Date: Mon, 17 Feb 2003 18:57:19 -0800 (PST) Message-Id: <20030217.185719.28797590.davem@redhat.com> To: jgarzik@pobox.com Cc: manfred@colorfullife.com, zaitcev@redhat.com, jbourne@mtroyal.ab.ca, netdev@oss.sgi.com Subject: Re: NAPI note From: "David S. Miller" In-Reply-To: <3E4D8295.2050400@pobox.com> References: <3E4D66DF.3040800@colorfullife.com> <3E4D8295.2050400@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1706 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Fri, 14 Feb 2003 18:58:13 -0500 Manfred Spraul wrote: > It seems to be a generic NAPI restriction: > The caller of netif_receive_skb() must not own a spinlock that is > acquired from an interrupt handler. Thanks much for noticing this, Manfred. I think this logic is buggy. In the example I've seen posted, only a NAPI implementation bug could cause the situation to occur. If cpu1 is in ->poll() for the driver, then by definition the device shall not cause interrupts. The device's interrupts are disabled before we enter the ->poll() handler, and as such the "cpu2 take device interrupt and takes driver->lock" cannot occur. If anything we've found a bug in interrupt disabling in the tg3 driver. From jgarzik@pobox.com Mon Feb 17 22:46:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 17 Feb 2003 22:46:55 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1I6ki3v014278 for ; Mon, 17 Feb 2003 22:46:46 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18l1ey-0008Rl-00; Tue, 18 Feb 2003 06:55:20 +0000 Message-ID: <3E51D8BF.1020804@pobox.com> Date: Tue, 18 Feb 2003 01:54:55 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [code] new NAPI helper functions Content-Type: multipart/mixed; boundary="------------020102020808000908050304" X-archive-position: 1707 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------020102020808000908050304 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Here are some functions in tg3, that I would like to eventually make available to all net drivers (and other net stack users). --------------020102020808000908050304 Content-Type: text/x-csrc; name="x.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="x.c" /* these three netif_xxx funcs should be moved into generic net layer */ static void netif_poll_disable(struct net_device *dev) { while (test_and_set_bit(__LINK_STATE_RX_SCHED, &dev->state)) { current->state = TASK_INTERRUPTIBLE; schedule_timeout(1); } } static inline void netif_poll_enable(struct net_device *dev) { clear_bit(__LINK_STATE_RX_SCHED, &dev->state); } /* same as netif_rx_complete, except that local_irq_save(flags) * has already been issued */ static inline void __netif_rx_complete(struct net_device *dev) { if (!test_bit(__LINK_STATE_RX_SCHED, &dev->state)) BUG(); list_del(&dev->poll_list); clear_bit(__LINK_STATE_RX_SCHED, &dev->state); } --------------020102020808000908050304-- From davem@redhat.com Tue Feb 18 00:00:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 00:01:28 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1I80o3v017655 for ; Tue, 18 Feb 2003 00:00:51 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA15137; Mon, 17 Feb 2003 23:53:58 -0800 Date: Mon, 17 Feb 2003 23:53:57 -0800 (PST) Message-Id: <20030217.235357.91322374.davem@redhat.com> To: jgarzik@pobox.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [code] new NAPI helper functions From: "David S. Miller" In-Reply-To: <3E51D8BF.1020804@pobox.com> References: <3E51D8BF.1020804@pobox.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1708 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 18 Feb 2003 01:54:55 -0500 Here are some functions in tg3, that I would like to eventually make available to all net drivers (and other net stack users). Looks fine by me. From erik@hensema.net Tue Feb 18 04:04:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 04:05:06 -0800 (PST) Received: from dexter.hensema.net (cc78409-a.hnglo1.ov.home.nl [212.120.97.185]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1IC4s3v025106 for ; Tue, 18 Feb 2003 04:04:56 -0800 Received: from bender.home.hensema.net (bender.home.hensema.net [192.168.1.252]) by dexter.hensema.net (8.12.3/8.12.3) with ESMTP id h1ICDWer017402 for ; Tue, 18 Feb 2003 13:13:32 +0100 Received: from bender.home.hensema.net (localhost [127.0.0.1]) by bender.home.hensema.net (8.12.3/8.12.3) with ESMTP id h1ICDW58005857 for ; Tue, 18 Feb 2003 13:13:32 +0100 Received: (from erik@localhost) by bender.home.hensema.net (8.12.3/8.12.3/Submit) id h1ICDVue005856 for netdev@oss.sgi.com; Tue, 18 Feb 2003 13:13:31 +0100 Date: Tue, 18 Feb 2003 13:13:31 +0100 From: Erik Hensema To: netdev@oss.sgi.com Subject: [Patch 2.4.21-pre4]: promote netfilter MARK values to sit packets Message-ID: <20030218121331.GA5848@hensema.net> Reply-To: erik@hensema.net Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="0OAP2g/MAC+5xKAE" Content-Disposition: inline User-Agent: Mutt/1.3.27i X-archive-position: 1709 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: erik@hensema.net Precedence: bulk X-list: netdev --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline This is the patch I sent to the list yesterday, this time including config options and help. The code should apply to at least 2.4.19 and up, the Configure.help patch is made against 2.4.21-pre4. Is this material for lkml and/or Marcello yet? -- Erik Hensema (erik@hensema.net) --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="sit-promote-mark-2.4.21-pre4.diff" diff -ur linux-2.4.21-pre4/Documentation/Configure.help linux-2.4.21-pre4.patched/Documentation/Configure.help --- linux-2.4.21-pre4/Documentation/Configure.help Tue Feb 18 12:54:18 2003 +++ linux-2.4.21-pre4.patched/Documentation/Configure.help Tue Feb 18 12:52:16 2003 @@ -5627,6 +5627,18 @@ It is safe to say N here for now. +IPv6: Promote netfilter MARK value to sit packets +CONFIG_IPV6_SIT_PROMOTE_MARK + If you use IPv6-in-IPv4 tunnels, you can use this option to mark + packets using ip6tables, and then match the sit (tunnel) packets + using iptables on the IPv4 level, or a tc fw match on the physical + outgoing interface. + + You need this if you want to provide QoS on a tunnelled IPv6 + connection. + + If unsure, say N. + Kernel httpd acceleration CONFIG_KHTTPD The kernel httpd acceleration daemon (kHTTPd) is a (limited) web diff -ur linux-2.4.21-pre4/net/ipv6/Config.in linux-2.4.21-pre4.patched/net/ipv6/Config.in --- linux-2.4.21-pre4/net/ipv6/Config.in Fri Dec 21 18:42:05 2001 +++ linux-2.4.21-pre4.patched/net/ipv6/Config.in Tue Feb 18 12:48:50 2003 @@ -7,4 +7,8 @@ if [ "$CONFIG_NETFILTER" != "n" ]; then source net/ipv6/netfilter/Config.in + + if [ "$CONFIG_IP6_NF_IPTABLES" != "n" ] ; then + bool ' IPv6: Promote netfilter MARK value to sit packets' CONFIG_IPV6_SIT_PROMOTE_MARK + fi fi diff -ur linux-2.4.21-pre4/net/ipv6/sit.c linux-2.4.21-pre4.patched/net/ipv6/sit.c --- linux-2.4.21-pre4/net/ipv6/sit.c Fri Nov 29 00:53:15 2002 +++ linux-2.4.21-pre4.patched/net/ipv6/sit.c Tue Feb 18 12:48:23 2003 @@ -571,6 +571,9 @@ } if (skb->sk) skb_set_owner_w(new_skb, skb->sk); +#ifdef CONFIG_IPV6_SIT_PROMOTE_MARK + new_skb->nfmark = skb->nfmark; +#endif dev_kfree_skb(skb); skb = new_skb; } --0OAP2g/MAC+5xKAE-- From erik@hensema.net Tue Feb 18 04:06:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 04:07:04 -0800 (PST) Received: from dexter.hensema.net (cc78409-a.hnglo1.ov.home.nl [212.120.97.185]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1IC6r3v025453 for ; Tue, 18 Feb 2003 04:06:55 -0800 Received: from bender.home.hensema.net (bender.home.hensema.net [192.168.1.252]) by dexter.hensema.net (8.12.3/8.12.3) with ESMTP id h1ICFVer017584 for ; Tue, 18 Feb 2003 13:15:31 +0100 Received: from bender.home.hensema.net (localhost [127.0.0.1]) by bender.home.hensema.net (8.12.3/8.12.3) with ESMTP id h1ICFV58005883 for ; Tue, 18 Feb 2003 13:15:31 +0100 Received: (from erik@localhost) by bender.home.hensema.net (8.12.3/8.12.3/Submit) id h1ICFVw0005882 for netdev@oss.sgi.com; Tue, 18 Feb 2003 13:15:31 +0100 Date: Tue, 18 Feb 2003 13:15:31 +0100 From: Erik Hensema To: netdev@oss.sgi.com Subject: [Patch 2.4.21-pre4] Move khttpd config option Message-ID: <20030218121531.GB5848@hensema.net> Reply-To: erik@hensema.net Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="JP+T4n/bALQSJXh8" Content-Disposition: inline User-Agent: Mutt/1.3.27i X-archive-position: 1710 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: erik@hensema.net Precedence: bulk X-list: netdev --JP+T4n/bALQSJXh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Currently the configure option for khttpd is below IPv6. This is confusing when IPv6 has suboptions, like with the sit-promote-mark patch. This patch moves the option. -- Erik Hensema (erik@hensema.net) --JP+T4n/bALQSJXh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="move-khttpd.diff" --- net/Config.in.orig Sat Aug 3 02:39:46 2002 +++ net/Config.in Tue Feb 18 13:04:58 2003 @@ -20,14 +20,14 @@ if [ "$CONFIG_INET" = "y" ]; then source net/ipv4/Config.in if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then + source net/khttpd/Config.in + if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then # IPv6 as module will cause a CRASH if you try to unload it tristate ' The IPv6 protocol (EXPERIMENTAL)' CONFIG_IPV6 if [ "$CONFIG_IPV6" != "n" ]; then source net/ipv6/Config.in fi fi - if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then - source net/khttpd/Config.in fi fi if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then --JP+T4n/bALQSJXh8-- From gandalf@wlug.westbo.se Tue Feb 18 07:44:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 07:44:52 -0800 (PST) Received: from tux.rsn.bth.se (postfix@tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1IFid3v005111 for ; Tue, 18 Feb 2003 07:44:40 -0800 Received: by tux.rsn.bth.se (Postfix, from userid 501) id F363C36FE4; Tue, 18 Feb 2003 16:17:45 +0100 (CET) Subject: [PATCH resend] zero rt_cache_stat statistics at init From: Martin Josefsson To: "David S. Miller" Cc: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Message-Id: <1045581465.18515.139.camel@tux.rsn.bth.se> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.1 Date: 18 Feb 2003 16:17:45 +0100 X-archive-position: 1711 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Hi, I'm running 2.5.58-mm1 with slab-debugging enabled and noticed this in /proc/net/rt_cache_stat (long line): 000000ae 5b61f6cf 5ab13072 5a5a5a5a 5a5a5a5a 5a5d0c9a 5a5a5a7f 5a5a5aa6 5a61e025 5a5a961a 5a5a5aea 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a Here's a patch against 2.5.58(-mm1) to memset() it at init, I believe this was missed when it was converted to percpu. It also includes a whitespace change to conform to the other sizeof's in ip_rt_init(): --- linux-2.5.58/net/ipv4/route.c.orig 2003-02-14 16:07:46.000000000 +0100 +++ linux-2.5.58/net/ipv4/route.c 2003-02-14 16:43:11.000000000 +0100 @@ -2652,11 +2652,18 @@ ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1); ip_rt_max_size = (rt_hash_mask + 1) * 16; - rt_cache_stat = kmalloc_percpu(sizeof (struct rt_cache_stat), + rt_cache_stat = kmalloc_percpu(sizeof(struct rt_cache_stat), GFP_KERNEL); if (!rt_cache_stat) goto out_enomem1; + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_possible(i)) + continue; + memset(per_cpu_ptr(rt_cache_stat, i), 0, + sizeof(struct rt_cache_stat)); + } + devinet_init(); ip_fib_init(); -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. From manfred@colorfullife.com Tue Feb 18 08:22:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 08:22:57 -0800 (PST) Received: from dbl.q-ag.de (IDENT:root@dbl.q-ag.de [80.146.160.66]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1IGMp3v006992 for ; Tue, 18 Feb 2003 08:22:53 -0800 Received: from colorfullife.com (localhost.localdomain [127.0.0.1]) by dbl.q-ag.de (8.12.5/8.12.5) with ESMTP id h1IGVKn1016101; Tue, 18 Feb 2003 17:31:21 +0100 Message-ID: <3E525FD8.1060009@colorfullife.com> Date: Tue, 18 Feb 2003 17:31:20 +0100 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021202 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: jgarzik@pobox.com, zaitcev@redhat.com, jbourne@mtroyal.ab.ca, netdev@oss.sgi.com Subject: Re: NAPI note References: <3E4D66DF.3040800@colorfullife.com> <3E4D8295.2050400@pobox.com> <20030217.185719.28797590.davem@redhat.com> In-Reply-To: <20030217.185719.28797590.davem@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1712 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: manfred@colorfullife.com Precedence: bulk X-list: netdev David S. Miller wrote: > From: Jeff Garzik > Date: Fri, 14 Feb 2003 18:58:13 -0500 > > Manfred Spraul wrote: > > It seems to be a generic NAPI restriction: > > The caller of netif_receive_skb() must not own a spinlock that is > > acquired from an interrupt handler. > > Thanks much for noticing this, Manfred. > >I think this logic is buggy. > >In the example I've seen posted, only a NAPI implementation bug >could cause the situation to occur. > >If cpu1 is in ->poll() for the driver, then by definition the >device shall not cause interrupts. The device's interrupts >are disabled before we enter the ->poll() handler, and as such >the "cpu2 take device interrupt and takes driver->lock" cannot >occur. > > No. I think the rule is that drivers that use the NAPI interface must not cause interrupts for packet receive and out-of-rx-buffers conditions. But what about media error interrupts, or tx interrupts? Or MIB counter overflow, etc. What about shared pci interrupts? All of them could occur, and if they take a spinlock that is held across netif_receive_skb(), then it can deadlock. OTHO if it's guaranteed that no interrupt occurs, then the nic should not take a spinlock at all and rely on the synchronization provided by NAPI. (->poll is single-threaded). -- Manfred From garzik@gtf.org Tue Feb 18 10:24:33 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 10:24:41 -0800 (PST) Received: from havoc.gtf.org (havoc.daloft.com [64.213.145.173]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1IIOW3v010649 for ; Tue, 18 Feb 2003 10:24:33 -0800 Received: by havoc.gtf.org (Postfix, from userid 500) id 510156648; Tue, 18 Feb 2003 13:33:06 -0500 (EST) Date: Tue, 18 Feb 2003 13:33:06 -0500 From: Jeff Garzik To: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: dev->xmit_lock_owner? Message-ID: <20030218183306.GA31478@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-archive-position: 1713 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev hum. It seems that dev_watchdog, among other places, does not assign a value to dev->xmit_lock_owner, when it takes the lock. I think this is a bug, but could be wrong ;-) From davem@redhat.com Tue Feb 18 15:51:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 15:51:13 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1INp43v020054 for ; Tue, 18 Feb 2003 15:51:05 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA16824; Tue, 18 Feb 2003 15:44:12 -0800 Date: Tue, 18 Feb 2003 15:44:11 -0800 (PST) Message-Id: <20030218.154411.39653430.davem@redhat.com> To: gandalf@wlug.westbo.se Cc: netdev@oss.sgi.com Subject: Re: [PATCH resend] zero rt_cache_stat statistics at init From: "David S. Miller" In-Reply-To: <1045581465.18515.139.camel@tux.rsn.bth.se> References: <1045581465.18515.139.camel@tux.rsn.bth.se> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1714 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev I got the patch I've just been away since the middle of last week. From davem@redhat.com Tue Feb 18 16:09:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 16:09:43 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J09Z3v020617 for ; Tue, 18 Feb 2003 16:09:36 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA16902; Tue, 18 Feb 2003 16:02:42 -0800 Date: Tue, 18 Feb 2003 16:02:41 -0800 (PST) Message-Id: <20030218.160241.10142341.davem@redhat.com> To: jgarzik@pobox.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: dev->xmit_lock_owner? From: "David S. Miller" In-Reply-To: <20030218183306.GA31478@gtf.org> References: <20030218183306.GA31478@gtf.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1715 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jeff Garzik Date: Tue, 18 Feb 2003 13:33:06 -0500 It seems that dev_watchdog, among other places, does not assign a value to dev->xmit_lock_owner, when it takes the lock. I think this is a bug, but could be wrong ;-) Not a bug, only pieces of the transmit path need to follow this rule. It's only meant to detect devices which have been chained in a way which forms a loop during transmit. From bwa@us.ibm.com Tue Feb 18 18:23:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 18:23:46 -0800 (PST) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J2Nf3v022973 for ; Tue, 18 Feb 2003 18:23:43 -0800 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.194.22]) by e31.co.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1J2WMmf020384; Tue, 18 Feb 2003 21:32:22 -0500 Received: from w-bwa1.beaverton.ibm.com (w-bwa1.beaverton.ibm.com [9.47.18.12]) by westrelay01.boulder.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1J2WL7E168444; Tue, 18 Feb 2003 19:32:21 -0700 Subject: Re: [PATCH] subset of RFC2553 From: Bruce Allan To: davem@redhat.com Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 18 Feb 2003 18:32:20 -0800 Message-Id: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> Mime-Version: 1.0 X-archive-position: 1716 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwa@us.ibm.com Precedence: bulk X-list: netdev [This message was originally sent as a private response to Dave and others. I am resending it to the original audience and apologize if the private response was the incorrect thing to do - just trying to reduce bandwidth ;-] Hi Dave, Thanks for your comments on the patch. As for the definition of struct sockaddr_storage, it was taken verbatim from the RFC and if I understand it correctly the use of the two pad fields and the __ss_align field are to force the structure to be aligned on a 64-bit boundary, be guaranteed large enough to hold anything up to the size specified by __SS_MAXSIZE, _and_ make it obvious what fills up the entire structure (i.e. no surprise padding put in by the compiler). There are two requirements for this structure described in the RFC that it be: "- Large enough to accommodate all supported protocol-specific address structures. - Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol specific address structures and used to access the fields of those structures without alignment problems." See below for more comments/questions... On Wed, 2003-02-12 at 21:43, David S. Miller wrote: > From: Bruce Allan > Date: 12 Feb 2003 15:15:39 -0800 > > I don't like how sockaddr_storage works, so you'll have to clean > it up before we move it to a generic spot. > > +struct sockaddr_storage { > + sa_family_t ss_family; /* address family */ > + /* Following fields are implementation specific */ > + char __ss_pad1[_SS_PAD1SIZE]; > + /* 6 byte pad, this is to make implementation */ > + /* specific pad up to alignment field that */ > + /* follows explicit in the data structure */ > + int64_t __ss_align; /* field to force desired structure */ > + /* storage alignment */ > + char __ss_pad2[_SS_PAD2SIZE]; > + /* 112 byte pad to achieve desired size, */ > + /* _SS_MAXSIZE value minus size of ss_family */ > + /* __ss_pad1, __ss_align fields is 112 */ > +}; > > All of this pad stuff is really unnecessary, just specify ss_family > and then "stuff" where "stuff" can be something like "char __data[0];" > Then you can add "attribute((aligned(64)))" or whatever to the > declaration as well. If you mean something like: struct sockaddr_storage { sa_family_t ss_family; char __data[0] __attribute__ ((aligned(128))); }; This will provide a 128-byte structure, but it is also aligned on a 128-byte boundary. I don't think it should necessarily have that constraint. How about this instead (a combination of your comment above and glibc's definition of sockaddr_storage): #define _SS_MAXSIZE 128 #define _ALIGNSIZE (sizeof(struct sockaddr *)) #if ULONG_MAX > 0xffffffff #define __ss_aligntype __u64 #else #define __ss_aligntype __u32 #endif struct sockaddr_storage { sa_family_t ss_family; __ss_aligntype __data[(_SS_MAXSIZE/sizeof(__ss_aligntype))-1]; } __attribute__ ((aligned(_ALIGNSIZE))); This will provide a _SS_MAXSIZE byte structure aligned properly for any 32- or 64-bit architecture (eg. on a 4-byte boundary on i386) which satisfies both requirements from the RFC mentioned above. Of course, there will be hidden padding between ss_family and __data introduced by the compiler. > > And if you're going to put some 64-bit type in here, use "__u64" > which actually makes you consistent with the rest of the kernel. Done. > > You could also do something like: > __u64 data[_SS_MAXSIZE / sizeof(__u64)]; This wouldn't allow for use of ss_family. > > Anything but this pad stuff... We also thought of using a union such as: struct sockaddr_storage { union { struct sockaddr sa; struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; /* etc.Should include all protocol specific * address structures */ } ss; } __attribute__ ((aligned(_ALIGNSIZE))); which would only be as large as the biggest protocol specific address structure and aligned properly, but would make for some unusual syntax during it's use not to mention it doesn't follow the RFC all that closely (doesn't provide for ss_family for instance). Any preferences, additional thoughts or comments? Thanks again, -- Bruce Allan Linux Technology Center IBM Corporation, Beaverton OR From hadi@cyberus.ca Tue Feb 18 18:40:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 18:40:23 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J2eK3v023896 for ; Tue, 18 Feb 2003 18:40:21 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18lKI9-000IAo-00; Tue, 18 Feb 2003 21:49:01 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1J2mgYO025347; Tue, 18 Feb 2003 21:48:42 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1J2mYta025344; Tue, 18 Feb 2003 21:48:34 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Tue, 18 Feb 2003 21:48:34 -0500 (EST) From: jamal To: "David S. Miller" cc: wjhun@cisco.com, "" , "" Subject: Re: Getting details about an 802.1q VLAN interface from userspace In-Reply-To: <20030217.182915.41644338.davem@redhat.com> Message-ID: <20030218214714.J25195@shell.cyberus.ca> References: <20030213143921.A21977@cisco.com> <20030217.182915.41644338.davem@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1717 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev I think these should be new attributes to RTM_*LINK. No need to define a new message. cheers, jamal On Mon, 17 Feb 2003, David S. Miller wrote: > From: Will Jhun > Date: Thu, 13 Feb 2003 14:39:21 -0800 > > Is there a better way to get this information via netlink or some > ioctl()? Would it be useful if I (or someone) added an ioctl() type to > get information about a VLAN interface? (vlan, ifindex of trunking > (real_dev) interface, priority maps; basically the content of struct > vlan_dev_info) > > I would accept new netlink interfaces to get and set this > information. ioctls are gross and will not be tolerated :) > > > From hadi@cyberus.ca Tue Feb 18 18:45:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 18:45:05 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J2j23v024314 for ; Tue, 18 Feb 2003 18:45:03 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18lKMh-000IiG-00; Tue, 18 Feb 2003 21:53:43 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1J2rLYO025356; Tue, 18 Feb 2003 21:53:21 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1J2rCoV025353; Tue, 18 Feb 2003 21:53:20 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Tue, 18 Feb 2003 21:53:12 -0500 (EST) From: jamal To: Manfred Spraul cc: "David S. Miller" , Jeff Garzik , "" , "" , "" Subject: Re: NAPI note In-Reply-To: <3E525FD8.1060009@colorfullife.com> Message-ID: <20030218212441.M25195@shell.cyberus.ca> References: <3E4D66DF.3040800@colorfullife.com> <3E4D8295.2050400@pobox.com> <20030217.185719.28797590.davem@redhat.com> <3E525FD8.1060009@colorfullife.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1718 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 18 Feb 2003, Manfred Spraul wrote: > David S. Miller wrote: > > > From: Jeff Garzik > > Date: Fri, 14 Feb 2003 18:58:13 -0500 > > > > Manfred Spraul wrote: > > > It seems to be a generic NAPI restriction: > > > The caller of netif_receive_skb() must not own a spinlock that is > > > acquired from an interrupt handler. > > > > Thanks much for noticing this, Manfred. > > > >I think this logic is buggy. > > > >In the example I've seen posted, only a NAPI implementation bug > >could cause the situation to occur. > > > >If cpu1 is in ->poll() for the driver, then by definition the > >device shall not cause interrupts. The device's interrupts > >are disabled before we enter the ->poll() handler, and as such > >the "cpu2 take device interrupt and takes driver->lock" cannot > >occur. > > > > > No. I think the rule is that drivers that use the NAPI interface must > not cause interrupts for packet receive and out-of-rx-buffers conditions. Ah, but that is only one of two rules. Theres other drivers which dont follow this rule and just shutdown all interupt sources. I know that the e1000 for example does this. I am not sure about the tg3. I think the doc says this but may not emphasize it as strongly. So if tg3 uses method 2 then its as Dave says - a bug. > But what about media error interrupts, or tx interrupts? Or MIB counter > overflow, etc. What about shared pci interrupts? Shared interupts should be interesting actually. However if you are in poll mode and you receive an interupt you should be able to quickly determine its not yours without much effect on shared locks, no? > All of them could occur, and if they take a spinlock that is held across > netif_receive_skb(), then it can deadlock. > yes this could happen with method 1 of programming the driver; however, tx, receive, link are essentially separate threads and would hardly share locks. > OTHO if it's guaranteed that no interrupt occurs, then the nic should > not take a spinlock at all and rely on the synchronization provided by > NAPI. (->poll is single-threaded). > i havent studied the e1000 theres a lot of this happening already. I dont think you need say to protect the tx ring for example from tx completion interupts vs regular softirq path. cheers, jamal From jgarzik@pobox.com Tue Feb 18 19:06:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 19:07:03 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J36r3v024913 for ; Tue, 18 Feb 2003 19:06:54 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18lKhm-0001Ne-00; Wed, 19 Feb 2003 03:15:30 +0000 Message-ID: <3E52F6AF.7000004@pobox.com> Date: Tue, 18 Feb 2003 22:14:55 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: Manfred Spraul , "David S. Miller" , zaitcev@redhat.com, jbourne@mtroyal.ab.ca, netdev@oss.sgi.com Subject: Re: NAPI note References: <3E4D66DF.3040800@colorfullife.com> <3E4D8295.2050400@pobox.com> <20030217.185719.28797590.davem@redhat.com> <3E525FD8.1060009@colorfullife.com> <20030218212441.M25195@shell.cyberus.ca> In-Reply-To: <20030218212441.M25195@shell.cyberus.ca> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1719 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev jamal wrote: > > On Tue, 18 Feb 2003, Manfred Spraul wrote: > > >>David S. Miller wrote: >> >> >>> From: Jeff Garzik >>> Date: Fri, 14 Feb 2003 18:58:13 -0500 >>> >>> Manfred Spraul wrote: >>> > It seems to be a generic NAPI restriction: >>> > The caller of netif_receive_skb() must not own a spinlock that is >>> > acquired from an interrupt handler. >>> >>> Thanks much for noticing this, Manfred. >>> >>>I think this logic is buggy. >>> >>>In the example I've seen posted, only a NAPI implementation bug >>>could cause the situation to occur. >>> >>>If cpu1 is in ->poll() for the driver, then by definition the >>>device shall not cause interrupts. The device's interrupts >>>are disabled before we enter the ->poll() handler, and as such >>>the "cpu2 take device interrupt and takes driver->lock" cannot >>>occur. >>> >>> >> >>No. I think the rule is that drivers that use the NAPI interface must >>not cause interrupts for packet receive and out-of-rx-buffers conditions. > > > Ah, but that is only one of two rules. > Theres other drivers which dont follow this rule and just shutdown > all interupt sources. I know that the e1000 for example does this. > I am not sure about the tg3. I think the doc says this but may not > emphasize it as strongly. > So if tg3 uses method 2 then its as Dave says - a bug. tg3 shuts down all interrupt sources, and handles all interrupt events in dev->poll(). David and I hashed it out a bit on IRC. The problem is that deliver_to_old_ones() waits, and thus the deadlock that Manfred described. For 2.4.x, the solution is simply to avoid the deadlock in the driver. For 2.5.x, David hinted that deliver_to_old_ones() may be going away. >>But what about media error interrupts, or tx interrupts? Or MIB counter >>overflow, etc. What about shared pci interrupts? > > > Shared interupts should be interesting actually. > However if you are in poll mode and you receive an interupt you should be > able to quickly determine its not yours without much effect on shared > locks, no? Normally, yes. However tg3 grabs a lock just about anytime it does anything. ;-) A long term project of mine is to slowly remove these locks, but that must wait until the driver stabilizes, and is overall a long process. Most of the locks _are_ removeable, but we keep hit deadlock bugs like this, and hardware bugs which need workarounds, so those come first. >>All of them could occur, and if they take a spinlock that is held across >>netif_receive_skb(), then it can deadlock. >> > > > yes this could happen with method 1 of programming the driver; however, > tx, receive, link are essentially separate threads and would hardly share > locks. They do in tg3's case. The locks can be removed eventually, but such is the state of life right now. Jeff From davem@redhat.com Tue Feb 18 19:20:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 19:20:42 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J3Kd3v025483 for ; Tue, 18 Feb 2003 19:20:40 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA17360; Tue, 18 Feb 2003 19:13:39 -0800 Date: Tue, 18 Feb 2003 19:13:39 -0800 (PST) Message-Id: <20030218.191339.23709806.davem@redhat.com> To: hadi@cyberus.ca Cc: wjhun@cisco.com, greearb@candelatech.com, netdev@oss.sgi.com Subject: Re: Getting details about an 802.1q VLAN interface from userspace From: "David S. Miller" In-Reply-To: <20030218214714.J25195@shell.cyberus.ca> References: <20030213143921.A21977@cisco.com> <20030217.182915.41644338.davem@redhat.com> <20030218214714.J25195@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1720 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: jamal Date: Tue, 18 Feb 2003 21:48:34 -0500 (EST) I think these should be new attributes to RTM_*LINK. No need to define a new message. I had overlooked this, you're certainly right. From davem@redhat.com Tue Feb 18 19:25:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 19:25:45 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J3Pg3v025936 for ; Tue, 18 Feb 2003 19:25:43 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id TAA17432; Tue, 18 Feb 2003 19:18:42 -0800 Date: Tue, 18 Feb 2003 19:18:41 -0800 (PST) Message-Id: <20030218.191841.132918432.davem@redhat.com> To: hadi@cyberus.ca Cc: manfred@colorfullife.com, jgarzik@pobox.com, zaitcev@redhat.com, jbourne@mtroyal.ab.ca, netdev@oss.sgi.com Subject: Re: NAPI note From: "David S. Miller" In-Reply-To: <20030218212441.M25195@shell.cyberus.ca> References: <20030217.185719.28797590.davem@redhat.com> <3E525FD8.1060009@colorfullife.com> <20030218212441.M25195@shell.cyberus.ca> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1721 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: jamal Date: Tue, 18 Feb 2003 21:53:12 -0500 (EST) Theres other drivers which dont follow this rule and just shutdown all interupt sources. I know that the e1000 for example does this. I am not sure about the tg3. I think the doc says this but may not emphasize it as strongly. So if tg3 uses method 2 then its as Dave says - a bug. Right, but I forgot that it's not a bug in the shared interrupt case where we need to grab a lock to access the hardware and fetch the interrupt status. Shared interupts should be interesting actually. However if you are in poll mode and you receive an interupt you should be able to quickly determine its not yours without much effect on shared locks, no? As Jeff has responded already, often you do need locks to do this sanely. In tg3, it's really complicated even though the chip writes the interrupt status to a piece of memory shared with the cpu. Any time you want to enable/disable tg3 chip interrupts you must flip and/or check a status bit in this piece of memory. So it really needs a lock. I personally think Jeff is overly optimistic about lock removal in the driver. :-) The last time I attempted to be clever here, we ended up with all sorts of deadlocks in tg3. It requires real brain time and heavy testing to make any kinds of changes in this area. I also don't want to accomplish this by splitting up into seperate lines of development of tg3, that's nuts as it would split up our testing and make both lines get less testing than a unified mainline driver would (which is what we happily do now). From jgarzik@pobox.com Tue Feb 18 20:25:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 20:26:00 -0800 (PST) Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J4Pq3v026762 for ; Tue, 18 Feb 2003 20:25:53 -0800 Received: from rdu57-8-131.nc.rr.com ([66.57.8.131] helo=pobox.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 18lLwF-0001xs-00; Wed, 19 Feb 2003 04:34:32 +0000 Message-ID: <3E53093F.5050502@pobox.com> Date: Tue, 18 Feb 2003 23:34:07 -0500 From: Jeff Garzik Organization: none User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk X-Accept-Language: en MIME-Version: 1.0 To: lkml , netdev@oss.sgi.com Subject: netdevices.txt update Content-Type: multipart/mixed; boundary="------------040403030109020202050704" X-archive-position: 1722 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------040403030109020202050704 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Just made a minor update to Documentation/networking/netdevices.txt, and thought I would take the opportunity to pass it around once again. Even though this doc has existed for quite a while now, I still come across code that loves to violate these locking rules in various ways. Comments and additions welcome Jeff --------------040403030109020202050704 Content-Type: text/plain; name="netdevices.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="netdevices.txt" Network Devices, the Kernel, and You! Introduction ============ The following is a random collection of documentation regarding network devices. struct net_device synchronization rules ======================================= dev->open: Synchronization: rtnl_lock() semaphore. Context: process dev->stop: Synchronization: rtnl_lock() semaphore. Context: process Note1: netif_running() is guaranteed false Note2: dev->poll() is guaranteed to be stopped dev->do_ioctl: Synchronization: rtnl_lock() semaphore. Context: process dev->get_stats: Synchronization: dev_base_lock rwlock. Context: nominally process, but don't sleep inside an rwlock dev->hard_start_xmit: Synchronization: dev->xmit_lock spinlock. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed false dev->tx_timeout: Synchronization: dev->xmit_lock spinlock. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed true dev->set_multicast_list: Synchronization: dev->xmit_lock spinlock. Context: BHs disabled dev->poll: Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See dev_close code and comments in net/core/dev.c for more info. Context: softirq --------------040403030109020202050704-- From Kazunori.Miyazawa@jp.yokogawa.com Tue Feb 18 20:40:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 20:40:16 -0800 (PST) Received: from zns001-0m9001.yokogawa.co.jp (zns001-0m9001.yokogawa.co.jp [203.174.79.138]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J4e73v027249 for ; Tue, 18 Feb 2003 20:40:08 -0800 Received: from zns001-0m9001.yokogawa.co.jp (localhost [127.0.0.1]) by zns001-0m9001.yokogawa.co.jp (8.11.6+Sun/8.11.6) with ESMTP id h1J4mkJ04047 for ; Wed, 19 Feb 2003 13:48:46 +0900 (JST) Received: from EXCHANGE03.jp.ykgw.net (zex001-0m9003.jp.ykgw.net [10.0.10.54]) by zns001-0m9001.yokogawa.co.jp (8.11.6+Sun/8.11.6) with ESMTP id h1J4mT003856; Wed, 19 Feb 2003 13:48:29 +0900 (JST) Received: from monza.miyazawa.org ([10.0.68.208]) by EXCHANGE03.jp.ykgw.net with Microsoft SMTPSVC(5.0.2195.5329); Wed, 19 Feb 2003 13:48:28 +0900 Date: Wed, 19 Feb 2003 13:48:50 +0900 From: Kazunori MIyazawa To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, Kazunori.Miyazawa@jp.yokogawa.com Subject: [PATCH] IPv6 IPsec support Message-Id: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> X-Mailer: Sylpheed version 0.8.9 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 19 Feb 2003 04:48:28.0573 (UTC) FILETIME=[251FA4D0:01C2D7D2] X-archive-position: 1723 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Kazunori.Miyazawa@jp.yokogawa.com Precedence: bulk X-list: netdev Hello, I'm MIYAZAWA@USAGI. This is a patch to support IPv6 IPsec on linux-2.5.62. It work well. I assume that skb->h.raw points properly on the skb in both inbound and outbound. IPv6 has some extension headers. It is not as simple as IPv4 options. AH however needs to fill zero on mutable options. skb->h.raw is used as an end of processing mutable options. There is crude trick at IPsec on Neighbor Discovery, because kernel needs dst to do IPsec but there is no dst at doing ND. I have no idea to avoid this issue except making dummy route at this moment. Do you have any good idea? Please let me know if you have some ideas and/or comments. Thanks in advance, --Kazunori Miyazawa (Yokogawa Electric Corporation) diff -urN linux-2.5.62/include/linux/ipv6.h linux25_for_patch/include/linux/ipv6.h --- linux-2.5.62/include/linux/ipv6.h 2003-02-18 07:56:25.000000000 +0900 +++ linux25_for_patch/include/linux/ipv6.h 2003-02-19 02:37:58.000000000 +0900 @@ -74,6 +74,21 @@ #define rt0_type rt_hdr.type; }; +struct ipv6_auth_hdr { + __u8 nexthdr; + __u8 hdrlen; /* This one is measured in 32 bit units! */ + __u16 reserved; + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 auth_data[4]; /* Length variable but >=4. Mind the 64 bit alignment! */ +}; + +struct ipv6_esp_hdr { + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 enc_data[8]; /* Length variable but >=8. Mind the 64 bit alignment! */ +}; + /* * IPv6 fixed header * diff -urN linux-2.5.62/include/net/dst.h linux25_for_patch/include/net/dst.h --- linux-2.5.62/include/net/dst.h 2003-02-18 07:56:58.000000000 +0900 +++ linux25_for_patch/include/net/dst.h 2003-02-19 02:37:57.000000000 +0900 @@ -248,6 +248,9 @@ extern int xfrm_lookup(struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); extern void xfrm_init(void); +extern int xfrm6_lookup(struct dst_entry **dst_p, struct flowi *fl, + struct sock *sk, int flags); +extern void xfrm6_init(void); #endif diff -urN linux-2.5.62/include/net/ip6_route.h linux25_for_patch/include/net/ip6_route.h --- linux-2.5.62/include/net/ip6_route.h 2003-02-18 07:56:02.000000000 +0900 +++ linux25_for_patch/include/net/ip6_route.h 2003-02-19 02:37:57.000000000 +0900 @@ -55,6 +55,8 @@ struct in6_addr *saddr, int oif, int flags); +extern struct rt6_info *ndisc_get_dummy_rt(void); + /* * support functions for ND * diff -urN linux-2.5.62/include/net/xfrm.h linux25_for_patch/include/net/xfrm.h --- linux-2.5.62/include/net/xfrm.h 2003-02-18 07:56:49.000000000 +0900 +++ linux25_for_patch/include/net/xfrm.h 2003-02-19 02:37:57.000000000 +0900 @@ -12,6 +12,7 @@ #include #include +#include #define XFRM_ALIGN8(len) (((len) + 7) & ~7) @@ -229,6 +230,8 @@ extern int xfrm_register_km(struct xfrm_mgr *km); extern int xfrm_unregister_km(struct xfrm_mgr *km); +extern u32 xfrm_policy_genid; +extern rwlock_t xfrm_policy_lock; extern struct xfrm_policy *xfrm_policy_list[XFRM_POLICY_MAX*2]; @@ -282,9 +285,11 @@ struct xfrm_dst *next; struct dst_entry dst; struct rtable rt; + struct rt6_info rt6; } u; }; +extern kmem_cache_t *secpath_cachep; struct sec_path { atomic_t refcnt; @@ -308,7 +313,6 @@ if (sp && atomic_dec_and_test(&sp->refcnt)) __secpath_destroy(sp); } - extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb); static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb) @@ -321,6 +325,18 @@ __xfrm_policy_check(sk, dir, skb); } +extern int __xfrm6_policy_check(struct sock *, int dir, struct sk_buff *skb); + +static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + if (sk && sk->policy[XFRM_POLICY_IN]) + return __xfrm6_policy_check(sk, dir, skb); + + return !xfrm_policy_list[dir] || + (skb->dst->flags & DST_NOPOLICY) || + __xfrm6_policy_check(sk, dir, skb); +} + extern int __xfrm_route_forward(struct sk_buff *skb); static inline int xfrm_route_forward(struct sk_buff *skb) @@ -378,14 +394,19 @@ extern void xfrm_state_init(void); extern void xfrm_input_init(void); +extern void xfrm6_input_init(void); extern int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, void*), void *); extern struct xfrm_state *xfrm_state_alloc(void); extern struct xfrm_state *xfrm_state_find(u32 daddr, u32 saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, struct xfrm_policy *pol, int *err); +extern struct xfrm_state *xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, + struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err); extern int xfrm_state_check_expire(struct xfrm_state *x); extern void xfrm_state_insert(struct xfrm_state *x); extern int xfrm_state_check_space(struct xfrm_state *x, struct sk_buff *skb); extern struct xfrm_state *xfrm_state_lookup(u32 daddr, u32 spi, u8 proto); +extern struct xfrm_state *xfrm6_state_lookup(struct in6_addr *daddr, u32 spi, u8 proto); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); extern void xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); @@ -393,6 +414,8 @@ extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); extern int xfrm_check_selectors(struct xfrm_state **x, int n, struct flowi *fl); extern int xfrm4_rcv(struct sk_buff *skb); +extern int xfrm6_rcv(struct sk_buff *skb); +extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); struct xfrm_policy *xfrm_policy_alloc(int gfp); @@ -403,12 +426,16 @@ struct xfrm_policy *xfrm_policy_byid(int dir, u32 id, int delete); void xfrm_policy_flush(void); void xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); +void xfrm6_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); struct xfrm_state * xfrm_find_acq(u8 mode, u16 reqid, u8 proto, u32 daddr, u32 saddr, int create); +struct xfrm_state * xfrm6_find_acq(u8 mode, u16 reqid, u8 proto, struct in6_addr *daddr, + struct in6_addr *saddr, int create); extern void xfrm_policy_flush(void); extern void xfrm_policy_kill(struct xfrm_policy *); extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol); extern struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl); extern int xfrm_flush_bundles(struct xfrm_state *x); +extern int xfrm6_flush_bundles(struct xfrm_state *x); extern wait_queue_head_t km_waitq; extern void km_warn_expired(struct xfrm_state *x); @@ -428,20 +455,79 @@ static inline int xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) { - return !memcmp(fl->fl6_dst, sel->daddr.a6, sizeof(struct in6_addr)) && - !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && - !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && - (fl->proto == sel->proto || !sel->proto) && - (fl->oif == sel->ifindex || !sel->ifindex) && - !memcmp(fl->fl6_src, sel->saddr.a6, sizeof(struct in6_addr)); + return !memcmp(fl->fl6_dst, &sel->daddr, (sel->prefixlen_d)/8) && + !memcmp(fl->fl6_src, &sel->saddr, (sel->prefixlen_s)/8) && + !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && + !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && + (fl->proto == sel->proto || !sel->proto) && + (fl->oif == sel->ifindex || !sel->ifindex); } extern int xfrm6_register_type(struct xfrm_type *type); extern int xfrm6_unregister_type(struct xfrm_type *type); extern struct xfrm_type *xfrm6_get_type(u8 proto); +extern void xfrm6_put_type(struct xfrm_type *type); extern struct xfrm_state *xfrm6_state_lookup(struct in6_addr *daddr, u32 spi, u8 proto); struct xfrm_state * xfrm6_find_acq(u8 mode, u16 reqid, u8 proto, struct in6_addr *daddr, struct in6_addr *saddr, int create); void xfrm6_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); +typedef void (icv_update_fn_t)(struct crypto_tfm *, + struct scatterlist *, unsigned int); + +struct ah_data +{ + u8 *key; + int key_len; + u8 *work_icv; + int icv_full_len; + int icv_trunc_len; + + void (*icv)(struct ah_data*, + struct sk_buff *skb, u8 *icv); + + struct crypto_tfm *tfm; +}; + +struct esp_data +{ + /* Confidentiality */ + struct { + u8 *key; /* Key */ + int key_len; /* Key length */ + u8 *ivec; /* ivec buffer */ + /* ivlen is offset from enc_data, where encrypted data start. + * It is logically different of crypto_tfm_alg_ivsize(tfm). + * We assume that it is either zero (no ivec), or + * >= crypto_tfm_alg_ivsize(tfm). */ + int ivlen; + int padlen; /* 0..255 */ + struct crypto_tfm *tfm; /* crypto handle */ + } conf; + + /* Integrity. It is active when icv_full_len != 0 */ + struct { + u8 *key; /* Key */ + int key_len; /* Length of the key */ + u8 *work_icv; + int icv_full_len; + int icv_trunc_len; + void (*icv)(struct esp_data*, + struct sk_buff *skb, + int offset, int len, u8 *icv); + struct crypto_tfm *tfm; + } auth; +}; + +void skb_ah_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, icv_update_fn_t icv_update); +void ah_hmac_digest(struct ah_data *ahp, struct sk_buff *skb, u8 *auth_data); +#if 0 +void skb_icv_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, + int offset, int len, icv_update_fn_t icv_update); +#endif +void esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, int len, u8 *auth_data); +int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len); +int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer); +void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); + #endif /* _NET_XFRM_H */ diff -urN linux-2.5.62/net/ipv4/ah.c linux25_for_patch/net/ipv4/ah.c --- linux-2.5.62/net/ipv4/ah.c 2003-02-18 07:56:54.000000000 +0900 +++ linux25_for_patch/net/ipv4/ah.c 2003-02-19 02:36:53.000000000 +0900 @@ -9,24 +9,6 @@ #define AH_HLEN_NOICV 12 -typedef void (icv_update_fn_t)(struct crypto_tfm *, - struct scatterlist *, unsigned int); - -struct ah_data -{ - u8 *key; - int key_len; - u8 *work_icv; - int icv_full_len; - int icv_trunc_len; - - void (*icv)(struct ah_data*, - struct sk_buff *skb, u8 *icv); - - struct crypto_tfm *tfm; -}; - - /* Clear mutable options and find final destination to substitute * into IP header for icv calculation. Options are already checked * for validity, so paranoia is not required. */ @@ -71,7 +53,7 @@ return 0; } -static void skb_ah_walk(const struct sk_buff *skb, +void skb_ah_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, icv_update_fn_t icv_update) { int offset = 0; @@ -145,7 +127,7 @@ BUG(); } -static void +void ah_hmac_digest(struct ah_data *ahp, struct sk_buff *skb, u8 *auth_data) { struct crypto_tfm *tfm = ahp->tfm; diff -urN linux-2.5.62/net/ipv4/esp.c linux25_for_patch/net/ipv4/esp.c --- linux-2.5.62/net/ipv4/esp.c 2003-02-18 07:56:17.000000000 +0900 +++ linux25_for_patch/net/ipv4/esp.c 2003-02-19 02:36:53.000000000 +0900 @@ -10,43 +10,10 @@ #define MAX_SG_ONSTACK 4 -typedef void (icv_update_fn_t)(struct crypto_tfm *, - struct scatterlist *, unsigned int); - /* BUGS: * - we assume replay seqno is always present. */ -struct esp_data -{ - /* Confidentiality */ - struct { - u8 *key; /* Key */ - int key_len; /* Key length */ - u8 *ivec; /* ivec buffer */ - /* ivlen is offset from enc_data, where encrypted data start. - * It is logically different of crypto_tfm_alg_ivsize(tfm). - * We assume that it is either zero (no ivec), or - * >= crypto_tfm_alg_ivsize(tfm). */ - int ivlen; - int padlen; /* 0..255 */ - struct crypto_tfm *tfm; /* crypto handle */ - } conf; - - /* Integrity. It is active when icv_full_len != 0 */ - struct { - u8 *key; /* Key */ - int key_len; /* Length of the key */ - u8 *work_icv; - int icv_full_len; - int icv_trunc_len; - void (*icv)(struct esp_data*, - struct sk_buff *skb, - int offset, int len, u8 *icv); - struct crypto_tfm *tfm; - } auth; -}; - /* Move to common area: it is shared with AH. */ void skb_icv_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, @@ -192,7 +159,7 @@ /* Common with AH after some work on arguments. */ -static void +void esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, int len, u8 *auth_data) { diff -urN linux-2.5.62/net/ipv4/xfrm_input.c linux25_for_patch/net/ipv4/xfrm_input.c --- linux-2.5.62/net/ipv4/xfrm_input.c 2003-02-18 07:55:50.000000000 +0900 +++ linux25_for_patch/net/ipv4/xfrm_input.c 2003-02-19 02:36:53.000000000 +0900 @@ -1,7 +1,7 @@ #include #include -static kmem_cache_t *secpath_cachep; +kmem_cache_t *secpath_cachep; void __secpath_destroy(struct sec_path *sp) { diff -urN linux-2.5.62/net/ipv4/xfrm_policy.c linux25_for_patch/net/ipv4/xfrm_policy.c --- linux-2.5.62/net/ipv4/xfrm_policy.c 2003-02-18 07:56:15.000000000 +0900 +++ linux25_for_patch/net/ipv4/xfrm_policy.c 2003-02-19 02:36:53.000000000 +0900 @@ -3,8 +3,8 @@ DECLARE_MUTEX(xfrm_cfg_sem); -static u32 xfrm_policy_genid; -static rwlock_t xfrm_policy_lock = RW_LOCK_UNLOCKED; +u32 xfrm_policy_genid; +rwlock_t xfrm_policy_lock = RW_LOCK_UNLOCKED; struct xfrm_policy *xfrm_policy_list[XFRM_POLICY_MAX*2]; @@ -469,7 +469,7 @@ read_lock_bh(&xfrm_policy_lock); for (pol = xfrm_policy_list[dir]; pol; pol = pol->next) { struct xfrm_selector *sel = &pol->selector; - + if (pol->family != AF_INET) continue; if (xfrm4_selector_match(sel, fl)) { atomic_inc(&pol->refcnt); break; diff -urN linux-2.5.62/net/ipv4/xfrm_state.c linux25_for_patch/net/ipv4/xfrm_state.c --- linux-2.5.62/net/ipv4/xfrm_state.c 2003-02-18 07:56:29.000000000 +0900 +++ linux25_for_patch/net/ipv4/xfrm_state.c 2003-02-19 02:36:53.000000000 +0900 @@ -165,8 +165,20 @@ spin_unlock(&xfrm_state_lock); if (del_timer(&x->timer)) atomic_dec(&x->refcnt); - if (atomic_read(&x->refcnt) != 1) - xfrm_flush_bundles(x); + if (atomic_read(&x->refcnt) != 1) { + switch (x->props.family) { + case AF_INET: + xfrm_flush_bundles(x); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + xfrm6_flush_bundles(x); + break; +#endif + default: + break; + } + } } if (kill && x->type) @@ -290,6 +302,7 @@ x->props.saddr.xfrm4_addr = saddr; x->props.mode = tmpl->mode; x->props.reqid = tmpl->reqid; + x->props.family = AF_INET; if (km_query(x, tmpl, pol) == 0) { x->km.state = XFRM_STATE_ACQ; @@ -322,10 +335,18 @@ { unsigned h = 0; - if (x->props.family == AF_INET) + switch (x->props.family) { + case AF_INET: h = ntohl(x->id.daddr.xfrm4_addr); - else if (x->props.family == AF_INET6) + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: h = ntohl(x->id.daddr.a6[2]^x->id.daddr.a6[3]); + break; +#endif + default: + return; + } h = (h ^ (h>>16)) % XFRM_DST_HSIZE; @@ -448,6 +469,7 @@ x0->props.family = AF_INET; x0->props.mode = mode; x0->props.reqid = reqid; + x0->props.family = AF_INET; x0->lft.hard_add_expires_seconds = ACQ_EXPIRES; atomic_inc(&x0->refcnt); mod_timer(&x0->timer, jiffies + ACQ_EXPIRES*HZ); @@ -836,4 +858,114 @@ wake_up(&km_waitq); } } + +struct xfrm_state * +xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err) +{ + unsigned h = ntohl(daddr->s6_addr32[2]^daddr->s6_addr32[3]); + struct xfrm_state *x = NULL; + int acquire_in_progress = 0; + int error = 0; + struct xfrm_state *best = NULL; + + h = (h ^ (h>>16)) % XFRM_DST_HSIZE; + + spin_lock_bh(&xfrm_state_lock); + list_for_each_entry(x, xfrm_state_bydst+h, bydst) { + if (x->props.family == AF_INET6&& + !memcmp(daddr, &x->id.daddr, sizeof(*daddr)) && + x->props.reqid == tmpl->reqid && + (!memcmp(saddr, &x->props.saddr, sizeof(*saddr))|| ipv6_addr_any(saddr)) && + tmpl->mode == x->props.mode && + tmpl->id.proto == x->id.proto) { + /* Resolution logic: + 1. There is a valid state with matching selector. + Done. + 2. Valid state with inappropriate selector. Skip. + + Entering area of "sysdeps". + + 3. If state is not valid, selector is temporary, + it selects only session which triggered + previous resolution. Key manager will do + something to install a state with proper + selector. + */ + if (x->km.state == XFRM_STATE_VALID) { + if (!xfrm6_selector_match(&x->sel, fl)) + continue; + if (!best || + best->km.dying > x->km.dying || + (best->km.dying == x->km.dying && + best->curlft.add_time < x->curlft.add_time)) + best = x; + } else if (x->km.state == XFRM_STATE_ACQ) { + acquire_in_progress = 1; + } else if (x->km.state == XFRM_STATE_ERROR || + x->km.state == XFRM_STATE_EXPIRED) { + if (xfrm6_selector_match(&x->sel, fl)) + error = 1; + } + } + } + + if (best) { + atomic_inc(&best->refcnt); + spin_unlock_bh(&xfrm_state_lock); + return best; + } + x = NULL; + if (!error && !acquire_in_progress && + ((x = xfrm_state_alloc()) != NULL)) { + /* Initialize temporary selector matching only + * to current session. */ + memcpy(&x->sel.daddr, fl->fl6_dst, sizeof(struct in6_addr)); + memcpy(&x->sel.saddr, fl->fl6_src, sizeof(struct in6_addr)); + x->sel.dport = fl->uli_u.ports.dport; + x->sel.dport_mask = ~0; + x->sel.sport = fl->uli_u.ports.sport; + x->sel.sport_mask = ~0; + x->sel.prefixlen_d = 128; + x->sel.prefixlen_s = 128; + x->sel.proto = fl->proto; + x->sel.ifindex = fl->oif; + x->id = tmpl->id; + if (ipv6_addr_any((struct in6_addr*)&x->id.daddr)) + memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr)); + memcpy(&x->props.saddr, &tmpl->saddr, sizeof(x->props.saddr)); + if (ipv6_addr_any((struct in6_addr*)&x->props.saddr)) + memcpy(&x->props.saddr, &saddr, sizeof(x->sel.saddr)); + x->props.mode = tmpl->mode; + x->props.reqid = tmpl->reqid; + x->props.family = AF_INET6; + + if (km_query(x, tmpl, pol) == 0) { + x->km.state = XFRM_STATE_ACQ; + list_add_tail(&x->bydst, xfrm_state_bydst+h); + atomic_inc(&x->refcnt); + if (x->id.spi) { + struct in6_addr *addr = (struct in6_addr*)&x->id.daddr; + h = ntohl((addr->s6_addr32[2]^addr->s6_addr32[3])^x->id.spi^x->id.proto); + h = (h ^ (h>>10) ^ (h>>20)) % XFRM_DST_HSIZE; + list_add(&x->byspi, xfrm_state_byspi+h); + atomic_inc(&x->refcnt); + } + x->lft.hard_add_expires_seconds = ACQ_EXPIRES; + atomic_inc(&x->refcnt); + mod_timer(&x->timer, ACQ_EXPIRES*HZ); + } else { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + x = NULL; + error = 1; + } + } + spin_unlock_bh(&xfrm_state_lock); + if (!x) + *err = acquire_in_progress ? -EAGAIN : + (error ? -ESRCH : -ENOMEM); + return x; +} + #endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -urN linux-2.5.62/net/ipv4/xfrm_user.c linux25_for_patch/net/ipv4/xfrm_user.c --- linux-2.5.62/net/ipv4/xfrm_user.c 2003-02-18 07:56:17.000000000 +0900 +++ linux25_for_patch/net/ipv4/xfrm_user.c 2003-02-19 02:36:53.000000000 +0900 @@ -1,6 +1,11 @@ /* xfrm_user.c: User interface to configure xfrm engine. * * Copyright (C) 2002 David S. Miller (davem@redhat.com) + * + * Changes + * + * KANDA Mitsuru and + * MIYAZAWA Kazunori @USAGI : IPv6 Support */ #include @@ -17,6 +22,9 @@ #include #include #include +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) +#include +#endif #include #include @@ -63,11 +71,13 @@ case AF_INET: break; - case AF_INET6: /* XXX */ - err = -EAFNOSUPPORT; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + break; +#endif - /* fallthru */ default: + err = -EAFNOSUPPORT; goto out; }; @@ -206,8 +216,21 @@ if (!x) return err; - x1 = xfrm_state_lookup(x->props.saddr.xfrm4_addr, - x->id.spi, x->id.proto); + switch (p->family) { + case AF_INET: + x1 = xfrm_state_lookup(x->props.saddr.xfrm4_addr, + x->id.spi, x->id.proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x1 = xfrm6_state_lookup((struct in6_addr *)&x->props.saddr, + x->id.spi,x->id.proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + if (x1) { xfrm_state_put(x); xfrm_state_put(x1); @@ -224,7 +247,19 @@ struct xfrm_state *x; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); - x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + switch (p->family) { + case AF_INET: + x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_state_lookup((struct in6_addr *)&p->saddr, p->spi, p->proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + if (x == NULL) return -ESRCH; @@ -342,7 +377,19 @@ struct sk_buff *resp_skb; int err; - x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + switch (p->family) { + case AF_INET: + x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_state_lookup((struct in6_addr *)&p->saddr, p->spi, p->proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + err = -ESRCH; if (x == NULL) goto out_noput; @@ -393,9 +440,25 @@ err = verify_userspi_info(p); if (err) goto out_noput; - x = xfrm_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, - p->info.sel.daddr.xfrm4_addr, - p->info.sel.saddr.xfrm4_addr, 1); + + switch (p->info.family) { + case AF_INET: + x = xfrm_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, + p->info.sel.daddr.xfrm4_addr, + p->info.sel.saddr.xfrm4_addr, 1); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, + (struct in6_addr *)&p->info.sel.daddr, + (struct in6_addr *)&p->info.sel.saddr, 1); + break; +#endif + default: + err = -EAFNOSUPPORT; + goto out_noput; + } + err = -ENOENT; if (x == NULL) goto out_noput; diff -urN linux-2.5.62/net/ipv6/Makefile linux25_for_patch/net/ipv6/Makefile --- linux-2.5.62/net/ipv6/Makefile 2003-02-18 07:56:44.000000000 +0900 +++ linux25_for_patch/net/ipv6/Makefile 2003-02-19 02:36:53.000000000 +0900 @@ -13,3 +13,6 @@ obj-$(CONFIG_NETFILTER) += netfilter/ obj-y += xfrm_policy.o +obj-y += xfrm_policy.o xfrm_input.o +obj-$(CONFIG_INET_AH) += ah.o +obj-$(CONFIG_INET_ESP) += esp.o diff -urN linux-2.5.62/net/ipv6/ah.c linux25_for_patch/net/ipv6/ah.c --- linux-2.5.62/net/ipv6/ah.c 1970-01-01 09:00:00.000000000 +0900 +++ linux25_for_patch/net/ipv6/ah.c 2003-02-19 02:36:53.000000000 +0900 @@ -0,0 +1,353 @@ +/* + * Copyright (C)2002 USAGI/WIDE Project + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Authors KANDA Mitsuru@USAGI + * MIYAZAWA Kazunori@USAGI + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define AH_HLEN_NOICV 12 + +/* XXX no ipv6 ah specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +int ah6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = sizeof(struct ipv6hdr); + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL; + struct ip_auth_hdr *ah; + struct ah_data *ahp; + u16 nh_offset = 0; + u8 nexthdr; + + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + if (x->props.mode) { + iph = skb->nh.ipv6h; + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + skb->nh.ipv6h->version = 6; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb->nh.ipv6h->nexthdr = IPPROTO_AH; + memcpy(&skb->nh.ipv6h->saddr, &x->props.saddr, sizeof(struct in6_addr)); + memcpy(&skb->nh.ipv6h->daddr, &x->id.daddr, sizeof(struct in6_addr)); + ah = (struct ip_auth_hdr*)(skb->nh.ipv6h+1); + ah->nexthdr = IPPROTO_IPV6; + } else { + hdr_len = skb->h.raw - skb->nh.raw; + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->data, hdr_len); + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + memcpy(skb->nh.ipv6h, iph, hdr_len); + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); + if (nexthdr == 0) + goto error; + + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + ah = (struct ip_auth_hdr*)(skb->nh.raw+hdr_len); + ah->nexthdr = nexthdr; + } + + skb->nh.ipv6h->priority = 0; + skb->nh.ipv6h->flow_lbl[0] = 0; + skb->nh.ipv6h->flow_lbl[1] = 0; + skb->nh.ipv6h->flow_lbl[2] = 0; + skb->nh.ipv6h->hop_limit = 0; + + ahp = x->data; + ah->hdrlen = (XFRM_ALIGN8(ahp->icv_trunc_len + + AH_HLEN_NOICV) >> 2) - 2; + ah->reserved = 0; + ah->spi = x->id.spi; + ah->seq_no = htonl(++x->replay.oseq); + ahp->icv(ahp, skb, ah->auth_data); + + if (x->props.mode) { + skb->nh.ipv6h->hop_limit = iph->hop_limit; + skb->nh.ipv6h->priority = iph->priority; + skb->nh.ipv6h->flow_lbl[0] = iph->flow_lbl[0]; + skb->nh.ipv6h->flow_lbl[1] = iph->flow_lbl[1]; + skb->nh.ipv6h->flow_lbl[2] = iph->flow_lbl[2]; + } else { + memcpy(skb->nh.ipv6h, iph, hdr_len); + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + kfree (iph); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int ah6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + int ah_hlen; + struct ipv6hdr *iph; + struct ipv6_auth_hdr *ah; + struct ah_data *ahp; + unsigned char *tmp_hdr = NULL; + int hdr_len = skb->h.raw - skb->nh.raw; + u8 nexthdr = 0; + + if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) + goto out; + + ah = (struct ipv6_auth_hdr*)skb->data; + + ahp = x->data; + ah_hlen = (ah->hdrlen + 2) << 2; + + if (ah_hlen != XFRM_ALIGN8(ahp->icv_full_len + AH_HLEN_NOICV) && + ah_hlen != XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV)) + goto out; + + if (!pskb_may_pull(skb, (ah->hdrlen+2)<<2)) + goto out; + + /* We are going to _remove_ AH header to keep sockets happy, + * so... Later this can change. */ + if (skb_cloned(skb) && + pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) + goto out; + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + ah = (struct ipv6_auth_hdr*)skb->data; + iph = skb->nh.ipv6h; + + { + u8 auth_data[ahp->icv_trunc_len]; + memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len); + memset(ah->auth_data, 0, ahp->icv_trunc_len); + skb_push(skb, skb->data - skb->nh.raw); + ahp->icv(ahp, skb, ah->auth_data); + if (memcmp(ah->auth_data, auth_data, ahp->icv_trunc_len)) { + if (net_ratelimit()) + printk(KERN_WARNING "ipsec ah authentication error\n"); + x->stats.integrity_failed++; + goto free_out; + } + } + + nexthdr = ah->nexthdr; + skb->nh.raw = skb_pull(skb, (ah->hdrlen+2)<<2); + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb_pull(skb, hdr_len); + skb->h.raw = skb->data; + + + kfree(tmp_hdr); + + return nexthdr; + +free_out: + kfree(tmp_hdr); +out: + return -EINVAL; +} + +void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, ah->spi, IPPROTO_AH); + if (!x) + return; + + printk(KERN_DEBUG "pmtu discvovery on SA AH/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(ah->spi), NIP6(iph->daddr)); + + xfrm_state_put(x); +} + +static int ah6_init_state(struct xfrm_state *x, void *args) +{ + struct ah_data *ahp = NULL; + struct xfrm_algo_desc *aalg_desc; + + /* null auth can use a zero length key */ + if (x->aalg->alg_key_len > 512) + goto error; + + ahp = kmalloc(sizeof(*ahp), GFP_KERNEL); + if (ahp == NULL) + return -ENOMEM; + + memset(ahp, 0, sizeof(*ahp)); + + ahp->key = x->aalg->alg_key; + ahp->key_len = (x->aalg->alg_key_len+7)/8; + ahp->tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (!ahp->tfm) + goto error; + ahp->icv = ah_hmac_digest; + + /* + * Lookup the algorithm description maintained by xfrm_algo, + * verify crypto transform properties, and store information + * we need for AH processing. This lookup cannot fail here + * after a successful crypto_alloc_tfm(). + */ + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(ahp->tfm)) { + printk(KERN_INFO "AH: %s digestsize %u != %hu\n", + x->aalg->alg_name, crypto_tfm_alg_digestsize(ahp->tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + ahp->icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + ahp->work_icv = kmalloc(ahp->icv_full_len, GFP_KERNEL); + if (!ahp->work_icv) + goto error; + + x->props.header_len = XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV); + if (x->props.mode) + x->props.header_len += 20; + x->data = ahp; + + return 0; + +error: + if (ahp) { + if (ahp->work_icv) + kfree(ahp->work_icv); + if (ahp->tfm) + crypto_free_tfm(ahp->tfm); + kfree(ahp); + } + return -EINVAL; +} + +static void ah6_destroy(struct xfrm_state *x) +{ + struct ah_data *ahp = x->data; + + if (ahp->work_icv) { + kfree(ahp->work_icv); + ahp->work_icv = NULL; + } + if (ahp->tfm) { + crypto_free_tfm(ahp->tfm); + ahp->tfm = NULL; + } +} + +static struct xfrm_type ah6_type = +{ + .description = "AH6", + .proto = IPPROTO_AH, + .init_state = ah6_init_state, + .destructor = ah6_destroy, + .input = ah6_input, + .output = ah6_output +}; + +static struct inet6_protocol ah6_protocol = { + .handler = xfrm6_rcv, + .err_handler = ah6_err, +}; + +int __init ah6_init(void) +{ + SET_MODULE_OWNER(&ah6_type); + + if (xfrm6_register_type(&ah6_type) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add xfrm type\n"); + return -EAGAIN; + } + + if (inet6_add_protocol(&ah6_protocol, IPPROTO_AH) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add protocol\n"); + xfrm6_unregister_type(&ah6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit ah6_fini(void) +{ + if (inet6_del_protocol(&ah6_protocol, IPPROTO_AH) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove protocol\n"); + + if (xfrm6_unregister_type(&ah6_type) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove xfrm type\n"); + +} + +module_init(ah6_init); +module_exit(ah6_fini); +MODULE_LICENSE("GPL"); diff -urN linux-2.5.62/net/ipv6/esp.c linux25_for_patch/net/ipv6/esp.c --- linux-2.5.62/net/ipv6/esp.c 1970-01-01 09:00:00.000000000 +0900 +++ linux25_for_patch/net/ipv6/esp.c 2003-02-19 02:36:53.000000000 +0900 @@ -0,0 +1,579 @@ +/* + * Copyright (C)2002 USAGI/WIDE Project + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Authors: KANDA Mitsuru@USAGI + * MIYAZAWA Kazunori@USAGI + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_SG_ONSTACK 4 +#if 0 +typedef void (icv_update_fn_t)(struct crypto_tfm *, + struct scatterlist *, unsigned int); +#endif + +/* XXX no ipv6 esp specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +/* BUGS: + * - we assume replay seqno is always present. + */ +#if 0 +struct esp_data +{ + /* Confidentiality */ + struct { + u8 *key; /* Key */ + int key_len; /* Key length */ + u8 *ivec; /* ivec buffer */ + /* ivlen is offset from enc_data, where encrypted data start. + * It is logically different of crypto_tfm_alg_ivsize(tfm). + * We assume that it is either zero (no ivec), or + * >= crypto_tfm_alg_ivsize(tfm). */ + int ivlen; + int padlen; /* 0..255 */ + struct crypto_tfm *tfm; /* crypto handle */ + } conf; + + /* Integrity. It is active when authlen != 0 */ + struct { + u8 *key; /* Key */ + int key_len; /* Length of the key */ + u8 *work_icv; + int icv_full_len; + int icv_trunc_len; + void (*icv)(struct esp_data*, + struct sk_buff *skb, + int offset, int len, u8 *icv); + + struct crypto_tfm *tfm; + } auth; +}; + +/* XXX Following functions are same as IPv4, but not exported */ +extern void skb_icv_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, + int offset, int len, icv_update_fn_t icv_update); +extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); +extern int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len); +extern void esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, int len, u8 *auth_data); +extern int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer); +extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); +#endif +/* Common with AH after some work on arguments. */ +#if 0 +static void +esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, + int len, u8 *auth_data) +{ + struct crypto_tfm *tfm = esp->auth.tfm; + char *icv = esp->auth.work_icv; + + memset(auth_data, 0, esp->auth.icv_trunc_len); + crypto_hmac_init(tfm, esp->auth.key, &esp->auth.key_len); + skb_icv_walk(skb, tfm, offset, len, crypto_hmac_update); + crypto_hmac_final(tfm, esp->auth.key, &esp->auth.key_len, icv); + memcpy(auth_data, icv, esp->auth.icv_trunc_len); +} +#endif +static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, struct ipv6_opt_hdr **prevhdr) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); + u8 nextnexthdr; + + *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; + + while (offset + 1 < packet_len) { + + switch (*nexthdr) { + + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + case NEXTHDR_DEST: + nextnexthdr = + ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; + /* XXX We know the option is inner dest opt + with next next header check. */ + if (nextnexthdr != NEXTHDR_HOP && + nextnexthdr != NEXTHDR_ROUTING && + nextnexthdr != NEXTHDR_DEST) { + return offset; + } + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + default : + return offset; + } + } + + return offset; +} + +int esp6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = 0; + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL, *top_iph; + struct ip_esp_hdr *esph; + struct crypto_tfm *tfm; + struct esp_data *esp; + struct sk_buff *trailer; + struct ipv6_opt_hdr *prevhdr = NULL; + int blksize; + int clen; + int alen; + int nfrags; + u8 nexthdr; + + /* First, if the skb is not checksummed, complete checksum. */ + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + err = -ENOMEM; + + /* Strip IP header in transport mode. Save it. */ + + if (!x->props.mode) { + hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &prevhdr); + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->nh.raw, hdr_len); + __skb_pull(skb, hdr_len); + } + + /* Now skb is pure payload to encrypt */ + + /* Round to block size */ + clen = skb->len; + + esp = x->data; + alen = esp->auth.icv_trunc_len; + tfm = esp->conf.tfm; + blksize = crypto_tfm_alg_blocksize(tfm); + clen = (clen + 2 + blksize-1)&~(blksize-1); + if (esp->conf.padlen) + clen = (clen + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + if ((nfrags = skb_cow_data(skb, clen-skb->len+alen, &trailer)) < 0) { + if (!x->props.mode && iph) kfree(iph); + goto error; + } + + /* Fill padding... */ + do { + int i; + for (i=0; ilen - 2; i++) + *(u8*)(trailer->tail + i) = i+1; + } while (0); + *(u8*)(trailer->tail + clen-skb->len - 2) = (clen - skb->len)-2; + pskb_put(skb, trailer, clen - skb->len); + + if (x->props.mode) { + iph = skb->nh.ipv6h; + top_iph = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + esph = (struct ip_esp_hdr*)(top_iph+1); + *(u8*)(trailer->tail - 1) = IPPROTO_IPV6; + top_iph->version = 6; + top_iph->priority = iph->priority; + top_iph->flow_lbl[0] = iph->flow_lbl[0]; + top_iph->flow_lbl[1] = iph->flow_lbl[1]; + top_iph->flow_lbl[2] = iph->flow_lbl[2]; + top_iph->nexthdr = IPPROTO_ESP; + top_iph->payload_len = htons(skb->len + alen); + top_iph->hop_limit = iph->hop_limit; + memcpy(&top_iph->saddr, (struct in6_addr *)&x->props.saddr, sizeof(struct ipv6hdr)); + memcpy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr, sizeof(struct ipv6hdr)); + } else { + /* XXX exthdr */ + esph = (struct ip_esp_hdr*)skb_push(skb, x->props.header_len); + top_iph = (struct ipv6hdr*)skb_push(skb, hdr_len); + memcpy(top_iph, iph, hdr_len); + kfree(iph); + top_iph->payload_len = htons(skb->len + alen - sizeof(struct ipv6hdr)); + if (prevhdr) { + prevhdr->nexthdr = IPPROTO_ESP; + } else { + top_iph->nexthdr = IPPROTO_ESP; + } + *(u8*)(trailer->tail - 1) = nexthdr; + } + + esph->spi = x->id.spi; + esph->seq_no = htonl(++x->replay.oseq); + + if (esp->conf.ivlen) + crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + + do { + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto error; + } + skb_to_sgvec(skb, sg, esph->enc_data+esp->conf.ivlen-skb->data, clen); + crypto_cipher_encrypt(tfm, sg, sg, clen); + if (unlikely(sg != sgbuf)) + kfree(sg); + } while (0); + + if (esp->conf.ivlen) { + memcpy(esph->enc_data, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + crypto_cipher_get_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + } + + if (esp->auth.icv_full_len) { + esp->auth.icv(esp, skb, (u8*)esph-skb->data, + 8+esp->conf.ivlen+clen, trailer->tail); + pskb_put(skb, trailer, alen); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; + +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int esp6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + struct ipv6hdr *iph; + struct ip_esp_hdr *esph; + struct esp_data *esp = x->data; + struct sk_buff *trailer; + int blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + int alen = esp->auth.icv_trunc_len; + int elen = skb->len - 8 - esp->conf.ivlen - alen; + + int hdr_len = skb->h.raw - skb->nh.raw; + int nfrags; + u8 ret_nexthdr = 0; + unsigned char *tmp_hdr = NULL; + + if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr))) + goto out; + + if (elen <= 0 || (elen & (blksize-1))) + goto out; + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + /* If integrity check is required, do this. */ + if (esp->auth.icv_full_len) { + u8 sum[esp->auth.icv_full_len]; + u8 sum1[alen]; + + esp->auth.icv(esp, skb, 0, skb->len-alen, sum); + + if (skb_copy_bits(skb, skb->len-alen, sum1, alen)) + BUG(); + + if (unlikely(memcmp(sum, sum1, alen))) { + x->stats.integrity_failed++; + goto out; + } + } + + if ((nfrags = skb_cow_data(skb, 0, &trailer)) < 0) + goto out; + + skb->ip_summed = CHECKSUM_NONE; + + esph = (struct ip_esp_hdr*)skb->data; + iph = skb->nh.ipv6h; + + /* Get ivec. This can be wrong, check against another impls. */ + if (esp->conf.ivlen) + crypto_cipher_set_iv(esp->conf.tfm, esph->enc_data, crypto_tfm_alg_ivsize(esp->conf.tfm)); + + { + u8 nexthdr[2]; + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + u8 padlen; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto out; + } + skb_to_sgvec(skb, sg, 8+esp->conf.ivlen, elen); + crypto_cipher_decrypt(esp->conf.tfm, sg, sg, elen); + if (unlikely(sg != sgbuf)) + kfree(sg); + + if (skb_copy_bits(skb, skb->len-alen-2, nexthdr, 2)) + BUG(); + + padlen = nexthdr[0]; + if (padlen+2 >= elen) { + if (net_ratelimit()) { + printk(KERN_WARNING "ipsec esp packet is garbage padlen=%d, elen=%d\n", padlen+2, elen); + } + goto out; + } + /* ... check padding bits here. Silly. :-) */ + + ret_nexthdr = nexthdr[1]; + pskb_trim(skb, skb->len - alen - padlen - 2); + skb->h.raw = skb_pull(skb, 8 + esp->conf.ivlen); + skb->nh.raw += 8 + esp->conf.ivlen; + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + } + kfree(tmp_hdr); + return ret_nexthdr; + +out: + return -EINVAL; +} + +static u32 esp6_get_max_size(struct xfrm_state *x, int mtu) +{ + struct esp_data *esp = x->data; + u32 blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + + if (x->props.mode) { + mtu = (mtu + 2 + blksize-1)&~(blksize-1); + } else { + /* The worst case. */ + mtu += 2 + blksize; + } + if (esp->conf.padlen) + mtu = (mtu + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + return mtu + x->props.header_len + esp->auth.icv_full_len; +} + +void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_esp_hdr *esph = (struct ip_esp_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, esph->spi, IPPROTO_ESP); + if (!x) + return; + printk(KERN_DEBUG "pmtu discvovery on SA ESP/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(esph->spi), NIP6(iph->daddr)); + xfrm_state_put(x); +} + +void esp6_destroy(struct xfrm_state *x) +{ + struct esp_data *esp = x->data; + + if (esp->conf.tfm) { + crypto_free_tfm(esp->conf.tfm); + esp->conf.tfm = NULL; + } + if (esp->conf.ivec) { + kfree(esp->conf.ivec); + esp->conf.ivec = NULL; + } + if (esp->auth.tfm) { + crypto_free_tfm(esp->auth.tfm); + esp->auth.tfm = NULL; + } + if (esp->auth.work_icv) { + kfree(esp->auth.work_icv); + esp->auth.work_icv = NULL; + } +} + +int esp6_init_state(struct xfrm_state *x, void *args) +{ + struct esp_data *esp = NULL; + + if (x->aalg) { + if (x->aalg->alg_key_len == 0 || x->aalg->alg_key_len > 512) + goto error; + } + if (x->ealg == NULL || x->ealg->alg_key_len == 0) + goto error; + + esp = kmalloc(sizeof(*esp), GFP_KERNEL); + if (esp == NULL) + return -ENOMEM; + + memset(esp, 0, sizeof(*esp)); + + if (x->aalg) { + struct xfrm_algo_desc *aalg_desc; + + esp->auth.key = x->aalg->alg_key; + esp->auth.key_len = (x->aalg->alg_key_len+7)/8; + esp->auth.tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (esp->auth.tfm == NULL) + goto error; + esp->auth.icv = esp_hmac_digest; + + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(esp->auth.tfm)) { + printk(KERN_INFO "ESP: %s digestsize %u != %hu\n", + x->aalg->alg_name, + crypto_tfm_alg_digestsize(esp->auth.tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + esp->auth.icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + esp->auth.icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + esp->auth.work_icv = kmalloc(esp->auth.icv_full_len, GFP_KERNEL); + if (!esp->auth.work_icv) + goto error; + } + esp->conf.key = x->ealg->alg_key; + esp->conf.key_len = (x->ealg->alg_key_len+7)/8; + esp->conf.tfm = crypto_alloc_tfm(x->ealg->alg_name, CRYPTO_TFM_MODE_CBC); + if (esp->conf.tfm == NULL) + goto error; + esp->conf.ivlen = crypto_tfm_alg_ivsize(esp->conf.tfm); + esp->conf.padlen = 0; + if (esp->conf.ivlen) { + esp->conf.ivec = kmalloc(esp->conf.ivlen, GFP_KERNEL); + get_random_bytes(esp->conf.ivec, esp->conf.ivlen); + } + crypto_cipher_setkey(esp->conf.tfm, esp->conf.key, esp->conf.key_len); + x->props.header_len = 8 + esp->conf.ivlen; + if (x->props.mode) + x->props.header_len += 40; /* XXX ext hdr */ + x->data = esp; + return 0; + +error: + if (esp) { + if (esp->auth.tfm) + crypto_free_tfm(esp->auth.tfm); + if (esp->auth.work_icv) + kfree(esp->auth.work_icv); + if (esp->conf.tfm) + crypto_free_tfm(esp->conf.tfm); + kfree(esp); + } + return -EINVAL; +} + +static struct xfrm_type esp6_type = +{ + .description = "ESP6", + .proto = IPPROTO_ESP, + .init_state = esp6_init_state, + .destructor = esp6_destroy, + .get_max_size = esp6_get_max_size, + .input = esp6_input, + .output = esp6_output +}; + +static struct inet6_protocol esp6_protocol = { + .handler = xfrm6_rcv, + .err_handler = esp6_err, +}; + +int __init esp6_init(void) +{ + SET_MODULE_OWNER(&esp6_type); + if (xfrm6_register_type(&esp6_type) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add xfrm type\n"); + return -EAGAIN; + } + if (inet6_add_protocol(&esp6_protocol, IPPROTO_ESP) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add protocol\n"); + xfrm6_unregister_type(&esp6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit esp6_fini(void) +{ + if (inet6_del_protocol(&esp6_protocol, IPPROTO_ESP) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove protocol\n"); + if (xfrm6_unregister_type(&esp6_type) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove xfrm type\n"); +} + +module_init(esp6_init); +module_exit(esp6_fini); +MODULE_LICENSE("GPL"); diff -urN linux-2.5.62/net/ipv6/exthdrs.c linux25_for_patch/net/ipv6/exthdrs.c --- linux-2.5.62/net/ipv6/exthdrs.c 2003-02-18 07:55:50.000000000 +0900 +++ linux25_for_patch/net/ipv6/exthdrs.c 2003-02-19 02:36:53.000000000 +0900 @@ -392,7 +392,7 @@ cpu ticks, checking that sender did not something stupid and opt->hdrlen is even. Shit! --ANK (980730) */ - +#if 0 static int ipv6_auth_hdr(struct sk_buff **skb_ptr, int nhoff) { struct sk_buff *skb=*skb_ptr; @@ -424,7 +424,7 @@ kfree_skb(skb); return -1; } - +#endif /* This list MUST NOT contain entry for NEXTHDR_HOP. It is parsed immediately after packet received and if it occurs somewhere in another place we must @@ -436,7 +436,9 @@ {NEXTHDR_ROUTING, ipv6_routing_header}, {NEXTHDR_DEST, ipv6_dest_opt}, {NEXTHDR_NONE, ipv6_nodata}, + /* {NEXTHDR_AUTH, ipv6_auth_hdr}, + */ /* {NEXTHDR_ESP, ipv6_esp_hdr}, */ @@ -627,6 +629,8 @@ { if (opt->auth) prev_hdr = ipv6_build_authhdr(skb, prev_hdr, opt->auth); + + skb->h.raw = skb->tail; if (opt->dst1opt) prev_hdr = ipv6_build_exthdr(skb, prev_hdr, NEXTHDR_DEST, opt->dst1opt); return prev_hdr; @@ -689,8 +693,10 @@ void ipv6_push_frag_opts(struct sk_buff *skb, struct ipv6_txoptions *opt, u8 *proto) { - if (opt->dst1opt) + if (opt->dst1opt) { ipv6_push_exthdr(skb, proto, NEXTHDR_DEST, opt->dst1opt); + skb->h.raw = skb->data; + } if (opt->auth) ipv6_push_authhdr(skb, proto, opt->auth); } diff -urN linux-2.5.62/net/ipv6/ip6_input.c linux25_for_patch/net/ipv6/ip6_input.c --- linux-2.5.62/net/ipv6/ip6_input.c 2003-02-18 07:56:42.000000000 +0900 +++ linux25_for_patch/net/ipv6/ip6_input.c 2003-02-19 02:36:53.000000000 +0900 @@ -150,7 +150,8 @@ It would be stupid to detect for optional headers, which are missing with probability of 200% */ - if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP) { + if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP && + nexthdr != NEXTHDR_AUTH && nexthdr != NEXTHDR_ESP) { nhoff = ipv6_parse_exthdrs(&skb, nhoff); if (nhoff < 0) return 0; diff -urN linux-2.5.62/net/ipv6/ip6_output.c linux25_for_patch/net/ipv6/ip6_output.c --- linux-2.5.62/net/ipv6/ip6_output.c 2003-02-18 07:55:51.000000000 +0900 +++ linux25_for_patch/net/ipv6/ip6_output.c 2003-02-19 02:36:53.000000000 +0900 @@ -23,6 +23,7 @@ * * H. von Brand : Added missing #include * Imran Patel : frag id should be in NBO + * MIYAZAWA, K. @USAGI: IPsec support */ #include @@ -174,7 +175,8 @@ } } #endif /* CONFIG_NETFILTER */ - return skb->dst->output(skb); + /* we don't use skb->dst->output() directly because of IPsec */ + return dst_output(skb); } /* @@ -192,6 +194,11 @@ int seg_len = skb->len; int hlimit; u32 mtu; + int err = 0; + + if ((err = xfrm6_lookup(&skb->dst, fl, sk, 0)) < 0) { + return err; + } if (opt) { int head_room; @@ -576,6 +583,13 @@ } pktlength = length; + if (dst) { + if ((err = xfrm6_lookup(&dst, fl, sk, 0)) < 0) { + dst_release(dst); + return -ENETUNREACH; + } + } + if (hlimit < 0) { if (ipv6_addr_is_multicast(fl->fl6_dst)) hlimit = np->mcast_hops; @@ -631,9 +645,8 @@ if (flags&MSG_PROBE) goto out; - skb = sock_alloc_send_skb(sk, pktlength + 15 + - dev->hard_header_len, - flags & MSG_DONTWAIT, &err); + /* XXX: alloc skb with as we do in the IPv4 stack for IPsec */ + skb = sock_alloc_send_skb(sk, mtu, flags & MSG_DONTWAIT, &err); if (skb == NULL) { IP6_INC_STATS(Ip6OutDiscards); @@ -663,9 +676,12 @@ err = getfrag(data, &hdr->saddr, ((char *) hdr) + (pktlength - length), 0, length); + if (!opt || !opt->dst1opt) + skb->h.raw = ((char *) hdr) + (pktlength - length); if (!err) { IP6_INC_STATS(Ip6OutRequests); + err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, ip6_maybe_reroute); } else { err = -EFAULT; diff -urN linux-2.5.62/net/ipv6/ndisc.c linux25_for_patch/net/ipv6/ndisc.c --- linux-2.5.62/net/ipv6/ndisc.c 2003-02-18 07:56:18.000000000 +0900 +++ linux25_for_patch/net/ipv6/ndisc.c 2003-02-19 02:36:53.000000000 +0900 @@ -23,6 +23,7 @@ * and moved to net/core. * Pekka Savola : RFC2461 validation * YOSHIFUJI Hideaki @USAGI : Verify ND options properly + * MIYAZAWA Kazunoro @USAGI : IPsec support */ /* Set to 3 to get tracing... */ @@ -71,6 +72,8 @@ #include #include #include +#include +#include #include #include @@ -336,8 +339,6 @@ unsigned char ha[MAX_ADDR_LEN]; unsigned char *h_dest = NULL; - skb_reserve(skb, (dev->hard_header_len + 15) & ~15); - if (dev->hard_header) { if (ipv6_addr_type(daddr) & IPV6_ADDR_MULTICAST) { ndisc_mc_map(daddr, ha, dev, 1); @@ -373,11 +374,51 @@ /* * Send a Neighbour Advertisement */ +int ndisc_output(struct sk_buff *skb) +{ + if (skb) { + struct neighbour *neigh = (skb->dst ? skb->dst->neighbour : NULL); + if (ndisc_build_ll_hdr(skb, skb->dev, &skb->nh.ipv6h->daddr, neigh, skb->len) == 0) { + kfree_skb(skb); + return -EINVAL; + } + dev_queue_xmit(skb); + return 0; + } + return -EINVAL; +} + +static inline void ndisc_rt_init(struct rt6_info *rt, struct net_device *dev, + struct neighbour *neigh) +{ + rt->rt6i_dev = dev; + rt->rt6i_nexthop = neigh; + rt->rt6i_expires = 0; + rt->rt6i_flags = RTF_LOCAL; + rt->rt6i_metric = 0; + rt->rt6i_hoplimit = 255; + rt->u.dst.output = ndisc_output; +} + +static inline void ndisc_flow_init(struct flowi *fl, u8 type, + struct in6_addr *saddr, struct in6_addr *daddr) +{ + memset(fl, 0, sizeof(*fl)); + fl->fl6_src = saddr; + fl->fl6_dst = daddr; + fl->proto = IPPROTO_ICMPV6; + fl->uli_u.icmpt.type = type; + fl->uli_u.icmpt.code = 0; +} + static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, - struct in6_addr *daddr, struct in6_addr *solicited_addr, - int router, int solicited, int override, int inc_opt) + struct in6_addr *daddr, struct in6_addr *solicited_addr, + int router, int solicited, int override, int inc_opt) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct nd_msg *msg; int len; @@ -386,6 +427,22 @@ len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_ADVERTISEMENT, solicited_addr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm6_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + if (inc_opt) { if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -401,14 +458,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, solicited_addr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *) skb_put(skb, len); + skb->h.raw = (unsigned char *)msg = (struct nd_msg *) skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; msg->icmph.icmp6_code = 0; @@ -431,7 +484,9 @@ csum_partial((__u8 *) msg, len, 0)); - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborAdvertisements); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -441,6 +496,9 @@ struct in6_addr *solicit, struct in6_addr *daddr, struct in6_addr *saddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct nd_msg *msg; @@ -455,6 +513,22 @@ saddr = &addr_buf; } + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm6_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); send_llinfo = dev->addr_len && ipv6_addr_type(saddr) != IPV6_ADDR_ANY; if (send_llinfo) @@ -467,14 +541,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *)skb_put(skb, len); + skb->h.raw = (unsigned char *)msg = (struct nd_msg *)skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_SOLICITATION; msg->icmph.icmp6_code = 0; msg->icmph.icmp6_cksum = 0; @@ -493,7 +563,9 @@ csum_partial((__u8 *) msg, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -502,6 +574,9 @@ void ndisc_send_rs(struct net_device *dev, struct in6_addr *saddr, struct in6_addr *daddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct icmp6hdr *hdr; @@ -509,6 +584,22 @@ int len; int err; + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_ROUTER_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, NULL); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm6_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr); if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -520,14 +611,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, NULL, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - hdr = (struct icmp6hdr *) skb_put(skb, len); + skb->h.raw = (unsigned char *)hdr = (struct icmp6hdr *) skb_put(skb, len); hdr->icmp6_type = NDISC_ROUTER_SOLICITATION; hdr->icmp6_code = 0; hdr->icmp6_cksum = 0; @@ -544,13 +631,14 @@ csum_partial((__u8 *) hdr, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRouterSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); } - static void ndisc_error_report(struct neighbour *neigh, struct sk_buff *skb) { /* @@ -1126,6 +1214,8 @@ struct in6_addr *addrp; struct net_device *dev; struct rt6_info *rt; + struct dst_entry *dst; + struct flowi fl; u8 *opt; int rd_len; int err; @@ -1137,6 +1227,22 @@ if (rt == NULL) return; + dst = (struct dst_entry*)rt; + + if (ipv6_get_lladdr(dev, &saddr_buf)) { + ND_PRINTK1("redirect: no link_local addr for dev\n"); + return; + } + + ndisc_flow_init(&fl, NDISC_REDIRECT, &saddr_buf, &skb->nh.ipv6h->saddr); + + dst_clone(dst); + err = xfrm6_lookup(&dst, &fl, NULL, 0); + if (err) { + dst_release(dst); + return; + } + if (rt->rt6i_flags & RTF_GATEWAY) { ND_PRINTK1("ndisc_send_redirect: not a neighbour\n"); dst_release(&rt->u.dst); @@ -1165,11 +1271,6 @@ rd_len &= ~0x7; len += rd_len; - if (ipv6_get_lladdr(dev, &saddr_buf)) { - ND_PRINTK1("redirect: no link_local addr for dev\n"); - return; - } - buff = sock_alloc_send_skb(sk, MAX_HEADER + len + dev->hard_header_len + 15, 0, &err); if (buff == NULL) { @@ -1179,15 +1280,11 @@ hlen = 0; - if (ndisc_build_ll_hdr(buff, dev, &skb->nh.ipv6h->saddr, NULL, len) == 0) { - kfree_skb(buff); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, buff, dev, &saddr_buf, &skb->nh.ipv6h->saddr, IPPROTO_ICMPV6, len); - icmph = (struct icmp6hdr *) skb_put(buff, len); + skb->h.raw = (unsigned char *)icmph = (struct icmp6hdr *) skb_put(buff, len); memset(icmph, 0, sizeof(struct icmp6hdr)); icmph->icmp6_type = NDISC_REDIRECT; @@ -1225,7 +1322,8 @@ len, IPPROTO_ICMPV6, csum_partial((u8 *) icmph, len, 0)); - dev_queue_xmit(buff); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRedirects); ICMP6_INC_STATS(Icmp6OutMsgs); diff -urN linux-2.5.62/net/ipv6/raw.c linux25_for_patch/net/ipv6/raw.c --- linux-2.5.62/net/ipv6/raw.c 2003-02-18 07:56:13.000000000 +0900 +++ linux25_for_patch/net/ipv6/raw.c 2003-02-19 02:36:53.000000000 +0900 @@ -45,6 +45,7 @@ #include #include +#include struct sock *raw_v6_htable[RAWV6_HTABLE_SIZE]; rwlock_t raw_v6_lock = RW_LOCK_UNLOCKED; @@ -304,6 +305,11 @@ struct inet_opt *inet = inet_sk(sk); struct raw6_opt *raw_opt = raw6_sk(sk); + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return NET_RX_DROP; + } + if (!raw_opt->checksum) skb->ip_summed = CHECKSUM_UNNECESSARY; diff -urN linux-2.5.62/net/ipv6/route.c linux25_for_patch/net/ipv6/route.c --- linux-2.5.62/net/ipv6/route.c 2003-02-18 07:56:43.000000000 +0900 +++ linux25_for_patch/net/ipv6/route.c 2003-02-19 02:36:53.000000000 +0900 @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -67,7 +68,6 @@ #define RT6_TRACE(x...) do { ; } while (0) #endif - static int ip6_rt_max_size = 4096; static int ip6_rt_gc_min_interval = 5*HZ; static int ip6_rt_gc_timeout = 60*HZ; @@ -128,6 +128,12 @@ rwlock_t rt6_lock = RW_LOCK_UNLOCKED; +/* Dummy rt for ndisc */ +struct rt6_info *ndisc_get_dummy_rt() +{ + return dst_alloc(&ip6_dst_ops); +} + /* * Route lookup. Any rt6_lock is implied. */ @@ -1815,6 +1821,8 @@ 0, SLAB_HWCACHE_ALIGN, NULL, NULL); fib6_init(); + xfrm6_init(); + #ifdef CONFIG_PROC_FS proc_net_create("ipv6_route", 0, rt6_proc_info); proc_net_create("rt6_stats", 0, rt6_proc_stats); diff -urN linux-2.5.62/net/ipv6/tcp_ipv6.c linux25_for_patch/net/ipv6/tcp_ipv6.c --- linux-2.5.62/net/ipv6/tcp_ipv6.c 2003-02-18 07:56:16.000000000 +0900 +++ linux25_for_patch/net/ipv6/tcp_ipv6.c 2003-02-19 02:36:53.000000000 +0900 @@ -51,6 +51,7 @@ #include #include #include +#include #include @@ -678,6 +679,9 @@ fl.nl_u.ip6_u.daddr = rt0->addr; } + if (!fl.fl6_src) + fl.fl6_src = &np->saddr; + dst = ip6_route_output(sk, &fl); if ((err = dst->error) != 0) { @@ -1638,6 +1642,9 @@ if (sk_filter(sk, skb, 0)) goto discard_and_relse; + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) + goto discard_it; + skb->dev = NULL; bh_lock_sock(sk); @@ -1653,6 +1660,10 @@ return ret; no_tcp_socket: + + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(TcpInErrs); @@ -1674,6 +1685,9 @@ goto discard_it; do_time_wait: + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(TcpInErrs); sock_put(sk); diff -urN linux-2.5.62/net/ipv6/udp.c linux25_for_patch/net/ipv6/udp.c --- linux-2.5.62/net/ipv6/udp.c 2003-02-18 07:56:49.000000000 +0900 +++ linux25_for_patch/net/ipv6/udp.c 2003-02-19 02:36:53.000000000 +0900 @@ -50,6 +50,7 @@ #include #include +#include DEFINE_SNMP_STAT(struct udp_mib, udp_stats_in6); @@ -541,6 +542,11 @@ static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) { + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return -1; + } + #if defined(CONFIG_FILTER) if (sk->filter && skb->ip_summed != CHECKSUM_UNNECESSARY) { if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum))) { @@ -646,6 +652,9 @@ if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto short_packet; + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard; + saddr = &skb->nh.ipv6h->saddr; daddr = &skb->nh.ipv6h->daddr; uh = skb->h.uh; diff -urN linux-2.5.62/net/ipv6/xfrm_input.c linux25_for_patch/net/ipv6/xfrm_input.c --- linux-2.5.62/net/ipv6/xfrm_input.c 1970-01-01 09:00:00.000000000 +0900 +++ linux25_for_patch/net/ipv6/xfrm_input.c 2003-02-19 02:36:53.000000000 +0900 @@ -0,0 +1,316 @@ +/* + * xfrm_input.c - IPv6 IPsec Processing + * Based on net/ipv4/xfrm_input.c + * + * Copyright (C)2003 USAGI/WIDE Project + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Authors: + * KANDA Mitsuru @ USAGI Project + * MIYAZAWA Kazunori @ USAGI Project + * + */ + +#include +#include + +/* Fetch spi and seq frpm ipsec header */ + +static int xfrm6_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq) +{ + int offset, offset_seq; + + switch (nexthdr) { + case IPPROTO_AH: + offset = offsetof(struct ip_auth_hdr, spi); + offset_seq = offsetof(struct ip_auth_hdr, seq_no); + break; + case IPPROTO_ESP: + offset = offsetof(struct ip_esp_hdr, spi); + offset_seq = offsetof(struct ip_esp_hdr, seq_no); + break; + case IPPROTO_COMP: + if (!pskb_may_pull(skb, 4)) + return -EINVAL; + *spi = *(u16*)(skb->h.raw + 2); + *seq = 0; + return 0; + default: + return 1; + } + + if (!pskb_may_pull(skb, 16)) + return -EINVAL; + + *spi = *(u32*)(skb->h.raw + offset); + *seq = *(u32*)(skb->h.raw + offset_seq); + return 0; +} + +static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) +{ + u8 *opt = (u8 *)opthdr; + int len = ipv6_optlen(opthdr); + int off = 0; + int optlen = 0; + + off += 2; + len -= 2; + + while (len > 0) { + + switch (opt[off]) { + + case IPV6_TLV_PAD0: + optlen = 1; + break; + default: + if (len < 2) + goto bad; + optlen = opt[off+1]+2; + if (len < optlen) + goto bad; + if (opt[off] & 0x20) + memset(&opt[off+2], 0, opt[off+1]); + break; + } + + off += optlen; + len -= optlen; + } + if (len == 0) + return 1; + +bad: + return 0; +} + +int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + unsigned int packet_len = skb->tail - skb->nh.raw; + u8 nexthdr = skb->nh.ipv6h->nexthdr; + u8 nextnexthdr = 0; + + *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + + while (offset + 1 <= packet_len) { + + switch (nexthdr) { + + case NEXTHDR_HOP: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun hopopts\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_ROUTING: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_DEST: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_AUTH: + if (dir == XFRM_POLICY_OUT) { + memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, + (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); + } + if (exthdr->nexthdr == NEXTHDR_DEST) { + offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + nextnexthdr = exthdr->nexthdr; + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + } + return nexthdr; + default: + return nexthdr; + } + } + + return nexthdr; +} + +int xfrm6_rcv(struct sk_buff *skb) +{ + int err; + u32 spi, seq; + struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH]; + struct xfrm_state *x; + int xfrm_nr = 0; + int decaps = 0; + struct ipv6hdr *hdr = skb->nh.ipv6h; + unsigned char *tmp_hdr = NULL; + int hdr_len = 0; + u16 nh_offset = 0; + u8 nexthdr = 0; + + if (hdr->nexthdr == IPPROTO_AH || hdr->nexthdr == IPPROTO_ESP) { + nh_offset = ((unsigned char*)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + hdr_len = sizeof(struct ipv6hdr); + } else { + hdr_len = skb->h.raw - skb->nh.raw; + } + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto drop; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); + hdr->priority = 0; + hdr->flow_lbl[0] = 0; + hdr->flow_lbl[1] = 0; + hdr->flow_lbl[2] = 0; + hdr->hop_limit = 0; + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) != 0) + goto drop; + + do { + struct ipv6hdr *iph = skb->nh.ipv6h; + + if (xfrm_nr == XFRM_MAX_DEPTH) + goto drop; + + x = xfrm6_state_lookup(&iph->daddr, spi, nexthdr); + if (x == NULL) + goto drop; + spin_lock(&x->lock); + if (unlikely(x->km.state != XFRM_STATE_VALID)) + goto drop_unlock; + + if (x->props.replay_window && xfrm_replay_check(x, seq)) + goto drop_unlock; + + nexthdr = x->type->input(x, skb); + if (nexthdr <= 0) + goto drop_unlock; + + if (x->props.replay_window) + xfrm_replay_advance(x, seq); + + x->curlft.bytes += skb->len; + x->curlft.packets++; + + spin_unlock(&x->lock); + + xfrm_vec[xfrm_nr++] = x; + + iph = skb->nh.ipv6h; /* ??? */ + + if (nexthdr == NEXTHDR_DEST) { + if (!pskb_may_pull(skb, (skb->h.raw-skb->data)+8) || + !pskb_may_pull(skb, (skb->h.raw-skb->data)+((skb->h.raw[1]+1)<<3))) { + err = -EINVAL; + goto drop; + } + nexthdr = skb->h.raw[0]; + nh_offset = skb->h.raw - skb->nh.raw; + skb_pull(skb, (skb->h.raw[1]+1)<<3); + skb->h.raw = skb->data; + } + + if (x->props.mode) { /* XXX */ + if (iph->nexthdr != IPPROTO_IPV6) + goto drop; + skb->nh.raw = skb->data; + iph = skb->nh.ipv6h; + decaps = 1; + break; + } + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) < 0) + goto drop; + } while (!err); + + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.raw[nh_offset] = nexthdr; + skb->nh.ipv6h->payload_len = htons(hdr_len + skb->len - sizeof(struct ipv6hdr)); + + /* Allocate new secpath or COW existing one. */ + if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { + struct sec_path *sp; + sp = kmem_cache_alloc(secpath_cachep, SLAB_ATOMIC); + if (!sp) + goto drop; + if (skb->sp) { + memcpy(sp, skb->sp, sizeof(struct sec_path)); + secpath_put(skb->sp); + } else + sp->len = 0; + atomic_set(&sp->refcnt, 1); + skb->sp = sp; + } + + if (xfrm_nr + skb->sp->len > XFRM_MAX_DEPTH) + goto drop; + + memcpy(skb->sp->xvec+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(void*)); + skb->sp->len += xfrm_nr; + + if (decaps) { + if (!(skb->dev->flags&IFF_LOOPBACK)) { + dst_release(skb->dst); + skb->dst = NULL; + } + netif_rx(skb); + return 0; + } else { + return -nexthdr; + } + +drop_unlock: + spin_unlock(&x->lock); + xfrm_state_put(x); +drop: + if (tmp_hdr) kfree(tmp_hdr); + while (--xfrm_nr >= 0) + xfrm_state_put(xfrm_vec[xfrm_nr]); + kfree_skb(skb); + return 0; +} + + +void __init xfrm6_input_init(void) +{ + /* do nothing */ +} diff -urN linux-2.5.62/net/ipv6/xfrm_policy.c linux25_for_patch/net/ipv6/xfrm_policy.c --- linux-2.5.62/net/ipv6/xfrm_policy.c 2003-02-18 07:56:43.000000000 +0900 +++ linux25_for_patch/net/ipv6/xfrm_policy.c 2003-02-19 02:36:53.000000000 +0900 @@ -1,5 +1,180 @@ +/* + * + * Changes: + * MIYAZAWA Kazunori @USAGI :IPv6 IPsec Policy Database. + * + */ + + #include #include +#include +#include + +extern struct dst_ops xfrm6_dst_ops; + +/* Limited flow cache. Its function now is to accelerate search for + * policy rules. + * + * Flow cache is private to cpus, at the moment this is important + * mostly for flows which do not match any rule, so that flow lookups + * are absolultely cpu-local. When a rule exists we do some updates + * to rule (refcnt, stats), so that locality is broken. Later this + * can be repaired. + */ + +struct flow_entry +{ + struct flow_entry *next; + struct flowi fl; + u8 dir; + u32 genid; + struct xfrm_policy *pol; +}; + +static kmem_cache_t *flow6_cachep; + +struct flow_entry **flow6_table; + +#define FLOW6CACHE_HASH_SIZE 1024 + +struct xfrm_policy *xfrm6_policy_lookup(int dir, struct flowi *fl); + +static inline u32 flow_hash(struct flowi *fl) +{ + u32 hash = fl->fl6_src->s6_addr32[2] ^ + fl->fl6_src->s6_addr32[3] ^ + fl->uli_u.ports.sport; + + hash = ((hash & 0xF0F0F0F0) >> 4) | ((hash & 0x0F0F0F0F) << 4); + hash ^= fl->fl6_dst->s6_addr32[2] ^ + fl->fl6_dst->s6_addr32[3] ^ + fl->uli_u.ports.dport; + + hash ^= (hash >> 10); + hash ^= (hash >> 20); + return hash & (FLOW6CACHE_HASH_SIZE-1); +} + +static int flow_lwm = 2*FLOW6CACHE_HASH_SIZE; +static int flow_hwm = 4*FLOW6CACHE_HASH_SIZE; + +static int flow_number[NR_CPUS] __cacheline_aligned; + +#define flow_count(cpu) (flow_number[cpu]) + +static void flow6_cache_shrink(int cpu) +{ + int i; + struct flow_entry *fle, **flp; + int shrink_to = flow_lwm/FLOW6CACHE_HASH_SIZE; + + for (i=0; inext; + } + while ((fle=*flp) != NULL) { + *flp = fle->next; + if (fle->pol) + xfrm_pol_put(fle->pol); + kmem_cache_free(flow6_cachep, fle); + } + } +} + +static inline int compare_flowi(struct flowi *fl1, struct flowi *fl2) +{ + return (fl1 == fl2) ? 1 : + fl1->oif == fl2->oif && + fl1->iif == fl2->iif && + fl1->fl6_flowlabel == fl2->fl6_flowlabel && + !memcmp(fl1->fl6_src, fl2->fl6_src, sizeof(struct in6_addr)) && + !memcmp(fl1->fl6_dst, fl2->fl6_dst, sizeof(struct in6_addr)) && + fl1->proto == fl2->proto && + fl1->flags == fl2->flags && + !memcmp(&fl1->uli_u, &fl2->uli_u, sizeof(fl1->uli_u)); +} + +struct xfrm_policy *flow6_lookup(int dir, struct flowi *fl) +{ + struct xfrm_policy *pol; + struct flow_entry *fle; + u32 hash = flow_hash(fl); + int cpu; + + local_bh_disable(); + cpu = smp_processor_id(); + + for (fle = flow6_table[cpu*FLOW6CACHE_HASH_SIZE+hash]; + fle; fle = fle->next) { + if (compare_flowi(fl, &fle->fl) && fle->dir == dir) { + if (fle->genid == xfrm_policy_genid) { + if ((pol = fle->pol) != NULL) + atomic_inc(&pol->refcnt); + local_bh_enable(); + return pol; + } + break; + } + } + + pol = xfrm6_policy_lookup(dir, fl); + + if (fle) { + /* Stale flow entry found. Update it. */ + fle->genid = xfrm_policy_genid; + + if (fle->pol) + xfrm_pol_put(fle->pol); + fle->pol = pol; + if (pol) + atomic_inc(&pol->refcnt); + } else { + if (flow_count(cpu) > flow_hwm) + flow6_cache_shrink(cpu); + fle = kmem_cache_alloc(flow6_cachep, SLAB_ATOMIC); + if (fle) { + flow_count(cpu)++; + fle->fl = *fl; + fle->genid = xfrm_policy_genid; + fle->dir = dir; + fle->pol = pol; + if (pol) + atomic_inc(&pol->refcnt); + fle->next = flow6_table[cpu*FLOW6CACHE_HASH_SIZE+hash]; + flow6_table[cpu*FLOW6CACHE_HASH_SIZE+hash] = fle; + } + } + + local_bh_enable(); + return pol; +} + +void __init flow6_cache_init(void) +{ + int order; + + flow6_cachep = kmem_cache_create("flow6_cache", + sizeof(struct flow_entry), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!flow6_cachep) + panic("NET: failed to allocate flow cache slab\n"); + + for (order = 0; + (PAGE_SIZE<owner); +} +/* Find policy to apply to this flow. */ + +struct xfrm_policy *xfrm6_policy_lookup(int dir, struct flowi *fl) +{ + struct xfrm_policy *pol; + read_lock_bh(&xfrm_policy_lock); + for (pol = xfrm_policy_list[dir]; pol; pol = pol->next) { + struct xfrm_selector *sel = &pol->selector; + if (pol->family != AF_INET6) continue); + if (xfrm6_selector_match(sel, fl)) { + atomic_inc(&pol->refcnt); + break; + } + } + read_unlock_bh(&xfrm_policy_lock); + return pol; +} + +struct xfrm_policy *xfrm6_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl) +{ + struct xfrm_policy *pol; + + read_lock_bh(&xfrm_policy_lock); + if ((pol = sk->policy[dir]) != NULL) { + if (xfrm6_selector_match(&pol->selector, fl)) + atomic_inc(&pol->refcnt); + else + pol = NULL; + } + read_unlock_bh(&xfrm_policy_lock); + return pol; +} + +/* Resolve list of templates for the flow, given policy. */ + +static int +xfrm6_tmpl_resolve(struct xfrm_policy *policy, struct flowi *fl, + struct xfrm_state **xfrm) +{ + int nx; + int i, error; + struct in6_addr *daddr = fl->fl6_dst; + struct in6_addr *saddr = fl->fl6_src; + + for (nx=0, i = 0; i < policy->xfrm_nr; i++) { + struct xfrm_state *x=NULL; + struct in6_addr *remote = daddr; + struct in6_addr *local = saddr; + struct xfrm_tmpl *tmpl = &policy->xfrm_vec[i]; + + if (tmpl->mode) { + remote = (struct in6_addr*)&tmpl->id.daddr; + local = (struct in6_addr*)&tmpl->saddr; + } + + x = xfrm6_state_find(remote, local, fl, tmpl, policy, &error); + + if (x && x->km.state == XFRM_STATE_VALID) { + xfrm[nx++] = x; + daddr = remote; + saddr = local; + continue; + } + + if (x) { + error = (x->km.state == XFRM_STATE_ERROR ? + -EINVAL : -EAGAIN); + xfrm_state_put(x); + } + + if (!tmpl->optional) + goto fail; + } + return nx; + +fail: + for (nx--; nx>=0; nx--) + xfrm_state_put(xfrm[nx]); + return error; +} + +/* Check that the bundle accepts the flow and its components are + * still valid. + */ + +static int xfrm6_bundle_ok(struct xfrm_dst *xdst, struct flowi *fl) +{ + do { + if (xdst->u.dst.ops != &xfrm6_dst_ops) + return 1; + + if (!xfrm6_selector_match(&xdst->u.dst.xfrm->sel, fl)) + return 0; + if (xdst->u.dst.xfrm->km.state != XFRM_STATE_VALID || + xdst->u.dst.path->obsolete > 0) + return 0; + xdst = (struct xfrm_dst*)xdst->u.dst.child; + } while (xdst); + return 0; +} + + +/* Allocate chain of dst_entry's, attach known xfrm's, calculate + * all the metrics... Shortly, bundle a bundle. + */ + +static int +xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int nx, + struct flowi *fl, struct dst_entry **dst_p) +{ + struct dst_entry *dst, *dst_prev; + struct rt6_info *rt0 = (struct rt6_info*)(*dst_p); + struct rt6_info *rt = rt0; + struct in6_addr *remote = fl->fl6_dst; + struct in6_addr *local = fl->fl6_src; + int i; + int err = 0; + int header_len = 0; + + dst = dst_prev = NULL; + + for (i = 0; i < nx; i++) { + struct dst_entry *dst1 = dst_alloc(&xfrm6_dst_ops); + + if (unlikely(dst1 == NULL)) { + err = -ENOBUFS; + goto error; + } + + dst1->xfrm = xfrm[i]; + if (!dst) + dst = dst1; + else { + dst_prev->child = dst1; + dst1->flags |= DST_NOHASH; + dst_clone(dst1); + } + dst_prev = dst1; + if (xfrm[i]->props.mode) { + remote = (struct in6_addr*)&xfrm[i]->id.daddr; + local = (struct in6_addr*)&xfrm[i]->props.saddr; + } + header_len += xfrm[i]->props.header_len; + } + + if (remote != fl->fl6_dst) { + struct flowi fl_tunnel; + memset(&fl_tunnel, 0, sizeof(fl_tunnel)); + fl_tunnel.fl6_dst = remote; + fl_tunnel.fl6_src = local; + + rt = (struct rt6_info *)ip6_route_output(NULL, &fl_tunnel); + if (err) + goto error; + } else { + dst_clone(&rt->u.dst); + } + + dst_prev->child = &rt->u.dst; + for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { + struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; + x->u.rt.fl = *fl; + + dst_prev->dev = rt->u.dst.dev; + if (rt->u.dst.dev) + dev_hold(rt->u.dst.dev); + dst_prev->obsolete = -1; + dst_prev->flags |= DST_HOST; + dst_prev->lastuse = jiffies; + dst_prev->header_len = header_len; + memcpy(&dst_prev->metrics, &rt->u.dst.metrics, sizeof(dst_prev->metrics)); + dst_prev->path = &rt->u.dst; + + /* Copy neighbout for reachability confirmation */ + dst_prev->neighbour = neigh_clone(rt->u.dst.neighbour); + dst_prev->input = rt->u.dst.input; + dst_prev->output = dst_prev->xfrm->type->output; + /* Sheit... I remember I did this right. Apparently, + * it was magically lost, so this code needs audit */ + x->u.rt6.rt6i_flags = rt0->rt6i_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL); + x->u.rt6.rt6i_metric = rt0->rt6i_metric; + x->u.rt6.rt6i_node = rt0->rt6i_node; + x->u.rt6.rt6i_hoplimit = rt0->rt6i_hoplimit; + x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; + memcpy(&x->u.rt6.rt6i_gateway, &rt0->rt6i_gateway, sizeof(x->u.rt6.rt6i_gateway)); + header_len -= x->u.dst.xfrm->props.header_len; + } + *dst_p = dst; + return 0; + +error: + if (dst) + dst_free(dst); + return err; +} + +/* Main function: finds/creates a bundle for given flow. + * + * At the moment we eat a raw IP route. Mostly to speed up lookups + * on interfaces with disabled IPsec. + */ +int xfrm6_lookup(struct dst_entry **dst_p, struct flowi *fl, + struct sock *sk, int flags) +{ + struct xfrm_policy *policy; + struct xfrm_state *xfrm[XFRM_MAX_DEPTH]; + struct rt6_info *rt = (struct rt6_info*)*dst_p; + struct dst_entry *dst; + int nx = 0; + int err; + u32 genid; + + fl->oif = rt->u.dst.dev->ifindex; +restart: + genid = xfrm_policy_genid; + policy = NULL; + + if (sk && sk->policy[1]) + policy = xfrm6_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl); + + if (!policy) { + /* To accelerate a bit... */ + if ((rt->u.dst.flags & DST_NOXFRM) || !xfrm_policy_list[XFRM_POLICY_OUT]) + return 0; + + policy = flow6_lookup(XFRM_POLICY_OUT, fl); + if (!policy) + return 0; + } + + policy->curlft.use_time = (unsigned long)xtime.tv_sec; + + switch (policy->action) { + case XFRM_POLICY_BLOCK: + /* Prohibit the flow */ + xfrm_pol_put(policy); + return -EPERM; + + case XFRM_POLICY_ALLOW: + if (policy->xfrm_nr == 0) { + /* Flow passes not transformed. */ + xfrm_pol_put(policy); + return 0; + } + + /* Try to find matching bundle. + * + * LATER: help from flow cache. It is optional, this + * is required only for output policy. + */ + read_lock_bh(&policy->lock); + for (dst = policy->bundles; dst; dst = dst->next) { + struct xfrm_dst *xdst = (struct xfrm_dst*)dst; + if (!memcmp(&xdst->u.rt6.rt6i_dst, &fl->fl6_dst, sizeof(struct in6_addr)) && + !memcmp(&xdst->u.rt6.rt6i_src, &fl->fl6_src, sizeof(struct in6_addr)) && + xfrm6_bundle_ok(xdst, fl)) { + dst_clone(dst); + break; + } + } + read_unlock_bh(&policy->lock); + + if (dst) + break; + + nx = xfrm6_tmpl_resolve(policy, fl, xfrm); + if (unlikely(nx<0)) { + err = nx; + if (err == -EAGAIN) { + struct task_struct *tsk = current; + DECLARE_WAITQUEUE(wait, tsk); + if (!flags) + goto error; + __set_task_state(tsk, TASK_INTERRUPTIBLE); + add_wait_queue(&km_waitq, &wait); + err = xfrm6_tmpl_resolve(policy, fl, xfrm); + if (err == -EAGAIN) + schedule(); + __set_task_state(tsk, TASK_RUNNING); + remove_wait_queue(&km_waitq, &wait); + if (err == -EAGAIN && signal_pending(current)) { + err = -ERESTART; + goto error; + } + if (err == -EAGAIN || + genid != xfrm_policy_genid) + goto restart; + } + if (err) + goto error; + } else if (nx == 0) { + /* Flow passes not transformed. */ + xfrm_pol_put(policy); + return 0; + } + + dst = &rt->u.dst; + err = xfrm6_bundle_create(policy, xfrm, nx, fl, &dst); + if (unlikely(err)) { + int i; + for (i=0; ilock); + if (unlikely(policy->dead)) { + /* Wow! While we worked on resolving, this + * policy has gone. Retry. It is not paranoia, + * we just cannot enlist new bundle to dead object. + */ + write_unlock_bh(&policy->lock); + + xfrm_pol_put(policy); + if (dst) + dst_free(dst); + goto restart; + } + dst->next = policy->bundles; + policy->bundles = dst; + dst_clone(dst); + write_unlock_bh(&policy->lock); + } + + *dst_p = dst; + xfrm_pol_put(policy); + return 0; + +error: + xfrm_pol_put(policy); + return err; +} + +/* When skb is transformed back to its "native" form, we have to + * check policy restrictions. At the moment we make this in maximally + * stupid way. Shame on me. :-) Of course, connected sockets must + * have policy cached at them. + */ + +static inline int +xfrm_state_ok(struct xfrm_tmpl *tmpl, struct xfrm_state *x) +{ + return x->id.proto == tmpl->id.proto && + (x->id.spi == tmpl->id.spi || !tmpl->id.spi) && + x->props.mode == tmpl->mode && + (tmpl->aalgos & (1<props.aalgo)) && + (!x->props.mode || !ipv6_addr_any((struct in6_addr*)&x->props.saddr) || + !memcmp(&tmpl->saddr, &x->props.saddr, sizeof(struct in6_addr))); +} + +static inline int +xfrm_policy_ok(struct xfrm_tmpl *tmpl, struct sec_path *sp, int idx) +{ + for (; idx < sp->len; idx++) { + if (xfrm_state_ok(tmpl, sp->xvec[idx])) + return ++idx; + } + return -1; +} + +static inline void +_decode_session(struct sk_buff *skb, struct flowi *fl) +{ + struct ipv6hdr *hdr = skb->nh.ipv6h; + u8 *xprth = skb->nh.raw + sizeof(struct ipv6hdr); + + switch (hdr->nexthdr) { + case IPPROTO_UDP: + case IPPROTO_TCP: + case IPPROTO_SCTP: + if (pskb_may_pull(skb, xprth + 4 - skb->data)) { + u16 *ports = (u16 *)xprth; + + fl->uli_u.ports.sport = ports[0]; + fl->uli_u.ports.dport = ports[1]; + } + break; + + case IPPROTO_ESP: + if (pskb_may_pull(skb, xprth + 4 - skb->data)) { + u32 *ehdr = (u32 *)xprth; + + fl->uli_u.spi = ehdr[0]; + } + break; + + case IPPROTO_AH: + if (pskb_may_pull(skb, xprth + 8 - skb->data)) { + u32 *ah_hdr = (u32*)xprth; + + fl->uli_u.spi = ah_hdr[1]; + } + break; + + default: + fl->uli_u.spi = 0; + break; + }; + fl->fl6_dst = &hdr->daddr; + fl->fl6_src = &hdr->saddr; +} + +int __xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + struct xfrm_policy *pol; + struct flowi fl; + + _decode_session(skb, &fl); + + /* First, check used SA against their selectors. */ + if (skb->sp) { + int i; + for (i=skb->sp->len-1; i>=0; i--) { + if (!xfrm6_selector_match(&skb->sp->xvec[i]->sel, &fl)) + return 0; + } + } + + pol = NULL; + + if (sk && sk->policy[dir]) + pol = xfrm6_sk_policy_lookup(sk, dir, &fl); + + if (!pol) + pol = flow6_lookup(dir, &fl); + + if (!pol) + return 1; + + pol->curlft.use_time = (unsigned long)xtime.tv_sec; + + if (pol->action == XFRM_POLICY_ALLOW) { + if (pol->xfrm_nr != 0) { + struct sec_path *sp; + int i, k; + + if ((sp = skb->sp) == NULL) + goto reject; + + /* For each tmpl search corresponding xfrm. + * Order is _important_. Later we will implement + * some barriers, but at the moment barriers + * are implied between each two transformations. + */ + for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { + k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); + if (k < 0) + goto reject; + } + } + xfrm_pol_put(pol); + return 1; + } + +reject: + xfrm_pol_put(pol); + return 0; +} + +int __xfrm6_route_forward(struct sk_buff *skb) +{ + struct flowi fl; + + _decode_session(skb, &fl); + + return xfrm6_lookup(&skb->dst, &fl, NULL, 0) == 0; +} + +/* Optimize later using cookies and generation ids. */ + +static struct dst_entry *xfrm6_dst_check(struct dst_entry *dst, u32 cookie) +{ + struct dst_entry *child = dst; + + while (child) { + if (child->obsolete > 0 || + (child->xfrm && child->xfrm->km.state != XFRM_STATE_VALID)) { + dst_release(dst); + return NULL; + } + child = child->child; + } + + return dst; +} + +static void xfrm6_dst_destroy(struct dst_entry *dst) +{ + xfrm_state_put(dst->xfrm); + dst->xfrm = NULL; +} + +static void xfrm6_link_failure(struct sk_buff *skb) +{ + /* Impossible. Such dst must be popped before reaches point of failure. */ + return; +} + +static struct dst_entry *xfrm6_negative_advice(struct dst_entry *dst) +{ + if (dst) { + if (dst->obsolete) { + dst_release(dst); + dst = NULL; + } + } + return dst; +} + + +static int xfrm6_garbage_collect(void) +{ + int i; + struct xfrm_policy *pol; + struct dst_entry *dst, **dstp, *gc_list = NULL; + + read_lock_bh(&xfrm_policy_lock); + for (i=0; i<2*XFRM_POLICY_MAX; i++) { + for (pol = xfrm_policy_list[i]; pol; pol = pol->next) { + write_lock(&pol->lock); + dstp = &pol->bundles; + while ((dst=*dstp) != NULL) { + if (atomic_read(&dst->__refcnt) == 0) { + *dstp = dst->next; + dst->next = gc_list; + gc_list = dst; + } else { + dstp = &dst->next; + } + } + write_unlock(&pol->lock); + } + } + read_unlock_bh(&xfrm_policy_lock); + + while (gc_list) { + dst = gc_list; + gc_list = dst->next; + dst_free(dst); + } + + return (atomic_read(&xfrm6_dst_ops.entries) > xfrm6_dst_ops.gc_thresh*2); +} + +static int bundle_depends_on(struct dst_entry *dst, struct xfrm_state *x) +{ + do { + if (dst->xfrm == x) + return 1; + } while ((dst = dst->child) != NULL); + return 0; +} + +int xfrm6_flush_bundles(struct xfrm_state *x) +{ + int i; + struct xfrm_policy *pol; + struct dst_entry *dst, **dstp, *gc_list = NULL; + + read_lock_bh(&xfrm_policy_lock); + for (i=0; i<2*XFRM_POLICY_MAX; i++) { + for (pol = xfrm_policy_list[i]; pol; pol = pol->next) { + write_lock(&pol->lock); + dstp = &pol->bundles; + while ((dst=*dstp) != NULL) { + if (bundle_depends_on(dst, x)) { + *dstp = dst->next; + dst->next = gc_list; + gc_list = dst; + } else { + dstp = &dst->next; + } + } + write_unlock(&pol->lock); + } + } + read_unlock_bh(&xfrm_policy_lock); + + while (gc_list) { + dst = gc_list; + gc_list = dst->next; + dst_free(dst); + } + + return 0; +} + +static void xfrm6_update_pmtu(struct dst_entry *dst, u32 mtu) +{ + struct dst_entry *path = dst->path; + + if (mtu < 68 + dst->header_len) + return; + + path->ops->update_pmtu(path, mtu); +} + +/* Well... that's _TASK_. We need to scan through transformation + * list and figure out what mss tcp should generate in order to + * final datagram fit to mtu. Mama mia... :-) + * + * Apparently, some easy way exists, but we used to choose the most + * bizarre ones. :-) So, raising Kalashnikov... tra-ta-ta. + * + * Consider this function as something like dark humour. :-) + */ +static int xfrm6_get_mss(struct dst_entry *dst, u32 mtu) +{ + int res = mtu - dst->header_len; + + for (;;) { + struct dst_entry *d = dst; + int m = res; + + do { + struct xfrm_state *x = d->xfrm; + if (x) { + spin_lock_bh(&x->lock); + if (x->km.state == XFRM_STATE_VALID && + x->type && x->type->get_max_size) + m = x->type->get_max_size(d->xfrm, m); + else + m += x->props.header_len; + spin_unlock_bh(&x->lock); + } + } while ((d = d->child) != NULL); + + if (m <= mtu) + break; + res -= (m - mtu); + if (res < 88) + return mtu; + } + + return res + dst->header_len; +} + +struct dst_ops xfrm6_dst_ops = { + .family = AF_INET6, + .protocol = __constant_htons(ETH_P_IPV6), + .gc = xfrm6_garbage_collect, + .check = xfrm6_dst_check, + .destroy = xfrm6_dst_destroy, + .negative_advice = xfrm6_negative_advice, + .link_failure = xfrm6_link_failure, + .update_pmtu = xfrm6_update_pmtu, + .get_mss = xfrm6_get_mss, + .gc_thresh = 1024, + .entry_size = sizeof(struct xfrm_dst), +}; + +void __init xfrm6_init(void) +{ + xfrm6_dst_ops.kmem_cachep = kmem_cache_create("xfrm6_dst_cache", + sizeof(struct xfrm_dst), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!xfrm6_dst_ops.kmem_cachep) + panic("IP: failed to allocate xfrm6_dst_cache\n"); + + flow6_cache_init(); + xfrm6_input_init(); + +} From davem@redhat.com Tue Feb 18 20:57:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 20:57:45 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J4vg3v027826 for ; Tue, 18 Feb 2003 20:57:43 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA17633; Tue, 18 Feb 2003 20:50:38 -0800 Date: Tue, 18 Feb 2003 20:50:37 -0800 (PST) Message-Id: <20030218.205037.133906611.davem@redhat.com> To: Kazunori.Miyazawa@jp.yokogawa.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1724 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori MIyazawa Date: Wed, 19 Feb 2003 13:48:50 +0900 I'm MIYAZAWA@USAGI. Hello Miyazawa-san, This is a patch to support IPv6 IPsec on linux-2.5.62. It work well. Thank you for this work. Alexey and I will review and work with your patch. I must ask, have you been working together with Kunihiro Ishiguro ? Or are you seperately doing the same work? It would be great if these two teams worked together. There is no reason to duplicate effort. All people doing work will get full credit. The only thing necessary is to send me patches to add credits to the comments. So nobody needs to fear that their contribution will go unnoticed. From kunihiro@ipinfusion.com Tue Feb 18 21:07:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 21:07:57 -0800 (PST) Received: from localhost.localdomain (ip-64-139-11-202.dsl.sca.megapath.net [64.139.11.202]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J57s3v028309 for ; Tue, 18 Feb 2003 21:07:54 -0800 Received: from localhost.localdomain (vprmatrix [127.0.0.1]) by localhost.localdomain (8.12.5/8.12.5) with ESMTP id h1J5AXUX004082; Wed, 19 Feb 2003 14:10:34 +0900 Date: Tue, 18 Feb 2003 21:10:33 -0800 Message-ID: <871y244zzq.wl@ipinfusion.com> From: Kunihiro Ishiguro To: "David S. Miller" Cc: Kazunori.Miyazawa@jp.yokogawa.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: <20030218.205037.133906611.davem@redhat.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.205037.133906611.davem@redhat.com> User-Agent: Wanderlust/2.10.0 (Venus) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.2.92 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1725 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kunihiro@ipinfusion.com Precedence: bulk X-list: netdev >Thank you for this work. Alexey and I will review and work with your >patch. > >I must ask, have you been working together with Kunihiro Ishiguro >? Or are you seperately doing the same work? We are doing the same work separately. >It would be great if these two teams worked together. There is no >reason to duplicate effort. I agree. >All people doing work will get full credit. The only thing >necessary is to send me patches to add credits to the comments. >So nobody needs to fear that their contribution will go unnoticed. Yes. -- Kunihiro Ishiguro From mk@karaba.org Tue Feb 18 21:13:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 21:13:40 -0800 (PST) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J5Db3v028740 for ; Tue, 18 Feb 2003 21:13:38 -0800 Received: from [3ffe:501:1057:710::1] (helo=hyakusiki.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 18lMbF-00071v-00; Wed, 19 Feb 2003 14:16:53 +0900 Date: Wed, 19 Feb 2003 14:17:55 +0900 Message-ID: From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: Kunihiro Ishiguro , "David S. Miller" , kuznet@ms2.inr.ac.ru Cc: Kazunori.Miyazawa@jp.yokogawa.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: <871y244zzq.wl@ipinfusion.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.205037.133906611.davem@redhat.com> <871y244zzq.wl@ipinfusion.com> User-Agent: User-Agent: Wanderlust/2.10.0 (Venus) Emacs/21.2 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1726 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev At Tue, 18 Feb 2003 21:10:33 -0800, Kunihiro Ishiguro wrote: > > >Thank you for this work. Alexey and I will review and work with your > >patch. > > > >I must ask, have you been working together with Kunihiro Ishiguro > >? Or are you seperately doing the same work? > > We are doing the same work separately. Yes, it's a matter for this... > >It would be great if these two teams worked together. There is no > >reason to duplicate effort. > > I agree. me too. So we should list up next ToDos. > >All people doing work will get full credit. The only thing > >necessary is to send me patches to add credits to the comments. > >So nobody needs to fear that their contribution will go unnoticed. > > Yes. I agree. ---------------------------------------- Mitsuru KANDA (mk@karaba.org) Toshiba Reseach & Development Center Communication Platform Laboratory (mk@isl.rdc.toshiba.co.jp) USAGI Project (mk@linux-ipv6.org) From yoshfuji@linux-ipv6.org Tue Feb 18 21:26:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 21:26:42 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J5Qc3v029658 for ; Tue, 18 Feb 2003 21:26:39 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1J5U5GR031996; Wed, 19 Feb 2003 14:30:05 +0900 Date: Wed, 19 Feb 2003 14:30:04 +0900 (JST) Message-Id: <20030219.143004.26675950.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: Kazunori.Miyazawa@jp.yokogawa.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030218.205037.133906611.davem@redhat.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.205037.133906611.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1727 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030218.205037.133906611.davem@redhat.com> (at Tue, 18 Feb 2003 20:50:37 -0800 (PST)), "David S. Miller" says: > I must ask, have you been working together with Kunihiro Ishiguro > ? Or are you seperately doing the same work? Unfortunately, we're doing seperately. :-p > It would be great if these two teams worked together. There is no > reason to duplicate effort. Aggreed, but we couldn't seek his code while our repositories are open and we sent you our patch in public. > All people doing work will get full credit. The only thing > necessary is to send me patches to add credits to the comments. > So nobody needs to fear that their contribution will go unnoticed. Thanks. We'll do it later. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From kazunori@miyazawa.org Tue Feb 18 21:40:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 21:40:34 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J5eU3v030676 for ; Tue, 18 Feb 2003 21:40:31 -0800 Received: from sanmarino (softdnserr [3ffe:501:41c:3:202:b3ff:fe05:cc74]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Wed, 19 Feb 2003 14:32:29 +0900 Date: Wed, 19 Feb 2003 14:58:04 +0900 From: Kazunori Miyazawa To: "Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQiI=?= "@miyazawa.org Cc: kunihiro@ipinfusion.com, davem@redhat.com, kuznet@ms2.inr.ac.ru, Kazunori.Miyazawa@jp.yokogawa.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support Message-Id: <20030219145804.5669ebee.kazunori@miyazawa.org> In-Reply-To: References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.205037.133906611.davem@redhat.com> <871y244zzq.wl@ipinfusion.com> X-Mailer: Sylpheed version 0.8.6 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-archive-position: 1728 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Wed, 19 Feb 2003 14:17:55 +0900 "Mitsuru KANDA / $B?@ED(B $B=<(B" wrote: > At Tue, 18 Feb 2003 21:10:33 -0800, > Kunihiro Ishiguro wrote: > > > > >Thank you for this work. Alexey and I will review and work with your > > >patch. > > > > > >I must ask, have you been working together with Kunihiro Ishiguro > > >? Or are you seperately doing the same work? > > > > We are doing the same work separately. > Yes, it's a matter for this... > We are developing separately. Yes, we should work together. > > >It would be great if these two teams worked together. There is no > > >reason to duplicate effort. > > > > I agree. > me too. > So we should list up next ToDos. > I knew he work on IPv6 IPsec. I had not known his status. But I knew his status with linux-2.5.62 first. > > >All people doing work will get full credit. The only thing > > >necessary is to send me patches to add credits to the comments. > > >So nobody needs to fear that their contribution will go unnoticed. > > > > Yes. > I agree. > OK. I suggested beause his work was similar to mine, which I sent you on 1/7. But I don't appeal anymore. Please forget it. I'm sorry if you felt discomfort. --Kazunori Miyazawa (Yokogawa Electric Corporation) From kunihiro@ipinfusion.com Tue Feb 18 21:54:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 21:54:59 -0800 (PST) Received: from localhost.localdomain (ip-64-139-11-202.dsl.sca.megapath.net [64.139.11.202]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J5su3v031271 for ; Tue, 18 Feb 2003 21:54:56 -0800 Received: from localhost.localdomain (vprmatrix [127.0.0.1]) by localhost.localdomain (8.12.5/8.12.5) with ESMTP id h1J5vdUX004144; Wed, 19 Feb 2003 14:57:39 +0900 Date: Tue, 18 Feb 2003 21:57:39 -0800 Message-ID: <87znos3j8s.wl@ipinfusion.com> From: Kunihiro Ishiguro To: Kazunori MIyazawa Cc: netdev@oss.sgi.com, usagi-core@linux-ipv6.org, davem@redhat.com, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> User-Agent: Wanderlust/2.10.0 (Venus) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.2.92 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1729 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kunihiro@ipinfusion.com Precedence: bulk X-list: netdev I just look through the patch. Here is my quick comments. I think no need of broadcasting my comments to kernel ML, so I took it of from CC:. netdev guys will be interested in right? So I kept it. 1. Do we really need to remove AH header from skb? In case of IPv4 we modify iph->protocol for further processing thus AH header is removed. But in case of IPv6, we just simply authenticate the packet then process next header. So do we really need to remove AH header in IPv6? Remaining AH header does not harm... 2. Easy kmalloc()... It seems there are some easy kmalloc(). Yes I'm stingy with memory. Let's say no AH mutable option field in IPv6 extention headers (actually it is very common case). We just need char work_buf[8] to retain IPv6 header mutable field. But all the time the patch allocate complete copy of the header including extention header then keep it in the chamber.... + int hdr_len = skb->h.raw - skb->nh.raw; ... + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); I think we should provision the need of mutation then allocate exactly required memory. If there no need of allocation, that's good news. Let me provide code for it. 3. xfrm6_policy_lookup() + if (pol->family != AF_INET6) continue); Last paren ;-). Well, I'll find more. Maybe we should be offline and come up with a single patch. -- Kunihiro Ishiguro From davem@redhat.com Tue Feb 18 23:09:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 23:09:11 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J7963v001847 for ; Tue, 18 Feb 2003 23:09:07 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA18149; Tue, 18 Feb 2003 23:02:11 -0800 Date: Tue, 18 Feb 2003 23:02:11 -0800 (PST) Message-Id: <20030218.230211.89243941.davem@redhat.com> To: kunihiro@ipinfusion.com Cc: Kazunori.Miyazawa@jp.yokogawa.com, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <87znos3j8s.wl@ipinfusion.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <87znos3j8s.wl@ipinfusion.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1730 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kunihiro Ishiguro Date: Tue, 18 Feb 2003 21:57:39 -0800 I think no need of broadcasting my comments to kernel ML, so I took it of from CC:. netdev guys will be interested in right? So I kept it. Yes, this is fine. 1. Do we really need to remove AH header from skb? In case of IPv4 we modify iph->protocol for further processing thus AH header is removed. But in case of IPv6, we just simply authenticate the packet then process next header. So do we really need to remove AH header in IPv6? Remaining AH header does not harm... This is an interesting topic. Actually, there is no reason to prefer one way or another. Remember, if anyone else is interested in SKB contents (such as tcpdump), that entity has clone of skb and can still see full contents. 2. Easy kmalloc()... It seems there are some easy kmalloc(). Yes I'm stingy with memory. It is another fun topic. These are great long term improvements. But for now, please consider something important when evaluating "overhead". This is the fact that we are performing full encryption or hash function. Such operation is quite massively more expensive than kmalloc here and there. Some day we will have hw acceleration support both at IPSEC and at crypto library level. At that time cost analysis will change. Well, I'll find more. Maybe we should be offline and come up with a single patch. I would ask that Alexey and myself stay on the CC: list. It would not hurt to keep netdev as well, perhaps we can breed some new experts in our ipsec code :-) From davem@redhat.com Tue Feb 18 23:20:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 23:20:25 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J7KM3v002564 for ; Tue, 18 Feb 2003 23:20:23 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA18201; Tue, 18 Feb 2003 23:13:25 -0800 Date: Tue, 18 Feb 2003 23:13:24 -0800 (PST) Message-Id: <20030218.231324.44469350.davem@redhat.com> To: Kazunori.Miyazawa@jp.yokogawa.com Cc: netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1731 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori MIyazawa Date: Wed, 19 Feb 2003 13:48:50 +0900 Please let me know if you have some ideas and/or comments. Hello again Miyazawa-san, I give you my initial comment. I see very quickly that the ipv6 side of implementation will give lots of opportunity for code sharing. But we must plan it correctly :-) I also wish to avoid exporting internal xfrm objects to ipv6 module. So, let us discuss one example: diff -urN linux-2.5.62/net/ipv4/xfrm_input.c linux25_for_patch/net/ipv4/xfrm_input.c --- linux-2.5.62/net/ipv4/xfrm_input.c 2003-02-18 07:55:50.000000000 +0900 +++ linux25_for_patch/net/ipv4/xfrm_input.c 2003-02-19 02:36:53.000000000 +0900 @@ -1,7 +1,7 @@ #include #include -static kmem_cache_t *secpath_cachep; +kmem_cache_t *secpath_cachep; void __secpath_destroy(struct sec_path *sp) { I understand why you need this, for xfrm6_rcv(). This is fine. However, it would be even better to put xfrm6_rcv() into net/ipv4/xfrm_input.c, protected by CONFIG_IPV6 || CONFIG_IPV6_MODULE ifdef. In this way we may split out identical pieces of code which occur in xfrm4_rcv() and xfrm6_rcv(). Then we merely need to export the xfrm6_rcv symbol for sake of ipv6 as module. In fact, nearly %90 of these two functions are identical. From davem@redhat.com Tue Feb 18 23:39:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 18 Feb 2003 23:39:59 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J7dt3v003316 for ; Tue, 18 Feb 2003 23:39:56 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA18229; Tue, 18 Feb 2003 23:33:01 -0800 Date: Tue, 18 Feb 2003 23:33:01 -0800 (PST) Message-Id: <20030218.233301.98333082.davem@redhat.com> To: Kazunori.Miyazawa@jp.yokogawa.com Cc: netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1732 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev As promised, some more comments: 1) Please, can you split out seperate patch for changes to net/ipv4/xfrm_user.c? They are independant. Kunihiro sent me identical patch, so please could you add him to credits in comment? Thank you. 2) I believe that net/ipv6/xfrm_policy.c is another area for more code sharing. Any time that I see removal of 'static', it is clue to me :-) Short term you can do as I suggested for secpath_cachep issue, that is to move this new code to net/ipv4/xfrm_policy.c as it is, conditionalized by CONFIG_IPV6 || CONFIG_IPV6_MODULE. Later we can work on increased code sharing here. 3) I noticed comment above transformation from explicit dst->output() call to dst_output(). It is not IPSEC issue, rather I believe that entire tree should have this conversion eventually. The concept of stackable destination cache entries is a generic one. 4) I believe some module symbol exports are missing to handle ipv6 as module. For example, for skb_ah_walk and skb_esp_walk. The rest of code looks fine to me. Now is not the time to get picky about small details, let us only get first draft basically correct. From kunihiro@ipinfusion.com Wed Feb 19 01:06:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 01:06:13 -0800 (PST) Received: from localhost.localdomain (ip-64-139-11-202.dsl.sca.megapath.net [64.139.11.202]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1J9643v004875 for ; Wed, 19 Feb 2003 01:06:04 -0800 Received: from localhost.localdomain (vprmatrix [127.0.0.1]) by localhost.localdomain (8.12.5/8.12.5) with ESMTP id h1J9Dpan001112; Wed, 19 Feb 2003 18:13:52 +0900 Date: Wed, 19 Feb 2003 01:13:51 -0800 Message-ID: <87of58hbu8.wl@ipinfusion.com> From: Kunihiro Ishiguro To: "David S. Miller" Cc: Kazunori.Miyazawa@jp.yokogawa.com, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: <20030218.230211.89243941.davem@redhat.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <87znos3j8s.wl@ipinfusion.com> <20030218.230211.89243941.davem@redhat.com> User-Agent: Wanderlust/2.10.0 (Venus) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.2.92 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1733 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kunihiro@ipinfusion.com Precedence: bulk X-list: netdev >I would ask that Alexey and myself stay on the CC: list. > >It would not hurt to keep netdev as well, perhaps we can >breed some new experts in our ipsec code :-) I believe many ipsec experts on this list ;-). >@@ -428,20 +455,79 @@ > static inline int > xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) > { >- return !memcmp(fl->fl6_dst, sel->daddr.a6, sizeof(struct in6_addr)) && >- !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && >- !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && >- (fl->proto == sel->proto || !sel->proto) && >- (fl->oif == sel->ifindex || !sel->ifindex) && >- !memcmp(fl->fl6_src, sel->saddr.a6, sizeof(struct in6_addr)); >+ return !memcmp(fl->fl6_dst, &sel->daddr, (sel->prefixlen_d)/8) && >+ !memcmp(fl->fl6_src, &sel->saddr, (sel->prefixlen_s)/8) && >+ !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && >+ !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && >+ (fl->proto == sel->proto || !sel->proto) && >+ (fl->oif == sel->ifindex || !sel->ifindex); > } memcmp with prefixlen/8 is too generous. Orignal non mask comparison is much worser (maybe my code...). We need bit comparison here. Poor xfrm6_selector_match()... I only have below idea... addr_match() is taken from ip6_fib.c... static __inline__ int addr_match(void *token1, void *token2, int prefixlen) { __u32 *a1 = token1; __u32 *a2 = token2; int pdw; int pbi; pdw = prefixlen >> 5; /* num of whole __u32 in prefix */ pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ if (pdw) if (memcmp(a1, a2, pdw << 2)) return 0; if (pbi) { __u32 mask; mask = htonl((0xffffffff) << (32 - pbi)); if ((a1[pdw] ^ a2[pdw]) & mask) return 0; } return 1; } static inline int xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) { return addr_match(fl->fl6_dst, &sel->daddr, sel->prefixlen_d) && addr_match(fl->fl6_src, &sel->saddr, sel->prefixlen_s) && !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && (fl->proto == sel->proto || !sel->proto) && (fl->oif == sel->ifindex || !sel->ifindex); } From kazunori@miyazawa.org Wed Feb 19 06:30:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 06:30:37 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JEUF3v022578 for ; Wed, 19 Feb 2003 06:30:16 -0800 Received: from monza.miyazawa.org ([::ffff:192.168.0.3]) (IDENT: miyazawa, AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Wed, 19 Feb 2003 23:22:14 +0900 Date: Wed, 19 Feb 2003 23:39:15 +0900 From: Kazunori MIyazawa To: usagi-core@linux-ipv6.org Cc: davem@redhat.com, Kazunori.Miyazawa@jp.yokogawa.com, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: (usagi-core 11926) Re: [PATCH] IPv6 IPsec support Message-Id: <20030219233915.130a26e3.kazunori@miyazawa.org> In-Reply-To: <20030218.233301.98333082.davem@redhat.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.233301.98333082.davem@redhat.com> X-Mailer: Sylpheed version 0.8.9 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1734 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Thank you, David. On Tue, 18 Feb 2003 23:33:01 -0800 (PST) "David S. Miller" wrote: > > As promised, some more comments: > > 1) Please, can you split out seperate patch for changes > to net/ipv4/xfrm_user.c? They are independant. > > Kunihiro sent me identical patch, so please could you > add him to credits in comment? Thank you. > OK. We will do it. > 2) I believe that net/ipv6/xfrm_policy.c is another area > for more code sharing. > > Any time that I see removal of 'static', it is clue to > me :-) > > Short term you can do as I suggested for secpath_cachep > issue, that is to move this new code to net/ipv4/xfrm_policy.c > as it is, conditionalized by CONFIG_IPV6 || CONFIG_IPV6_MODULE. > > Later we can work on increased code sharing here. > I see, I will move ours into net/ipv4/xfrm_policy.c > 3) I noticed comment above transformation from > explicit dst->output() call to dst_output(). > > It is not IPSEC issue, rather I believe that entire tree should > have this conversion eventually. The concept of stackable > destination cache entries is a generic one. > Please let me understand. I think dst->output calls each dst output routine chains but those could not process the return value NET_XMIT_BYPASS returned from ah and/or esp. Is this out of scope of IPsec? > 4) I believe some module symbol exports are missing to handle > ipv6 as module. > > For example, for skb_ah_walk and skb_esp_walk. > Thank you, I will check them. --Kazunori Miyazawa (Yokogawa Electric Corporation) From mk@karaba.org Wed Feb 19 08:47:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 08:47:24 -0800 (PST) Received: from zanzibar.karaba.org (karaba.org [218.219.152.88] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JGlE3v028542 for ; Wed, 19 Feb 2003 08:47:15 -0800 Received: from [3ffe:501:1057:710::1] (helo=hyakusiki.karaba.org) by zanzibar.karaba.org with esmtp (Exim 3.35 #1 (Debian)) id 18lXVX-0004QE-00; Thu, 20 Feb 2003 01:55:43 +0900 Date: Thu, 20 Feb 2003 01:56:48 +0900 Message-ID: From: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: "David S. Miller" Cc: kunihiro@ipinfusion.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: <20030218.233301.98333082.davem@redhat.com> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.233301.98333082.davem@redhat.com> User-Agent: User-Agent: Wanderlust/2.10.0 (Venus) Emacs/21.2 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1735 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mk@karaba.org Precedence: bulk X-list: netdev Hello David, > 1) Please, can you split out seperate patch for changes > to net/ipv4/xfrm_user.c? They are independant. > > Kunihiro sent me identical patch, so please could you > add him to credits in comment? Thank you. I attached xfrm_user.c patch below. Just FYI, the IPv6 part of this patch depends xfrm6_state_lookup(). Sincerely, Mitsuru KANDA (mk@karaba.org) USAGI Project (mk@linux-ipv6.org) diff -uNr linux-2.5.62.org/net/ipv4/xfrm_user.c linux-2.5.62/net/ipv4/xfrm_user.c --- linux-2.5.62.org/net/ipv4/xfrm_user.c 2003-02-18 07:56:17.000000000 +0900 +++ linux-2.5.62/net/ipv4/xfrm_user.c 2003-02-20 00:00:57.000000000 +0900 @@ -1,6 +1,13 @@ /* xfrm_user.c: User interface to configure xfrm engine. * * Copyright (C) 2002 David S. Miller (davem@redhat.com) + * + * Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * */ #include @@ -17,6 +24,9 @@ #include #include #include +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) +#include +#endif #include #include @@ -63,11 +73,13 @@ case AF_INET: break; - case AF_INET6: /* XXX */ - err = -EAFNOSUPPORT; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + break; +#endif - /* fallthru */ default: + err = -EAFNOSUPPORT; goto out; }; @@ -206,8 +218,21 @@ if (!x) return err; - x1 = xfrm_state_lookup(x->props.saddr.xfrm4_addr, - x->id.spi, x->id.proto); + switch (p->family) { + case AF_INET: + x1 = xfrm_state_lookup(x->props.saddr.xfrm4_addr, + x->id.spi, x->id.proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x1 = xfrm6_state_lookup((struct in6_addr *)&x->props.saddr, + x->id.spi,x->id.proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + if (x1) { xfrm_state_put(x); xfrm_state_put(x1); @@ -224,7 +249,19 @@ struct xfrm_state *x; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); - x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + switch (p->family) { + case AF_INET: + x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_state_lookup((struct in6_addr *)&p->saddr, p->spi, p->proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + if (x == NULL) return -ESRCH; @@ -342,7 +379,19 @@ struct sk_buff *resp_skb; int err; - x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + switch (p->family) { + case AF_INET: + x = xfrm_state_lookup(p->saddr.xfrm4_addr, p->spi, p->proto); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_state_lookup((struct in6_addr *)&p->saddr, p->spi, p->proto); + break; +#endif + default: + return -EAFNOSUPPORT; + } + err = -ESRCH; if (x == NULL) goto out_noput; @@ -393,9 +442,25 @@ err = verify_userspi_info(p); if (err) goto out_noput; - x = xfrm_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, - p->info.sel.daddr.xfrm4_addr, - p->info.sel.saddr.xfrm4_addr, 1); + + switch (p->info.family) { + case AF_INET: + x = xfrm_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, + p->info.sel.daddr.xfrm4_addr, + p->info.sel.saddr.xfrm4_addr, 1); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + x = xfrm6_find_acq(p->info.mode, p->info.reqid, p->info.id.proto, + (struct in6_addr *)&p->info.sel.daddr, + (struct in6_addr *)&p->info.sel.saddr, 1); + break; +#endif + default: + err = -EAFNOSUPPORT; + goto out_noput; + } + err = -ENOENT; if (x == NULL) goto out_noput; From yoshfuji@linux-ipv6.org Wed Feb 19 09:20:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 09:20:11 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JHK03v029268 for ; Wed, 19 Feb 2003 09:20:02 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1JHSefV003784; Thu, 20 Feb 2003 02:28:40 +0900 Date: Thu, 20 Feb 2003 02:28:40 +0900 (JST) Message-Id: <20030220.022840.48274187.yoshfuji@linux-ipv6.org> To: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com CC: usagi@linux-ipv6.org Subject: [PATCH] dst->{in,out}put() clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1736 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. This patch removes dst->input() and dst->output(), and use dst_input() and dst_output() instead. Patch is against linux-2.5.62. Thank you in advance. ------------------------------------------------------------------- Patch-Name: dst->{in,out}put() clean-up. Patch-Id: FIX_2_5_62_CLEANUP-20030219 Patch-Author: YOSHIFUJI Hideaki / USAGI Project Credit: YOSHIFUJI Hideaki / USAGI Project , David Miller ------------------------------------------------------------------- Index: include/net/dn_route.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/dn_route.h,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.18.1 diff -u -r1.1.1.2 -r1.1.1.2.18.1 --- include/net/dn_route.h 14 Oct 2002 13:07:48 -0000 1.1.1.2 +++ include/net/dn_route.h 19 Feb 2003 13:46:37 -0000 1.1.1.2.18.1 @@ -122,7 +122,7 @@ if ((dst = sk->dst_cache) && !dst->obsolete) { try_again: skb->dst = dst_clone(dst); - dst->output(skb); + dst_output(skb); return; } Index: net/decnet/dn_nsp_out.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/decnet/dn_nsp_out.c,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.24.1 diff -u -r1.1.1.1 -r1.1.1.1.24.1 --- net/decnet/dn_nsp_out.c 7 Oct 2002 10:20:39 -0000 1.1.1.1 +++ net/decnet/dn_nsp_out.c 19 Feb 2003 13:44:54 -0000 1.1.1.1.24.1 @@ -593,7 +593,7 @@ * associations. */ skb->dst = dst_clone(dst); - skb->dst->output(skb); + dst_output(skb); } Index: net/decnet/dn_route.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/decnet/dn_route.c,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.8.1 diff -u -r1.1.1.4 -r1.1.1.4.8.1 --- net/decnet/dn_route.c 11 Nov 2002 04:09:01 -0000 1.1.1.4 +++ net/decnet/dn_route.c 19 Feb 2003 13:44:54 -0000 1.1.1.4.8.1 @@ -389,7 +389,7 @@ int err; if ((err = dn_route_input(skb)) == 0) - return skb->dst->input(skb); + return dst_input(skb); if (decnet_debug_level & 4) { char *devname = skb->dev ? skb->dev->name : "???"; Index: net/ipv4/igmp.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv4/igmp.c,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.14.1 diff -u -r1.1.1.4 -r1.1.1.4.14.1 --- net/ipv4/igmp.c 30 Oct 2002 09:43:15 -0000 1.1.1.4 +++ net/ipv4/igmp.c 19 Feb 2003 13:44:54 -0000 1.1.1.4.14.1 @@ -184,12 +184,12 @@ #define IGMP_SIZE (sizeof(struct igmphdr)+sizeof(struct iphdr)+4) -/* Don't just hand NF_HOOK skb->dst->output, in case netfilter hook +/* Don't just hand NF_HOOK dst_output, in case netfilter hook changes route */ static inline int output_maybe_reroute(struct sk_buff *skb) { - return skb->dst->output(skb); + return dst_output(skb); } static int igmp_send_report(struct net_device *dev, u32 group, int type) Index: net/ipv4/ip_input.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv4/ip_input.c,v retrieving revision 1.1.1.6 retrieving revision 1.1.1.6.6.1 diff -u -r1.1.1.6 -r1.1.1.6.6.1 --- net/ipv4/ip_input.c 9 Jan 2003 11:14:34 -0000 1.1.1.6 +++ net/ipv4/ip_input.c 19 Feb 2003 13:44:54 -0000 1.1.1.6.6.1 @@ -344,7 +344,7 @@ } } - return skb->dst->input(skb); + return dst_input(skb); inhdr_error: IP_INC_STATS_BH(IpInHdrErrors); Index: net/ipv4/ipmr.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv4/ipmr.c,v retrieving revision 1.1.1.6 retrieving revision 1.1.1.6.8.1 diff -u -r1.1.1.6 -r1.1.1.6.8.1 --- net/ipv4/ipmr.c 11 Nov 2002 04:08:51 -0000 1.1.1.6 +++ net/ipv4/ipmr.c 19 Feb 2003 13:44:54 -0000 1.1.1.6.8.1 @@ -1112,9 +1112,9 @@ struct dst_entry *dst = skb->dst; if (skb->len <= dst_pmtu(dst)) - return dst->output(skb); + return dst_output(skb); else - return ip_fragment(skb, dst->output); + return ip_fragment(skb, dst_output); } /* Index: net/ipv6/exthdrs.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/exthdrs.c,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.6.1 diff -u -r1.1.1.2 -r1.1.1.2.6.1 --- net/ipv6/exthdrs.c 9 Jan 2003 11:14:36 -0000 1.1.1.2 +++ net/ipv6/exthdrs.c 19 Feb 2003 09:44:56 -0000 1.1.1.2.6.1 @@ -288,7 +288,7 @@ dst_release(xchg(&skb->dst, NULL)); ip6_route_input(skb); if (skb->dst->error) { - skb->dst->input(skb); + dst_input(skb); return -1; } if (skb->dst->dev->flags&IFF_LOOPBACK) { @@ -302,7 +302,7 @@ goto looped_back; } - skb->dst->input(skb); + dst_input(skb); return -1; } Index: net/ipv6/ip6_input.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_input.c,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.8.2 diff -u -r1.1.1.4 -r1.1.1.4.8.2 --- net/ipv6/ip6_input.c 23 Nov 2002 11:09:43 -0000 1.1.1.4 +++ net/ipv6/ip6_input.c 19 Feb 2003 13:44:54 -0000 1.1.1.4.8.2 @@ -47,7 +47,7 @@ if (skb->dst == NULL) ip6_route_input(skb); - return skb->dst->input(skb); + return dst_input(skb); } int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt) @@ -235,7 +235,7 @@ skb2 = skb; } - dst->output(skb2); + dst_output(skb2); } } #endif Index: net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 ip6_output.c --- net/ipv6/ip6_output.c 30 Oct 2002 09:43:18 -0000 1.1.1.3 +++ net/ipv6/ip6_output.c 19 Feb 2003 13:47:12 -0000 @@ -174,7 +174,7 @@ } } #endif /* CONFIG_NETFILTER */ - return skb->dst->output(skb); + return dst_output(skb); } /* @@ -722,7 +722,7 @@ static inline int ip6_forward_finish(struct sk_buff *skb) { - return skb->dst->output(skb); + return dst_output(skb); } int ip6_forward(struct sk_buff *skb) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Wed Feb 19 13:34:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 13:34:51 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JLYj3v007104 for ; Wed, 19 Feb 2003 13:34:46 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA19657; Wed, 19 Feb 2003 13:27:44 -0800 Date: Wed, 19 Feb 2003 13:27:44 -0800 (PST) Message-Id: <20030219.132744.105180654.davem@redhat.com> To: kazunori@miyazawa.org Cc: usagi-core@linux-ipv6.org, Kazunori.Miyazawa@jp.yokogawa.com, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: (usagi-core 11926) Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <20030219233915.130a26e3.kazunori@miyazawa.org> References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.233301.98333082.davem@redhat.com> <20030219233915.130a26e3.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1737 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori MIyazawa Date: Wed, 19 Feb 2003 23:39:15 +0900 > 3) I noticed comment above transformation from > explicit dst->output() call to dst_output(). > > It is not IPSEC issue, rather I believe that entire tree should > have this conversion eventually. The concept of stackable > destination cache entries is a generic one. > Please let me understand. I think dst->output calls each dst output routine chains but those could not process the return value NET_XMIT_BYPASS returned from ah and/or esp. Is this out of scope of IPsec? Not really. Stackable destinations are a powerful concept. For example, we could reimplement IPIP processing using this. In this way, IP tunnels can become stacked destinations. Another application of stackable destinatins could be something like CIPE. Please understand what NET_XMIT_BYPASS means, which is "please continue to invoke input/output method, I have placed new dst in skb" I will apply the patch from Yoshfuji which makes the transformations. From davem@redhat.com Wed Feb 19 13:44:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 13:44:38 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JLiU3v007600 for ; Wed, 19 Feb 2003 13:44:30 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA19709; Wed, 19 Feb 2003 13:37:00 -0800 Date: Wed, 19 Feb 2003 13:36:59 -0800 (PST) Message-Id: <20030219.133659.71995717.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi@linux-ipv6.org Subject: Re: [PATCH] dst->{in,out}put() clean-up From: "David S. Miller" In-Reply-To: <20030220.022840.48274187.yoshfuji@linux-ipv6.org> References: <20030220.022840.48274187.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1738 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Thu, 20 Feb 2003 02:28:40 +0900 (JST) This patch removes dst->input() and dst->output(), and use dst_input() and dst_output() instead. Patch is against linux-2.5.62. Thank you in advance. Thank you, it is applied. I have made one tiny improvement, in ipv4/igmp.c case we can totally eliminate output_maybe_reroute() function and pass directly dst_output() to NF_HOOK. Once transformation from dst->output() to dst_output() is applied, these old inline functions for NF_HOOK no longer have purpose. Thank you again. From davem@redhat.com Wed Feb 19 13:51:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 13:51:20 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JLpG3v008074 for ; Wed, 19 Feb 2003 13:51:17 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA19744; Wed, 19 Feb 2003 13:43:45 -0800 Date: Wed, 19 Feb 2003 13:43:45 -0800 (PST) Message-Id: <20030219.134345.124058678.davem@redhat.com> To: mk@karaba.org Cc: kunihiro@ipinfusion.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.233301.98333082.davem@redhat.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1739 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Mitsuru KANDA / $B?@ED(B $B=<(B Date: Thu, 20 Feb 2003 01:56:48 +0900 I attached xfrm_user.c patch below. Thank you, I will apply this. Just FYI, the IPv6 part of this patch depends xfrm6_state_lookup(). It is ok, it exists in 2.5.x tree already. But thank you for this reminder. From kunihiro@ipinfusion.com Wed Feb 19 15:03:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 15:03:16 -0800 (PST) Received: from localhost.localdomain (mail.ipinfusion.com [65.223.109.2]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JN353v009564 for ; Wed, 19 Feb 2003 15:03:06 -0800 Received: from localhost.localdomain (vprmatrix [127.0.0.1]) by localhost.localdomain (8.12.5/8.12.5) with ESMTP id h1JNArrr021247; Thu, 20 Feb 2003 08:10:53 +0900 Date: Wed, 19 Feb 2003 15:10:53 -0800 Message-ID: <87bs17hnnm.wl@ipinfusion.com> From: Kunihiro Ishiguro To: Mitsuru KANDA / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= Cc: "David S. Miller" , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support In-Reply-To: References: <20030219134850.5f203ea7.Kazunori.Miyazawa@jp.yokogawa.com> <20030218.233301.98333082.davem@redhat.com> User-Agent: Wanderlust/2.10.0 (Venus) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.2.92 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-archive-position: 1740 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kunihiro@ipinfusion.com Precedence: bulk X-list: netdev This will not be useful other than Miyazaki/Kanda. I've applied miyazaki's patch then try to diff against local code. o xfrm6_selector_match() fix o No need of option field mutation in xfrm6_rcv(). It is moved to ah.c. o Setting Routing Header's segment_lefts to 0 is wrong. Let's let it be. o xfrm6_rcv() try to figure out he is processing AH or ESP by ipv6hdr->protocol... But when other extenstion header exits this could be wrong. Initial protocol value is passed from the caller. o Some cosmetic change. And this patch include o Not removing AH header o Mutation field provisioning But there changes are no needed. Miyazaki, would you mind to take a look into this? Have fun ;-). -- Kunihiro Ishiguro diff -ruN linux-2.5.62.orig/include/net/ipv6.h linux-2.5.62/include/net/ipv6.h --- linux-2.5.62.orig/include/net/ipv6.h 2003-02-14 15:52:28.000000000 -0800 +++ linux-2.5.62/include/net/ipv6.h 2003-02-19 13:19:38.000000000 -0800 @@ -41,7 +41,7 @@ #define NEXTHDR_MAX 255 - +#define IP6OPT_MUTABLE 0x20 #define IPV6_DEFAULT_HOPLIMIT 64 #define IPV6_DEFAULT_MCASTHOPS 1 diff -ruN linux-2.5.62.orig/include/net/xfrm.h linux-2.5.62/include/net/xfrm.h --- linux-2.5.62.orig/include/net/xfrm.h 2003-02-19 14:24:53.000000000 -0800 +++ linux-2.5.62/include/net/xfrm.h 2003-02-19 14:03:20.000000000 -0800 @@ -414,7 +414,7 @@ extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); extern int xfrm_check_selectors(struct xfrm_state **x, int n, struct flowi *fl); extern int xfrm4_rcv(struct sk_buff *skb); -extern int xfrm6_rcv(struct sk_buff *skb); +extern int xfrm6_rcv(struct sk_buff *skb, u8 proto); extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); @@ -452,11 +452,37 @@ extern struct xfrm_algo_desc *xfrm_aalg_get_byname(char *name); extern struct xfrm_algo_desc *xfrm_ealg_get_byname(char *name); +static __inline__ int addr_match(void *token1, void *token2, int prefixlen) +{ + __u32 *a1 = token1; + __u32 *a2 = token2; + int pdw; + int pbi; + + pdw = prefixlen >> 5; /* num of whole __u32 in prefix */ + pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ + + if (pdw) + if (memcmp(a1, a2, pdw << 2)) + return 0; + + if (pbi) { + __u32 mask; + + mask = htonl((0xffffffff) << (32 - pbi)); + + if ((a1[pdw] ^ a2[pdw]) & mask) + return 0; + } + + return 1; +} + static inline int xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) { - return !memcmp(fl->fl6_dst, &sel->daddr, (sel->prefixlen_d)/8) && - !memcmp(fl->fl6_src, &sel->saddr, (sel->prefixlen_s)/8) && + return !addr_match(fl->fl6_dst, &sel->daddr, sel->prefixlen_d) && + !addr_match(fl->fl6_src, &sel->saddr, sel->prefixlen_s) && !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && (fl->proto == sel->proto || !sel->proto) && diff -ruN -x '*.o' -x '*.cmd' -x '*.ko' -x '*.mod.c' linux-2.5.62.orig/net/ipv6/ah.c linux-2.5.62/net/ipv6/ah.c --- linux-2.5.62.orig/net/ipv6/ah.c 2003-02-19 14:24:53.000000000 -0800 +++ linux-2.5.62/net/ipv6/ah.c 2003-02-19 14:31:12.000000000 -0800 @@ -32,7 +32,6 @@ #define AH_HLEN_NOICV 12 -/* XXX no ipv6 ah specific */ #define NIP6(addr) \ ntohs((addr).s6_addr16[0]),\ ntohs((addr).s6_addr16[1]),\ @@ -43,6 +42,214 @@ ntohs((addr).s6_addr16[6]),\ ntohs((addr).s6_addr16[7]) +static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) +{ + u8 *opt = (u8 *)opthdr; + int len = ipv6_optlen(opthdr); + int off = 0; + int optlen = 0; + + off += 2; + len -= 2; + + while (len > 0) { + + switch (opt[off]) { + + case IPV6_TLV_PAD0: + optlen = 1; + break; + default: + if (len < 2) + goto bad; + optlen = opt[off+1]+2; + if (len < optlen) + goto bad; + if (opt[off] & 0x20) + memset(&opt[off+2], 0, opt[off+1]); + break; + } + + off += optlen; + len -= optlen; + } + if (len == 0) + return 1; + +bad: + return 0; +} + +int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + unsigned int packet_len = skb->tail - skb->nh.raw; + u8 nexthdr = skb->nh.ipv6h->nexthdr; + u8 nextnexthdr = 0; + + *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + + while (offset + 1 <= packet_len) { + + switch (nexthdr) { + + case NEXTHDR_HOP: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun hopopts\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_ROUTING: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_DEST: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_AUTH: + if (dir == XFRM_POLICY_OUT) { + memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, + (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); + } + if (exthdr->nexthdr == NEXTHDR_DEST) { + offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + nextnexthdr = exthdr->nexthdr; + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + } + return nexthdr; + default: + return nexthdr; + } + } + + return nexthdr; +} + +static int ipv6_check_mutable_options(struct sk_buff *skb, struct ipv6hdr *hdr, + struct inet6_skb_parm *opt) +{ + int lim; + u8 *optpnt; + u8 nexthdr = hdr->nexthdr; + int datalen = 0; + + optpnt = (u8*)(hdr+1); + lim = ntohs(hdr->payload_len); + + while (lim > 0) { + struct ipv6_opt_hdr *opthdr; + int hdrlen; + + opthdr = (struct ipv6_opt_hdr*)optpnt; + + switch(nexthdr) { + case NEXTHDR_HOP: + opt->hop = optpnt - skb->nh.raw; + hdrlen = ipv6_optlen(opthdr); + datalen += (hdrlen - 2); + break; + case NEXTHDR_DEST: + opt->dst1 = optpnt - skb->nh.raw; + hdrlen = ipv6_optlen(opthdr); + datalen += (hdrlen - 2); + break; + case NEXTHDR_ROUTING: + case NEXTHDR_FRAGMENT: + case NEXTHDR_NONE: + hdrlen = ipv6_optlen(opthdr); + break; + case NEXTHDR_AUTH: + hdrlen = (opthdr->hdrlen + 2) << 2; + break; + default: + goto out; + } + nexthdr = opthdr->nexthdr; + optpnt += hdrlen; + lim -= hdrlen; + } +out: + return datalen; +} + +static int ah6_set_option(u8 *opthdr, u8 **opt_data, int erase) +{ + u8 *optpnt = opthdr; + int len = ipv6_optlen((struct ipv6_opt_hdr*)opthdr); + int datalen; + int optlen; + + optpnt += 2; + len -= 2; + datalen = len; + + if (erase) { + memcpy(*opt_data, optpnt, datalen); + + while (len > 0) { + if (optpnt[0] == IPV6_TLV_PAD0) { + optlen = 1; + } else { + if (len < 2) + return -1; + optlen = optpnt[1] + 2; + if (optlen > len) + return -1; + if (optpnt[0] & IP6OPT_MUTABLE) + memset(optpnt+2, 0, optpnt[1]); + } + optpnt += optlen; + len -= optlen; + } + } else { + memcpy(optpnt, *opt_data, datalen); + } + + *opt_data += datalen; + + return 0; +} + +static inline void ah6_clear_mutable_options(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 *opt_data) +{ + if (opt->hop) + ah6_set_option(skb->nh.raw+opt->hop, &opt_data, 1); + if (opt->dst1) + ah6_set_option(skb->nh.raw+opt->dst1, &opt_data, 1); +} + +static inline void ah6_restore_mutable_options(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 *opt_data) +{ + if (opt->hop) + ah6_set_option(skb->nh.raw+opt->hop, &opt_data, 0); + if (opt->dst1) + ah6_set_option(skb->nh.raw+opt->dst1, &opt_data, 0); +} + int ah6_output(struct sk_buff *skb) { int err; @@ -50,6 +257,7 @@ struct dst_entry *dst = skb->dst; struct xfrm_state *x = dst->xfrm; struct ipv6hdr *iph = NULL; + struct ipv6hdr *top_hdr; struct ip_auth_hdr *ah; struct ah_data *ahp; u16 nh_offset = 0; @@ -66,13 +274,13 @@ if (x->props.mode) { iph = skb->nh.ipv6h; - skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); - skb->nh.ipv6h->version = 6; - skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); - skb->nh.ipv6h->nexthdr = IPPROTO_AH; - memcpy(&skb->nh.ipv6h->saddr, &x->props.saddr, sizeof(struct in6_addr)); - memcpy(&skb->nh.ipv6h->daddr, &x->id.daddr, sizeof(struct in6_addr)); - ah = (struct ip_auth_hdr*)(skb->nh.ipv6h+1); + top_hdr = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + top_hdr->version = 6; + top_hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + top_hdr->nexthdr = IPPROTO_AH; + memcpy(&top_hdr->saddr, &x->props.saddr, sizeof(struct in6_addr)); + memcpy(&top_hdr->daddr, &x->id.daddr, sizeof(struct in6_addr)); + ah = (struct ip_auth_hdr*)(top_hdr+1); ah->nexthdr = IPPROTO_IPV6; } else { hdr_len = skb->h.raw - skb->nh.raw; @@ -82,42 +290,40 @@ goto error; } memcpy(iph, skb->data, hdr_len); - skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); - memcpy(skb->nh.ipv6h, iph, hdr_len); + top_hdr = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + memcpy(top_hdr, iph, hdr_len); nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); if (nexthdr == 0) goto error; skb->nh.raw[nh_offset] = IPPROTO_AH; - skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + top_hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); ah = (struct ip_auth_hdr*)(skb->nh.raw+hdr_len); ah->nexthdr = nexthdr; } - skb->nh.ipv6h->priority = 0; - skb->nh.ipv6h->flow_lbl[0] = 0; - skb->nh.ipv6h->flow_lbl[1] = 0; - skb->nh.ipv6h->flow_lbl[2] = 0; - skb->nh.ipv6h->hop_limit = 0; + skb->nh.ipv6h = top_hdr; + top_hdr->priority = 0; + memset(top_hdr->flow_lbl, 0, 3); + top_hdr->hop_limit = 0; ahp = x->data; ah->hdrlen = (XFRM_ALIGN8(ahp->icv_trunc_len + - AH_HLEN_NOICV) >> 2) - 2; + AH_HLEN_NOICV) >> 2) - 2; + ah->reserved = 0; ah->spi = x->id.spi; ah->seq_no = htonl(++x->replay.oseq); ahp->icv(ahp, skb, ah->auth_data); if (x->props.mode) { - skb->nh.ipv6h->hop_limit = iph->hop_limit; - skb->nh.ipv6h->priority = iph->priority; - skb->nh.ipv6h->flow_lbl[0] = iph->flow_lbl[0]; - skb->nh.ipv6h->flow_lbl[1] = iph->flow_lbl[1]; - skb->nh.ipv6h->flow_lbl[2] = iph->flow_lbl[2]; + top_hdr->priority = iph->priority; + memcpy(top_hdr->flow_lbl, iph->flow_lbl, 3); + top_hdr->hop_limit = iph->hop_limit; } else { - memcpy(skb->nh.ipv6h, iph, hdr_len); + memcpy(top_hdr, iph, hdr_len); skb->nh.raw[nh_offset] = IPPROTO_AH; - skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + top_hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); kfree (iph); } @@ -139,42 +345,48 @@ int ah6_input(struct xfrm_state *x, struct sk_buff *skb) { int ah_hlen; - struct ipv6hdr *iph; + struct ipv6hdr *hdr; struct ipv6_auth_hdr *ah; struct ah_data *ahp; - unsigned char *tmp_hdr = NULL; - int hdr_len = skb->h.raw - skb->nh.raw; - u8 nexthdr = 0; + struct inet6_skb_parm opt; + char work_buf[8]; + int optlen; + u8 *opt_data = NULL; if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) goto out; ah = (struct ipv6_auth_hdr*)skb->data; - ahp = x->data; - ah_hlen = (ah->hdrlen + 2) << 2; + ah_hlen = (ah->hdrlen + 2) << 2; - if (ah_hlen != XFRM_ALIGN8(ahp->icv_full_len + AH_HLEN_NOICV) && - ah_hlen != XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV)) - goto out; + if (ah_hlen != XFRM_ALIGN8(ahp->icv_full_len + AH_HLEN_NOICV) && + ah_hlen != XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV)) + goto out; if (!pskb_may_pull(skb, (ah->hdrlen+2)<<2)) goto out; - /* We are going to _remove_ AH header to keep sockets happy, - * so... Later this can change. */ - if (skb_cloned(skb) && - pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) - goto out; - tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); - if (!tmp_hdr) - goto out; - memcpy(tmp_hdr, skb->nh.raw, hdr_len); - ah = (struct ipv6_auth_hdr*)skb->data; - iph = skb->nh.ipv6h; + hdr = skb->nh.ipv6h; + memcpy(work_buf, hdr, 8); + hdr->priority = 0; + memset(hdr->flow_lbl, 0, 3); + hdr->hop_limit = 0; + + memset(&opt, 0, sizeof(struct inet6_skb_parm)); + optlen = ipv6_check_mutable_options(skb, hdr, &opt); + if (optlen < 0) + goto out; { u8 auth_data[ahp->icv_trunc_len]; + + if (optlen) { + opt_data = kmalloc(optlen, GFP_ATOMIC); + if (!opt_data) + goto out; + ah6_clear_mutable_options(skb, &opt, opt_data); + } memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len); memset(ah->auth_data, 0, ahp->icv_trunc_len); skb_push(skb, skb->data - skb->nh.raw); @@ -185,22 +397,19 @@ x->stats.integrity_failed++; goto free_out; } + if (optlen) { + ah6_restore_mutable_options(skb, &opt, opt_data); + kfree (opt_data); + } } + memcpy(hdr, work_buf, 8); + skb->h.raw += (ah->hdrlen+2)<<2; + skb->data = skb->h.raw; - nexthdr = ah->nexthdr; - skb->nh.raw = skb_pull(skb, (ah->hdrlen+2)<<2); - memcpy(skb->nh.raw, tmp_hdr, hdr_len); - skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); - skb_pull(skb, hdr_len); - skb->h.raw = skb->data; - - - kfree(tmp_hdr); - - return nexthdr; - + return ah->nexthdr; free_out: - kfree(tmp_hdr); + if (optlen) + kfree(opt_data); out: return -EINVAL; } @@ -208,22 +417,19 @@ void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, int type, int code, int offset, __u32 info) { - struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ipv6hdr *hdr = (struct ipv6hdr*)skb->data; struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset); struct xfrm_state *x; - if (type != ICMPV6_DEST_UNREACH || - type != ICMPV6_PKT_TOOBIG) + if (type != ICMPV6_DEST_UNREACH || type != ICMPV6_PKT_TOOBIG) return; - x = xfrm6_state_lookup(&iph->daddr, ah->spi, IPPROTO_AH); + x = xfrm6_state_lookup(&hdr->daddr, ah->spi, IPPROTO_AH); if (!x) return; - printk(KERN_DEBUG "pmtu discvovery on SA AH/%08x/" "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", - ntohl(ah->spi), NIP6(iph->daddr)); - + ntohl(ah->spi), NIP6(hdr->daddr)); xfrm_state_put(x); } @@ -315,26 +521,29 @@ .output = ah6_output }; +static inline int +xfrm6_ah_rcv(struct sk_buff *skb) +{ + return xfrm6_rcv(skb, IPPROTO_AH); +} + static struct inet6_protocol ah6_protocol = { - .handler = xfrm6_rcv, + .handler = xfrm6_ah_rcv, .err_handler = ah6_err, }; int __init ah6_init(void) { SET_MODULE_OWNER(&ah6_type); - if (xfrm6_register_type(&ah6_type) < 0) { printk(KERN_INFO "ipv6 ah init: can't add xfrm type\n"); return -EAGAIN; } - if (inet6_add_protocol(&ah6_protocol, IPPROTO_AH) < 0) { printk(KERN_INFO "ipv6 ah init: can't add protocol\n"); xfrm6_unregister_type(&ah6_type); return -EAGAIN; } - return 0; } @@ -342,10 +551,8 @@ { if (inet6_del_protocol(&ah6_protocol, IPPROTO_AH) < 0) printk(KERN_INFO "ipv6 ah close: can't remove protocol\n"); - if (xfrm6_unregister_type(&ah6_type) < 0) printk(KERN_INFO "ipv6 ah close: can't remove xfrm type\n"); - } module_init(ah6_init); diff -ruN -x '*.o' -x '*.cmd' -x '*.ko' -x '*.mod.c' linux-2.5.62.orig/net/ipv6/esp.c linux-2.5.62/net/ipv6/esp.c --- linux-2.5.62.orig/net/ipv6/esp.c 2003-02-19 14:24:53.000000000 -0800 +++ linux-2.5.62/net/ipv6/esp.c 2003-02-19 14:20:43.000000000 -0800 @@ -35,10 +35,6 @@ #include #define MAX_SG_ONSTACK 4 -#if 0 -typedef void (icv_update_fn_t)(struct crypto_tfm *, - struct scatterlist *, unsigned int); -#endif /* XXX no ipv6 esp specific */ #define NIP6(addr) \ @@ -545,8 +541,14 @@ .output = esp6_output }; +static inline int +xfrm6_esp_rcv(struct sk_buff *skb) +{ + return xfrm6_rcv(skb, IPPROTO_ESP); +} + static struct inet6_protocol esp6_protocol = { - .handler = xfrm6_rcv, + .handler = xfrm6_esp_rcv, .err_handler = esp6_err, }; diff -ruN -x '*.o' -x '*.cmd' -x '*.ko' -x '*.mod.c' linux-2.5.62.orig/net/ipv6/xfrm_input.c linux-2.5.62/net/ipv6/xfrm_input.c --- linux-2.5.62.orig/net/ipv6/xfrm_input.c 2003-02-19 14:24:53.000000000 -0800 +++ linux-2.5.62/net/ipv6/xfrm_input.c 2003-02-19 14:06:49.000000000 -0800 @@ -30,11 +30,11 @@ /* Fetch spi and seq frpm ipsec header */ -static int xfrm6_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq) +static int xfrm6_parse_spi(struct sk_buff *skb, u8 proto, u32 *spi, u32 *seq) { int offset, offset_seq; - switch (nexthdr) { + switch (proto) { case IPPROTO_AH: offset = offsetof(struct ip_auth_hdr, spi); offset_seq = offsetof(struct ip_auth_hdr, seq_no); @@ -61,115 +61,7 @@ return 0; } -static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) -{ - u8 *opt = (u8 *)opthdr; - int len = ipv6_optlen(opthdr); - int off = 0; - int optlen = 0; - - off += 2; - len -= 2; - - while (len > 0) { - - switch (opt[off]) { - - case IPV6_TLV_PAD0: - optlen = 1; - break; - default: - if (len < 2) - goto bad; - optlen = opt[off+1]+2; - if (len < optlen) - goto bad; - if (opt[off] & 0x20) - memset(&opt[off+2], 0, opt[off+1]); - break; - } - - off += optlen; - len -= optlen; - } - if (len == 0) - return 1; - -bad: - return 0; -} - -int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) -{ - u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - unsigned int packet_len = skb->tail - skb->nh.raw; - u8 nexthdr = skb->nh.ipv6h->nexthdr; - u8 nextnexthdr = 0; - - *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; - - while (offset + 1 <= packet_len) { - - switch (nexthdr) { - - case NEXTHDR_HOP: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun hopopts\n"); - return 0; - } - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_ROUTING: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_DEST: - *nh_offset = offset; - offset += ipv6_optlen(exthdr); - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun destopt\n"); - return 0; - } - nexthdr = exthdr->nexthdr; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - break; - - case NEXTHDR_AUTH: - if (dir == XFRM_POLICY_OUT) { - memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, - (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); - } - if (exthdr->nexthdr == NEXTHDR_DEST) { - offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; - exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); - nextnexthdr = exthdr->nexthdr; - if (!zero_out_mutable_opts(exthdr)) { - if (net_ratelimit()) - printk(KERN_WARNING "overrun destopt\n"); - return 0; - } - } - return nexthdr; - default: - return nexthdr; - } - } - - return nexthdr; -} - -int xfrm6_rcv(struct sk_buff *skb) +int xfrm6_rcv(struct sk_buff *skb, u8 proto) { int err; u32 spi, seq; @@ -177,32 +69,10 @@ struct xfrm_state *x; int xfrm_nr = 0; int decaps = 0; - struct ipv6hdr *hdr = skb->nh.ipv6h; - unsigned char *tmp_hdr = NULL; - int hdr_len = 0; u16 nh_offset = 0; u8 nexthdr = 0; - if (hdr->nexthdr == IPPROTO_AH || hdr->nexthdr == IPPROTO_ESP) { - nh_offset = ((unsigned char*)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; - hdr_len = sizeof(struct ipv6hdr); - } else { - hdr_len = skb->h.raw - skb->nh.raw; - } - - tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); - if (!tmp_hdr) - goto drop; - memcpy(tmp_hdr, skb->nh.raw, hdr_len); - - nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); - hdr->priority = 0; - hdr->flow_lbl[0] = 0; - hdr->flow_lbl[1] = 0; - hdr->flow_lbl[2] = 0; - hdr->hop_limit = 0; - - if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) != 0) + if ((err = xfrm6_parse_spi(skb, proto, &spi, &seq)) != 0) goto drop; do { @@ -211,9 +81,10 @@ if (xfrm_nr == XFRM_MAX_DEPTH) goto drop; - x = xfrm6_state_lookup(&iph->daddr, spi, nexthdr); + x = xfrm6_state_lookup(&iph->daddr, spi, proto); if (x == NULL) goto drop; + spin_lock(&x->lock); if (unlikely(x->km.state != XFRM_STATE_VALID)) goto drop_unlock; @@ -221,8 +92,8 @@ if (x->props.replay_window && xfrm_replay_check(x, seq)) goto drop_unlock; - nexthdr = x->type->input(x, skb); - if (nexthdr <= 0) + proto = x->type->input(x, skb); + if (proto <= 0) goto drop_unlock; if (x->props.replay_window) @@ -237,13 +108,13 @@ iph = skb->nh.ipv6h; /* ??? */ - if (nexthdr == NEXTHDR_DEST) { + if (proto == NEXTHDR_DEST) { if (!pskb_may_pull(skb, (skb->h.raw-skb->data)+8) || !pskb_may_pull(skb, (skb->h.raw-skb->data)+((skb->h.raw[1]+1)<<3))) { err = -EINVAL; goto drop; } - nexthdr = skb->h.raw[0]; + proto = skb->h.raw[0]; nh_offset = skb->h.raw - skb->nh.raw; skb_pull(skb, (skb->h.raw[1]+1)<<3); skb->h.raw = skb->data; @@ -258,14 +129,10 @@ break; } - if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) < 0) + if ((err = xfrm6_parse_spi(skb, proto, &spi, &seq)) < 0) goto drop; } while (!err); - memcpy(skb->nh.raw, tmp_hdr, hdr_len); - skb->nh.raw[nh_offset] = nexthdr; - skb->nh.ipv6h->payload_len = htons(hdr_len + skb->len - sizeof(struct ipv6hdr)); - /* Allocate new secpath or COW existing one. */ if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { struct sec_path *sp; @@ -295,14 +162,13 @@ netif_rx(skb); return 0; } else { - return -nexthdr; + return -proto; } drop_unlock: spin_unlock(&x->lock); xfrm_state_put(x); drop: - if (tmp_hdr) kfree(tmp_hdr); while (--xfrm_nr >= 0) xfrm_state_put(xfrm_vec[xfrm_nr]); kfree_skb(skb); diff -ruN -x '*.o' -x '*.cmd' -x '*.ko' -x '*.mod.c' linux-2.5.62.orig/net/ipv6/xfrm_policy.c linux-2.5.62/net/ipv6/xfrm_policy.c --- linux-2.5.62.orig/net/ipv6/xfrm_policy.c 2003-02-19 14:24:53.000000000 -0800 +++ linux-2.5.62/net/ipv6/xfrm_policy.c 2003-02-19 02:50:41.000000000 -0800 @@ -229,7 +229,7 @@ read_lock_bh(&xfrm_policy_lock); for (pol = xfrm_policy_list[dir]; pol; pol = pol->next) { struct xfrm_selector *sel = &pol->selector; - if (pol->family != AF_INET6) continue); + if (pol->family != AF_INET6) continue; if (xfrm6_selector_match(sel, fl)) { atomic_inc(&pol->refcnt); break; From jgrimm2@us.ibm.com Wed Feb 19 15:31:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 15:31:11 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1JNV73v010597 for ; Wed, 19 Feb 2003 15:31:08 -0800 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e2.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1JNdkl3066170; Wed, 19 Feb 2003 18:39:46 -0500 Received: from us.ibm.com (touki.austin.ibm.com [9.41.94.47]) by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1JNdhIc079424; Wed, 19 Feb 2003 18:39:43 -0500 Message-ID: <3E54128C.327D7759@us.ibm.com> Date: Wed, 19 Feb 2003 17:26:04 -0600 From: Jon Grimm X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.62 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bruce Allan CC: davem@redhat.com, lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: [PATCH] subset of RFC2553 References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1741 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgrimm2@us.ibm.com Precedence: bulk X-list: netdev Bruce Allan wrote: > > How about this instead (a combination of your comment above and glibc's > definition of sockaddr_storage): > #define _SS_MAXSIZE 128 > #define _ALIGNSIZE (sizeof(struct sockaddr *)) > #if ULONG_MAX > 0xffffffff > #define __ss_aligntype __u64 > #else > #define __ss_aligntype __u32 > #endif > struct sockaddr_storage { > sa_family_t ss_family; > __ss_aligntype __data[(_SS_MAXSIZE/sizeof(__ss_aligntype))-1]; > } __attribute__ ((aligned(_ALIGNSIZE))); > Hmmm... this seemed to generate a 124-byte struct instead of the stated intent of 128. Maybe instead: #define _SS_MAXSIZE 128 #if ULONG_MAX > 0xffffffff #define __ss_aligntype __u64 #else #define __ss_aligntype __u32 #endif #define _ALIGNSIZE (sizeof(__ss_aligntype)) struct sockaddr_storage { sa_family_t ss_family; __ss_aligntype __data[_SS_MAXSIZE/_ALIGNSIZE-1] __attribute__ ((aligned(_ALIGNSIZE))); } __attribute ((aligned(_ALIGNSIZE))); Align the struct on _ALIGNSIZE; align _data on _ALIGNSIZE to to generate padding between ss_family and __data. Best Regards, jon From sri@us.ibm.com Wed Feb 19 16:20:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 16:20:51 -0800 (PST) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1K0Kl3v011757 for ; Wed, 19 Feb 2003 16:20:48 -0800 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e1.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1K0TQEs145150; Wed, 19 Feb 2003 19:29:26 -0500 Received: from dyn9-47-18-140.beaverton.ibm.com (dyn9-47-18-140.beaverton.ibm.com [9.47.18.140]) by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1K0TMId192612; Wed, 19 Feb 2003 19:29:22 -0500 Date: Wed, 19 Feb 2003 16:14:48 -0800 (PST) From: Sridhar Samudrala X-X-Sender: sridhar@dyn9-47-18-140.beaverton.ibm.com To: davem@redhat.com, cc: netdev@oss.sgi.com, Subject: [PATCH] Fix to avoid overriding TCP/UDP with a new protocol of same type. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1742 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sri@us.ibm.com Precedence: bulk X-list: netdev Dave, Alexey, I think i found a bug in inet_register_protosw() which results in a behavior that is not expected. Registering a new protocol of type SOCK_STREAM with a protocol value other than IPPROTO_TCP will override TCP if the application passes 0 as the protocol to the socket() call. socket(AF_INET, SOCK_STREAM, 0) I guess many applications follow this syntax as they assume TCP is the default protocol for SOCK_STREAM type. The same holds true for SOCK_DGRAM type sockets assuing UDP as the default. This is due to the insertion of a new inet_protosw entry into the inetsw list of a particular type at the head of the list. inet_create() uses the first entry in the list if a wild-card protocol is passed. The following patch fixes the insertion of a new entry so that it is added after the last permanent entry in the list. This makes sure that the new entries do not override any existing permanent entries. I found this problem when i registered SOCK_STREAM/IPPROTO_SCTP to support tcp-style SCTP sockets and surprisingly noticed 'ssh' using SCTP as the transport protocol. Thanks Sridhar Patch against linux 2.5.62 ------------------------------------------------------------------------------- diff -Nru a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c --- a/net/ipv4/af_inet.c Wed Feb 19 15:37:34 2003 +++ b/net/ipv4/af_inet.c Wed Feb 19 15:37:34 2003 @@ -976,6 +976,7 @@ struct list_head *lh; struct inet_protosw *answer; int protocol = p->protocol; + struct list_head *last_perm; br_write_lock_bh(BR_NETPROTO_LOCK); @@ -984,24 +985,29 @@ /* If we are trying to override a permanent protocol, bail. */ answer = NULL; + last_perm = &inetsw[p->type]; list_for_each(lh, &inetsw[p->type]) { answer = list_entry(lh, struct inet_protosw, list); /* Check only the non-wild match. */ - if (protocol == answer->protocol && - (INET_PROTOSW_PERMANENT & answer->flags)) - break; + if (INET_PROTOSW_PERMANENT & answer->flags) { + if (protocol == answer->protocol) + break; + last_perm = lh; + } answer = NULL; } if (answer) goto out_permanent; - /* Add to the BEGINNING so that we override any existing - * entry. This means that when we remove this entry, the + /* Add the new entry after the last permanent entry if any, so that + * the new entry does not override a permanent entry when matched with + * a wild-card protocol. But it is allowed to override any existing + * non-permanent entry. This means that when we remove this entry, the * system automatically returns to the old behavior. */ - list_add(&p->list, &inetsw[p->type]); + list_add(&p->list, last_perm); out: br_write_unlock_bh(BR_NETPROTO_LOCK); return; diff -Nru a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c --- a/net/ipv6/af_inet6.c Wed Feb 19 15:37:34 2003 +++ b/net/ipv6/af_inet6.c Wed Feb 19 15:37:34 2003 @@ -571,6 +571,7 @@ struct list_head *lh; struct inet_protosw *answer; int protocol = p->protocol; + struct list_head *last_perm; br_write_lock_bh(BR_NETPROTO_LOCK); @@ -579,24 +580,29 @@ /* If we are trying to override a permanent protocol, bail. */ answer = NULL; + last_perm = &inetsw6[p->type]; list_for_each(lh, &inetsw6[p->type]) { answer = list_entry(lh, struct inet_protosw, list); /* Check only the non-wild match. */ - if (protocol == answer->protocol && - (INET_PROTOSW_PERMANENT & answer->flags)) - break; + if (INET_PROTOSW_PERMANENT & answer->flags) { + if (protocol == answer->protocol) + break; + last_perm = lh; + } answer = NULL; } if (answer) goto out_permanent; - /* Add to the BEGINNING so that we override any existing - * entry. This means that when we remove this entry, the + /* Add the new entry after the last permanent entry if any, so that + * the new entry does not override a permanent entry when matched with + * a wild-card protocol. But it is allowed to override any existing + * non-permanent entry. This means that when we remove this entry, the * system automatically returns to the old behavior. */ - list_add(&p->list, &inetsw6[p->type]); + list_add(&p->list, last_perm); out: br_write_unlock_bh(BR_NETPROTO_LOCK); return; From davem@redhat.com Wed Feb 19 16:24:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 16:24:46 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1K0Og3v012186 for ; Wed, 19 Feb 2003 16:24:43 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA20124; Wed, 19 Feb 2003 16:17:01 -0800 Date: Wed, 19 Feb 2003 16:17:01 -0800 (PST) Message-Id: <20030219.161701.122322786.davem@redhat.com> To: sri@us.ibm.com Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: [PATCH] Fix to avoid overriding TCP/UDP with a new protocol of same type. From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1743 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Sridhar Samudrala Date: Wed, 19 Feb 2003 16:14:48 -0800 (PST) I think i found a bug in inet_register_protosw() which results in a behavior that is not expected. Thanks Sridhar, this looks like a good fix. From jgrimm2@us.ibm.com Wed Feb 19 16:28:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 16:28:12 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1K0S53v012603 for ; Wed, 19 Feb 2003 16:28:06 -0800 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e2.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1K0ail3087682; Wed, 19 Feb 2003 19:36:44 -0500 Received: from us.ibm.com (touki.austin.ibm.com [9.41.94.47]) by northrelay01.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1K0aeFa090804; Wed, 19 Feb 2003 19:36:41 -0500 Message-ID: <3E541FEA.C34BEEB7@us.ibm.com> Date: Wed, 19 Feb 2003 18:23:06 -0600 From: Jon Grimm X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.62 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bruce Allan , davem@redhat.com, lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: [PATCH] subset of RFC2553 References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> <3E54128C.327D7759@us.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1744 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgrimm2@us.ibm.com Precedence: bulk X-list: netdev Or if you don't care about the alignment of the __data field at all: #define _SS_MAXSIZE 128 #if ULONG_MAX > 0xffffffff #define _ALIGNSIZE ((sizeof(__u64))) #else #define _ALIGNSIZE ((sizeof(__u32))) #endif struct sockaddr_storage { sa_family_t ss_family; char __data[_SS_MAXSIZE-sizeof(sa_family_t)*2 + _ALIGNSIZE]; } __attribute ((aligned(_ALIGNSIZE))); jon Jon Grimm wrote: > > Bruce Allan wrote: > > > > How about this instead (a combination of your comment above and glibc's > > definition of sockaddr_storage): > > #define _SS_MAXSIZE 128 > > #define _ALIGNSIZE (sizeof(struct sockaddr *)) > > #if ULONG_MAX > 0xffffffff > > #define __ss_aligntype __u64 > > #else > > #define __ss_aligntype __u32 > > #endif > > struct sockaddr_storage { > > sa_family_t ss_family; > > __ss_aligntype __data[(_SS_MAXSIZE/sizeof(__ss_aligntype))-1]; > > } __attribute__ ((aligned(_ALIGNSIZE))); > > > > Hmmm... this seemed to generate a 124-byte struct instead of the stated > intent of 128. > > Maybe instead: > > #define _SS_MAXSIZE 128 > #if ULONG_MAX > 0xffffffff > #define __ss_aligntype __u64 > #else > #define __ss_aligntype __u32 > #endif > #define _ALIGNSIZE (sizeof(__ss_aligntype)) > struct sockaddr_storage { > sa_family_t ss_family; > __ss_aligntype __data[_SS_MAXSIZE/_ALIGNSIZE-1] __attribute__ > ((aligned(_ALIGNSIZE))); > } __attribute ((aligned(_ALIGNSIZE))); > > Align the struct on _ALIGNSIZE; align _data on _ALIGNSIZE to to generate > padding between ss_family and __data. > > Best Regards, > jon > > ------------------------------------------------------- > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. > The most comprehensive and flexible code editor you can use. > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. > www.slickedit.com/sourceforge > _______________________________________________ > Lksctp-developers mailing list > Lksctp-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/lksctp-developers From davem@redhat.com Wed Feb 19 16:28:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 16:28:44 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1K0Sf3v012849 for ; Wed, 19 Feb 2003 16:28:41 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA20153; Wed, 19 Feb 2003 16:21:30 -0800 Date: Wed, 19 Feb 2003 16:21:29 -0800 (PST) Message-Id: <20030219.162129.11584427.davem@redhat.com> To: jgrimm2@us.ibm.com Cc: bwa@us.ibm.com, lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: [PATCH] subset of RFC2553 From: "David S. Miller" In-Reply-To: <3E54128C.327D7759@us.ibm.com> References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> <3E54128C.327D7759@us.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1745 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Jon Grimm Date: Wed, 19 Feb 2003 17:26:04 -0600 Hmmm... this seemed to generate a 124-byte struct instead of the stated intent of 128. Once this is all resolved, can you guys make sure to send me a new patch? Thanks. From davem@redhat.com Wed Feb 19 16:45:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 19 Feb 2003 16:45:28 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1K0jL3v013581 for ; Wed, 19 Feb 2003 16:45:21 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id QAA20218; Wed, 19 Feb 2003 16:37:49 -0800 Date: Wed, 19 Feb 2003 16:37:48 -0800 (PST) Message-Id: <20030219.163748.123408970.davem@redhat.com> To: kunihiro@ipinfusion.com Cc: mk@karaba.org, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPsec support From: "David S. Miller" In-Reply-To: <87bs17hnnm.wl@ipinfusion.com> References: <20030218.233301.98333082.davem@redhat.com> <87bs17hnnm.wl@ipinfusion.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1746 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kunihiro Ishiguro Date: Wed, 19 Feb 2003 15:10:53 -0800 But there changes are no needed. Miyazaki, would you mind to take a look into this? Have fun ;-). Hello, I do not comment on your patch but on related issue which must be remembered. ipv6/xfrm_policy.c will go away, and we will move this code into ipv4/xfrm_policy.c inside of an ipv4 ifdef protected area. I will be taking all the ipv6 ipsec work I have merged and push it to Linus right now. Once he takes this, we may work from common source base and begin to merge in new work. From ipv6_san@rediffmail.com Thu Feb 20 02:38:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 02:38:53 -0800 (PST) Received: from rediffmail.com (webmail32.rediffmail.com [203.199.83.32] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KAch3v028368 for ; Thu, 20 Feb 2003 02:38:44 -0800 Received: (qmail 836 invoked by uid 510); 20 Feb 2003 10:47:19 -0000 Date: 20 Feb 2003 10:47:19 -0000 Message-ID: <20030220104719.835.qmail@webmail32.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 20 feb 2003 10:47:19 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: netdev@oss.sgi.com Cc: ipv6_san@rediffmail.com Subject: connecting ipv6 nodes throu' ipv4 network .... Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1747 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Hi, I have two Linux nodes( hosts ) enabled with IPv6. IPv6 addr of machine-1 is 3ff3:1234::1 IPv6 addr of machine-2 is 3ff3:1234::2 My LAN is based on IPv4. To make these IPv6-hosts to communicate on the LAN, what configuration do i need to make ??? I tried connecting only these two machines using a cross cable and sending IPv6 packets. But it failed. Is it correct to do so ??? thanx in advance. -San ------------------------------------------------------ From yoshfuji@wide.ad.jp Thu Feb 20 03:04:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 03:04:40 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KB4a3v031241 for ; Thu, 20 Feb 2003 03:04:37 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1KBDLjF002838; Thu, 20 Feb 2003 20:13:21 +0900 Date: Thu, 20 Feb 2003 20:13:20 +0900 (JST) Message-Id: <20030220.201320.73408217.yoshfuji@wide.ad.jp> To: ipv6_san@rediffmail.com Cc: netdev@oss.sgi.com Subject: Re: connecting ipv6 nodes throu' ipv4 network .... From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030220104719.835.qmail@webmail32.rediffmail.com> References: <20030220104719.835.qmail@webmail32.rediffmail.com> X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1748 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@wide.ad.jp Precedence: bulk X-list: netdev In article <20030220104719.835.qmail@webmail32.rediffmail.com> (at 20 Feb 2003 10:47:19 -0000), "santosh kumar gowda" says: > I tried connecting only these two machines using a cross cable and > sending IPv6 packets. But it failed. Is it correct to do so ??? Please show your exact configuration / command / output, etc. What do you mean by "it failed"? Any messages? -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From ipv6_san@rediffmail.com Thu Feb 20 03:38:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 03:38:22 -0800 (PST) Received: from rediffmail.com (webmail29.rediffmail.com [203.199.83.39] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KBc73v031845 for ; Thu, 20 Feb 2003 03:38:08 -0800 Received: (qmail 21898 invoked by uid 510); 20 Feb 2003 11:46:36 -0000 Date: 20 Feb 2003 11:46:36 -0000 Message-ID: <20030220114636.21895.qmail@webmail29.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 20 feb 2003 11:46:36 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "YOSHIFUJI Hideaki / $B5HF#1QL@\(B" Cc: netdev@oss.sgi.com Subject: Re: Re: connecting ipv6 nodes throu' ipv4 network .... Content-type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h1KBc73v031845 X-archive-position: 1749 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev My configuration..... For Machine-1 # ip -6 addr 1: lo: mtu 16436 qdisc noqueue inet6 ::1/128 scope host 8: epmac1: mtu 1500 qdisc pfifo_fast qlen 100 inet6 3ffe:1234::2/128 scope global inet6 fe80::a00:6ff:fe2b:9654/10 scope link Machine-2 #ip -6 addr 1: lo: mtu 16436 qdisc noqueue inet6 ::1/128 scope host 2: eth0: mtu 1500 qdisc pfifo_fast qlen 100 inet6 3ffe:1234::1/128 scope global inet6 fe80::2b0:d0ff:fed2:65ff/10 scope link I connected both the machines (directly) using cross cable and using sendip utility(ipv6 packet generator) usage: sendip -p ipv6 -p tcp destaddr i sent few packets with ipv6. I used ethereal to capture the data(in promiscous mode). Here source=3ffe:1234::1 and dest=3ffe:1234::2. when i send an ipv6 packet using sendip, the source sends ICMPv6 Neighbor solicitation, but i dont see a reply from the dest,which is expected. Is the method correct ?? Or other alternative method ?? Advice pls. Regds, -San --------------------------------------------- On Thu, 20 Feb 2003 YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote : >In article <20030220104719.835.qmail@webmail32.rediffmail.com> (at 20 Feb 2003 10:47:19 -0000), "santosh kumar gowda" says: > > > I tried connecting only these two machines using a cross cable and > > sending IPv6 packets. But it failed. Is it correct to do so ??? > >Please show your exact configuration / command / output, etc. >What do you mean by "it failed"? >Any messages? > >-- >Hideaki YOSHIFUJI @ USAGI Project >GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA > > From yoshfuji@linux-ipv6.org Thu Feb 20 03:44:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 03:44:27 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KBiG3v032280 for ; Thu, 20 Feb 2003 03:44:17 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1KBr3jF003033; Thu, 20 Feb 2003 20:53:03 +0900 Date: Thu, 20 Feb 2003 20:53:03 +0900 (JST) Message-Id: <20030220.205303.112795862.yoshfuji@linux-ipv6.org> To: ipv6_san@rediffmail.com Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: connecting ipv6 nodes throu' ipv4 network .... From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030220114636.21895.qmail@webmail29.rediffmail.com> References: <20030220114636.21895.qmail@webmail29.rediffmail.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1750 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030220114636.21895.qmail@webmail29.rediffmail.com> (at 20 Feb 2003 11:46:36 -0000), "santosh kumar gowda" says: > 8: epmac1: mtu 1500 qdisc pfifo_fast qlen 100 > inet6 3ffe:1234::2/128 scope global : > inet6 3ffe:1234::1/128 scope global use /64 instead. --yoshfuji From solt@dns.toxicfilms.tv Thu Feb 20 03:44:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 03:44:31 -0800 (PST) Received: from dns.toxicfilms.tv (dns.toxicfilms.tv [150.254.37.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KBiG3v032279 for ; Thu, 20 Feb 2003 03:44:28 -0800 Received: by dns.toxicfilms.tv (Postfix, from userid 1000) id 5D4CD1CD17; Thu, 20 Feb 2003 12:52:55 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by dns.toxicfilms.tv (Postfix) with ESMTP id BE41D8024B; Thu, 20 Feb 2003 12:52:55 +0100 (CET) Date: Thu, 20 Feb 2003 12:52:55 +0100 (CET) From: Maciej Soltysiak To: santosh kumar gowda Cc: netdev@oss.sgi.com Subject: Re: connecting ipv6 nodes throu' ipv4 network .... In-Reply-To: <20030220104719.835.qmail@webmail32.rediffmail.com> Message-ID: References: <20030220104719.835.qmail@webmail32.rediffmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1751 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: solt@dns.toxicfilms.tv Precedence: bulk X-list: netdev > IPv6 addr of machine-1 is 3ff3:1234::1 > IPv6 addr of machine-2 is 3ff3:1234::2 > My LAN is based on IPv4. > To make these IPv6-hosts to communicate on the LAN, what > configuration do i need to make ??? Well, as in ipv4 the hosts need an address and routing: machine-1: # ip addr add 3ff3:1234::1 dev eth0 # ip route add 3ffe:1234/mask dev eth0 machine-2: # ip addr add 3ff3:1234::1 dev eth0 # ip route add 3ff3:1234/mask dev eth0 If you have a route to the ip6 world, then add to them: # ip route add 2001::/3 via (your:ip6:router:address) dev eth0 > I tried connecting only these two machines using a cross cable and > sending IPv6 packets. But it failed. Is it correct to do so ??? Should work, maybe the cables are bad? Regards, Maciej From ipv6_san@rediffmail.com Thu Feb 20 04:07:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 04:07:22 -0800 (PST) Received: from rediffmail.com (webmail28.rediffmail.com [203.199.83.38] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KC7C3v002146 for ; Thu, 20 Feb 2003 04:07:13 -0800 Received: (qmail 27111 invoked by uid 510); 20 Feb 2003 12:15:46 -0000 Date: 20 Feb 2003 12:15:46 -0000 Message-ID: <20030220121546.27110.qmail@webmail28.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 20 feb 2003 12:15:46 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "Maciej Soltysiak" Cc: netdev@oss.sgi.com Subject: Re: Re: connecting ipv6 nodes throu' ipv4 network .... Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1752 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev when i connect my two IPv6 hosts using a cross cable, i'm unable to ping and the ethernet link seems to be down. Also, one machine has bridged-ethernet. Is it correct to do so ?? pls help. -san ----------------------------- On Thu, 20 Feb 2003 Maciej Soltysiak wrote : > > IPv6 addr of machine-1 is 3ff3:1234::1 > > IPv6 addr of machine-2 is 3ff3:1234::2 > > My LAN is based on IPv4. > > To make these IPv6-hosts to communicate on the LAN, what > > configuration do i need to make ??? > >Well, as in ipv4 the hosts need an address and routing: >machine-1: ># ip addr add 3ff3:1234::1 dev eth0 ># ip route add 3ffe:1234/mask dev eth0 > >machine-2: ># ip addr add 3ff3:1234::1 dev eth0 ># ip route add 3ff3:1234/mask dev eth0 > >If you have a route to the ip6 world, then add to them: ># ip route add 2001::/3 via (your:ip6:router:address) dev eth0 > > > I tried connecting only these two machines using a cross cable >and > > sending IPv6 packets. But it failed. Is it correct to do so >??? >Should work, maybe the cables are bad? > >Regards, >Maciej > From ipv6_san@rediffmail.com Thu Feb 20 04:11:47 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 04:11:53 -0800 (PST) Received: from rediffmail.com (webmail28.rediffmail.com [203.199.83.38] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KCBg3v002605 for ; Thu, 20 Feb 2003 04:11:45 -0800 Received: (qmail 22269 invoked by uid 510); 20 Feb 2003 12:13:35 -0000 Date: 20 Feb 2003 12:13:35 -0000 Message-ID: <20030220121335.22268.qmail@webmail28.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 20 feb 2003 12:13:35 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "YOSHIFUJI Hideaki / $B5HF#1QL@\(B" Cc: netdev@oss.sgi.com Subject: Re: Re: connecting ipv6 nodes throu' ipv4 network .... Content-type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h1KCBg3v002605 X-archive-position: 1753 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev as iam in a IPv4 LAN, do i need to setup a tunnel for these IPv6 hosts to communicate ?? any configuration required ?? -San ------------------------------ On Thu, 20 Feb 2003 YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote : >In article <20030220114636.21895.qmail@webmail29.rediffmail.com> (at 20 Feb 2003 11:46:36 -0000), "santosh kumar gowda" says: > > > 8: epmac1: mtu 1500 qdisc pfifo_fast qlen 100 > > inet6 3ffe:1234::2/128 scope global >: > > inet6 3ffe:1234::1/128 scope global > >use /64 instead. > >--yoshfuji > > From solt@dns.toxicfilms.tv Thu Feb 20 04:20:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 04:20:18 -0800 (PST) Received: from dns.toxicfilms.tv (dns.toxicfilms.tv [150.254.37.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KCKD3v003065 for ; Thu, 20 Feb 2003 04:20:15 -0800 Received: by dns.toxicfilms.tv (Postfix, from userid 1000) id E2A101CD18; Thu, 20 Feb 2003 13:28:59 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by dns.toxicfilms.tv (Postfix) with ESMTP id C08208024B; Thu, 20 Feb 2003 13:28:59 +0100 (CET) Date: Thu, 20 Feb 2003 13:28:59 +0100 (CET) From: Maciej Soltysiak To: santosh kumar gowda Cc: =?iso-8859-2?Q?YOSHIFUJI_Hideaki_=2F_=1B$B5HF#1QL=40=1B=28B?= , netdev@oss.sgi.com Subject: Re: Re: connecting ipv6 nodes throu' ipv4 network .... In-Reply-To: <20030220121335.22268.qmail@webmail28.rediffmail.com> Message-ID: References: <20030220121335.22268.qmail@webmail28.rediffmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1754 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: solt@dns.toxicfilms.tv Precedence: bulk X-list: netdev > as iam in a IPv4 LAN, do i need to setup a tunnel for these IPv6 hosts > to communicate ?? No. > any configuration required ?? Just the basic: address and route to the network > -San Maciej From hadi@cyberus.ca Thu Feb 20 04:30:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 04:30:59 -0800 (PST) Received: from mx03.cyberus.ca (mx03.cyberus.ca [216.191.240.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KCUq3v003526 for ; Thu, 20 Feb 2003 04:30:53 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx03.cyberus.ca with esmtp (Exim 4.10) id 18lpzH-000HY2-00; Thu, 20 Feb 2003 07:39:39 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1KCdKYO029602; Thu, 20 Feb 2003 07:39:20 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1KCdJtq029599; Thu, 20 Feb 2003 07:39:19 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Thu, 20 Feb 2003 07:39:19 -0500 (EST) From: jamal To: Simon Kirby cc: netdev@oss.sgi.com Subject: Re: Longstanding networking / SMP issue? (duplextest) In-Reply-To: <20030220020527.GA12748@netnation.com> Message-ID: <20030220073513.E28230@shell.cyberus.ca> References: <20030219203342.P28230@shell.cyberus.ca> <20030220020527.GA12748@netnation.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1755 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev I got you. I think its a clever idea. You have better chances if you make your packets larger. Netdev is the network developers mailing list at netdev@oss.sgi.com Linux-net is network users mailing list; cheers, jamal On Wed, 19 Feb 2003, Simon Kirby wrote: > On Wed, Feb 19, 2003 at 08:40:47PM -0500, jamal wrote: > > > Hi, could you please either posting or ccing netdev on network related > > issues? There are a lot of people who could help you but are not > > subscribed to lk. > > Is netdev separate from linux-net? If so, where is it? :) > > > I dont have an answer for you. I think testing a different card like > > Dave says would be a good start. I am curious about the theory of > > operation of your program; by duplex mismatch i take it you mean one end > > has one speed setting but the other has a conflicting one? > > If thats so i am not sure i understand why sending a burst of packets > > and waiting for responses catches this problem. Are you saying > > you could successfully send but fail to receive? I dont see the connection > > and i am curious. > > Here's the story... > > Once upon a time, Donald wrote (or helped write, I don't know) the eepro100 > driver. Because Intel had so many variants of the eepro100, it was > difficult to implement link autonegotiation because different cards had > the link state detection wired to different pins. Even today, the eepro100 > driver doesn't seem to autonegotiate with any of the switches I've tried > it with. (10/100 autonegotiates fine, but the duplex does not.) > > To get around this problem, we had to force our switch ports to 100/full > and force the eepro100 driver to 100/full also. This works well, but > because we have hundreds of boxes running diferent OSs and we tend to > move things aroud fairly often, it's easy to forget to set up the port > properly. When this happens, everything appears to work, but depending > on timing, some packets are dropped due to one side believing there is a > collision while other side sees no problem. (This assumes you understand > how Ethernet works -- if not, let me know!) > > The timing required to hit the packet loss is actually quite important. > Web site visitors can often download at fairly high speeds over the > Internet (600 kB/sec, for example) from a machine with a duplex mismatch, > but internally (on the same LAN), transfer rates can be as low as 20 > kB/sec, although usually about 60 kB/sec, depending on the number of > switches being traversed, luck, etc. > > The duplextest program attempts to set up a timing situation that > exploits the half-collision scenario by sending multiple echo packets at > once, so that the first will be on the way back while the latter are > still being sent. "ping -f" doesn't work for this because it always uses > a timer and only sends one packet. > > I hope this explains things... > > Simon- > > [ Simon Kirby ][ Network Operations ] > [ sim@netnation.com ][ NetNation Communications ] > [ Opinions expressed are not necessarily those of my employer. ] > From hadi@cyberus.ca Thu Feb 20 04:35:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 04:35:32 -0800 (PST) Received: from mx03.cyberus.ca (mx03.cyberus.ca [216.191.240.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KCZT3v008411 for ; Thu, 20 Feb 2003 04:35:30 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx03.cyberus.ca with esmtp (Exim 4.10) id 18lq3k-000I0X-00; Thu, 20 Feb 2003 07:44:16 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1KChvYO029611; Thu, 20 Feb 2003 07:43:57 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1KChvF0029608; Thu, 20 Feb 2003 07:43:57 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Thu, 20 Feb 2003 07:43:57 -0500 (EST) From: jamal To: Simon Kirby cc: netdev@oss.sgi.com Subject: Re: Longstanding networking / SMP issue? (duplextest) In-Reply-To: <20030220073513.E28230@shell.cyberus.ca> Message-ID: <20030220074148.U28230@shell.cyberus.ca> References: <20030219203342.P28230@shell.cyberus.ca> <20030220020527.GA12748@netnation.com> <20030220073513.E28230@shell.cyberus.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1756 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Pls ignore this. Wasnt meant to go to netdev - this is what happens when i get too lazy (i invoked it from pine so i could cunpaste it and then forgot to delete it ;->). cheers, jamal On Thu, 20 Feb 2003, jamal wrote: > > I got you. I think its a clever idea. You have better chances if you > make your packets larger. > Netdev is the network developers mailing list at netdev@oss.sgi.com > Linux-net is network users mailing list; > > cheers, > jamal > > On Wed, 19 Feb 2003, Simon Kirby wrote: > > > On Wed, Feb 19, 2003 at 08:40:47PM -0500, jamal wrote: > > > > > Hi, could you please either posting or ccing netdev on network related > > > issues? There are a lot of people who could help you but are not > > > subscribed to lk. > > > > Is netdev separate from linux-net? If so, where is it? :) > > > > > I dont have an answer for you. I think testing a different card like > > > Dave says would be a good start. I am curious about the theory of > > > operation of your program; by duplex mismatch i take it you mean one end > > > has one speed setting but the other has a conflicting one? > > > If thats so i am not sure i understand why sending a burst of packets > > > and waiting for responses catches this problem. Are you saying > > > you could successfully send but fail to receive? I dont see the connection > > > and i am curious. > > > > Here's the story... > > > > Once upon a time, Donald wrote (or helped write, I don't know) the eepro100 > > driver. Because Intel had so many variants of the eepro100, it was > > difficult to implement link autonegotiation because different cards had > > the link state detection wired to different pins. Even today, the eepro100 > > driver doesn't seem to autonegotiate with any of the switches I've tried > > it with. (10/100 autonegotiates fine, but the duplex does not.) > > > > To get around this problem, we had to force our switch ports to 100/full > > and force the eepro100 driver to 100/full also. This works well, but > > because we have hundreds of boxes running diferent OSs and we tend to > > move things aroud fairly often, it's easy to forget to set up the port > > properly. When this happens, everything appears to work, but depending > > on timing, some packets are dropped due to one side believing there is a > > collision while other side sees no problem. (This assumes you understand > > how Ethernet works -- if not, let me know!) > > > > The timing required to hit the packet loss is actually quite important. > > Web site visitors can often download at fairly high speeds over the > > Internet (600 kB/sec, for example) from a machine with a duplex mismatch, > > but internally (on the same LAN), transfer rates can be as low as 20 > > kB/sec, although usually about 60 kB/sec, depending on the number of > > switches being traversed, luck, etc. > > > > The duplextest program attempts to set up a timing situation that > > exploits the half-collision scenario by sending multiple echo packets at > > once, so that the first will be on the way back while the latter are > > still being sent. "ping -f" doesn't work for this because it always uses > > a timer and only sends one packet. > > > > I hope this explains things... > > > > Simon- > > > > [ Simon Kirby ][ Network Operations ] > > [ sim@netnation.com ][ NetNation Communications ] > > [ Opinions expressed are not necessarily those of my employer. ] > > > > > From bwa@us.ibm.com Thu Feb 20 12:44:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 12:44:18 -0800 (PST) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KKi93v018679 for ; Thu, 20 Feb 2003 12:44:10 -0800 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e4.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1KKqZx7084452; Thu, 20 Feb 2003 15:52:35 -0500 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1KKqVDK055844; Thu, 20 Feb 2003 15:52:32 -0500 Message-ID: <3E554063.3010008@us.ibm.com> Date: Thu, 20 Feb 2003 12:53:55 -0800 From: Bruce Allan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jon Grimm CC: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: [PATCH] subset of RFC2553 References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> <3E54128C.327D7759@us.ibm.com> <3E541FEA.C34BEEB7@us.ibm.com> <3E542F52.50001@us.ibm.com> In-Reply-To: <3E542F52.50001@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1757 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwa@us.ibm.com Precedence: bulk X-list: netdev After re-reading this thread, I think what Dave didn't like about the original definition of sockaddr_storage was the unnecessary use of _more_than_1_ pad field and the align field. So, the simplest definition that Dave might approve would be: #define _SS_MAXSIZE 128 /* Implementation specific max size */ struct sockaddr_storage { sa_family_t ss_family; char __data[_SS_MAXSIZE - sizeof(sa_family_t)]; } __attribute __ ((aligned(sizeof(struct sockaddr *)))); The use of the 'sizeof(struct sockaddr *)' for specifying the required alignment is just to illustrate the second criteria for this structure as documented in the RFC, i.e. "It is aligned at an appropriate boundary so protocol specific socket address data structure pointers can be cast to it and access their fields without alignment problems...". I'll resubmit a patch tomorrow with the above definition if there are no objections. Bruce Allan Jon Grimm wrote: > Bruce, > I removed Dave from the the cc list until we get something that > works for us (or anyone else that wants to chime in). > My second proposal doesn't look quite right when I reread it. > See below. > > Thanks, > Jon > > Jon Grimm wrote: > >> Or if you don't care about the alignment of the __data field at all: >> >> #define _SS_MAXSIZE 128 >> #if ULONG_MAX > 0xffffffff >> #define _ALIGNSIZE ((sizeof(__u64))) >> #else >> #define _ALIGNSIZE ((sizeof(__u32))) >> #endif >> struct sockaddr_storage { >> sa_family_t ss_family; >> char __data[_SS_MAXSIZE-sizeof(sa_family_t)*2 + >> _ALIGNSIZE]; >> > > Should be > char __data[__SS_MAXSIZE-ALIGNSIZE]; > >> } __attribute ((aligned(_ALIGNSIZE))); >> >> >> jon > From jgrimm2@us.ibm.com Thu Feb 20 13:17:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 20 Feb 2003 13:17:43 -0800 (PST) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1KLHZ3v020461 for ; Thu, 20 Feb 2003 13:17:36 -0800 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e3.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1KLQ55X017076; Thu, 20 Feb 2003 16:26:05 -0500 Received: from us.ibm.com ([9.53.216.106]) by northrelay01.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1KLPs94061438; Thu, 20 Feb 2003 16:25:55 -0500 Message-ID: <3E554771.5020104@us.ibm.com> Date: Thu, 20 Feb 2003 15:24:01 -0600 From: Jon Grimm Organization: IBM User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Bruce Allan CC: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [Lksctp-developers] Re: [PATCH] subset of RFC2553 References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> <3E54128C.327D7759@us.ibm.com> <3E541FEA.C34BEEB7@us.ibm.com> <3E542F52.50001@us.ibm.com> <3E554063.3010008@us.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1758 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgrimm2@us.ibm.com Precedence: bulk X-list: netdev Bruce Allan wrote: > > #define _SS_MAXSIZE 128 /* Implementation specific max size */ > struct sockaddr_storage { > sa_family_t ss_family; > char __data[_SS_MAXSIZE - sizeof(sa_family_t)]; > } __attribute __ ((aligned(sizeof(struct sockaddr *)))); > This works for me. I really just needed the struct to come out to 128 bytes. > The use of the 'sizeof(struct sockaddr *)' for specifying the required > alignment is just to illustrate the second criteria for this structure > as documented in the RFC, i.e. "It is aligned at an appropriate boundary > so protocol specific socket address data structure pointers can be cast > to it and access their fields without alignment problems...". > Yes. I agree. > I'll resubmit a patch tomorrow with the above definition if there are no > objections. > OK. I compiled this on 4 and 8 byte alignements. jon From ipv6_san@rediffmail.com Fri Feb 21 03:30:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 03:30:46 -0800 (PST) Received: from rediffmail.com (webmail17.rediffmail.com [203.199.83.27] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LBUP3v006776 for ; Fri, 21 Feb 2003 03:30:27 -0800 Received: (qmail 18122 invoked by uid 510); 21 Feb 2003 11:38:54 -0000 Date: 21 Feb 2003 11:38:54 -0000 Message-ID: <20030221113854.18121.qmail@webmail17.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 21 feb 2003 11:38:54 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: netdev@oss.sgi.com Cc: linux-mips@linux-mips.org Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1759 Subject: (no subject) X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Well, i have a Linux machine i686 and an IAD based on MIPS 32-bit arch, both enabled with IPv6. Linux with 2.4.18-14 based on i686 configured as... # ip -6 addr add 3ff3:1234::1/64 dev eth0 # ip -6 route add 3ffe:1234/64 dev eth0 IAD with 2.4.5-pre1 kernel based on MIPS 32-bit core configured as... # ip -6 addr add 3ff3:1234::2/64 dev epmac1 # ip -6 route add 3ff3:1234/64dev epmac1 -------------------------------------------------- My IAD has a Flash, where the Linux kernel and the filesystem images are present. Flash size = 16MB Filesystem is jffs2 Generated partitions are... yamon partition size:2048 Kb kernel partition size: 1024 Kb rw image size: 10624 Kb env partition size:128 Kb Total: 13824 Kb ------------------------------ These two are connected in a IPv4 based LAN. When i try to ping6 from Linux machine to the IAD, my IAD hangs and generates a kernel OOps message. Below is the snap shot of the message... Following message is produced at the IAD terminal..... # Unable to handle kernel paging request at virtual address 00000000, epc == 802 4ce74, ra == 802592a8 Oops in fault.c:do_page_fault, line 172: $0 : 00000000 1000fc00 8024ce70 8032bbbc $4 : 00000000 83cbf120 00000000 83cbf17c $8 : 00000001 0000000f 8030e000 00000003 $12: 00000416 83f69e18 00000000 8030914b $16: fffffffd 00000000 00000018 c0026d20 $20: 81120dc0 83cbf17c 83cbf17c 00000000 $24: 00000001 2ac1d440 $28: 80106000 80107d78 83cbf120 802592a8 epc : 8024ce74 Status: 1000fc03 Cause : 00800008 Process swapper (pid: 0, stackpage=80106000) Stack: 00000008 00000000 83a00000 00000000 fd010018 83a11860 8031fd50 00000000 00000000 8024ca40 00000000 0000002c 8e8e1dac 00000000 fffffffd 83cbf120 00000000 83cbf17c c0026d20 00000000 00000000 00000000 800e1d38 80259b0c 83a53346 801efd1c 83a53346 839e92a0 00000000 83a118e0 83a53346 00000000 8022a1ec 80229e90 00000001 83cbf120 0000000e 8008a0e0 04000000 00000000 801f9384 ... Call Trace: [<8024ca40>] [] [<80259b0c>] [<801efd1c>] [<8022a1ec>] [<8 0229e90>] [<801f9384>] [<8024c97c>] [<8011f6b0>] [<8011f214>] [<801f5674>] [<801 1af14>] [<8011f73c>] [<8011ad88>] [<8011aacc>] [<8011aacc>] [<801117cc>] [<80111 7cc>] [<8010a478>] [<8010dee8>] [<80107fe0>] [<80125700>] [<801117cc>] [<8010600 0>] [<80107f60>] [<8010870c>] [<801086f0>] [<80108a78>] [<80115470>] [<802bfffc> ] [<802b7d30>] [<802c0b48>] [<801005e8>] Code: 03e00008 27bd0030 00803021 <8cc50000> 30a400e0 10800003 240300e0 1483 0034 24020001 Suggestions/Tips are welcome. -San ------------------------------------- From davidsen@tmr.com Fri Feb 21 06:51:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 06:51:14 -0800 (PST) Received: from gatekeeper.tmr.com (tmr-02.dsl.thebiz.net [216.238.38.204]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LEp63v011139 for ; Fri, 21 Feb 2003 06:51:08 -0800 Received: from localhost (davidsen@localhost) by gatekeeper.tmr.com (8.9.0/8.9.0) with SMTP id JAA21517; Fri, 21 Feb 2003 09:56:22 -0500 Date: Fri, 21 Feb 2003 09:56:21 -0500 (EST) From: Bill Davidsen To: Jeff Garzik cc: lkml , netdev@oss.sgi.com Subject: Re: netdevices.txt update In-Reply-To: <3E53093F.5050502@pobox.com> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY=------------040403030109020202050704 Content-ID: X-archive-position: 1760 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidsen@tmr.com Precedence: bulk X-list: netdev This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --------------040403030109020202050704 Content-Type: TEXT/PLAIN; CHARSET=us-ascii; FORMAT=flowed Content-ID: On Tue, 18 Feb 2003, Jeff Garzik wrote: > Just made a minor update to Documentation/networking/netdevices.txt, and > thought I would take the opportunity to pass it around once again. > > Even though this doc has existed for quite a while now, I still come > across code that loves to violate these locking rules in various ways. > > Comments and additions welcome I wish other kernel interfaces were as well documented, or documented at all other than the source code using them. Thank you so much! -- bill davidsen CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. --------------040403030109020202050704-- From bunk@fs.tum.de Fri Feb 21 07:49:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 07:49:14 -0800 (PST) Received: from hermes.fachschaften.tu-muenchen.de (hermes.fachschaften.tu-muenchen.de [129.187.202.12]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LFn43v015146 for ; Fri, 21 Feb 2003 07:49:05 -0800 Received: (qmail 25478 invoked from network); 21 Feb 2003 15:57:52 -0000 Received: from mimas.fachschaften.tu-muenchen.de (129.187.202.58) by hermes.fachschaften.tu-muenchen.de with QMQP; 21 Feb 2003 15:57:52 -0000 Date: Fri, 21 Feb 2003 16:57:49 +0100 From: Adrian Bunk To: Linus Torvalds Cc: netdev@oss.sgi.com, trivial@rustcorp.com.au Subject: [2.5 patch] remove an unneeded #if from net/ipv6/af_inet6.c (fwd) Message-ID: <20030221155749.GR531@fs.tum.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 1761 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@fs.tum.de Precedence: bulk X-list: netdev Hi Linus, the trivial patch forwarded below still applies against and compiles in 2.5.62. Please apply Adrian ----- Forwarded message from Adrian Bunk ----- Date: Sat, 18 Jan 2003 18:19:22 +0100 From: Adrian Bunk To: Pedro Roque Cc: netdev@oss.sgi.com Subject: [2.5 patch] remove an unneeded #if from net/ipv6/af_inet6.c The patch below removes an unneeded #if from net/ipv6/af_inet6.c: - kernel 2.0 is too ancient to check for - the MODULE_* macros have empty definitions #if !MODULE I've tested the compilation with 2.5.59. Please apply Adrian --- linux-2.5.59-full/net/ipv6/af_inet6.c.old 2003-01-18 18:11:08.000000000 +0100 +++ linux-2.5.59-full/net/ipv6/af_inet6.c 2003-01-18 18:11:38.000000000 +0100 @@ -67,11 +67,9 @@ module for allowing unload */ #endif -#if defined(MODULE) && LINUX_VERSION_CODE > 0x20115 MODULE_AUTHOR("Cast of dozens"); MODULE_DESCRIPTION("IPv6 protocol stack for Linux"); MODULE_PARM(unloadable, "i"); -#endif /* IPv6 procfs goodies... */ ----- End forwarded message ----- From bwa@us.ibm.com Fri Feb 21 08:57:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 08:57:37 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LGvQ3v017948 for ; Fri, 21 Feb 2003 08:57:27 -0800 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e2.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1LH6Dl3051816; Fri, 21 Feb 2003 12:06:13 -0500 Received: from w-bwa1.beaverton.ibm.com (w-bwa1.beaverton.ibm.com [9.47.18.12]) by northrelay01.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1LH68QC108822; Fri, 21 Feb 2003 12:06:08 -0500 Subject: Re: [PATCH] subset of RFC2553 From: Bruce Allan To: "David S. Miller" Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com, bwa@us.ibm.com In-Reply-To: <20030219.162129.11584427.davem@redhat.com> References: <1045621941.1253.21.camel@w-bwa1.beaverton.ibm.com> <3E54128C.327D7759@us.ibm.com> <20030219.162129.11584427.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 21 Feb 2003 09:06:08 -0800 Message-Id: <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> Mime-Version: 1.0 X-archive-position: 1762 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwa@us.ibm.com Precedence: bulk X-list: netdev Below is the rework of the original patch sent out last week with a version of sockaddr_storage that I hope is acceptable. It applies against 2.5.59. Thanks again, Bruce. diff -Naur linux-2.5.59/include/linux/in6.h linux-2.5.59-RFC2553/include/linux/in6.h --- linux-2.5.59/include/linux/in6.h 2003-02-12 14:05:59.000000000 -0800 +++ linux-2.5.59-RFC2553/include/linux/in6.h 2003-02-12 10:09:23.000000000 -0800 @@ -40,6 +40,15 @@ #define s6_addr32 in6_u.u6_addr32 }; +/* IPv6 Wildcard Address (::) and Loopback Address (::1) defined in RFC2553 + * NOTE: Be aware the IN6ADDR_* constants and in6addr_* externals are defined + * in network byte order, not in host byte order as are the IPv4 equivalents + */ +extern const struct in6_addr in6addr_any; +#define IN6ADDR_ANY_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } } } +extern const struct in6_addr in6addr_loopback; +#define IN6ADDR_LOOPBACK_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 } } } + struct sockaddr_in6 { unsigned short int sin6_family; /* AF_INET6 */ __u16 sin6_port; /* Transport layer port # */ diff -Naur linux-2.5.59/include/linux/socket.h linux-2.5.59-RFC2553/include/linux/socket.h --- linux-2.5.59/include/linux/socket.h 2003-02-12 14:05:59.000000000 -0800 +++ linux-2.5.59-RFC2553/include/linux/socket.h 2003-02-20 15:08:44.000000000 -0800 @@ -25,6 +25,21 @@ }; /* + * Desired design of maximum size and alignment (see RFC2553) + */ +#define _SS_MAXSIZE 128 /* Implementation specific max size */ +#define _SS_ALIGNSIZE (__alignof__ (struct sockaddr *)) + /* Implementation specific desired alignment */ + +struct sockaddr_storage { + sa_family_t ss_family; /* address family */ + /* Following field(s) are implementation specific */ + char __data[_SS_MAXSIZE - sizeof(sa_family_t)]; + /* space to achieve desired size, */ + /* _SS_MAXSIZE value minus size of ss_family */ +} __attribute__ ((aligned(_SS_ALIGNSIZE))); /* force desired alignment */ + +/* * As we do 4.4BSD message passing we use a 4.4BSD message passing * system, not 4.3. Thus msg_accrights(len) are now missing. They * belong in an obscure libc emulation or the bin. diff -Naur linux-2.5.59/include/net/sctp/structs.h linux-2.5.59-RFC2553/include/net/sctp/structs.h --- linux-2.5.59/include/net/sctp/structs.h 2003-02-12 14:05:59.000000000 -0800 +++ linux-2.5.59-RFC2553/include/net/sctp/structs.h 2003-02-12 08:35:07.000000000 -0800 @@ -61,38 +61,6 @@ #include /* We need tq_struct. */ #include /* We need sctp* header structs. */ -/* - * This is (almost) a direct quote from RFC 2553. - */ - -/* - * Desired design of maximum size and alignment - */ -#define _SS_MAXSIZE 128 /* Implementation specific max size */ -#define _SS_ALIGNSIZE (sizeof (__s64)) - /* Implementation specific desired alignment */ -/* - * Definitions used for sockaddr_storage structure paddings design. - */ -#define _SS_PAD1SIZE (_SS_ALIGNSIZE - sizeof (sa_family_t)) -#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t)+ \ - _SS_PAD1SIZE + _SS_ALIGNSIZE)) - -struct sockaddr_storage { - sa_family_t __ss_family; /* address family */ - /* Following fields are implementation specific */ - char __ss_pad1[_SS_PAD1SIZE]; - /* 6 byte pad, to make implementation */ - /* specific pad up to alignment field that */ - /* follows explicit in the data structure */ - __s64 __ss_align; /* field to force desired structure */ - /* storage alignment */ - char __ss_pad2[_SS_PAD2SIZE]; - /* 112 byte pad to achieve desired size, */ - /* _SS_MAXSIZE value minus size of ss_family */ - /* __ss_pad1, __ss_align fields is 112 */ -}; - /* A convenience structure for handling sockaddr structures. * We should wean ourselves off this. */ diff -Naur linux-2.5.59/net/ipv6/addrconf.c linux-2.5.59-RFC2553/net/ipv6/addrconf.c --- linux-2.5.59/net/ipv6/addrconf.c 2003-02-12 14:05:59.000000000 -0800 +++ linux-2.5.59-RFC2553/net/ipv6/addrconf.c 2003-02-12 13:55:03.000000000 -0800 @@ -136,6 +136,10 @@ MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ }; +/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ +const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; +const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; + int ipv6_addr_type(struct in6_addr *addr) { u32 st; -- Bruce Allan Linux Technology Center IBM Corporation, Beaverton OR From macro@ds2.pg.gda.pl Fri Feb 21 10:32:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 10:32:54 -0800 (PST) Received: from delta.ds2.pg.gda.pl (macro@delta.ds2.pg.gda.pl [213.192.72.1]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LIWh3v020866 for ; Fri, 21 Feb 2003 10:32:45 -0800 Received: from localhost by delta.ds2.pg.gda.pl (8.9.3/8.9.3) with SMTP id TAA18031; Fri, 21 Feb 2003 19:37:09 +0100 (MET) Date: Fri, 21 Feb 2003 19:37:08 +0100 (MET) From: "Maciej W. Rozycki" To: santosh kumar gowda cc: netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: (no subject) In-Reply-To: <20030221113854.18121.qmail@webmail17.rediffmail.com> Message-ID: Organization: Technical University of Gdansk MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1763 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: macro@ds2.pg.gda.pl Precedence: bulk X-list: netdev On 21 Feb 2003, santosh kumar gowda wrote: > Following message is produced at the IAD terminal..... > > # Unable to handle kernel paging request at virtual address > 00000000, epc == 802 > 4ce74, ra == 802592a8 > Oops in fault.c:do_page_fault, line 172: [...] > Suggestions/Tips are welcome. Decode the oops first or nobody will be able to give any help. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + From greearb@candelatech.com Fri Feb 21 15:03:15 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 15:03:52 -0800 (PST) Received: from grok.yi.org (IDENT:hdoKG1+0/kLxg0IpwChCZeJoS3C0l/s5@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1LN3D3v026306 for ; Fri, 21 Feb 2003 15:03:14 -0800 Received: from candelatech.com (IDENT:Ku/MJF4PsisYzPlNjIsIiIgnsWZfBjn8@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id h1LNC6130711 for ; Fri, 21 Feb 2003 15:12:06 -0800 Message-ID: <3E56B245.9040704@candelatech.com> Date: Fri, 21 Feb 2003 15:12:05 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "'netdev@oss.sgi.com'" Subject: locked up my rtl8139C chip, connected to a 10bt hub. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1764 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev I can repeatedly lock up my rtl8139C NIC. So far, it's always been the one connected to a 10bt hub. I have been running it as fast as it can go, and it generally ides in 2-4 hours.... Driver: stock RH 8.0 kernel, seems to be rtl8139too 0.9.25 Here is the output from rtldiag as found on Becker's site. There is also some damning critique of the rtl8139too...any truth to the accusations? :) I have found bouncing the link does not necessarily fix it as long as I keep trying to generate traffic. However, bouncing the link and waiting about 20 seconds to start traffic again seems to un-jam it. Let me know if I can offer some more info. It's the last chip down that seems to be the main culprit.... rtl8139-diag.c:v2.10 9/18/2002 Donald Becker (becker@scyld.com) http://www.scyld.com/diag/index.html Index #1: Found a RealTek RTL8139 adapter at 0xd800. RealTek chip registers at 0xd800 0x000: 03f31000 000065a6 80000000 00000000 000ea0ba 000ea0ba 000ea0b6 000ea0b6 0x020: 06b0e000 06b0e600 06b0ec00 06b0f200 069f0000 0d0a0000 173c172c 0000c07f 0x040: 74000680 0000f78e 09c6002b 00000000 008d1000 00000000 0088c510 00100000 0x060: 1100f00f 01e1782d 000145e1 00000000 00000004 000207c8 b0f243b9 8a36df43. Realtek station address 00:10:f3:03:a6:65, chip type 'rtl8139C'. Receiver configuration: Normal unicast and hashed multicast Rx FIFO threshold 2048 bytes, maximum burst 2048 bytes, 32KB ring Transmitter enabled with NONSTANDARD! settings, maximum burst 1024 bytes. Tx entry #0 status 000ea0ba complete, 186 bytes. Tx entry #1 status 000ea0ba complete, 186 bytes. Tx entry #2 status 000ea0b6 complete, 182 bytes. Tx entry #3 status 000ea0b6 complete, 182 bytes. Flow control: Tx disabled Rx disabled. The chip configuration is 0x10 0x8d, MII half-duplex mode. No interrupt sources are pending. The RTL8139 does not use a MII transceiver. It does have internal MII-compatible registers: Basic mode control register 0x1100. Basic mode status register 0x782d. Autonegotiation Advertisement 0x01e1. Link Partner Ability register 0x45e1. Autonegotiation expansion 0x0001. Disconnects 0x0000. False carrier sense counter 0x0000. NWay test register 0x0004. Receive frame error count 0x0000. MII PHY #32 transceiver registers: 1100 782d 0000 0000 01e1 45e1 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. Basic mode control register 0x1100: Auto-negotiation enabled. Basic mode status register 0x782d ... 782d. Link status: established. Capable of 100baseTx-FD 100baseTx 10baseT-FD 10baseT. Able to perform Auto-negotiation, negotiation complete. This transceiver has no vendor identification. I'm advertising 01e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT Advertising no additional info pages. IEEE 802.3 CSMA/CD protocol. Link partner capability is 45e1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD 10baseT. Negotiation completed. Index #2: Found a RealTek RTL8139 adapter at 0xdc00. RealTek chip registers at 0xdc00 0x000: 03f31000 000064a6 ffffffff ffffffff 000aa03c 000aa03c 000aa03c 000aa03c 0x020: 06d50000 06d50600 06d50c00 06d51200 00420000 0d0a0000 389c388c 0000c07f 0x040: 74000680 0000f78f 09c62a83 00000000 008d1000 00000000 0088c510 00100000 0x060: 1100f00f 01e1782d 000141e1 00000000 00000004 000417c8 b0f243b9 8a36df43. Realtek station address 00:10:f3:03:a6:64, chip type 'rtl8139C'. Receiver configuration: Promiscuous Rx FIFO threshold 2048 bytes, maximum burst 2048 bytes, 32KB ring Transmitter enabled with NONSTANDARD! settings, maximum burst 1024 bytes. Tx entry #0 status 000aa03c complete, 60 bytes. Tx entry #1 status 000aa03c complete, 60 bytes. Tx entry #2 status 000aa03c complete, 60 bytes. Tx entry #3 status 000aa03c complete, 60 bytes. Flow control: Tx disabled Rx disabled. The chip configuration is 0x10 0x8d, MII half-duplex mode. No interrupt sources are pending. The RTL8139 does not use a MII transceiver. It does have internal MII-compatible registers: Basic mode control register 0x1100. Basic mode status register 0x782d. Autonegotiation Advertisement 0x01e1. Link Partner Ability register 0x41e1. Autonegotiation expansion 0x0001. Disconnects 0x0000. False carrier sense counter 0x0000. NWay test register 0x0004. Receive frame error count 0x0000. MII PHY #32 transceiver registers: 1100 782d 0000 0000 01e1 41e1 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. Basic mode control register 0x1100: Auto-negotiation enabled. Basic mode status register 0x782d ... 782d. Link status: established. Capable of 100baseTx-FD 100baseTx 10baseT-FD 10baseT. Able to perform Auto-negotiation, negotiation complete. This transceiver has no vendor identification. I'm advertising 01e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT Advertising no additional info pages. IEEE 802.3 CSMA/CD protocol. Link partner capability is 41e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT. Negotiation completed. Index #3: Found a RealTek RTL8139 adapter at 0xe000. RealTek chip registers at 0xe000 0x000: 03f31000 000063a6 ffffffff ffffffff 9008a5ea 9008a042 9008a5ea 9008a5ea 0x020: 06022000 06022600 06022c00 06023200 02980000 0d0a0000 44e844d8 0000c07f 0x040: 74000680 0000f78f 09c6771d 00000000 008d1000 00000000 0088c518 00100000 0x060: 1000f00f 01e1782d 00000000 00000000 00000005 000f77c0 b0f243b9 8a36df43. Realtek station address 00:10:f3:03:a6:63, chip type 'rtl8139C'. Receiver configuration: Promiscuous Rx FIFO threshold 2048 bytes, maximum burst 2048 bytes, 32KB ring Transmitter enabled with NONSTANDARD! settings, maximum burst 1024 bytes. Tx entry #0 status 9008a5ea complete, 1514 bytes. Tx carrier lost Tx entry #1 status 9008a042 complete, 66 bytes. Tx carrier lost Tx entry #2 status 9008a5ea complete, 1514 bytes. Tx carrier lost Tx entry #3 status 9008a5ea complete, 1514 bytes. Tx carrier lost Flow control: Tx disabled Rx disabled. The chip configuration is 0x10 0x8d, MII half-duplex mode. No interrupt sources are pending. The RTL8139 does not use a MII transceiver. It does have internal MII-compatible registers: Basic mode control register 0x1000. Basic mode status register 0x782d. Autonegotiation Advertisement 0x01e1. Link Partner Ability register 0x0000. Autonegotiation expansion 0x0000. Disconnects 0x0000. False carrier sense counter 0x0000. NWay test register 0x0005. Receive frame error count 0x0000. MII PHY #32 transceiver registers: 1000 782d 0000 0000 01e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. Basic mode control register 0x1000: Auto-negotiation enabled. Basic mode status register 0x782d ... 782d. Link status: established. Capable of 100baseTx-FD 100baseTx 10baseT-FD 10baseT. Able to perform Auto-negotiation, negotiation complete. This transceiver has no vendor identification. I'm advertising 01e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT Advertising no additional info pages. IEEE 802.3 CSMA/CD protocol. Link partner capability is 0000:. Negotiation did not complete. -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From davem@redhat.com Fri Feb 21 23:31:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 23:31:59 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1M7VM3v000891 for ; Fri, 21 Feb 2003 23:31:23 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA26554; Fri, 21 Feb 2003 23:23:58 -0800 Date: Fri, 21 Feb 2003 23:23:57 -0800 (PST) Message-Id: <20030221.232357.01023911.davem@redhat.com> To: bwa@us.ibm.com Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] subset of RFC2553 From: "David S. Miller" In-Reply-To: <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> References: <3E54128C.327D7759@us.ibm.com> <20030219.162129.11584427.davem@redhat.com> <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1765 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Bruce Allan Date: 21 Feb 2003 09:06:08 -0800 Below is the rework of the original patch sent out last week with a version of sockaddr_storage that I hope is acceptable. It applies against 2.5.59. I will apply this patch, thanks for resolving all of the issues Bruce. From davem@redhat.com Fri Feb 21 23:34:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 21 Feb 2003 23:34:44 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1M7Y93v001100 for ; Fri, 21 Feb 2003 23:34:10 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA26565; Fri, 21 Feb 2003 23:26:40 -0800 Date: Fri, 21 Feb 2003 23:26:39 -0800 (PST) Message-Id: <20030221.232639.129509431.davem@redhat.com> To: bwa@us.ibm.com Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] subset of RFC2553 From: "David S. Miller" In-Reply-To: <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> References: <3E54128C.327D7759@us.ibm.com> <20030219.162129.11584427.davem@redhat.com> <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1766 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Bruce, while applying this I noticed that in6addr_{any,loopback} are not exported by modules. Please send me a small patch to add the exports if this will be needed by SCTP and friends. Thanks. From pekkas@netcore.fi Sat Feb 22 01:17:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 01:18:26 -0800 (PST) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1M9Hn3v002541 for ; Sat, 22 Feb 2003 01:17:51 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id h1M9QAT29165; Sat, 22 Feb 2003 11:26:11 +0200 Date: Sat, 22 Feb 2003 11:26:10 +0200 (EET) From: Pekka Savola To: Bruce Allan cc: "David S. Miller" , , , Subject: Re: [PATCH] subset of RFC2553 In-Reply-To: <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1767 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev Btw, FYI, RFC2553bis has been accepted for publication for Informational RFC some time ago. I think these definitions are the same there, though. It's available at: http://www.ietf.org/internet-drafts/draft-ietf-ipngwg-rfc2553bis-10.txt (there are a few minor editorial nits which will probably be fixed prior to publication.) On 21 Feb 2003, Bruce Allan wrote: > Below is the rework of the original patch sent out last week with a > version of sockaddr_storage that I hope is acceptable. It applies > against 2.5.59. > > Thanks again, > Bruce. > > diff -Naur linux-2.5.59/include/linux/in6.h linux-2.5.59-RFC2553/include/linux/in6.h > --- linux-2.5.59/include/linux/in6.h 2003-02-12 14:05:59.000000000 -0800 > +++ linux-2.5.59-RFC2553/include/linux/in6.h 2003-02-12 10:09:23.000000000 -0800 > @@ -40,6 +40,15 @@ > #define s6_addr32 in6_u.u6_addr32 > }; > > +/* IPv6 Wildcard Address (::) and Loopback Address (::1) defined in RFC2553 > + * NOTE: Be aware the IN6ADDR_* constants and in6addr_* externals are defined > + * in network byte order, not in host byte order as are the IPv4 equivalents > + */ > +extern const struct in6_addr in6addr_any; > +#define IN6ADDR_ANY_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } } } > +extern const struct in6_addr in6addr_loopback; > +#define IN6ADDR_LOOPBACK_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 } } } > + > struct sockaddr_in6 { > unsigned short int sin6_family; /* AF_INET6 */ > __u16 sin6_port; /* Transport layer port # */ > diff -Naur linux-2.5.59/include/linux/socket.h linux-2.5.59-RFC2553/include/linux/socket.h > --- linux-2.5.59/include/linux/socket.h 2003-02-12 14:05:59.000000000 -0800 > +++ linux-2.5.59-RFC2553/include/linux/socket.h 2003-02-20 15:08:44.000000000 -0800 > @@ -25,6 +25,21 @@ > }; > > /* > + * Desired design of maximum size and alignment (see RFC2553) > + */ > +#define _SS_MAXSIZE 128 /* Implementation specific max size */ > +#define _SS_ALIGNSIZE (__alignof__ (struct sockaddr *)) > + /* Implementation specific desired alignment */ > + > +struct sockaddr_storage { > + sa_family_t ss_family; /* address family */ > + /* Following field(s) are implementation specific */ > + char __data[_SS_MAXSIZE - sizeof(sa_family_t)]; > + /* space to achieve desired size, */ > + /* _SS_MAXSIZE value minus size of ss_family */ > +} __attribute__ ((aligned(_SS_ALIGNSIZE))); /* force desired alignment */ > + > +/* > * As we do 4.4BSD message passing we use a 4.4BSD message passing > * system, not 4.3. Thus msg_accrights(len) are now missing. They > * belong in an obscure libc emulation or the bin. > diff -Naur linux-2.5.59/include/net/sctp/structs.h linux-2.5.59-RFC2553/include/net/sctp/structs.h > --- linux-2.5.59/include/net/sctp/structs.h 2003-02-12 14:05:59.000000000 -0800 > +++ linux-2.5.59-RFC2553/include/net/sctp/structs.h 2003-02-12 08:35:07.000000000 -0800 > @@ -61,38 +61,6 @@ > #include /* We need tq_struct. */ > #include /* We need sctp* header structs. */ > > -/* > - * This is (almost) a direct quote from RFC 2553. > - */ > - > -/* > - * Desired design of maximum size and alignment > - */ > -#define _SS_MAXSIZE 128 /* Implementation specific max size */ > -#define _SS_ALIGNSIZE (sizeof (__s64)) > - /* Implementation specific desired alignment */ > -/* > - * Definitions used for sockaddr_storage structure paddings design. > - */ > -#define _SS_PAD1SIZE (_SS_ALIGNSIZE - sizeof (sa_family_t)) > -#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t)+ \ > - _SS_PAD1SIZE + _SS_ALIGNSIZE)) > - > -struct sockaddr_storage { > - sa_family_t __ss_family; /* address family */ > - /* Following fields are implementation specific */ > - char __ss_pad1[_SS_PAD1SIZE]; > - /* 6 byte pad, to make implementation */ > - /* specific pad up to alignment field that */ > - /* follows explicit in the data structure */ > - __s64 __ss_align; /* field to force desired structure */ > - /* storage alignment */ > - char __ss_pad2[_SS_PAD2SIZE]; > - /* 112 byte pad to achieve desired size, */ > - /* _SS_MAXSIZE value minus size of ss_family */ > - /* __ss_pad1, __ss_align fields is 112 */ > -}; > - > /* A convenience structure for handling sockaddr structures. > * We should wean ourselves off this. > */ > diff -Naur linux-2.5.59/net/ipv6/addrconf.c linux-2.5.59-RFC2553/net/ipv6/addrconf.c > --- linux-2.5.59/net/ipv6/addrconf.c 2003-02-12 14:05:59.000000000 -0800 > +++ linux-2.5.59-RFC2553/net/ipv6/addrconf.c 2003-02-12 13:55:03.000000000 -0800 > @@ -136,6 +136,10 @@ > MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ > }; > > +/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ > +const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; > +const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; > + > int ipv6_addr_type(struct in6_addr *addr) > { > u32 st; > > -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From kazunori@miyazawa.org Sat Feb 22 03:17:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 03:17:43 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1MBH53v006291 for ; Sat, 22 Feb 2003 03:17:06 -0800 Received: from monza.miyazawa.org ([::ffff:192.168.0.3]) (IDENT: miyazawa, AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Sat, 22 Feb 2003 20:09:01 +0900 Date: Sat, 22 Feb 2003 20:26:23 +0900 From: Kazunori Miyazawa To: davem@redhat.com, kuznet@ms2.inr.ac.ru Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kunihiro@ipinfusion.com Subject: [PATCH] IPv6 IPSEC support Message-Id: <20030222202623.38d41d8a.kazunori@miyazawa.org> X-Mailer: Sylpheed version 0.8.9 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1768 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, I resubmit a patch for IPv6 IPsec. I moved the functions in net/ipv6/xfrm_policy and net/ipv6/xfrm_input to net/ipv4/xfrm_policy and net/ipv4/xfrm_input with ifdefs. And I unified sigle cache. I also moved the functions for ah, and esp. These are big changes against the patch I sent. I merged Ishiguro-san's patch for xfrm6_selector_match. As a result of moving IPv6 IPsec functions to net/ipv4, it currently prevents to make IPv6 as a module. This patch is against linux-2.5.62 + CS1_1002 Bese regards, --Kazunori Miyazawa (Yokogawa Electric Corporation) diff -ruN -x CVS linux-2.5.62+cs1_1002/include/linux/ipv6.h linux25/include/linux/ipv6.h --- linux-2.5.62+cs1_1002/include/linux/ipv6.h 2003-02-18 20:33:56.000000000 +0900 +++ linux25/include/linux/ipv6.h 2003-02-22 02:10:46.000000000 +0900 @@ -74,6 +74,21 @@ #define rt0_type rt_hdr.type; }; +struct ipv6_auth_hdr { + __u8 nexthdr; + __u8 hdrlen; /* This one is measured in 32 bit units! */ + __u16 reserved; + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 auth_data[4]; /* Length variable but >=4. Mind the 64 bit alignment! */ +}; + +struct ipv6_esp_hdr { + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 enc_data[8]; /* Length variable but >=8. Mind the 64 bit alignment! */ +}; + /* * IPv6 fixed header * diff -ruN -x CVS linux-2.5.62+cs1_1002/include/net/dst.h linux25/include/net/dst.h --- linux-2.5.62+cs1_1002/include/net/dst.h 2003-02-18 20:33:52.000000000 +0900 +++ linux25/include/net/dst.h 2003-02-22 02:10:46.000000000 +0900 @@ -248,6 +248,9 @@ extern int xfrm_lookup(struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); extern void xfrm_init(void); +extern int xfrm6_lookup(struct dst_entry **dst_p, struct flowi *fl, + struct sock *sk, int flags); +extern void xfrm6_init(void); #endif diff -ruN -x CVS linux-2.5.62+cs1_1002/include/net/ip6_route.h linux25/include/net/ip6_route.h --- linux-2.5.62+cs1_1002/include/net/ip6_route.h 2003-02-18 20:33:52.000000000 +0900 +++ linux25/include/net/ip6_route.h 2003-02-22 02:10:46.000000000 +0900 @@ -55,6 +55,8 @@ struct in6_addr *saddr, int oif, int flags); +extern struct rt6_info *ndisc_get_dummy_rt(void); + /* * support functions for ND * diff -ruN -x CVS linux-2.5.62+cs1_1002/include/net/xfrm.h linux25/include/net/xfrm.h --- linux-2.5.62+cs1_1002/include/net/xfrm.h 2003-02-18 20:33:52.000000000 +0900 +++ linux25/include/net/xfrm.h 2003-02-22 18:13:16.000000000 +0900 @@ -12,6 +12,7 @@ #include #include +#include #define XFRM_ALIGN8(len) (((len) + 7) & ~7) @@ -282,6 +283,7 @@ struct xfrm_dst *next; struct dst_entry dst; struct rtable rt; + struct rt6_info rt6; } u; }; @@ -308,26 +310,42 @@ if (sp && atomic_dec_and_test(&sp->refcnt)) __secpath_destroy(sp); } - -extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb); +extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, unsigned short family); static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb) { if (sk && sk->policy[XFRM_POLICY_IN]) - return __xfrm_policy_check(sk, dir, skb); + return __xfrm_policy_check(sk, dir, skb, AF_INET); + + return !xfrm_policy_list[dir] || + (skb->dst->flags & DST_NOPOLICY) || + __xfrm_policy_check(sk, dir, skb, AF_INET); +} + +static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + if (sk && sk->policy[XFRM_POLICY_IN]) + return __xfrm_policy_check(sk, dir, skb, AF_INET6); return !xfrm_policy_list[dir] || (skb->dst->flags & DST_NOPOLICY) || - __xfrm_policy_check(sk, dir, skb); + __xfrm_policy_check(sk, dir, skb, AF_INET6); } -extern int __xfrm_route_forward(struct sk_buff *skb); +extern int __xfrm_route_forward(struct sk_buff *skb, unsigned short family); static inline int xfrm_route_forward(struct sk_buff *skb) { return !xfrm_policy_list[XFRM_POLICY_OUT] || (skb->dst->flags & DST_NOXFRM) || - __xfrm_route_forward(skb); + __xfrm_route_forward(skb, AF_INET); +} + +static inline int xfrm6_route_forward(struct sk_buff *skb) +{ + return !xfrm_policy_list[XFRM_POLICY_OUT] || + (skb->dst->flags & DST_NOXFRM) || + __xfrm_route_forward(skb, AF_INET6); } extern int __xfrm_sk_clone_policy(struct sock *sk); @@ -382,10 +400,14 @@ extern struct xfrm_state *xfrm_state_alloc(void); extern struct xfrm_state *xfrm_state_find(u32 daddr, u32 saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, struct xfrm_policy *pol, int *err); +extern struct xfrm_state *xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, + struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err); extern int xfrm_state_check_expire(struct xfrm_state *x); extern void xfrm_state_insert(struct xfrm_state *x); extern int xfrm_state_check_space(struct xfrm_state *x, struct sk_buff *skb); extern struct xfrm_state *xfrm_state_lookup(u32 daddr, u32 spi, u8 proto); +extern struct xfrm_state *xfrm6_state_lookup(struct in6_addr *daddr, u32 spi, u8 proto); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); extern void xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); @@ -393,22 +415,27 @@ extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); extern int xfrm_check_selectors(struct xfrm_state **x, int n, struct flowi *fl); extern int xfrm4_rcv(struct sk_buff *skb); +extern int xfrm6_rcv(struct sk_buff *skb); +extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); struct xfrm_policy *xfrm_policy_alloc(int gfp); extern int xfrm_policy_walk(int (*func)(struct xfrm_policy *, int, int, void*), void *); -struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl); +struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl, unsigned short family); int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl); struct xfrm_policy *xfrm_policy_delete(int dir, struct xfrm_selector *sel); struct xfrm_policy *xfrm_policy_byid(int dir, u32 id, int delete); void xfrm_policy_flush(void); void xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); struct xfrm_state * xfrm_find_acq(u8 mode, u16 reqid, u8 proto, u32 daddr, u32 saddr, int create); +struct xfrm_state * xfrm6_find_acq(u8 mode, u16 reqid, u8 proto, struct in6_addr *daddr, + struct in6_addr *saddr, int create); extern void xfrm_policy_flush(void); extern void xfrm_policy_kill(struct xfrm_policy *); extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol); extern struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl); extern int xfrm_flush_bundles(struct xfrm_state *x); +extern int xfrm6_flush_bundles(struct xfrm_state *x); extern wait_queue_head_t km_waitq; extern void km_warn_expired(struct xfrm_state *x); @@ -425,15 +452,41 @@ extern struct xfrm_algo_desc *xfrm_aalg_get_byname(char *name); extern struct xfrm_algo_desc *xfrm_ealg_get_byname(char *name); +static __inline__ int addr_match(void *token1, void *token2, int prefixlen) +{ + __u32 *a1 = token1; + __u32 *a2 = token2; + int pdw; + int pbi; + + pdw = prefixlen >> 5; /* num of whole __u32 in prefix */ + pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ + + if (pdw) + if (memcmp(a1, a2, pdw << 2)) + return 0; + + if (pbi) { + __u32 mask; + + mask = htonl((0xffffffff) << (32 - pbi)); + + if ((a1[pdw] ^ a2[pdw]) & mask) + return 0; + } + + return 1; +} + static inline int xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) { - return !memcmp(fl->fl6_dst, sel->daddr.a6, sizeof(struct in6_addr)) && - !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && - !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && - (fl->proto == sel->proto || !sel->proto) && - (fl->oif == sel->ifindex || !sel->ifindex) && - !memcmp(fl->fl6_src, sel->saddr.a6, sizeof(struct in6_addr)); + return addr_match(fl->fl6_dst, &sel->daddr, sel->prefixlen_d) && + addr_match(fl->fl6_src, &sel->saddr, sel->prefixlen_s) && + !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && + !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && + (fl->proto == sel->proto || !sel->proto) && + (fl->oif == sel->ifindex || !sel->ifindex); } extern int xfrm6_register_type(struct xfrm_type *type); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv4/ah.c linux25/net/ipv4/ah.c --- linux-2.5.62+cs1_1002/net/ipv4/ah.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv4/ah.c 2003-02-22 14:32:40.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -7,7 +15,12 @@ #include #include -#define AH_HLEN_NOICV 12 +#include +#include + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + #include +#endif typedef void (icv_update_fn_t)(struct crypto_tfm *, struct scatterlist *, unsigned int); @@ -26,6 +39,7 @@ struct crypto_tfm *tfm; }; +#define AH_HLEN_NOICV 12 /* Clear mutable options and find final destination to substitute * into IP header for icv calculation. Options are already checked @@ -458,4 +472,329 @@ module_init(ah4_init); module_exit(ah4_fini); + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* XXX no ipv6 ah specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +int ah6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = sizeof(struct ipv6hdr); + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL; + struct ip_auth_hdr *ah; + struct ah_data *ahp; + u16 nh_offset = 0; + u8 nexthdr; + + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + if (x->props.mode) { + iph = skb->nh.ipv6h; + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + skb->nh.ipv6h->version = 6; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb->nh.ipv6h->nexthdr = IPPROTO_AH; + memcpy(&skb->nh.ipv6h->saddr, &x->props.saddr, sizeof(struct in6_addr)); + memcpy(&skb->nh.ipv6h->daddr, &x->id.daddr, sizeof(struct in6_addr)); + ah = (struct ip_auth_hdr*)(skb->nh.ipv6h+1); + ah->nexthdr = IPPROTO_IPV6; + } else { + hdr_len = skb->h.raw - skb->nh.raw; + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->data, hdr_len); + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + memcpy(skb->nh.ipv6h, iph, hdr_len); + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); + if (nexthdr == 0) + goto error; + + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + ah = (struct ip_auth_hdr*)(skb->nh.raw+hdr_len); + ah->nexthdr = nexthdr; + } + + skb->nh.ipv6h->priority = 0; + skb->nh.ipv6h->flow_lbl[0] = 0; + skb->nh.ipv6h->flow_lbl[1] = 0; + skb->nh.ipv6h->flow_lbl[2] = 0; + skb->nh.ipv6h->hop_limit = 0; + + ahp = x->data; + ah->hdrlen = (XFRM_ALIGN8(ahp->icv_trunc_len + + AH_HLEN_NOICV) >> 2) - 2; + + ah->reserved = 0; + ah->spi = x->id.spi; + ah->seq_no = htonl(++x->replay.oseq); + ahp->icv(ahp, skb, ah->auth_data); + + if (x->props.mode) { + skb->nh.ipv6h->hop_limit = iph->hop_limit; + skb->nh.ipv6h->priority = iph->priority; + skb->nh.ipv6h->flow_lbl[0] = iph->flow_lbl[0]; + skb->nh.ipv6h->flow_lbl[1] = iph->flow_lbl[1]; + skb->nh.ipv6h->flow_lbl[2] = iph->flow_lbl[2]; + } else { + memcpy(skb->nh.ipv6h, iph, hdr_len); + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + kfree (iph); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int ah6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + int ah_hlen; + struct ipv6hdr *iph; + struct ipv6_auth_hdr *ah; + struct ah_data *ahp; + unsigned char *tmp_hdr = NULL; + int hdr_len = skb->h.raw - skb->nh.raw; + u8 nexthdr = 0; + + if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) + goto out; + + ah = (struct ipv6_auth_hdr*)skb->data; + ahp = x->data; + ah_hlen = (ah->hdrlen + 2) << 2; + + if (ah_hlen != XFRM_ALIGN8(ahp->icv_full_len + AH_HLEN_NOICV) && + ah_hlen != XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV)) + goto out; + + if (!pskb_may_pull(skb, ah_hlen)) + goto out; + + /* We are going to _remove_ AH header to keep sockets happy, + * so... Later this can change. */ + if (skb_cloned(skb) && + pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) + goto out; + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + ah = (struct ipv6_auth_hdr*)skb->data; + iph = skb->nh.ipv6h; + + { + u8 auth_data[ahp->icv_trunc_len]; + + memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len); + skb_push(skb, skb->data - skb->nh.raw); + ahp->icv(ahp, skb, ah->auth_data); + if (memcmp(ah->auth_data, auth_data, ahp->icv_trunc_len)) { + if (net_ratelimit()) + printk(KERN_WARNING "ipsec ah authentication error\n"); + x->stats.integrity_failed++; + goto free_out; + } + } + + nexthdr = ah->nexthdr; + skb->nh.raw = skb_pull(skb, (ah->hdrlen+2)<<2); + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb_pull(skb, hdr_len); + skb->h.raw = skb->data; + + + kfree(tmp_hdr); + + return nexthdr; + +free_out: + kfree(tmp_hdr); +out: + return -EINVAL; +} + +void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, ah->spi, IPPROTO_AH); + if (!x) + return; + + printk(KERN_DEBUG "pmtu discvovery on SA AH/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(ah->spi), NIP6(iph->daddr)); + + xfrm_state_put(x); +} + +static int ah6_init_state(struct xfrm_state *x, void *args) +{ + struct ah_data *ahp = NULL; + struct xfrm_algo_desc *aalg_desc; + + /* null auth can use a zero length key */ + if (x->aalg->alg_key_len > 512) + goto error; + + ahp = kmalloc(sizeof(*ahp), GFP_KERNEL); + if (ahp == NULL) + return -ENOMEM; + + memset(ahp, 0, sizeof(*ahp)); + + ahp->key = x->aalg->alg_key; + ahp->key_len = (x->aalg->alg_key_len+7)/8; + ahp->tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (!ahp->tfm) + goto error; + ahp->icv = ah_hmac_digest; + + /* + * Lookup the algorithm description maintained by xfrm_algo, + * verify crypto transform properties, and store information + * we need for AH processing. This lookup cannot fail here + * after a successful crypto_alloc_tfm(). + */ + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(ahp->tfm)) { + printk(KERN_INFO "AH: %s digestsize %u != %hu\n", + x->aalg->alg_name, crypto_tfm_alg_digestsize(ahp->tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + ahp->icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + ahp->work_icv = kmalloc(ahp->icv_full_len, GFP_KERNEL); + if (!ahp->work_icv) + goto error; + + x->props.header_len = XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV); + if (x->props.mode) + x->props.header_len += 20; + x->data = ahp; + + return 0; + +error: + if (ahp) { + if (ahp->work_icv) + kfree(ahp->work_icv); + if (ahp->tfm) + crypto_free_tfm(ahp->tfm); + kfree(ahp); + } + return -EINVAL; +} + +static void ah6_destroy(struct xfrm_state *x) +{ + struct ah_data *ahp = x->data; + + if (ahp->work_icv) { + kfree(ahp->work_icv); + ahp->work_icv = NULL; + } + if (ahp->tfm) { + crypto_free_tfm(ahp->tfm); + ahp->tfm = NULL; + } +} + +static struct xfrm_type ah6_type = +{ + .description = "AH6", + .proto = IPPROTO_AH, + .init_state = ah6_init_state, + .destructor = ah6_destroy, + .input = ah6_input, + .output = ah6_output +}; + +static struct inet6_protocol ah6_protocol = { + .handler = xfrm6_rcv, + .err_handler = ah6_err, +}; + +int __init ah6_init(void) +{ + SET_MODULE_OWNER(&ah6_type); + + if (xfrm6_register_type(&ah6_type) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add xfrm type\n"); + return -EAGAIN; + } + + if (inet6_add_protocol(&ah6_protocol, IPPROTO_AH) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add protocol\n"); + xfrm6_unregister_type(&ah6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit ah6_fini(void) +{ + if (inet6_del_protocol(&ah6_protocol, IPPROTO_AH) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove protocol\n"); + + if (xfrm6_unregister_type(&ah6_type) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove xfrm type\n"); + +} + +module_init(ah6_init); +module_exit(ah6_fini); + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ + MODULE_LICENSE("GPL"); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv4/esp.c linux25/net/ipv4/esp.c --- linux-2.5.62+cs1_1002/net/ipv4/esp.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv4/esp.c 2003-02-22 14:29:32.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -8,8 +16,13 @@ #include #include -#define MAX_SG_ONSTACK 4 +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) +#include +#include +#endif + +#define MAX_SG_ONSTACK 4 typedef void (icv_update_fn_t)(struct crypto_tfm *, struct scatterlist *, unsigned int); @@ -725,4 +738,487 @@ module_init(esp4_init); module_exit(esp4_fini); + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* XXX no ipv6 esp specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, struct ipv6_opt_hdr **prevhdr) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); + u8 nextnexthdr; + + *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; + + while (offset + 1 < packet_len) { + + switch (*nexthdr) { + + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + case NEXTHDR_DEST: + nextnexthdr = + ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; + /* XXX We know the option is inner dest opt + with next next header check. */ + if (nextnexthdr != NEXTHDR_HOP && + nextnexthdr != NEXTHDR_ROUTING && + nextnexthdr != NEXTHDR_DEST) { + return offset; + } + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + default : + return offset; + } + } + + return offset; +} + +int esp6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = 0; + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL, *top_iph; + struct ip_esp_hdr *esph; + struct crypto_tfm *tfm; + struct esp_data *esp; + struct sk_buff *trailer; + struct ipv6_opt_hdr *prevhdr = NULL; + int blksize; + int clen; + int alen; + int nfrags; + u8 nexthdr; + + /* First, if the skb is not checksummed, complete checksum. */ + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + err = -ENOMEM; + + /* Strip IP header in transport mode. Save it. */ + + if (!x->props.mode) { + hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &prevhdr); + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->nh.raw, hdr_len); + __skb_pull(skb, hdr_len); + } + + /* Now skb is pure payload to encrypt */ + + /* Round to block size */ + clen = skb->len; + + esp = x->data; + alen = esp->auth.icv_trunc_len; + tfm = esp->conf.tfm; + blksize = crypto_tfm_alg_blocksize(tfm); + clen = (clen + 2 + blksize-1)&~(blksize-1); + if (esp->conf.padlen) + clen = (clen + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + if ((nfrags = skb_cow_data(skb, clen-skb->len+alen, &trailer)) < 0) { + if (!x->props.mode && iph) kfree(iph); + goto error; + } + + /* Fill padding... */ + do { + int i; + for (i=0; ilen - 2; i++) + *(u8*)(trailer->tail + i) = i+1; + } while (0); + *(u8*)(trailer->tail + clen-skb->len - 2) = (clen - skb->len)-2; + pskb_put(skb, trailer, clen - skb->len); + + if (x->props.mode) { + iph = skb->nh.ipv6h; + top_iph = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + esph = (struct ip_esp_hdr*)(top_iph+1); + *(u8*)(trailer->tail - 1) = IPPROTO_IPV6; + top_iph->version = 6; + top_iph->priority = iph->priority; + top_iph->flow_lbl[0] = iph->flow_lbl[0]; + top_iph->flow_lbl[1] = iph->flow_lbl[1]; + top_iph->flow_lbl[2] = iph->flow_lbl[2]; + top_iph->nexthdr = IPPROTO_ESP; + top_iph->payload_len = htons(skb->len + alen); + top_iph->hop_limit = iph->hop_limit; + memcpy(&top_iph->saddr, (struct in6_addr *)&x->props.saddr, sizeof(struct ipv6hdr)); + memcpy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr, sizeof(struct ipv6hdr)); + } else { + /* XXX exthdr */ + esph = (struct ip_esp_hdr*)skb_push(skb, x->props.header_len); + top_iph = (struct ipv6hdr*)skb_push(skb, hdr_len); + memcpy(top_iph, iph, hdr_len); + kfree(iph); + top_iph->payload_len = htons(skb->len + alen - sizeof(struct ipv6hdr)); + if (prevhdr) { + prevhdr->nexthdr = IPPROTO_ESP; + } else { + top_iph->nexthdr = IPPROTO_ESP; + } + *(u8*)(trailer->tail - 1) = nexthdr; + } + + esph->spi = x->id.spi; + esph->seq_no = htonl(++x->replay.oseq); + + if (esp->conf.ivlen) + crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + + do { + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto error; + } + skb_to_sgvec(skb, sg, esph->enc_data+esp->conf.ivlen-skb->data, clen); + crypto_cipher_encrypt(tfm, sg, sg, clen); + if (unlikely(sg != sgbuf)) + kfree(sg); + } while (0); + + if (esp->conf.ivlen) { + memcpy(esph->enc_data, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + crypto_cipher_get_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + } + + if (esp->auth.icv_full_len) { + esp->auth.icv(esp, skb, (u8*)esph-skb->data, + 8+esp->conf.ivlen+clen, trailer->tail); + pskb_put(skb, trailer, alen); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; + +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int esp6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + struct ipv6hdr *iph; + struct ip_esp_hdr *esph; + struct esp_data *esp = x->data; + struct sk_buff *trailer; + int blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + int alen = esp->auth.icv_trunc_len; + int elen = skb->len - 8 - esp->conf.ivlen - alen; + + int hdr_len = skb->h.raw - skb->nh.raw; + int nfrags; + u8 ret_nexthdr = 0; + unsigned char *tmp_hdr = NULL; + + if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr))) + goto out; + + if (elen <= 0 || (elen & (blksize-1))) + goto out; + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + /* If integrity check is required, do this. */ + if (esp->auth.icv_full_len) { + u8 sum[esp->auth.icv_full_len]; + u8 sum1[alen]; + + esp->auth.icv(esp, skb, 0, skb->len-alen, sum); + + if (skb_copy_bits(skb, skb->len-alen, sum1, alen)) + BUG(); + + if (unlikely(memcmp(sum, sum1, alen))) { + x->stats.integrity_failed++; + goto out; + } + } + + if ((nfrags = skb_cow_data(skb, 0, &trailer)) < 0) + goto out; + + skb->ip_summed = CHECKSUM_NONE; + + esph = (struct ip_esp_hdr*)skb->data; + iph = skb->nh.ipv6h; + + /* Get ivec. This can be wrong, check against another impls. */ + if (esp->conf.ivlen) + crypto_cipher_set_iv(esp->conf.tfm, esph->enc_data, crypto_tfm_alg_ivsize(esp->conf.tfm)); + + { + u8 nexthdr[2]; + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + u8 padlen; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto out; + } + skb_to_sgvec(skb, sg, 8+esp->conf.ivlen, elen); + crypto_cipher_decrypt(esp->conf.tfm, sg, sg, elen); + if (unlikely(sg != sgbuf)) + kfree(sg); + + if (skb_copy_bits(skb, skb->len-alen-2, nexthdr, 2)) + BUG(); + + padlen = nexthdr[0]; + if (padlen+2 >= elen) { + if (net_ratelimit()) { + printk(KERN_WARNING "ipsec esp packet is garbage padlen=%d, elen=%d\n", padlen+2, elen); + } + goto out; + } + /* ... check padding bits here. Silly. :-) */ + + ret_nexthdr = nexthdr[1]; + pskb_trim(skb, skb->len - alen - padlen - 2); + skb->h.raw = skb_pull(skb, 8 + esp->conf.ivlen); + skb->nh.raw += 8 + esp->conf.ivlen; + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + } + kfree(tmp_hdr); + return ret_nexthdr; + +out: + return -EINVAL; +} + +static u32 esp6_get_max_size(struct xfrm_state *x, int mtu) +{ + struct esp_data *esp = x->data; + u32 blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + + if (x->props.mode) { + mtu = (mtu + 2 + blksize-1)&~(blksize-1); + } else { + /* The worst case. */ + mtu += 2 + blksize; + } + if (esp->conf.padlen) + mtu = (mtu + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + return mtu + x->props.header_len + esp->auth.icv_full_len; +} + +void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_esp_hdr *esph = (struct ip_esp_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, esph->spi, IPPROTO_ESP); + if (!x) + return; + printk(KERN_DEBUG "pmtu discvovery on SA ESP/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(esph->spi), NIP6(iph->daddr)); + xfrm_state_put(x); +} + +void esp6_destroy(struct xfrm_state *x) +{ + struct esp_data *esp = x->data; + + if (esp->conf.tfm) { + crypto_free_tfm(esp->conf.tfm); + esp->conf.tfm = NULL; + } + if (esp->conf.ivec) { + kfree(esp->conf.ivec); + esp->conf.ivec = NULL; + } + if (esp->auth.tfm) { + crypto_free_tfm(esp->auth.tfm); + esp->auth.tfm = NULL; + } + if (esp->auth.work_icv) { + kfree(esp->auth.work_icv); + esp->auth.work_icv = NULL; + } +} + +int esp6_init_state(struct xfrm_state *x, void *args) +{ + struct esp_data *esp = NULL; + + if (x->aalg) { + if (x->aalg->alg_key_len == 0 || x->aalg->alg_key_len > 512) + goto error; + } + if (x->ealg == NULL || x->ealg->alg_key_len == 0) + goto error; + + esp = kmalloc(sizeof(*esp), GFP_KERNEL); + if (esp == NULL) + return -ENOMEM; + + memset(esp, 0, sizeof(*esp)); + + if (x->aalg) { + struct xfrm_algo_desc *aalg_desc; + + esp->auth.key = x->aalg->alg_key; + esp->auth.key_len = (x->aalg->alg_key_len+7)/8; + esp->auth.tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (esp->auth.tfm == NULL) + goto error; + esp->auth.icv = esp_hmac_digest; + + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(esp->auth.tfm)) { + printk(KERN_INFO "ESP: %s digestsize %u != %hu\n", + x->aalg->alg_name, + crypto_tfm_alg_digestsize(esp->auth.tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + esp->auth.icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + esp->auth.icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + esp->auth.work_icv = kmalloc(esp->auth.icv_full_len, GFP_KERNEL); + if (!esp->auth.work_icv) + goto error; + } + esp->conf.key = x->ealg->alg_key; + esp->conf.key_len = (x->ealg->alg_key_len+7)/8; + esp->conf.tfm = crypto_alloc_tfm(x->ealg->alg_name, CRYPTO_TFM_MODE_CBC); + if (esp->conf.tfm == NULL) + goto error; + esp->conf.ivlen = crypto_tfm_alg_ivsize(esp->conf.tfm); + esp->conf.padlen = 0; + if (esp->conf.ivlen) { + esp->conf.ivec = kmalloc(esp->conf.ivlen, GFP_KERNEL); + get_random_bytes(esp->conf.ivec, esp->conf.ivlen); + } + crypto_cipher_setkey(esp->conf.tfm, esp->conf.key, esp->conf.key_len); + x->props.header_len = 8 + esp->conf.ivlen; + if (x->props.mode) + x->props.header_len += 40; /* XXX ext hdr */ + x->data = esp; + return 0; + +error: + if (esp) { + if (esp->auth.tfm) + crypto_free_tfm(esp->auth.tfm); + if (esp->auth.work_icv) + kfree(esp->auth.work_icv); + if (esp->conf.tfm) + crypto_free_tfm(esp->conf.tfm); + kfree(esp); + } + return -EINVAL; +} + +static struct xfrm_type esp6_type = +{ + .description = "ESP6", + .proto = IPPROTO_ESP, + .init_state = esp6_init_state, + .destructor = esp6_destroy, + .get_max_size = esp6_get_max_size, + .input = esp6_input, + .output = esp6_output +}; + +static struct inet6_protocol esp6_protocol = { + .handler = xfrm6_rcv, + .err_handler = esp6_err, +}; + +int __init esp6_init(void) +{ + SET_MODULE_OWNER(&esp6_type); + if (xfrm6_register_type(&esp6_type) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add xfrm type\n"); + return -EAGAIN; + } + if (inet6_add_protocol(&esp6_protocol, IPPROTO_ESP) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add protocol\n"); + xfrm6_unregister_type(&esp6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit esp6_fini(void) +{ + if (inet6_del_protocol(&esp6_protocol, IPPROTO_ESP) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove protocol\n"); + if (xfrm6_unregister_type(&esp6_type) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove xfrm type\n"); +} + +module_init(esp6_init); +module_exit(esp6_fini); + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ + + MODULE_LICENSE("GPL"); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv4/xfrm_input.c linux25/net/ipv4/xfrm_input.c --- linux-2.5.62+cs1_1002/net/ipv4/xfrm_input.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv4/xfrm_input.c 2003-02-22 14:29:32.000000000 +0900 @@ -1,4 +1,13 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include +#include #include static kmem_cache_t *secpath_cachep; @@ -157,3 +166,288 @@ if (!secpath_cachep) panic("IP: failed to allocate secpath_cache\n"); } + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* Fetch spi and seq frpm ipsec header */ + +static int xfrm6_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq) +{ + int offset, offset_seq; + + switch (nexthdr) { + case IPPROTO_AH: + offset = offsetof(struct ip_auth_hdr, spi); + offset_seq = offsetof(struct ip_auth_hdr, seq_no); + break; + case IPPROTO_ESP: + offset = offsetof(struct ip_esp_hdr, spi); + offset_seq = offsetof(struct ip_esp_hdr, seq_no); + break; + case IPPROTO_COMP: + if (!pskb_may_pull(skb, 4)) + return -EINVAL; + *spi = *(u16*)(skb->h.raw + 2); + *seq = 0; + return 0; + default: + return 1; + } + + if (!pskb_may_pull(skb, 16)) + return -EINVAL; + + *spi = *(u32*)(skb->h.raw + offset); + *seq = *(u32*)(skb->h.raw + offset_seq); + return 0; +} + +static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) +{ + u8 *opt = (u8 *)opthdr; + int len = ipv6_optlen(opthdr); + int off = 0; + int optlen = 0; + + off += 2; + len -= 2; + + while (len > 0) { + + switch (opt[off]) { + + case IPV6_TLV_PAD0: + optlen = 1; + break; + default: + if (len < 2) + goto bad; + optlen = opt[off+1]+2; + if (len < optlen) + goto bad; + if (opt[off] & 0x20) + memset(&opt[off+2], 0, opt[off+1]); + break; + } + + off += optlen; + len -= optlen; + } + if (len == 0) + return 1; + +bad: + return 0; +} + +int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + unsigned int packet_len = skb->tail - skb->nh.raw; + u8 nexthdr = skb->nh.ipv6h->nexthdr; + u8 nextnexthdr = 0; + + *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + + while (offset + 1 <= packet_len) { + + switch (nexthdr) { + + case NEXTHDR_HOP: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun hopopts\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_ROUTING: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_DEST: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_AUTH: + if (dir == XFRM_POLICY_OUT) { + memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, + (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); + } + if (exthdr->nexthdr == NEXTHDR_DEST) { + offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + nextnexthdr = exthdr->nexthdr; + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + } + return nexthdr; + default : + return nexthdr; + } + } + + return nexthdr; +} + +int xfrm6_rcv(struct sk_buff *skb) +{ + int err; + u32 spi, seq; + struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH]; + struct xfrm_state *x; + int xfrm_nr = 0; + int decaps = 0; + struct ipv6hdr *hdr = skb->nh.ipv6h; + unsigned char *tmp_hdr = NULL; + int hdr_len = 0; + u16 nh_offset = 0; + u8 nexthdr = 0; + + if (hdr->nexthdr == IPPROTO_AH || hdr->nexthdr == IPPROTO_ESP) { + nh_offset = ((unsigned char*)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + hdr_len = sizeof(struct ipv6hdr); + } else { + hdr_len = skb->h.raw - skb->nh.raw; + } + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto drop; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); + hdr->priority = 0; + hdr->flow_lbl[0] = 0; + hdr->flow_lbl[1] = 0; + hdr->flow_lbl[2] = 0; + hdr->hop_limit = 0; + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) != 0) + goto drop; + + do { + struct ipv6hdr *iph = skb->nh.ipv6h; + + if (xfrm_nr == XFRM_MAX_DEPTH) + goto drop; + + x = xfrm6_state_lookup(&iph->daddr, spi, nexthdr); + if (x == NULL) + goto drop; + spin_lock(&x->lock); + if (unlikely(x->km.state != XFRM_STATE_VALID)) + goto drop_unlock; + + if (x->props.replay_window && xfrm_replay_check(x, seq)) + goto drop_unlock; + + nexthdr = x->type->input(x, skb); + if (nexthdr <= 0) + goto drop_unlock; + + if (x->props.replay_window) + xfrm_replay_advance(x, seq); + + x->curlft.bytes += skb->len; + x->curlft.packets++; + + spin_unlock(&x->lock); + + xfrm_vec[xfrm_nr++] = x; + + iph = skb->nh.ipv6h; /* ??? */ + + if (nexthdr == NEXTHDR_DEST) { + if (!pskb_may_pull(skb, (skb->h.raw-skb->data)+8) || + !pskb_may_pull(skb, (skb->h.raw-skb->data)+((skb->h.raw[1]+1)<<3))) { + err = -EINVAL; + goto drop; + } + nexthdr = skb->h.raw[0]; + nh_offset = skb->h.raw - skb->nh.raw; + skb_pull(skb, (skb->h.raw[1]+1)<<3); + skb->h.raw = skb->data; + } + + if (x->props.mode) { /* XXX */ + if (iph->nexthdr != IPPROTO_IPV6) + goto drop; + skb->nh.raw = skb->data; + iph = skb->nh.ipv6h; + decaps = 1; + break; + } + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) < 0) + goto drop; + } while (!err); + + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.raw[nh_offset] = nexthdr; + skb->nh.ipv6h->payload_len = htons(hdr_len + skb->len - sizeof(struct ipv6hdr)); + + /* Allocate new secpath or COW existing one. */ + if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { + struct sec_path *sp; + sp = kmem_cache_alloc(secpath_cachep, SLAB_ATOMIC); + if (!sp) + goto drop; + if (skb->sp) { + memcpy(sp, skb->sp, sizeof(struct sec_path)); + secpath_put(skb->sp); + } else + sp->len = 0; + atomic_set(&sp->refcnt, 1); + skb->sp = sp; + } + + if (xfrm_nr + skb->sp->len > XFRM_MAX_DEPTH) + goto drop; + + memcpy(skb->sp->xvec+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(void*)); + skb->sp->len += xfrm_nr; + + if (decaps) { + if (!(skb->dev->flags&IFF_LOOPBACK)) { + dst_release(skb->dst); + skb->dst = NULL; + } + netif_rx(skb); + return 0; + } else { + return -nexthdr; + } + +drop_unlock: + spin_unlock(&x->lock); + xfrm_state_put(x); +drop: + if (tmp_hdr) kfree(tmp_hdr); + while (--xfrm_nr >= 0) + xfrm_state_put(xfrm_vec[xfrm_nr]); + kfree_skb(skb); + return 0; +} + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv4/xfrm_policy.c linux25/net/ipv4/xfrm_policy.c --- linux-2.5.62+cs1_1002/net/ipv4/xfrm_policy.c 2003-02-22 14:44:24.000000000 +0900 +++ linux25/net/ipv4/xfrm_policy.c 2003-02-22 19:40:11.000000000 +0900 @@ -1,6 +1,16 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include +#include +#include DECLARE_MUTEX(xfrm_cfg_sem); @@ -55,6 +65,34 @@ #define flow_count(cpu) (flow_number[cpu]) +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +static int xfrm6_bundle_ok(struct xfrm_dst *xdst, struct flowi *fl); +static int xfrm6_bundle_create(struct xfrm_policy *policy, + struct xfrm_state **xfrm, int nx, + struct flowi *fl, struct dst_entry **dst_p); +static int xfrm6_tmpl_resolve(struct xfrm_policy *policy, struct flowi *fl, + struct xfrm_state **xfrm); + +static inline u32 flow_hash6(struct flowi *fl) +{ + u32 hash = fl->fl6_src->s6_addr32[2] ^ + fl->fl6_src->s6_addr32[3] ^ + fl->uli_u.ports.sport; + + hash = ((hash & 0xF0F0F0F0) >> 4) | ((hash & 0x0F0F0F0F) << 4); + + hash ^= fl->fl6_dst->s6_addr32[2] ^ + fl->fl6_dst->s6_addr32[3] ^ + fl->uli_u.ports.dport; + hash ^= (hash >> 10); + hash ^= (hash >> 20); + return hash & (FLOWCACHE_HASH_SIZE-1); +} + +extern struct dst_ops xfrm6_dst_ops; +#endif + static void flow_cache_shrink(int cpu) { int i; @@ -77,13 +115,27 @@ } } -struct xfrm_policy *flow_lookup(int dir, struct flowi *fl) +struct xfrm_policy *flow_lookup(int dir, struct flowi *fl, + unsigned short family) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; struct flow_entry *fle; - u32 hash = flow_hash(fl); + u32 hash; int cpu; + switch (family) { + case AF_INET: + hash = flow_hash(fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + hash = flow_hash6(fl); + break; +#endif + default: + return NULL; + } + local_bh_disable(); cpu = smp_processor_id(); @@ -101,7 +153,7 @@ } } - pol = xfrm_policy_lookup(dir, fl); + pol = xfrm_policy_lookup(dir, fl, family); if (fle) { /* Stale flow entry found. Update it. */ @@ -506,33 +558,63 @@ /* Find policy to apply to this flow. */ -struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl) +struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl, unsigned short family) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; read_lock_bh(&xfrm_policy_lock); for (pol = xfrm_policy_list[dir]; pol; pol = pol->next) { struct xfrm_selector *sel = &pol->selector; - - if (xfrm4_selector_match(sel, fl)) { - atomic_inc(&pol->refcnt); + switch (family) { + case AF_INET: + if (pol->family != AF_INET) break; + if (xfrm4_selector_match(sel, fl)) { + atomic_inc(&pol->refcnt); + goto unlock_out; + } break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + if (pol->family != AF_INET6) break; + if (xfrm6_selector_match(sel, fl)) { + atomic_inc(&pol->refcnt); + goto unlock_out; + } + break; +#endif + default: + goto unlock_out; } } +unlock_out: read_unlock_bh(&xfrm_policy_lock); return pol; } struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; read_lock_bh(&xfrm_policy_lock); if ((pol = sk->policy[dir]) != NULL) { - if (xfrm4_selector_match(&pol->selector, fl)) - atomic_inc(&pol->refcnt); - else + switch (sk->family) { + case AF_INET: + if (xfrm4_selector_match(&pol->selector, fl)) + atomic_inc(&pol->refcnt); + else + pol = NULL; + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + if (xfrm6_selector_match(&pol->selector, fl)) + atomic_inc(&pol->refcnt); + else + pol = NULL; + break; +#endif + default: pol = NULL; + } } read_unlock_bh(&xfrm_policy_lock); return pol; @@ -806,9 +888,7 @@ int nx = 0; int err; u32 genid; - - fl->oif = rt->u.dst.dev->ifindex; - fl->fl4_src = rt->rt_src; + u16 family = (*dst_p)->ops->family; restart: genid = xfrm_policy_genid; @@ -821,7 +901,16 @@ if ((rt->u.dst.flags & DST_NOXFRM) || !xfrm_policy_list[XFRM_POLICY_OUT]) return 0; - policy = flow_lookup(XFRM_POLICY_OUT, fl); + switch (family) { + case AF_INET: + policy = flow_lookup(XFRM_POLICY_OUT, fl, AF_INET); + break; + case AF_INET6: + policy = flow_lookup(XFRM_POLICY_OUT, fl, AF_INET6); + break; + default: + return 0; + } if (!policy) return 0; } @@ -846,23 +935,48 @@ * LATER: help from flow cache. It is optional, this * is required only for output policy. */ - read_lock_bh(&policy->lock); - for (dst = policy->bundles; dst; dst = dst->next) { - struct xfrm_dst *xdst = (struct xfrm_dst*)dst; - if (xdst->u.rt.fl.fl4_dst == fl->fl4_dst && - xdst->u.rt.fl.fl4_src == fl->fl4_src && - xdst->u.rt.fl.oif == fl->oif && - xfrm_bundle_ok(xdst, fl)) { - dst_clone(dst); + if (family == AF_INET) { + fl->oif = rt->u.dst.dev->ifindex; + fl->fl4_src = rt->rt_src; + read_lock_bh(&policy->lock); + for (dst = policy->bundles; dst; dst = dst->next) { + struct xfrm_dst *xdst = (struct xfrm_dst*)dst; + if (xdst->u.rt.fl.fl4_dst == fl->fl4_dst && + xdst->u.rt.fl.fl4_src == fl->fl4_src && + xdst->u.rt.fl.oif == fl->oif && + xfrm_bundle_ok(xdst, fl)) { + dst_clone(dst); + break; + } + } + read_unlock_bh(&policy->lock); + if (dst) break; + nx = xfrm_tmpl_resolve(policy, fl, xfrm); +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + } else if (family == AF_INET6) { + read_lock_bh(&policy->lock); + for (dst = policy->bundles; dst; dst = dst->next) { + struct xfrm_dst *xdst = (struct xfrm_dst*)dst; + if (!memcmp(&xdst->u.rt6.rt6i_dst, &fl->fl6_dst, sizeof(struct in6_addr)) && + !memcmp(&xdst->u.rt6.rt6i_src, &fl->fl6_src, sizeof(struct in6_addr)) && + xfrm6_bundle_ok(xdst, fl)) { + dst_clone(dst); + break; + } } + read_unlock_bh(&policy->lock); + if (dst) + break; + nx = xfrm6_tmpl_resolve(policy, fl, xfrm); +#endif + } else { + return -EINVAL; } - read_unlock_bh(&policy->lock); if (dst) break; - nx = xfrm_tmpl_resolve(policy, fl, xfrm); if (unlikely(nx<0)) { err = nx; if (err == -EAGAIN) { @@ -873,7 +987,18 @@ __set_task_state(tsk, TASK_INTERRUPTIBLE); add_wait_queue(&km_waitq, &wait); - err = xfrm_tmpl_resolve(policy, fl, xfrm); + switch (family) { + case AF_INET: + err = xfrm_tmpl_resolve(policy, fl, xfrm); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + err = xfrm6_tmpl_resolve(policy, fl, xfrm); + break; +#endif + default: + err = -EINVAL; + } if (err == -EAGAIN) schedule(); __set_task_state(tsk, TASK_RUNNING); @@ -896,7 +1021,19 @@ } dst = &rt->u.dst; - err = xfrm_bundle_create(policy, xfrm, nx, fl, &dst); + switch (family) { + case AF_INET: + err = xfrm_bundle_create(policy, xfrm, nx, fl, &dst); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + err = xfrm6_bundle_create(policy, xfrm, nx, fl, &dst); + break; +#endif + default: + err = -EINVAL; + } + if (unlikely(err)) { int i; for (i=0; ifl4_src = iph->saddr; } -int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) +static inline int +xfrm6_state_ok(struct xfrm_tmpl *tmpl, struct xfrm_state *x) +{ + return x->id.proto == tmpl->id.proto && + (x->id.spi == tmpl->id.spi || !tmpl->id.spi) && + x->props.mode == tmpl->mode && + (tmpl->aalgos & (1<props.aalgo)) && + (!x->props.mode || !ipv6_addr_any((struct in6_addr*)&x->props.saddr) || + !memcmp(&tmpl->saddr, &x->props.saddr, sizeof(struct in6_addr))); +} + +static inline int +xfrm6_policy_ok(struct xfrm_tmpl *tmpl, struct sec_path *sp, int idx) +{ + for (; idx < sp->len; idx++) { + if (xfrm6_state_ok(tmpl, sp->xvec[idx])) + return ++idx; + } + return -1; +} + +static inline void +_decode_session6(struct sk_buff *skb, struct flowi *fl) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6hdr *hdr = skb->nh.ipv6h; + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + u8 nexthdr = skb->nh.ipv6h->nexthdr; + + fl->fl6_dst = &hdr->daddr; + fl->fl6_src = &hdr->saddr; + + while (pskb_may_pull(skb, skb->nh.raw + offset + 1 - skb->data)) { + switch (nexthdr) { + case NEXTHDR_ROUTING: + case NEXTHDR_HOP: + case NEXTHDR_DEST: + offset += ipv6_optlen(exthdr); + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case IPPROTO_UDP: + case IPPROTO_TCP: + case IPPROTO_SCTP: + if (pskb_may_pull(skb, skb->nh.raw + offset + 4 - skb->data)) { + u16 *ports = (u16 *)exthdr; + + fl->uli_u.ports.sport = ports[0]; + fl->uli_u.ports.dport = ports[1]; + } + return; + + /* XXX Why are there these headers? */ + case IPPROTO_AH: + case IPPROTO_ESP: + default: + fl->uli_u.spi = 0; + return; + }; + } +} +#endif + +int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) { struct xfrm_policy *pol; struct flowi fl; - _decode_session(skb, &fl); + switch (family) { + case AF_INET: + _decode_session(skb, &fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + _decode_session6(skb, &fl); + break; +#endif + default : + return 0; + } /* First, check used SA against their selectors. */ if (skb->sp) { int i; - for (i=skb->sp->len-1; i>=0; i--) { - if (!xfrm4_selector_match(&skb->sp->xvec[i]->sel, &fl)) + switch (family) { + case AF_INET: + for (i=skb->sp->len-1; i>=0; i--) { + if (!xfrm4_selector_match(&skb->sp->xvec[i]->sel, &fl)) + return 0; + } + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + for (i=skb->sp->len-1; i>=0; i--) { + if (family == AF_INET6 && !xfrm6_selector_match(&skb->sp->xvec[i]->sel, &fl)) + return 0; + } + break; +#endif + default : return 0; } } @@ -1029,7 +1256,7 @@ pol = xfrm_sk_policy_lookup(sk, dir, &fl); if (!pol) - pol = flow_lookup(dir, &fl); + pol = flow_lookup(dir, &fl, family); if (!pol) return 1; @@ -1049,10 +1276,25 @@ * some barriers, but at the moment barriers * are implied between each two transformations. */ - for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { - k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); - if (k < 0) - goto reject; + switch (family) { + case AF_INET: + for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { + k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); + if (k < 0) + goto reject; + } + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { + k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); + if (k < 0) + goto reject; + } + break; +#endif + default : + return 0; } } xfrm_pol_put(pol); @@ -1064,18 +1306,29 @@ return 0; } -int __xfrm_route_forward(struct sk_buff *skb) +int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct flowi fl; - _decode_session(skb, &fl); + switch (family) { + case AF_INET: + _decode_session(skb, &fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + _decode_session6(skb, &fl); + break; +#endif + default: + return 0; + } return xfrm_lookup(&skb->dst, &fl, NULL, 0) == 0; } /* Optimize later using cookies and generation ids. */ -static struct dst_entry *xfrm4_dst_check(struct dst_entry *dst, u32 cookie) +static struct dst_entry *xfrm_dst_check(struct dst_entry *dst, u32 cookie) { struct dst_entry *child = dst; @@ -1091,19 +1344,19 @@ return dst; } -static void xfrm4_dst_destroy(struct dst_entry *dst) +static void xfrm_dst_destroy(struct dst_entry *dst) { xfrm_state_put(dst->xfrm); dst->xfrm = NULL; } -static void xfrm4_link_failure(struct sk_buff *skb) +static void xfrm_link_failure(struct sk_buff *skb) { /* Impossible. Such dst must be popped before reaches point of failure. */ return; } -static struct dst_entry *xfrm4_negative_advice(struct dst_entry *dst) +static struct dst_entry *xfrm_negative_advice(struct dst_entry *dst) { if (dst) { if (dst->obsolete) { @@ -1114,8 +1367,7 @@ return dst; } - -static int xfrm4_garbage_collect(void) +static void __xfrm_garbage_collect(void) { int i; struct xfrm_policy *pol; @@ -1145,7 +1397,11 @@ gc_list = dst->next; dst_free(dst); } +} +static inline int xfrm4_garbage_collect(void) +{ + __xfrm_garbage_collect(); return (atomic_read(&xfrm4_dst_ops.entries) > xfrm4_dst_ops.gc_thresh*2); } @@ -1247,10 +1503,10 @@ .family = AF_INET, .protocol = __constant_htons(ETH_P_IP), .gc = xfrm4_garbage_collect, - .check = xfrm4_dst_check, - .destroy = xfrm4_dst_destroy, - .negative_advice = xfrm4_negative_advice, - .link_failure = xfrm4_link_failure, + .check = xfrm_dst_check, + .destroy = xfrm_dst_destroy, + .negative_advice = xfrm_negative_advice, + .link_failure = xfrm_link_failure, .update_pmtu = xfrm4_update_pmtu, .get_mss = xfrm4_get_mss, .gc_thresh = 1024, @@ -1267,8 +1523,301 @@ if (!xfrm4_dst_ops.kmem_cachep) panic("IP: failed to allocate xfrm4_dst_cache\n"); - flow_cache_init(); +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + xfrm6_dst_ops.kmem_cachep = xfrm4_dst_ops.kmem_cachep; +#endif + flow_cache_init(); xfrm_state_init(); xfrm_input_init(); } + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* Limited flow cache. Its function now is to accelerate search for + * policy rules. + * + * Flow cache is private to cpus, at the moment this is important + * mostly for flows which do not match any rule, so that flow lookups + * are absolultely cpu-local. When a rule exists we do some updates + * to rule (refcnt, stats), so that locality is broken. Later this + * can be repaired. + */ + +/* Resolve list of templates for the flow, given policy. */ + +static int +xfrm6_tmpl_resolve(struct xfrm_policy *policy, struct flowi *fl, + struct xfrm_state **xfrm) +{ + int nx; + int i, error; + struct in6_addr *daddr = fl->fl6_dst; + struct in6_addr *saddr = fl->fl6_src; + + for (nx=0, i = 0; i < policy->xfrm_nr; i++) { + struct xfrm_state *x=NULL; + struct in6_addr *remote = daddr; + struct in6_addr *local = saddr; + struct xfrm_tmpl *tmpl = &policy->xfrm_vec[i]; + + if (tmpl->mode) { + remote = (struct in6_addr*)&tmpl->id.daddr; + local = (struct in6_addr*)&tmpl->saddr; + } + + x = xfrm6_state_find(remote, local, fl, tmpl, policy, &error); + + if (x && x->km.state == XFRM_STATE_VALID) { + xfrm[nx++] = x; + daddr = remote; + saddr = local; + continue; + } + + if (x) { + error = (x->km.state == XFRM_STATE_ERROR ? + -EINVAL : -EAGAIN); + xfrm_state_put(x); + } + + if (!tmpl->optional) + goto fail; + } + return nx; + +fail: + for (nx--; nx>=0; nx--) + xfrm_state_put(xfrm[nx]); + return error; +} + +/* Check that the bundle accepts the flow and its components are + * still valid. + */ + +static int xfrm6_bundle_ok(struct xfrm_dst *xdst, struct flowi *fl) +{ + do { + if (xdst->u.dst.ops != &xfrm6_dst_ops) + return 1; + + if (!xfrm6_selector_match(&xdst->u.dst.xfrm->sel, fl)) + return 0; + if (xdst->u.dst.xfrm->km.state != XFRM_STATE_VALID || + xdst->u.dst.path->obsolete > 0) + return 0; + xdst = (struct xfrm_dst*)xdst->u.dst.child; + } while (xdst); + return 0; +} + + +/* Allocate chain of dst_entry's, attach known xfrm's, calculate + * all the metrics... Shortly, bundle a bundle. + */ + +static int +xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int nx, + struct flowi *fl, struct dst_entry **dst_p) +{ + struct dst_entry *dst, *dst_prev; + struct rt6_info *rt0 = (struct rt6_info*)(*dst_p); + struct rt6_info *rt = rt0; + struct in6_addr *remote = fl->fl6_dst; + struct in6_addr *local = fl->fl6_src; + int i; + int err = 0; + int header_len = 0; + + dst = dst_prev = NULL; + + for (i = 0; i < nx; i++) { + struct dst_entry *dst1 = dst_alloc(&xfrm6_dst_ops); + + if (unlikely(dst1 == NULL)) { + err = -ENOBUFS; + goto error; + } + + dst1->xfrm = xfrm[i]; + if (!dst) + dst = dst1; + else { + dst_prev->child = dst1; + dst1->flags |= DST_NOHASH; + dst_clone(dst1); + } + dst_prev = dst1; + if (xfrm[i]->props.mode) { + remote = (struct in6_addr*)&xfrm[i]->id.daddr; + local = (struct in6_addr*)&xfrm[i]->props.saddr; + } + header_len += xfrm[i]->props.header_len; + } + + if (remote != fl->fl6_dst) { + struct flowi fl_tunnel; + memset(&fl_tunnel, 0, sizeof(fl_tunnel)); + fl_tunnel.fl6_dst = remote; + fl_tunnel.fl6_src = local; + + rt = (struct rt6_info*)ip6_route_output(NULL, &fl_tunnel); + if (err) + goto error; + } else { + dst_clone(&rt->u.dst); + } + + dst_prev->child = &rt->u.dst; + for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { + struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; + x->u.rt.fl = *fl; + + dst_prev->dev = rt->u.dst.dev; + if (rt->u.dst.dev) + dev_hold(rt->u.dst.dev); + dst_prev->obsolete = -1; + dst_prev->flags |= DST_HOST; + dst_prev->lastuse = jiffies; + dst_prev->header_len = header_len; + memcpy(&dst_prev->metrics, &rt->u.dst.metrics, sizeof(dst_prev->metrics)); + dst_prev->path = &rt->u.dst; + + /* Copy neighbout for reachability confirmation */ + dst_prev->neighbour = neigh_clone(rt->u.dst.neighbour); + dst_prev->input = rt->u.dst.input; + dst_prev->output = dst_prev->xfrm->type->output; + /* Sheit... I remember I did this right. Apparently, + * it was magically lost, so this code needs audit */ + x->u.rt6.rt6i_flags = rt0->rt6i_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL); + x->u.rt6.rt6i_metric = rt0->rt6i_metric; + x->u.rt6.rt6i_node = rt0->rt6i_node; + x->u.rt6.rt6i_hoplimit = rt0->rt6i_hoplimit; + x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; + memcpy(&x->u.rt6.rt6i_gateway, &rt0->rt6i_gateway, sizeof(x->u.rt6.rt6i_gateway)); + header_len -= x->u.dst.xfrm->props.header_len; + } + *dst_p = dst; + return 0; + +error: + if (dst) + dst_free(dst); + return err; +} + +static inline int xfrm6_garbage_collect(void) +{ + __xfrm_garbage_collect(); + return (atomic_read(&xfrm6_dst_ops.entries) > xfrm6_dst_ops.gc_thresh*2); +} + +static int bundle6_depends_on(struct dst_entry *dst, struct xfrm_state *x) +{ + do { + if (dst->xfrm == x) + return 1; + } while ((dst = dst->child) != NULL); + return 0; +} + +int xfrm6_flush_bundles(struct xfrm_state *x) +{ + int i; + struct xfrm_policy *pol; + struct dst_entry *dst, **dstp, *gc_list = NULL; + + read_lock_bh(&xfrm_policy_lock); + for (i=0; i<2*XFRM_POLICY_MAX; i++) { + for (pol = xfrm_policy_list[i]; pol; pol = pol->next) { + write_lock(&pol->lock); + dstp = &pol->bundles; + while ((dst=*dstp) != NULL) { + if (bundle6_depends_on(dst, x)) { + *dstp = dst->next; + dst->next = gc_list; + gc_list = dst; + } else { + dstp = &dst->next; + } + } + write_unlock(&pol->lock); + } + } + read_unlock_bh(&xfrm_policy_lock); + + while (gc_list) { + dst = gc_list; + gc_list = dst->next; + dst_free(dst); + } + + return 0; +} + +static void xfrm6_update_pmtu(struct dst_entry *dst, u32 mtu) +{ + struct dst_entry *path = dst->path; + + if (mtu >= 1280 && mtu < dst_pmtu(dst)) + return; + + path->ops->update_pmtu(path, mtu); +} + +/* Well... that's _TASK_. We need to scan through transformation + * list and figure out what mss tcp should generate in order to + * final datagram fit to mtu. Mama mia... :-) + * + * Apparently, some easy way exists, but we used to choose the most + * bizarre ones. :-) So, raising Kalashnikov... tra-ta-ta. + * + * Consider this function as something like dark humour. :-) + */ +static int xfrm6_get_mss(struct dst_entry *dst, u32 mtu) +{ + int res = mtu - dst->header_len; + + for (;;) { + struct dst_entry *d = dst; + int m = res; + + do { + struct xfrm_state *x = d->xfrm; + if (x) { + spin_lock_bh(&x->lock); + if (x->km.state == XFRM_STATE_VALID && + x->type && x->type->get_max_size) + m = x->type->get_max_size(d->xfrm, m); + else + m += x->props.header_len; + spin_unlock_bh(&x->lock); + } + } while ((d = d->child) != NULL); + + if (m <= mtu) + break; + res -= (m - mtu); + if (res < 88) + return mtu; + } + + return res + dst->header_len; +} + +struct dst_ops xfrm6_dst_ops = { + .family = AF_INET6, + .protocol = __constant_htons(ETH_P_IPV6), + .gc = xfrm6_garbage_collect, + .check = xfrm_dst_check, + .destroy = xfrm_dst_destroy, + .negative_advice = xfrm_negative_advice, + .link_failure = xfrm_link_failure, + .update_pmtu = xfrm6_update_pmtu, + .get_mss = xfrm6_get_mss, + .gc_thresh = 1024, + .entry_size = sizeof(struct xfrm_dst), +}; + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv4/xfrm_state.c linux25/net/ipv4/xfrm_state.c --- linux-2.5.62+cs1_1002/net/ipv4/xfrm_state.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv4/xfrm_state.c 2003-02-22 14:29:32.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -165,8 +173,19 @@ spin_unlock(&xfrm_state_lock); if (del_timer(&x->timer)) atomic_dec(&x->refcnt); - if (atomic_read(&x->refcnt) != 1) - xfrm_flush_bundles(x); + if (atomic_read(&x->refcnt) != 1) { + switch (x->props.family) { + case AF_INET: + xfrm_flush_bundles(x); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + xfrm6_flush_bundles(x); + break; +#endif + default:; + } + } } if (kill && x->type) @@ -290,6 +309,7 @@ x->props.saddr.xfrm4_addr = saddr; x->props.mode = tmpl->mode; x->props.reqid = tmpl->reqid; + x->props.family = AF_INET; if (km_query(x, tmpl, pol) == 0) { x->km.state = XFRM_STATE_ACQ; @@ -322,10 +342,18 @@ { unsigned h = 0; - if (x->props.family == AF_INET) + switch (x->props.family) { + case AF_INET: h = ntohl(x->id.daddr.xfrm4_addr); - else if (x->props.family == AF_INET6) + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: h = ntohl(x->id.daddr.a6[2]^x->id.daddr.a6[3]); + break; +#endif + default: + return; + } h = (h ^ (h>>16)) % XFRM_DST_HSIZE; @@ -448,6 +476,7 @@ x0->props.family = AF_INET; x0->props.mode = mode; x0->props.reqid = reqid; + x0->props.family = AF_INET; x0->lft.hard_add_expires_seconds = ACQ_EXPIRES; atomic_inc(&x0->refcnt); mod_timer(&x0->timer, jiffies + ACQ_EXPIRES*HZ); @@ -836,4 +865,114 @@ wake_up(&km_waitq); } } + +struct xfrm_state * +xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err) +{ + unsigned h = ntohl(daddr->s6_addr32[2]^daddr->s6_addr32[3]); + struct xfrm_state *x = NULL; + int acquire_in_progress = 0; + int error = 0; + struct xfrm_state *best = NULL; + + h = (h ^ (h>>16)) % XFRM_DST_HSIZE; + + spin_lock_bh(&xfrm_state_lock); + list_for_each_entry(x, xfrm_state_bydst+h, bydst) { + if (x->props.family == AF_INET6&& + !memcmp(daddr, &x->id.daddr, sizeof(*daddr)) && + x->props.reqid == tmpl->reqid && + (!memcmp(saddr, &x->props.saddr, sizeof(*saddr))|| ipv6_addr_any(saddr)) && + tmpl->mode == x->props.mode && + tmpl->id.proto == x->id.proto) { + /* Resolution logic: + 1. There is a valid state with matching selector. + Done. + 2. Valid state with inappropriate selector. Skip. + + Entering area of "sysdeps". + + 3. If state is not valid, selector is temporary, + it selects only session which triggered + previous resolution. Key manager will do + something to install a state with proper + selector. + */ + if (x->km.state == XFRM_STATE_VALID) { + if (!xfrm6_selector_match(&x->sel, fl)) + continue; + if (!best || + best->km.dying > x->km.dying || + (best->km.dying == x->km.dying && + best->curlft.add_time < x->curlft.add_time)) + best = x; + } else if (x->km.state == XFRM_STATE_ACQ) { + acquire_in_progress = 1; + } else if (x->km.state == XFRM_STATE_ERROR || + x->km.state == XFRM_STATE_EXPIRED) { + if (xfrm6_selector_match(&x->sel, fl)) + error = 1; + } + } + } + + if (best) { + atomic_inc(&best->refcnt); + spin_unlock_bh(&xfrm_state_lock); + return best; + } + x = NULL; + if (!error && !acquire_in_progress && + ((x = xfrm_state_alloc()) != NULL)) { + /* Initialize temporary selector matching only + * to current session. */ + memcpy(&x->sel.daddr, fl->fl6_dst, sizeof(struct in6_addr)); + memcpy(&x->sel.saddr, fl->fl6_src, sizeof(struct in6_addr)); + x->sel.dport = fl->uli_u.ports.dport; + x->sel.dport_mask = ~0; + x->sel.sport = fl->uli_u.ports.sport; + x->sel.sport_mask = ~0; + x->sel.prefixlen_d = 128; + x->sel.prefixlen_s = 128; + x->sel.proto = fl->proto; + x->sel.ifindex = fl->oif; + x->id = tmpl->id; + if (ipv6_addr_any((struct in6_addr*)&x->id.daddr)) + memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr)); + memcpy(&x->props.saddr, &tmpl->saddr, sizeof(x->props.saddr)); + if (ipv6_addr_any((struct in6_addr*)&x->props.saddr)) + memcpy(&x->props.saddr, &saddr, sizeof(x->sel.saddr)); + x->props.mode = tmpl->mode; + x->props.reqid = tmpl->reqid; + x->props.family = AF_INET6; + + if (km_query(x, tmpl, pol) == 0) { + x->km.state = XFRM_STATE_ACQ; + list_add_tail(&x->bydst, xfrm_state_bydst+h); + atomic_inc(&x->refcnt); + if (x->id.spi) { + struct in6_addr *addr = (struct in6_addr*)&x->id.daddr; + h = ntohl((addr->s6_addr32[2]^addr->s6_addr32[3])^x->id.spi^x->id.proto); + h = (h ^ (h>>10) ^ (h>>20)) % XFRM_DST_HSIZE; + list_add(&x->byspi, xfrm_state_byspi+h); + atomic_inc(&x->refcnt); + } + x->lft.hard_add_expires_seconds = ACQ_EXPIRES; + atomic_inc(&x->refcnt); + mod_timer(&x->timer, ACQ_EXPIRES*HZ); + } else { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + x = NULL; + error = 1; + } + } + spin_unlock_bh(&xfrm_state_lock); + if (!x) + *err = acquire_in_progress ? -EAGAIN : + (error ? -ESRCH : -ENOMEM); + return x; +} + #endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/exthdrs.c linux25/net/ipv6/exthdrs.c --- linux-2.5.62+cs1_1002/net/ipv6/exthdrs.c 2003-02-22 14:44:24.000000000 +0900 +++ linux25/net/ipv6/exthdrs.c 2003-02-22 02:10:46.000000000 +0900 @@ -392,7 +392,7 @@ cpu ticks, checking that sender did not something stupid and opt->hdrlen is even. Shit! --ANK (980730) */ - +#if 0 static int ipv6_auth_hdr(struct sk_buff **skb_ptr, int nhoff) { struct sk_buff *skb=*skb_ptr; @@ -424,6 +424,7 @@ kfree_skb(skb); return -1; } +#endif /* This list MUST NOT contain entry for NEXTHDR_HOP. It is parsed immediately after packet received @@ -436,7 +437,9 @@ {NEXTHDR_ROUTING, ipv6_routing_header}, {NEXTHDR_DEST, ipv6_dest_opt}, {NEXTHDR_NONE, ipv6_nodata}, + /* {NEXTHDR_AUTH, ipv6_auth_hdr}, + */ /* {NEXTHDR_ESP, ipv6_esp_hdr}, */ @@ -627,6 +630,8 @@ { if (opt->auth) prev_hdr = ipv6_build_authhdr(skb, prev_hdr, opt->auth); + + skb->h.raw = skb->tail; if (opt->dst1opt) prev_hdr = ipv6_build_exthdr(skb, prev_hdr, NEXTHDR_DEST, opt->dst1opt); return prev_hdr; @@ -689,8 +694,10 @@ void ipv6_push_frag_opts(struct sk_buff *skb, struct ipv6_txoptions *opt, u8 *proto) { - if (opt->dst1opt) + if (opt->dst1opt) { ipv6_push_exthdr(skb, proto, NEXTHDR_DEST, opt->dst1opt); + skb->h.raw = skb->data; + } if (opt->auth) ipv6_push_authhdr(skb, proto, opt->auth); } diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/ip6_input.c linux25/net/ipv6/ip6_input.c --- linux-2.5.62+cs1_1002/net/ipv6/ip6_input.c 2003-02-22 14:44:24.000000000 +0900 +++ linux25/net/ipv6/ip6_input.c 2003-02-22 02:10:46.000000000 +0900 @@ -150,7 +150,8 @@ It would be stupid to detect for optional headers, which are missing with probability of 200% */ - if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP) { + if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP && + nexthdr != NEXTHDR_AUTH && nexthdr != NEXTHDR_ESP) { nhoff = ipv6_parse_exthdrs(&skb, nhoff); if (nhoff < 0) return 0; diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/ip6_output.c linux25/net/ipv6/ip6_output.c --- linux-2.5.62+cs1_1002/net/ipv6/ip6_output.c 2003-02-22 14:44:24.000000000 +0900 +++ linux25/net/ipv6/ip6_output.c 2003-02-22 13:55:47.000000000 +0900 @@ -192,6 +192,11 @@ int seg_len = skb->len; int hlimit; u32 mtu; + int err = 0; + + if ((err = xfrm_lookup(&skb->dst, fl, sk, 0)) < 0) { + return err; + } if (opt) { int head_room; @@ -576,6 +581,13 @@ } pktlength = length; + if (dst) { + if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0) { + dst_release(dst); + return -ENETUNREACH; + } + } + if (hlimit < 0) { if (ipv6_addr_is_multicast(fl->fl6_dst)) hlimit = np->mcast_hops; @@ -630,10 +642,8 @@ err = 0; if (flags&MSG_PROBE) goto out; - - skb = sock_alloc_send_skb(sk, pktlength + 15 + - dev->hard_header_len, - flags & MSG_DONTWAIT, &err); + /* alloc skb with mtu as we do in the IPv4 stack for IPsec */ + skb = sock_alloc_send_skb(sk, mtu, flags & MSG_DONTWAIT, &err); if (skb == NULL) { IP6_INC_STATS(Ip6OutDiscards); @@ -663,6 +673,8 @@ err = getfrag(data, &hdr->saddr, ((char *) hdr) + (pktlength - length), 0, length); + if (!opt || !opt->dst1opt) + skb->h.raw = ((char *) hdr) + (pktlength - length); if (!err) { IP6_INC_STATS(Ip6OutRequests); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/ndisc.c linux25/net/ipv6/ndisc.c --- linux-2.5.62+cs1_1002/net/ipv6/ndisc.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv6/ndisc.c 2003-02-22 12:58:25.000000000 +0900 @@ -72,6 +72,7 @@ #include #include +#include #include #include @@ -336,8 +337,6 @@ unsigned char ha[MAX_ADDR_LEN]; unsigned char *h_dest = NULL; - skb_reserve(skb, (dev->hard_header_len + 15) & ~15); - if (dev->hard_header) { if (ipv6_addr_type(daddr) & IPV6_ADDR_MULTICAST) { ndisc_mc_map(daddr, ha, dev, 1); @@ -374,10 +373,50 @@ * Send a Neighbour Advertisement */ +int ndisc_output(struct sk_buff *skb) +{ + if (skb) { + struct neighbour *neigh = (skb->dst ? skb->dst->neighbour : NULL); + if (ndisc_build_ll_hdr(skb, skb->dev, &skb->nh.ipv6h->daddr, neigh, skb->len) == 0) { + kfree_skb(skb); + return -EINVAL; + } + dev_queue_xmit(skb); + return 0; + } + return -EINVAL; +} + +static inline void ndisc_rt_init(struct rt6_info *rt, struct net_device *dev, + struct neighbour *neigh) +{ + rt->rt6i_dev = dev; + rt->rt6i_nexthop = neigh; + rt->rt6i_expires = 0; + rt->rt6i_flags = RTF_LOCAL; + rt->rt6i_metric = 0; + rt->rt6i_hoplimit = 255; + rt->u.dst.output = ndisc_output; +} + +static inline void ndisc_flow_init(struct flowi *fl, u8 type, + struct in6_addr *saddr, struct in6_addr *daddr) +{ + memset(fl, 0, sizeof(*fl)); + fl->fl6_src = saddr; + fl->fl6_dst = daddr; + fl->proto = IPPROTO_ICMPV6; + fl->uli_u.icmpt.type = type; + fl->uli_u.icmpt.code = 0; +} + static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, struct in6_addr *daddr, struct in6_addr *solicited_addr, int router, int solicited, int override, int inc_opt) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct nd_msg *msg; int len; @@ -386,6 +425,22 @@ len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_ADVERTISEMENT, solicited_addr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + if (inc_opt) { if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -401,14 +456,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, solicited_addr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *) skb_put(skb, len); + skb->h.raw = (unsigned char*) msg = (struct nd_msg *) skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; msg->icmph.icmp6_code = 0; @@ -431,7 +482,9 @@ csum_partial((__u8 *) msg, len, 0)); - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborAdvertisements); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -441,6 +494,9 @@ struct in6_addr *solicit, struct in6_addr *daddr, struct in6_addr *saddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct nd_msg *msg; @@ -455,6 +511,22 @@ saddr = &addr_buf; } + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); send_llinfo = dev->addr_len && ipv6_addr_type(saddr) != IPV6_ADDR_ANY; if (send_llinfo) @@ -467,14 +539,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *)skb_put(skb, len); + skb->h.raw = (unsigned char*) msg = (struct nd_msg *)skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_SOLICITATION; msg->icmph.icmp6_code = 0; msg->icmph.icmp6_cksum = 0; @@ -493,7 +561,9 @@ csum_partial((__u8 *) msg, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -502,6 +572,9 @@ void ndisc_send_rs(struct net_device *dev, struct in6_addr *saddr, struct in6_addr *daddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct icmp6hdr *hdr; @@ -509,6 +582,22 @@ int len; int err; + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_ROUTER_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, NULL); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr); if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -520,14 +609,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, NULL, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - hdr = (struct icmp6hdr *) skb_put(skb, len); + skb->h.raw = (unsigned char*) hdr = (struct icmp6hdr *) skb_put(skb, len); hdr->icmp6_type = NDISC_ROUTER_SOLICITATION; hdr->icmp6_code = 0; hdr->icmp6_cksum = 0; @@ -544,7 +629,9 @@ csum_partial((__u8 *) hdr, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRouterSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -1126,6 +1213,8 @@ struct in6_addr *addrp; struct net_device *dev; struct rt6_info *rt; + struct dst_entry *dst; + struct flowi fl; u8 *opt; int rd_len; int err; @@ -1137,6 +1226,22 @@ if (rt == NULL) return; + dst = (struct dst_entry*)rt; + + if (ipv6_get_lladdr(dev, &saddr_buf)) { + ND_PRINTK1("redirect: no link_local addr for dev\n"); + return; + } + + ndisc_flow_init(&fl, NDISC_REDIRECT, &saddr_buf, &skb->nh.ipv6h->saddr); + + dst_clone(dst); + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err) { + dst_release(dst); + return; + } + if (rt->rt6i_flags & RTF_GATEWAY) { ND_PRINTK1("ndisc_send_redirect: not a neighbour\n"); dst_release(&rt->u.dst); @@ -1165,11 +1270,6 @@ rd_len &= ~0x7; len += rd_len; - if (ipv6_get_lladdr(dev, &saddr_buf)) { - ND_PRINTK1("redirect: no link_local addr for dev\n"); - return; - } - buff = sock_alloc_send_skb(sk, MAX_HEADER + len + dev->hard_header_len + 15, 0, &err); if (buff == NULL) { @@ -1179,15 +1279,11 @@ hlen = 0; - if (ndisc_build_ll_hdr(buff, dev, &skb->nh.ipv6h->saddr, NULL, len) == 0) { - kfree_skb(buff); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, buff, dev, &saddr_buf, &skb->nh.ipv6h->saddr, IPPROTO_ICMPV6, len); - icmph = (struct icmp6hdr *) skb_put(buff, len); + skb->h.raw = (unsigned char*) icmph = (struct icmp6hdr *) skb_put(buff, len); memset(icmph, 0, sizeof(struct icmp6hdr)); icmph->icmp6_type = NDISC_REDIRECT; @@ -1225,7 +1321,8 @@ len, IPPROTO_ICMPV6, csum_partial((u8 *) icmph, len, 0)); - dev_queue_xmit(buff); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRedirects); ICMP6_INC_STATS(Icmp6OutMsgs); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/raw.c linux25/net/ipv6/raw.c --- linux-2.5.62+cs1_1002/net/ipv6/raw.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv6/raw.c 2003-02-22 02:10:46.000000000 +0900 @@ -45,6 +45,7 @@ #include #include +#include struct sock *raw_v6_htable[RAWV6_HTABLE_SIZE]; rwlock_t raw_v6_lock = RW_LOCK_UNLOCKED; @@ -304,6 +305,11 @@ struct inet_opt *inet = inet_sk(sk); struct raw6_opt *raw_opt = raw6_sk(sk); + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return NET_RX_DROP; + } + if (!raw_opt->checksum) skb->ip_summed = CHECKSUM_UNNECESSARY; diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/route.c linux25/net/ipv6/route.c --- linux-2.5.62+cs1_1002/net/ipv6/route.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv6/route.c 2003-02-22 14:06:40.000000000 +0900 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -128,6 +129,12 @@ rwlock_t rt6_lock = RW_LOCK_UNLOCKED; +/* Dummy rt for ndisc */ +struct rt6_info *ndisc_get_dummy_rt() +{ + return dst_alloc(&ip6_dst_ops); +} + /* * Route lookup. Any rt6_lock is implied. */ diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/tcp_ipv6.c linux25/net/ipv6/tcp_ipv6.c --- linux-2.5.62+cs1_1002/net/ipv6/tcp_ipv6.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv6/tcp_ipv6.c 2003-02-22 14:58:24.000000000 +0900 @@ -51,6 +51,7 @@ #include #include #include +#include #include @@ -678,6 +679,9 @@ fl.nl_u.ip6_u.daddr = rt0->addr; } + if (!fl.fl6_src) + fl.fl6_src = &np->saddr; + dst = ip6_route_output(sk, &fl); if ((err = dst->error) != 0) { @@ -1638,6 +1642,9 @@ if (sk_filter(sk, skb, 0)) goto discard_and_relse; + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) + goto discard_it; + skb->dev = NULL; bh_lock_sock(sk); @@ -1653,6 +1660,9 @@ return ret; no_tcp_socket: + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(TcpInErrs); @@ -1672,8 +1682,11 @@ discard_and_relse: sock_put(sk); goto discard_it; - + do_time_wait: + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(TcpInErrs); sock_put(sk); diff -ruN -x CVS linux-2.5.62+cs1_1002/net/ipv6/udp.c linux25/net/ipv6/udp.c --- linux-2.5.62+cs1_1002/net/ipv6/udp.c 2003-02-18 20:32:55.000000000 +0900 +++ linux25/net/ipv6/udp.c 2003-02-22 02:10:46.000000000 +0900 @@ -50,6 +50,7 @@ #include #include +#include DEFINE_SNMP_STAT(struct udp_mib, udp_stats_in6); @@ -541,6 +542,11 @@ static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) { + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return -1; + } + #if defined(CONFIG_FILTER) if (sk->filter && skb->ip_summed != CHECKSUM_UNNECESSARY) { if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum))) { @@ -646,6 +652,9 @@ if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto short_packet; + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard; + saddr = &skb->nh.ipv6h->saddr; daddr = &skb->nh.ipv6h->daddr; uh = skb->h.uh; diff -ruN -x CVS linux-2.5.62+cs1_1002/net/netsyms.c linux25/net/netsyms.c --- linux-2.5.62+cs1_1002/net/netsyms.c 2003-02-22 14:44:23.000000000 +0900 +++ linux25/net/netsyms.c 2003-02-22 02:15:11.000000000 +0900 @@ -325,12 +325,15 @@ EXPORT_SYMBOL(xfrm_policy_byid); EXPORT_SYMBOL(xfrm_policy_list); #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) +EXPORT_SYMBOL(xfrm6_state_find); +EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_state_lookup); EXPORT_SYMBOL(xfrm6_find_acq); EXPORT_SYMBOL(xfrm6_alloc_spi); EXPORT_SYMBOL(xfrm6_register_type); EXPORT_SYMBOL(xfrm6_unregister_type); EXPORT_SYMBOL(xfrm6_get_type); +EXPORT_SYMBOL(xfrm6_clear_mutable_options); #endif EXPORT_SYMBOL_GPL(xfrm_probe_algs); From davem@redhat.com Sat Feb 22 03:21:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 03:21:46 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1MBLC3v006709 for ; Sat, 22 Feb 2003 03:21:12 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id DAA26956; Sat, 22 Feb 2003 03:13:26 -0800 Date: Sat, 22 Feb 2003 03:13:26 -0800 (PST) Message-Id: <20030222.031326.103246837.davem@redhat.com> To: kazunori@miyazawa.org Cc: kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kunihiro@ipinfusion.com Subject: Re: [PATCH] IPv6 IPSEC support From: "David S. Miller" In-Reply-To: <20030222202623.38d41d8a.kazunori@miyazawa.org> References: <20030222202623.38d41d8a.kazunori@miyazawa.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1769 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Kazunori Miyazawa Date: Sat, 22 Feb 2003 20:26:23 +0900 I also moved the functions for ah, and esp. I don't think this is so good idea... As a result of moving IPv6 IPsec functions to net/ipv4, it currently prevents to make IPv6 as a module. This is one of the reasons why ah/esp ipv6 should stay under ipv6. Nothing in xfrm routines really need to reference ipv6 module functions, please eliminate this dependency. Breaking ipv6 as module is ok for temporary development, but eventually it must be solved. From kazunori@miyazawa.org Sat Feb 22 04:06:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 04:06:48 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1MC693v007878 for ; Sat, 22 Feb 2003 04:06:10 -0800 Received: from monza.miyazawa.org ([::ffff:192.168.0.3]) (IDENT: miyazawa, AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Sat, 22 Feb 2003 20:58:06 +0900 Date: Sat, 22 Feb 2003 21:15:26 +0900 From: Kazunori Miyazawa To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org, kunihiro@ipinfusion.com Subject: Re: [PATCH] IPv6 IPSEC support Message-Id: <20030222211526.2884077a.kazunori@miyazawa.org> In-Reply-To: <20030222.031326.103246837.davem@redhat.com> References: <20030222202623.38d41d8a.kazunori@miyazawa.org> <20030222.031326.103246837.davem@redhat.com> X-Mailer: Sylpheed version 0.8.9 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1770 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev On Sat, 22 Feb 2003 03:13:26 -0800 (PST) "David S. Miller" wrote: > From: Kazunori Miyazawa > Date: Sat, 22 Feb 2003 20:26:23 +0900 > > I also moved the functions for ah, and esp. > > I don't think this is so good idea... > > As a result of moving IPv6 IPsec functions to net/ipv4, it currently prevents to > make IPv6 as a module. > > This is one of the reasons why ah/esp ipv6 should stay under ipv6. > I will fix them and submit patch again. Thank you, --Kazunori Miyazawa (Yokogawa Electric Coporation) From yoshfuji@linux-ipv6.org Sat Feb 22 04:40:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 04:41:30 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1MCem3v008494 for ; Sat, 22 Feb 2003 04:40:51 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1MCnZBF009414; Sat, 22 Feb 2003 21:49:36 +0900 Date: Sat, 22 Feb 2003 21:49:35 +0900 (JST) Message-Id: <20030222.214935.134101784.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: kazunori@miyazawa.org, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi@linux-ipv6.org, kunihiro@ipinfusion.com Subject: Re: [PATCH] IPv6 IPSEC support From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030222.031326.103246837.davem@redhat.com> References: <20030222202623.38d41d8a.kazunori@miyazawa.org> <20030222.031326.103246837.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1771 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. In article <20030222.031326.103246837.davem@redhat.com> (at Sat, 22 Feb 2003 03:13:26 -0800 (PST)), "David S. Miller" says: > Nothing in xfrm routines really need to reference ipv6 module > functions, please eliminate this dependency. Breaking ipv6 as module > is ok for temporary development, but eventually it must be solved. xfrm_policy.c:xfrm6_bundle_create() seems to depend on ip6_route_output() as xfrm_bundle_create() depends on __ip_route_output_key(). How do we solve this dependency? inter-module? -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Sat Feb 22 15:55:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 15:55:50 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1MNtj3v017115 for ; Sat, 22 Feb 2003 15:55:46 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id PAA28228; Sat, 22 Feb 2003 15:47:54 -0800 Date: Sat, 22 Feb 2003 15:47:53 -0800 (PST) Message-Id: <20030222.154753.133994666.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: kazunori@miyazawa.org, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi@linux-ipv6.org, kunihiro@ipinfusion.com Subject: Re: [PATCH] IPv6 IPSEC support From: "David S. Miller" In-Reply-To: <20030222.214935.134101784.yoshfuji@linux-ipv6.org> References: <20030222202623.38d41d8a.kazunori@miyazawa.org> <20030222.031326.103246837.davem@redhat.com> <20030222.214935.134101784.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1772 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Sat, 22 Feb 2003 21:49:35 +0900 (JST) xfrm_policy.c:xfrm6_bundle_create() seems to depend on ip6_route_output() as xfrm_bundle_create() depends on __ip_route_output_key(). How do we solve this dependency? inter-module? Good question. Maybe we can pass around a structure to xfrm_lookup() which contains information on how to lookup routes for tunnels. It can just be a function pointer right now. It may be possible to generalize this technique even more, making more xfrm_*() routines address-family independant. One example, xfrm_lookup() gets this xfrm_afinfo pointer, and it can use it to learn how to compare addresses. The xfrm_afinfo pointer is also passed to xfrm_bundle_create() which uses it to learn how to lookup tunnel routes. A small net/ipv6/xfrm_ipv6.c module is created, which registers a xfrm_afinfo structure to the generic xfrm engine, it teaches how to do these operations for AF_INET6 xfrm objects. Do you think this can work? We have several conflicting desires, all of them arise from capability to make many things as modules. The only reliable aspect is that ipv4 cannot be modular. Because of this we can allow xfrm_user and af_key to be either modular or non-modular. From yoshfuji@linux-ipv6.org Sat Feb 22 16:35:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 22 Feb 2003 16:35:55 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1N0Zo3v017751 for ; Sat, 22 Feb 2003 16:35:51 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1N0igBF012861; Sun, 23 Feb 2003 09:44:43 +0900 Date: Sun, 23 Feb 2003 09:44:42 +0900 (JST) Message-Id: <20030223.094442.107718872.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: kazunori@miyazawa.org, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi@linux-ipv6.org, kunihiro@ipinfusion.com Subject: Re: [PATCH] IPv6 IPSEC support From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030222.154753.133994666.davem@redhat.com> References: <20030222.031326.103246837.davem@redhat.com> <20030222.214935.134101784.yoshfuji@linux-ipv6.org> <20030222.154753.133994666.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1773 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030222.154753.133994666.davem@redhat.com> (at Sat, 22 Feb 2003 15:47:53 -0800 (PST)), "David S. Miller" says: > One example, xfrm_lookup() gets this xfrm_afinfo pointer, and it can > use it to learn how to compare addresses. The xfrm_afinfo pointer > is also passed to xfrm_bundle_create() which uses it to learn how > to lookup tunnel routes. > > A small net/ipv6/xfrm_ipv6.c module is created, which registers > a xfrm_afinfo structure to the generic xfrm engine, it teaches > how to do these operations for AF_INET6 xfrm objects. > > Do you think this can work? I suppose so. We'll try to work on it. -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Sun Feb 23 01:25:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 01:25:50 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1N9Pf3v021718 for ; Sun, 23 Feb 2003 01:25:42 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id BAA28860; Sun, 23 Feb 2003 01:18:17 -0800 Date: Sun, 23 Feb 2003 01:18:16 -0800 (PST) Message-Id: <20030223.011816.108201183.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: "David S. Miller" In-Reply-To: <20021103.115427.104445233.yoshfuji@linux-ipv6.org> References: <20021103.115427.104445233.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1774 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Sun, 03 Nov 2002 11:54:27 +0900 (JST) This patch cleans up functions in ipv6 stack: - export route6_me_harder() as ip6_route_harder() and use it from net/ipv6/netfilter/ip6_queue.c. - make ip6_addr_prefix() to generate prefix of given address and prefix length, instead of doing "ipv6_copy_addr() then ipv6_wash_prefix()." Please change new name to ip6_route_me_harder(). When one says "something me harder" is has amusing implications when heard by most english speakers and I'd like to keep this :-) I will apply this patch once you make the change. Would you like me to add it to 2.4.x as well? We really need to revisit USAGI patch backlog. I have and will apply privacy extension 2.5.x patch you sent. For all the others please feel free to "patch bomb" me :-) Please indicate with each patch whether the it is desired in 2.4.x as well. If you wish to concentrate on ipv6 ipsec first, that is ok too. :-) From kazunori@miyazawa.org Sun Feb 23 07:26:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 07:26:31 -0800 (PST) Received: from miyazawa.org (usen-43x235x12x234.ap-USEN.usen.ad.jp [43.235.12.234]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1NFQK3v032301 for ; Sun, 23 Feb 2003 07:26:21 -0800 Received: from monza.miyazawa.org ([::ffff:192.168.0.3]) (AUTH: LOGIN kazunori, ) by miyazawa.org with esmtp; Mon, 24 Feb 2003 00:18:16 +0900 Date: Mon, 24 Feb 2003 00:35:40 +0900 From: Kazunori Miyazawa To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, usagi-core@linux-ipv6.org Subject: Re: [PATCH] IPv6 IPSEC support Message-Id: <20030224003540.53e4cda5.kazunori@miyazawa.org> In-Reply-To: <20030222.031326.103246837.davem@redhat.com> References: <20030222202623.38d41d8a.kazunori@miyazawa.org> <20030222.031326.103246837.davem@redhat.com> X-Mailer: Sylpheed version 0.8.10 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1775 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kazunori@miyazawa.org Precedence: bulk X-list: netdev Hello, On Sat, 22 Feb 2003 03:13:26 -0800 (PST) "David S. Miller" wrote: > From: Kazunori Miyazawa > Date: Sat, 22 Feb 2003 20:26:23 +0900 > > I also moved the functions for ah, and esp. > > I don't think this is so good idea... > > As a result of moving IPv6 IPsec functions to net/ipv4, it currently prevents to > make IPv6 as a module. > > This is one of the reasons why ah/esp ipv6 should stay under ipv6. > > Nothing in xfrm routines really need to reference ipv6 module > functions, please eliminate this dependency. Breaking ipv6 as module > is ok for temporary development, but eventually it must be solved. I just moved ipv6 ah/esp functions to under net/ipv6. Thank you, --Kazunori Miyazawa (Yokogawa Electric Corporation) diff -ruN -x CVS linux-2.5.62+cs1.1002/include/linux/ipv6.h linux25/include/linux/ipv6.h --- linux-2.5.62+cs1.1002/include/linux/ipv6.h 2003-02-23 17:56:54.000000000 +0900 +++ linux25/include/linux/ipv6.h 2003-02-23 13:24:59.000000000 +0900 @@ -74,6 +74,21 @@ #define rt0_type rt_hdr.type; }; +struct ipv6_auth_hdr { + __u8 nexthdr; + __u8 hdrlen; /* This one is measured in 32 bit units! */ + __u16 reserved; + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 auth_data[4]; /* Length variable but >=4. Mind the 64 bit alignment! */ +}; + +struct ipv6_esp_hdr { + __u32 spi; + __u32 seq_no; /* Sequence number */ + __u8 enc_data[8]; /* Length variable but >=8. Mind the 64 bit alignment! */ +}; + /* * IPv6 fixed header * diff -ruN -x CVS linux-2.5.62+cs1.1002/include/net/dst.h linux25/include/net/dst.h --- linux-2.5.62+cs1.1002/include/net/dst.h 2003-02-23 17:56:43.000000000 +0900 +++ linux25/include/net/dst.h 2003-02-23 13:24:59.000000000 +0900 @@ -248,6 +248,9 @@ extern int xfrm_lookup(struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); extern void xfrm_init(void); +extern int xfrm6_lookup(struct dst_entry **dst_p, struct flowi *fl, + struct sock *sk, int flags); +extern void xfrm6_init(void); #endif diff -ruN -x CVS linux-2.5.62+cs1.1002/include/net/ip6_route.h linux25/include/net/ip6_route.h --- linux-2.5.62+cs1.1002/include/net/ip6_route.h 2003-02-23 17:56:43.000000000 +0900 +++ linux25/include/net/ip6_route.h 2003-02-23 13:24:59.000000000 +0900 @@ -55,6 +55,8 @@ struct in6_addr *saddr, int oif, int flags); +extern struct rt6_info *ndisc_get_dummy_rt(void); + /* * support functions for ND * diff -ruN -x CVS linux-2.5.62+cs1.1002/include/net/xfrm.h linux25/include/net/xfrm.h --- linux-2.5.62+cs1.1002/include/net/xfrm.h 2003-02-23 17:56:44.000000000 +0900 +++ linux25/include/net/xfrm.h 2003-02-23 19:57:44.000000000 +0900 @@ -12,6 +12,7 @@ #include #include +#include #define XFRM_ALIGN8(len) (((len) + 7) & ~7) @@ -282,6 +283,7 @@ struct xfrm_dst *next; struct dst_entry dst; struct rtable rt; + struct rt6_info rt6; } u; }; @@ -308,26 +310,42 @@ if (sp && atomic_dec_and_test(&sp->refcnt)) __secpath_destroy(sp); } - -extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb); +extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, unsigned short family); static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb) { if (sk && sk->policy[XFRM_POLICY_IN]) - return __xfrm_policy_check(sk, dir, skb); + return __xfrm_policy_check(sk, dir, skb, AF_INET); + + return !xfrm_policy_list[dir] || + (skb->dst->flags & DST_NOPOLICY) || + __xfrm_policy_check(sk, dir, skb, AF_INET); +} + +static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +{ + if (sk && sk->policy[XFRM_POLICY_IN]) + return __xfrm_policy_check(sk, dir, skb, AF_INET6); return !xfrm_policy_list[dir] || (skb->dst->flags & DST_NOPOLICY) || - __xfrm_policy_check(sk, dir, skb); + __xfrm_policy_check(sk, dir, skb, AF_INET6); } -extern int __xfrm_route_forward(struct sk_buff *skb); +extern int __xfrm_route_forward(struct sk_buff *skb, unsigned short family); static inline int xfrm_route_forward(struct sk_buff *skb) { return !xfrm_policy_list[XFRM_POLICY_OUT] || (skb->dst->flags & DST_NOXFRM) || - __xfrm_route_forward(skb); + __xfrm_route_forward(skb, AF_INET); +} + +static inline int xfrm6_route_forward(struct sk_buff *skb) +{ + return !xfrm_policy_list[XFRM_POLICY_OUT] || + (skb->dst->flags & DST_NOXFRM) || + __xfrm_route_forward(skb, AF_INET6); } extern int __xfrm_sk_clone_policy(struct sock *sk); @@ -382,10 +400,14 @@ extern struct xfrm_state *xfrm_state_alloc(void); extern struct xfrm_state *xfrm_state_find(u32 daddr, u32 saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, struct xfrm_policy *pol, int *err); +extern struct xfrm_state *xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, + struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err); extern int xfrm_state_check_expire(struct xfrm_state *x); extern void xfrm_state_insert(struct xfrm_state *x); extern int xfrm_state_check_space(struct xfrm_state *x, struct sk_buff *skb); extern struct xfrm_state *xfrm_state_lookup(u32 daddr, u32 spi, u8 proto); +extern struct xfrm_state *xfrm6_state_lookup(struct in6_addr *daddr, u32 spi, u8 proto); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); extern void xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); @@ -393,22 +415,27 @@ extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); extern int xfrm_check_selectors(struct xfrm_state **x, int n, struct flowi *fl); extern int xfrm4_rcv(struct sk_buff *skb); +extern int xfrm6_rcv(struct sk_buff *skb); +extern int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir); extern int xfrm_user_policy(struct sock *sk, int optname, u8 *optval, int optlen); struct xfrm_policy *xfrm_policy_alloc(int gfp); extern int xfrm_policy_walk(int (*func)(struct xfrm_policy *, int, int, void*), void *); -struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl); +struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl, unsigned short family); int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl); struct xfrm_policy *xfrm_policy_delete(int dir, struct xfrm_selector *sel); struct xfrm_policy *xfrm_policy_byid(int dir, u32 id, int delete); void xfrm_policy_flush(void); void xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); struct xfrm_state * xfrm_find_acq(u8 mode, u16 reqid, u8 proto, u32 daddr, u32 saddr, int create); +struct xfrm_state * xfrm6_find_acq(u8 mode, u16 reqid, u8 proto, struct in6_addr *daddr, + struct in6_addr *saddr, int create); extern void xfrm_policy_flush(void); extern void xfrm_policy_kill(struct xfrm_policy *); extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol); extern struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl); extern int xfrm_flush_bundles(struct xfrm_state *x); +extern int xfrm6_flush_bundles(struct xfrm_state *x); extern wait_queue_head_t km_waitq; extern void km_warn_expired(struct xfrm_state *x); @@ -425,15 +452,41 @@ extern struct xfrm_algo_desc *xfrm_aalg_get_byname(char *name); extern struct xfrm_algo_desc *xfrm_ealg_get_byname(char *name); +static __inline__ int addr_match(void *token1, void *token2, int prefixlen) +{ + __u32 *a1 = token1; + __u32 *a2 = token2; + int pdw; + int pbi; + + pdw = prefixlen >> 5; /* num of whole __u32 in prefix */ + pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ + + if (pdw) + if (memcmp(a1, a2, pdw << 2)) + return 0; + + if (pbi) { + __u32 mask; + + mask = htonl((0xffffffff) << (32 - pbi)); + + if ((a1[pdw] ^ a2[pdw]) & mask) + return 0; + } + + return 1; +} + static inline int xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl) { - return !memcmp(fl->fl6_dst, sel->daddr.a6, sizeof(struct in6_addr)) && - !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && - !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && - (fl->proto == sel->proto || !sel->proto) && - (fl->oif == sel->ifindex || !sel->ifindex) && - !memcmp(fl->fl6_src, sel->saddr.a6, sizeof(struct in6_addr)); + return addr_match(fl->fl6_dst, &sel->daddr, sel->prefixlen_d) && + addr_match(fl->fl6_src, &sel->saddr, sel->prefixlen_s) && + !((fl->uli_u.ports.dport^sel->dport)&sel->dport_mask) && + !((fl->uli_u.ports.sport^sel->sport)&sel->sport_mask) && + (fl->proto == sel->proto || !sel->proto) && + (fl->oif == sel->ifindex || !sel->ifindex); } extern int xfrm6_register_type(struct xfrm_type *type); @@ -444,4 +497,83 @@ struct xfrm_state * xfrm6_find_acq(u8 mode, u16 reqid, u8 proto, struct in6_addr *daddr, struct in6_addr *saddr, int create); void xfrm6_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi); +struct ah_data +{ + u8 *key; + int key_len; + u8 *work_icv; + int icv_full_len; + int icv_trunc_len; + + void (*icv)(struct ah_data*, + struct sk_buff *skb, u8 *icv); + + struct crypto_tfm *tfm; +}; + +struct esp_data +{ + /* Confidentiality */ + struct { + u8 *key; /* Key */ + int key_len; /* Key length */ + u8 *ivec; /* ivec buffer */ + /* ivlen is offset from enc_data, where encrypted data start. + * It is logically different of crypto_tfm_alg_ivsize(tfm). + * We assume that it is either zero (no ivec), or + * >= crypto_tfm_alg_ivsize(tfm). */ + int ivlen; + int padlen; /* 0..255 */ + struct crypto_tfm *tfm; /* crypto handle */ + } conf; + + /* Integrity. It is active when icv_full_len != 0 */ + struct { + u8 *key; /* Key */ + int key_len; /* Length of the key */ + u8 *work_icv; + int icv_full_len; + int icv_trunc_len; + void (*icv)(struct esp_data*, + struct sk_buff *skb, + int offset, int len, u8 *icv); + struct crypto_tfm *tfm; + } auth; +}; + +typedef void (icv_update_fn_t)(struct crypto_tfm *, struct scatterlist *, unsigned int); +void skb_ah_walk(const struct sk_buff *skb, + struct crypto_tfm *tfm, icv_update_fn_t icv_update); +void skb_icv_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, + int offset, int len, icv_update_fn_t icv_update); +int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len); +int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer); +void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); + +static inline void +ah_hmac_digest(struct ah_data *ahp, struct sk_buff *skb, u8 *auth_data) +{ + struct crypto_tfm *tfm = ahp->tfm; + + memset(auth_data, 0, ahp->icv_trunc_len); + crypto_hmac_init(tfm, ahp->key, &ahp->key_len); + skb_ah_walk(skb, tfm, crypto_hmac_update); + crypto_hmac_final(tfm, ahp->key, &ahp->key_len, ahp->work_icv); + memcpy(auth_data, ahp->work_icv, ahp->icv_trunc_len); +} + +static inline void +esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, + int len, u8 *auth_data) +{ + struct crypto_tfm *tfm = esp->auth.tfm; + char *icv = esp->auth.work_icv; + + memset(auth_data, 0, esp->auth.icv_trunc_len); + crypto_hmac_init(tfm, esp->auth.key, &esp->auth.key_len); + skb_icv_walk(skb, tfm, offset, len, crypto_hmac_update); + crypto_hmac_final(tfm, esp->auth.key, &esp->auth.key_len, icv); + memcpy(auth_data, icv, esp->auth.icv_trunc_len); +} + #endif /* _NET_XFRM_H */ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv4/ah.c linux25/net/ipv4/ah.c --- linux-2.5.62+cs1.1002/net/ipv4/ah.c 2003-02-23 17:53:46.000000000 +0900 +++ linux25/net/ipv4/ah.c 2003-02-23 18:17:13.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -7,25 +15,10 @@ #include #include -#define AH_HLEN_NOICV 12 - -typedef void (icv_update_fn_t)(struct crypto_tfm *, - struct scatterlist *, unsigned int); - -struct ah_data -{ - u8 *key; - int key_len; - u8 *work_icv; - int icv_full_len; - int icv_trunc_len; - - void (*icv)(struct ah_data*, - struct sk_buff *skb, u8 *icv); - - struct crypto_tfm *tfm; -}; +#include +#include +#define AH_HLEN_NOICV 12 /* Clear mutable options and find final destination to substitute * into IP header for icv calculation. Options are already checked @@ -71,7 +64,7 @@ return 0; } -static void skb_ah_walk(const struct sk_buff *skb, +void skb_ah_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, icv_update_fn_t icv_update) { int offset = 0; @@ -145,18 +138,6 @@ BUG(); } -static void -ah_hmac_digest(struct ah_data *ahp, struct sk_buff *skb, u8 *auth_data) -{ - struct crypto_tfm *tfm = ahp->tfm; - - memset(auth_data, 0, ahp->icv_trunc_len); - crypto_hmac_init(tfm, ahp->key, &ahp->key_len); - skb_ah_walk(skb, tfm, crypto_hmac_update); - crypto_hmac_final(tfm, ahp->key, &ahp->key_len, ahp->work_icv); - memcpy(auth_data, ahp->work_icv, ahp->icv_trunc_len); -} - static int ah_output(struct sk_buff *skb) { int err; diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv4/esp.c linux25/net/ipv4/esp.c --- linux-2.5.62+cs1.1002/net/ipv4/esp.c 2003-02-23 17:53:46.000000000 +0900 +++ linux25/net/ipv4/esp.c 2003-02-23 18:17:39.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -8,45 +16,13 @@ #include #include -#define MAX_SG_ONSTACK 4 -typedef void (icv_update_fn_t)(struct crypto_tfm *, - struct scatterlist *, unsigned int); +#define MAX_SG_ONSTACK 4 /* BUGS: * - we assume replay seqno is always present. */ -struct esp_data -{ - /* Confidentiality */ - struct { - u8 *key; /* Key */ - int key_len; /* Key length */ - u8 *ivec; /* ivec buffer */ - /* ivlen is offset from enc_data, where encrypted data start. - * It is logically different of crypto_tfm_alg_ivsize(tfm). - * We assume that it is either zero (no ivec), or - * >= crypto_tfm_alg_ivsize(tfm). */ - int ivlen; - int padlen; /* 0..255 */ - struct crypto_tfm *tfm; /* crypto handle */ - } conf; - - /* Integrity. It is active when icv_full_len != 0 */ - struct { - u8 *key; /* Key */ - int key_len; /* Length of the key */ - u8 *work_icv; - int icv_full_len; - int icv_trunc_len; - void (*icv)(struct esp_data*, - struct sk_buff *skb, - int offset, int len, u8 *icv); - struct crypto_tfm *tfm; - } auth; -}; - /* Move to common area: it is shared with AH. */ void skb_icv_walk(const struct sk_buff *skb, struct crypto_tfm *tfm, @@ -190,22 +166,6 @@ return elt; } -/* Common with AH after some work on arguments. */ - -static void -esp_hmac_digest(struct esp_data *esp, struct sk_buff *skb, int offset, - int len, u8 *auth_data) -{ - struct crypto_tfm *tfm = esp->auth.tfm; - char *icv = esp->auth.work_icv; - - memset(auth_data, 0, esp->auth.icv_trunc_len); - crypto_hmac_init(tfm, esp->auth.key, &esp->auth.key_len); - skb_icv_walk(skb, tfm, offset, len, crypto_hmac_update); - crypto_hmac_final(tfm, esp->auth.key, &esp->auth.key_len, icv); - memcpy(auth_data, icv, esp->auth.icv_trunc_len); -} - /* Check that skb data bits are writable. If they are not, copy data * to newly created private area. If "tailbits" is given, make sure that * tailbits bytes beyond current end of skb are writable. diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv4/xfrm_input.c linux25/net/ipv4/xfrm_input.c --- linux-2.5.62+cs1.1002/net/ipv4/xfrm_input.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv4/xfrm_input.c 2003-02-23 13:25:00.000000000 +0900 @@ -1,4 +1,13 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include +#include #include static kmem_cache_t *secpath_cachep; @@ -157,3 +166,288 @@ if (!secpath_cachep) panic("IP: failed to allocate secpath_cache\n"); } + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* Fetch spi and seq frpm ipsec header */ + +static int xfrm6_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq) +{ + int offset, offset_seq; + + switch (nexthdr) { + case IPPROTO_AH: + offset = offsetof(struct ip_auth_hdr, spi); + offset_seq = offsetof(struct ip_auth_hdr, seq_no); + break; + case IPPROTO_ESP: + offset = offsetof(struct ip_esp_hdr, spi); + offset_seq = offsetof(struct ip_esp_hdr, seq_no); + break; + case IPPROTO_COMP: + if (!pskb_may_pull(skb, 4)) + return -EINVAL; + *spi = *(u16*)(skb->h.raw + 2); + *seq = 0; + return 0; + default: + return 1; + } + + if (!pskb_may_pull(skb, 16)) + return -EINVAL; + + *spi = *(u32*)(skb->h.raw + offset); + *seq = *(u32*)(skb->h.raw + offset_seq); + return 0; +} + +static int zero_out_mutable_opts(struct ipv6_opt_hdr *opthdr) +{ + u8 *opt = (u8 *)opthdr; + int len = ipv6_optlen(opthdr); + int off = 0; + int optlen = 0; + + off += 2; + len -= 2; + + while (len > 0) { + + switch (opt[off]) { + + case IPV6_TLV_PAD0: + optlen = 1; + break; + default: + if (len < 2) + goto bad; + optlen = opt[off+1]+2; + if (len < optlen) + goto bad; + if (opt[off] & 0x20) + memset(&opt[off+2], 0, opt[off+1]); + break; + } + + off += optlen; + len -= optlen; + } + if (len == 0) + return 1; + +bad: + return 0; +} + +int xfrm6_clear_mutable_options(struct sk_buff *skb, u16 *nh_offset, int dir) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + unsigned int packet_len = skb->tail - skb->nh.raw; + u8 nexthdr = skb->nh.ipv6h->nexthdr; + u8 nextnexthdr = 0; + + *nh_offset = ((unsigned char *)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + + while (offset + 1 <= packet_len) { + + switch (nexthdr) { + + case NEXTHDR_HOP: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun hopopts\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_ROUTING: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + ((struct ipv6_rt_hdr*)exthdr)->segments_left = 0; + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_DEST: + *nh_offset = offset; + offset += ipv6_optlen(exthdr); + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case NEXTHDR_AUTH: + if (dir == XFRM_POLICY_OUT) { + memset(((struct ipv6_auth_hdr*)exthdr)->auth_data, 0, + (((struct ipv6_auth_hdr*)exthdr)->hdrlen - 1) << 2); + } + if (exthdr->nexthdr == NEXTHDR_DEST) { + offset += (((struct ipv6_auth_hdr*)exthdr)->hdrlen + 2) << 2; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + nextnexthdr = exthdr->nexthdr; + if (!zero_out_mutable_opts(exthdr)) { + if (net_ratelimit()) + printk(KERN_WARNING "overrun destopt\n"); + return 0; + } + } + return nexthdr; + default : + return nexthdr; + } + } + + return nexthdr; +} + +int xfrm6_rcv(struct sk_buff *skb) +{ + int err; + u32 spi, seq; + struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH]; + struct xfrm_state *x; + int xfrm_nr = 0; + int decaps = 0; + struct ipv6hdr *hdr = skb->nh.ipv6h; + unsigned char *tmp_hdr = NULL; + int hdr_len = 0; + u16 nh_offset = 0; + u8 nexthdr = 0; + + if (hdr->nexthdr == IPPROTO_AH || hdr->nexthdr == IPPROTO_ESP) { + nh_offset = ((unsigned char*)&skb->nh.ipv6h->nexthdr) - skb->nh.raw; + hdr_len = sizeof(struct ipv6hdr); + } else { + hdr_len = skb->h.raw - skb->nh.raw; + } + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto drop; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_IN); + hdr->priority = 0; + hdr->flow_lbl[0] = 0; + hdr->flow_lbl[1] = 0; + hdr->flow_lbl[2] = 0; + hdr->hop_limit = 0; + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) != 0) + goto drop; + + do { + struct ipv6hdr *iph = skb->nh.ipv6h; + + if (xfrm_nr == XFRM_MAX_DEPTH) + goto drop; + + x = xfrm6_state_lookup(&iph->daddr, spi, nexthdr); + if (x == NULL) + goto drop; + spin_lock(&x->lock); + if (unlikely(x->km.state != XFRM_STATE_VALID)) + goto drop_unlock; + + if (x->props.replay_window && xfrm_replay_check(x, seq)) + goto drop_unlock; + + nexthdr = x->type->input(x, skb); + if (nexthdr <= 0) + goto drop_unlock; + + if (x->props.replay_window) + xfrm_replay_advance(x, seq); + + x->curlft.bytes += skb->len; + x->curlft.packets++; + + spin_unlock(&x->lock); + + xfrm_vec[xfrm_nr++] = x; + + iph = skb->nh.ipv6h; /* ??? */ + + if (nexthdr == NEXTHDR_DEST) { + if (!pskb_may_pull(skb, (skb->h.raw-skb->data)+8) || + !pskb_may_pull(skb, (skb->h.raw-skb->data)+((skb->h.raw[1]+1)<<3))) { + err = -EINVAL; + goto drop; + } + nexthdr = skb->h.raw[0]; + nh_offset = skb->h.raw - skb->nh.raw; + skb_pull(skb, (skb->h.raw[1]+1)<<3); + skb->h.raw = skb->data; + } + + if (x->props.mode) { /* XXX */ + if (iph->nexthdr != IPPROTO_IPV6) + goto drop; + skb->nh.raw = skb->data; + iph = skb->nh.ipv6h; + decaps = 1; + break; + } + + if ((err = xfrm6_parse_spi(skb, nexthdr, &spi, &seq)) < 0) + goto drop; + } while (!err); + + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.raw[nh_offset] = nexthdr; + skb->nh.ipv6h->payload_len = htons(hdr_len + skb->len - sizeof(struct ipv6hdr)); + + /* Allocate new secpath or COW existing one. */ + if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { + struct sec_path *sp; + sp = kmem_cache_alloc(secpath_cachep, SLAB_ATOMIC); + if (!sp) + goto drop; + if (skb->sp) { + memcpy(sp, skb->sp, sizeof(struct sec_path)); + secpath_put(skb->sp); + } else + sp->len = 0; + atomic_set(&sp->refcnt, 1); + skb->sp = sp; + } + + if (xfrm_nr + skb->sp->len > XFRM_MAX_DEPTH) + goto drop; + + memcpy(skb->sp->xvec+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(void*)); + skb->sp->len += xfrm_nr; + + if (decaps) { + if (!(skb->dev->flags&IFF_LOOPBACK)) { + dst_release(skb->dst); + skb->dst = NULL; + } + netif_rx(skb); + return 0; + } else { + return -nexthdr; + } + +drop_unlock: + spin_unlock(&x->lock); + xfrm_state_put(x); +drop: + if (tmp_hdr) kfree(tmp_hdr); + while (--xfrm_nr >= 0) + xfrm_state_put(xfrm_vec[xfrm_nr]); + kfree_skb(skb); + return 0; +} + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv4/xfrm_policy.c linux25/net/ipv4/xfrm_policy.c --- linux-2.5.62+cs1.1002/net/ipv4/xfrm_policy.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv4/xfrm_policy.c 2003-02-23 13:25:00.000000000 +0900 @@ -1,6 +1,16 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include +#include +#include DECLARE_MUTEX(xfrm_cfg_sem); @@ -55,6 +65,34 @@ #define flow_count(cpu) (flow_number[cpu]) +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +static int xfrm6_bundle_ok(struct xfrm_dst *xdst, struct flowi *fl); +static int xfrm6_bundle_create(struct xfrm_policy *policy, + struct xfrm_state **xfrm, int nx, + struct flowi *fl, struct dst_entry **dst_p); +static int xfrm6_tmpl_resolve(struct xfrm_policy *policy, struct flowi *fl, + struct xfrm_state **xfrm); + +static inline u32 flow_hash6(struct flowi *fl) +{ + u32 hash = fl->fl6_src->s6_addr32[2] ^ + fl->fl6_src->s6_addr32[3] ^ + fl->uli_u.ports.sport; + + hash = ((hash & 0xF0F0F0F0) >> 4) | ((hash & 0x0F0F0F0F) << 4); + + hash ^= fl->fl6_dst->s6_addr32[2] ^ + fl->fl6_dst->s6_addr32[3] ^ + fl->uli_u.ports.dport; + hash ^= (hash >> 10); + hash ^= (hash >> 20); + return hash & (FLOWCACHE_HASH_SIZE-1); +} + +extern struct dst_ops xfrm6_dst_ops; +#endif + static void flow_cache_shrink(int cpu) { int i; @@ -77,13 +115,27 @@ } } -struct xfrm_policy *flow_lookup(int dir, struct flowi *fl) +struct xfrm_policy *flow_lookup(int dir, struct flowi *fl, + unsigned short family) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; struct flow_entry *fle; - u32 hash = flow_hash(fl); + u32 hash; int cpu; + switch (family) { + case AF_INET: + hash = flow_hash(fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + hash = flow_hash6(fl); + break; +#endif + default: + return NULL; + } + local_bh_disable(); cpu = smp_processor_id(); @@ -101,7 +153,7 @@ } } - pol = xfrm_policy_lookup(dir, fl); + pol = xfrm_policy_lookup(dir, fl, family); if (fle) { /* Stale flow entry found. Update it. */ @@ -506,33 +558,63 @@ /* Find policy to apply to this flow. */ -struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl) +struct xfrm_policy *xfrm_policy_lookup(int dir, struct flowi *fl, unsigned short family) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; read_lock_bh(&xfrm_policy_lock); for (pol = xfrm_policy_list[dir]; pol; pol = pol->next) { struct xfrm_selector *sel = &pol->selector; - - if (xfrm4_selector_match(sel, fl)) { - atomic_inc(&pol->refcnt); + switch (family) { + case AF_INET: + if (pol->family != AF_INET) break; + if (xfrm4_selector_match(sel, fl)) { + atomic_inc(&pol->refcnt); + goto unlock_out; + } break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + if (pol->family != AF_INET6) break; + if (xfrm6_selector_match(sel, fl)) { + atomic_inc(&pol->refcnt); + goto unlock_out; + } + break; +#endif + default: + goto unlock_out; } } +unlock_out: read_unlock_bh(&xfrm_policy_lock); return pol; } struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl) { - struct xfrm_policy *pol; + struct xfrm_policy *pol = NULL; read_lock_bh(&xfrm_policy_lock); if ((pol = sk->policy[dir]) != NULL) { - if (xfrm4_selector_match(&pol->selector, fl)) - atomic_inc(&pol->refcnt); - else + switch (sk->family) { + case AF_INET: + if (xfrm4_selector_match(&pol->selector, fl)) + atomic_inc(&pol->refcnt); + else + pol = NULL; + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + if (xfrm6_selector_match(&pol->selector, fl)) + atomic_inc(&pol->refcnt); + else + pol = NULL; + break; +#endif + default: pol = NULL; + } } read_unlock_bh(&xfrm_policy_lock); return pol; @@ -806,9 +888,7 @@ int nx = 0; int err; u32 genid; - - fl->oif = rt->u.dst.dev->ifindex; - fl->fl4_src = rt->rt_src; + u16 family = (*dst_p)->ops->family; restart: genid = xfrm_policy_genid; @@ -821,7 +901,16 @@ if ((rt->u.dst.flags & DST_NOXFRM) || !xfrm_policy_list[XFRM_POLICY_OUT]) return 0; - policy = flow_lookup(XFRM_POLICY_OUT, fl); + switch (family) { + case AF_INET: + policy = flow_lookup(XFRM_POLICY_OUT, fl, AF_INET); + break; + case AF_INET6: + policy = flow_lookup(XFRM_POLICY_OUT, fl, AF_INET6); + break; + default: + return 0; + } if (!policy) return 0; } @@ -846,23 +935,48 @@ * LATER: help from flow cache. It is optional, this * is required only for output policy. */ - read_lock_bh(&policy->lock); - for (dst = policy->bundles; dst; dst = dst->next) { - struct xfrm_dst *xdst = (struct xfrm_dst*)dst; - if (xdst->u.rt.fl.fl4_dst == fl->fl4_dst && - xdst->u.rt.fl.fl4_src == fl->fl4_src && - xdst->u.rt.fl.oif == fl->oif && - xfrm_bundle_ok(xdst, fl)) { - dst_clone(dst); + if (family == AF_INET) { + fl->oif = rt->u.dst.dev->ifindex; + fl->fl4_src = rt->rt_src; + read_lock_bh(&policy->lock); + for (dst = policy->bundles; dst; dst = dst->next) { + struct xfrm_dst *xdst = (struct xfrm_dst*)dst; + if (xdst->u.rt.fl.fl4_dst == fl->fl4_dst && + xdst->u.rt.fl.fl4_src == fl->fl4_src && + xdst->u.rt.fl.oif == fl->oif && + xfrm_bundle_ok(xdst, fl)) { + dst_clone(dst); + break; + } + } + read_unlock_bh(&policy->lock); + if (dst) break; + nx = xfrm_tmpl_resolve(policy, fl, xfrm); +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + } else if (family == AF_INET6) { + read_lock_bh(&policy->lock); + for (dst = policy->bundles; dst; dst = dst->next) { + struct xfrm_dst *xdst = (struct xfrm_dst*)dst; + if (!memcmp(&xdst->u.rt6.rt6i_dst, &fl->fl6_dst, sizeof(struct in6_addr)) && + !memcmp(&xdst->u.rt6.rt6i_src, &fl->fl6_src, sizeof(struct in6_addr)) && + xfrm6_bundle_ok(xdst, fl)) { + dst_clone(dst); + break; + } } + read_unlock_bh(&policy->lock); + if (dst) + break; + nx = xfrm6_tmpl_resolve(policy, fl, xfrm); +#endif + } else { + return -EINVAL; } - read_unlock_bh(&policy->lock); if (dst) break; - nx = xfrm_tmpl_resolve(policy, fl, xfrm); if (unlikely(nx<0)) { err = nx; if (err == -EAGAIN) { @@ -873,7 +987,18 @@ __set_task_state(tsk, TASK_INTERRUPTIBLE); add_wait_queue(&km_waitq, &wait); - err = xfrm_tmpl_resolve(policy, fl, xfrm); + switch (family) { + case AF_INET: + err = xfrm_tmpl_resolve(policy, fl, xfrm); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + err = xfrm6_tmpl_resolve(policy, fl, xfrm); + break; +#endif + default: + err = -EINVAL; + } if (err == -EAGAIN) schedule(); __set_task_state(tsk, TASK_RUNNING); @@ -896,7 +1021,19 @@ } dst = &rt->u.dst; - err = xfrm_bundle_create(policy, xfrm, nx, fl, &dst); + switch (family) { + case AF_INET: + err = xfrm_bundle_create(policy, xfrm, nx, fl, &dst); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + err = xfrm6_bundle_create(policy, xfrm, nx, fl, &dst); + break; +#endif + default: + err = -EINVAL; + } + if (unlikely(err)) { int i; for (i=0; ifl4_src = iph->saddr; } -int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb) +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) +static inline int +xfrm6_state_ok(struct xfrm_tmpl *tmpl, struct xfrm_state *x) +{ + return x->id.proto == tmpl->id.proto && + (x->id.spi == tmpl->id.spi || !tmpl->id.spi) && + x->props.mode == tmpl->mode && + (tmpl->aalgos & (1<props.aalgo)) && + (!x->props.mode || !ipv6_addr_any((struct in6_addr*)&x->props.saddr) || + !memcmp(&tmpl->saddr, &x->props.saddr, sizeof(struct in6_addr))); +} + +static inline int +xfrm6_policy_ok(struct xfrm_tmpl *tmpl, struct sec_path *sp, int idx) +{ + for (; idx < sp->len; idx++) { + if (xfrm6_state_ok(tmpl, sp->xvec[idx])) + return ++idx; + } + return -1; +} + +static inline void +_decode_session6(struct sk_buff *skb, struct flowi *fl) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6hdr *hdr = skb->nh.ipv6h; + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + u8 nexthdr = skb->nh.ipv6h->nexthdr; + + fl->fl6_dst = &hdr->daddr; + fl->fl6_src = &hdr->saddr; + + while (pskb_may_pull(skb, skb->nh.raw + offset + 1 - skb->data)) { + switch (nexthdr) { + case NEXTHDR_ROUTING: + case NEXTHDR_HOP: + case NEXTHDR_DEST: + offset += ipv6_optlen(exthdr); + nexthdr = exthdr->nexthdr; + exthdr = (struct ipv6_opt_hdr*)(skb->nh.raw + offset); + break; + + case IPPROTO_UDP: + case IPPROTO_TCP: + case IPPROTO_SCTP: + if (pskb_may_pull(skb, skb->nh.raw + offset + 4 - skb->data)) { + u16 *ports = (u16 *)exthdr; + + fl->uli_u.ports.sport = ports[0]; + fl->uli_u.ports.dport = ports[1]; + } + return; + + /* XXX Why are there these headers? */ + case IPPROTO_AH: + case IPPROTO_ESP: + default: + fl->uli_u.spi = 0; + return; + }; + } +} +#endif + +int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) { struct xfrm_policy *pol; struct flowi fl; - _decode_session(skb, &fl); + switch (family) { + case AF_INET: + _decode_session(skb, &fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + _decode_session6(skb, &fl); + break; +#endif + default : + return 0; + } /* First, check used SA against their selectors. */ if (skb->sp) { int i; - for (i=skb->sp->len-1; i>=0; i--) { - if (!xfrm4_selector_match(&skb->sp->xvec[i]->sel, &fl)) + switch (family) { + case AF_INET: + for (i=skb->sp->len-1; i>=0; i--) { + if (!xfrm4_selector_match(&skb->sp->xvec[i]->sel, &fl)) + return 0; + } + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + for (i=skb->sp->len-1; i>=0; i--) { + if (family == AF_INET6 && !xfrm6_selector_match(&skb->sp->xvec[i]->sel, &fl)) + return 0; + } + break; +#endif + default : return 0; } } @@ -1029,7 +1256,7 @@ pol = xfrm_sk_policy_lookup(sk, dir, &fl); if (!pol) - pol = flow_lookup(dir, &fl); + pol = flow_lookup(dir, &fl, family); if (!pol) return 1; @@ -1049,10 +1276,25 @@ * some barriers, but at the moment barriers * are implied between each two transformations. */ - for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { - k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); - if (k < 0) - goto reject; + switch (family) { + case AF_INET: + for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { + k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); + if (k < 0) + goto reject; + } + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + for (i = pol->xfrm_nr-1, k = 0; i >= 0; i--) { + k = xfrm_policy_ok(pol->xfrm_vec+i, sp, k); + if (k < 0) + goto reject; + } + break; +#endif + default : + return 0; } } xfrm_pol_put(pol); @@ -1064,18 +1306,29 @@ return 0; } -int __xfrm_route_forward(struct sk_buff *skb) +int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct flowi fl; - _decode_session(skb, &fl); + switch (family) { + case AF_INET: + _decode_session(skb, &fl); + break; +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + case AF_INET6: + _decode_session6(skb, &fl); + break; +#endif + default: + return 0; + } return xfrm_lookup(&skb->dst, &fl, NULL, 0) == 0; } /* Optimize later using cookies and generation ids. */ -static struct dst_entry *xfrm4_dst_check(struct dst_entry *dst, u32 cookie) +static struct dst_entry *xfrm_dst_check(struct dst_entry *dst, u32 cookie) { struct dst_entry *child = dst; @@ -1091,19 +1344,19 @@ return dst; } -static void xfrm4_dst_destroy(struct dst_entry *dst) +static void xfrm_dst_destroy(struct dst_entry *dst) { xfrm_state_put(dst->xfrm); dst->xfrm = NULL; } -static void xfrm4_link_failure(struct sk_buff *skb) +static void xfrm_link_failure(struct sk_buff *skb) { /* Impossible. Such dst must be popped before reaches point of failure. */ return; } -static struct dst_entry *xfrm4_negative_advice(struct dst_entry *dst) +static struct dst_entry *xfrm_negative_advice(struct dst_entry *dst) { if (dst) { if (dst->obsolete) { @@ -1114,8 +1367,7 @@ return dst; } - -static int xfrm4_garbage_collect(void) +static void __xfrm_garbage_collect(void) { int i; struct xfrm_policy *pol; @@ -1145,7 +1397,11 @@ gc_list = dst->next; dst_free(dst); } +} +static inline int xfrm4_garbage_collect(void) +{ + __xfrm_garbage_collect(); return (atomic_read(&xfrm4_dst_ops.entries) > xfrm4_dst_ops.gc_thresh*2); } @@ -1247,10 +1503,10 @@ .family = AF_INET, .protocol = __constant_htons(ETH_P_IP), .gc = xfrm4_garbage_collect, - .check = xfrm4_dst_check, - .destroy = xfrm4_dst_destroy, - .negative_advice = xfrm4_negative_advice, - .link_failure = xfrm4_link_failure, + .check = xfrm_dst_check, + .destroy = xfrm_dst_destroy, + .negative_advice = xfrm_negative_advice, + .link_failure = xfrm_link_failure, .update_pmtu = xfrm4_update_pmtu, .get_mss = xfrm4_get_mss, .gc_thresh = 1024, @@ -1267,8 +1523,301 @@ if (!xfrm4_dst_ops.kmem_cachep) panic("IP: failed to allocate xfrm4_dst_cache\n"); - flow_cache_init(); +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + xfrm6_dst_ops.kmem_cachep = xfrm4_dst_ops.kmem_cachep; +#endif + flow_cache_init(); xfrm_state_init(); xfrm_input_init(); } + +#if defined (CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + +/* Limited flow cache. Its function now is to accelerate search for + * policy rules. + * + * Flow cache is private to cpus, at the moment this is important + * mostly for flows which do not match any rule, so that flow lookups + * are absolultely cpu-local. When a rule exists we do some updates + * to rule (refcnt, stats), so that locality is broken. Later this + * can be repaired. + */ + +/* Resolve list of templates for the flow, given policy. */ + +static int +xfrm6_tmpl_resolve(struct xfrm_policy *policy, struct flowi *fl, + struct xfrm_state **xfrm) +{ + int nx; + int i, error; + struct in6_addr *daddr = fl->fl6_dst; + struct in6_addr *saddr = fl->fl6_src; + + for (nx=0, i = 0; i < policy->xfrm_nr; i++) { + struct xfrm_state *x=NULL; + struct in6_addr *remote = daddr; + struct in6_addr *local = saddr; + struct xfrm_tmpl *tmpl = &policy->xfrm_vec[i]; + + if (tmpl->mode) { + remote = (struct in6_addr*)&tmpl->id.daddr; + local = (struct in6_addr*)&tmpl->saddr; + } + + x = xfrm6_state_find(remote, local, fl, tmpl, policy, &error); + + if (x && x->km.state == XFRM_STATE_VALID) { + xfrm[nx++] = x; + daddr = remote; + saddr = local; + continue; + } + + if (x) { + error = (x->km.state == XFRM_STATE_ERROR ? + -EINVAL : -EAGAIN); + xfrm_state_put(x); + } + + if (!tmpl->optional) + goto fail; + } + return nx; + +fail: + for (nx--; nx>=0; nx--) + xfrm_state_put(xfrm[nx]); + return error; +} + +/* Check that the bundle accepts the flow and its components are + * still valid. + */ + +static int xfrm6_bundle_ok(struct xfrm_dst *xdst, struct flowi *fl) +{ + do { + if (xdst->u.dst.ops != &xfrm6_dst_ops) + return 1; + + if (!xfrm6_selector_match(&xdst->u.dst.xfrm->sel, fl)) + return 0; + if (xdst->u.dst.xfrm->km.state != XFRM_STATE_VALID || + xdst->u.dst.path->obsolete > 0) + return 0; + xdst = (struct xfrm_dst*)xdst->u.dst.child; + } while (xdst); + return 0; +} + + +/* Allocate chain of dst_entry's, attach known xfrm's, calculate + * all the metrics... Shortly, bundle a bundle. + */ + +static int +xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int nx, + struct flowi *fl, struct dst_entry **dst_p) +{ + struct dst_entry *dst, *dst_prev; + struct rt6_info *rt0 = (struct rt6_info*)(*dst_p); + struct rt6_info *rt = rt0; + struct in6_addr *remote = fl->fl6_dst; + struct in6_addr *local = fl->fl6_src; + int i; + int err = 0; + int header_len = 0; + + dst = dst_prev = NULL; + + for (i = 0; i < nx; i++) { + struct dst_entry *dst1 = dst_alloc(&xfrm6_dst_ops); + + if (unlikely(dst1 == NULL)) { + err = -ENOBUFS; + goto error; + } + + dst1->xfrm = xfrm[i]; + if (!dst) + dst = dst1; + else { + dst_prev->child = dst1; + dst1->flags |= DST_NOHASH; + dst_clone(dst1); + } + dst_prev = dst1; + if (xfrm[i]->props.mode) { + remote = (struct in6_addr*)&xfrm[i]->id.daddr; + local = (struct in6_addr*)&xfrm[i]->props.saddr; + } + header_len += xfrm[i]->props.header_len; + } + + if (remote != fl->fl6_dst) { + struct flowi fl_tunnel; + memset(&fl_tunnel, 0, sizeof(fl_tunnel)); + fl_tunnel.fl6_dst = remote; + fl_tunnel.fl6_src = local; + + rt = (struct rt6_info*)ip6_route_output(NULL, &fl_tunnel); + if (err) + goto error; + } else { + dst_clone(&rt->u.dst); + } + + dst_prev->child = &rt->u.dst; + for (dst_prev = dst; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) { + struct xfrm_dst *x = (struct xfrm_dst*)dst_prev; + x->u.rt.fl = *fl; + + dst_prev->dev = rt->u.dst.dev; + if (rt->u.dst.dev) + dev_hold(rt->u.dst.dev); + dst_prev->obsolete = -1; + dst_prev->flags |= DST_HOST; + dst_prev->lastuse = jiffies; + dst_prev->header_len = header_len; + memcpy(&dst_prev->metrics, &rt->u.dst.metrics, sizeof(dst_prev->metrics)); + dst_prev->path = &rt->u.dst; + + /* Copy neighbout for reachability confirmation */ + dst_prev->neighbour = neigh_clone(rt->u.dst.neighbour); + dst_prev->input = rt->u.dst.input; + dst_prev->output = dst_prev->xfrm->type->output; + /* Sheit... I remember I did this right. Apparently, + * it was magically lost, so this code needs audit */ + x->u.rt6.rt6i_flags = rt0->rt6i_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL); + x->u.rt6.rt6i_metric = rt0->rt6i_metric; + x->u.rt6.rt6i_node = rt0->rt6i_node; + x->u.rt6.rt6i_hoplimit = rt0->rt6i_hoplimit; + x->u.rt6.rt6i_gateway = rt0->rt6i_gateway; + memcpy(&x->u.rt6.rt6i_gateway, &rt0->rt6i_gateway, sizeof(x->u.rt6.rt6i_gateway)); + header_len -= x->u.dst.xfrm->props.header_len; + } + *dst_p = dst; + return 0; + +error: + if (dst) + dst_free(dst); + return err; +} + +static inline int xfrm6_garbage_collect(void) +{ + __xfrm_garbage_collect(); + return (atomic_read(&xfrm6_dst_ops.entries) > xfrm6_dst_ops.gc_thresh*2); +} + +static int bundle6_depends_on(struct dst_entry *dst, struct xfrm_state *x) +{ + do { + if (dst->xfrm == x) + return 1; + } while ((dst = dst->child) != NULL); + return 0; +} + +int xfrm6_flush_bundles(struct xfrm_state *x) +{ + int i; + struct xfrm_policy *pol; + struct dst_entry *dst, **dstp, *gc_list = NULL; + + read_lock_bh(&xfrm_policy_lock); + for (i=0; i<2*XFRM_POLICY_MAX; i++) { + for (pol = xfrm_policy_list[i]; pol; pol = pol->next) { + write_lock(&pol->lock); + dstp = &pol->bundles; + while ((dst=*dstp) != NULL) { + if (bundle6_depends_on(dst, x)) { + *dstp = dst->next; + dst->next = gc_list; + gc_list = dst; + } else { + dstp = &dst->next; + } + } + write_unlock(&pol->lock); + } + } + read_unlock_bh(&xfrm_policy_lock); + + while (gc_list) { + dst = gc_list; + gc_list = dst->next; + dst_free(dst); + } + + return 0; +} + +static void xfrm6_update_pmtu(struct dst_entry *dst, u32 mtu) +{ + struct dst_entry *path = dst->path; + + if (mtu >= 1280 && mtu < dst_pmtu(dst)) + return; + + path->ops->update_pmtu(path, mtu); +} + +/* Well... that's _TASK_. We need to scan through transformation + * list and figure out what mss tcp should generate in order to + * final datagram fit to mtu. Mama mia... :-) + * + * Apparently, some easy way exists, but we used to choose the most + * bizarre ones. :-) So, raising Kalashnikov... tra-ta-ta. + * + * Consider this function as something like dark humour. :-) + */ +static int xfrm6_get_mss(struct dst_entry *dst, u32 mtu) +{ + int res = mtu - dst->header_len; + + for (;;) { + struct dst_entry *d = dst; + int m = res; + + do { + struct xfrm_state *x = d->xfrm; + if (x) { + spin_lock_bh(&x->lock); + if (x->km.state == XFRM_STATE_VALID && + x->type && x->type->get_max_size) + m = x->type->get_max_size(d->xfrm, m); + else + m += x->props.header_len; + spin_unlock_bh(&x->lock); + } + } while ((d = d->child) != NULL); + + if (m <= mtu) + break; + res -= (m - mtu); + if (res < 88) + return mtu; + } + + return res + dst->header_len; +} + +struct dst_ops xfrm6_dst_ops = { + .family = AF_INET6, + .protocol = __constant_htons(ETH_P_IPV6), + .gc = xfrm6_garbage_collect, + .check = xfrm_dst_check, + .destroy = xfrm_dst_destroy, + .negative_advice = xfrm_negative_advice, + .link_failure = xfrm_link_failure, + .update_pmtu = xfrm6_update_pmtu, + .get_mss = xfrm6_get_mss, + .gc_thresh = 1024, + .entry_size = sizeof(struct xfrm_dst), +}; + +#endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv4/xfrm_state.c linux25/net/ipv4/xfrm_state.c --- linux-2.5.62+cs1.1002/net/ipv4/xfrm_state.c 2003-02-23 17:53:46.000000000 +0900 +++ linux25/net/ipv4/xfrm_state.c 2003-02-23 13:25:00.000000000 +0900 @@ -1,3 +1,11 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + #include #include #include @@ -165,8 +173,19 @@ spin_unlock(&xfrm_state_lock); if (del_timer(&x->timer)) atomic_dec(&x->refcnt); - if (atomic_read(&x->refcnt) != 1) - xfrm_flush_bundles(x); + if (atomic_read(&x->refcnt) != 1) { + switch (x->props.family) { + case AF_INET: + xfrm_flush_bundles(x); + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + xfrm6_flush_bundles(x); + break; +#endif + default:; + } + } } if (kill && x->type) @@ -290,6 +309,7 @@ x->props.saddr.xfrm4_addr = saddr; x->props.mode = tmpl->mode; x->props.reqid = tmpl->reqid; + x->props.family = AF_INET; if (km_query(x, tmpl, pol) == 0) { x->km.state = XFRM_STATE_ACQ; @@ -322,10 +342,18 @@ { unsigned h = 0; - if (x->props.family == AF_INET) + switch (x->props.family) { + case AF_INET: h = ntohl(x->id.daddr.xfrm4_addr); - else if (x->props.family == AF_INET6) + break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: h = ntohl(x->id.daddr.a6[2]^x->id.daddr.a6[3]); + break; +#endif + default: + return; + } h = (h ^ (h>>16)) % XFRM_DST_HSIZE; @@ -448,6 +476,7 @@ x0->props.family = AF_INET; x0->props.mode = mode; x0->props.reqid = reqid; + x0->props.family = AF_INET; x0->lft.hard_add_expires_seconds = ACQ_EXPIRES; atomic_inc(&x0->refcnt); mod_timer(&x0->timer, jiffies + ACQ_EXPIRES*HZ); @@ -836,4 +865,114 @@ wake_up(&km_waitq); } } + +struct xfrm_state * +xfrm6_state_find(struct in6_addr *daddr, struct in6_addr *saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, + struct xfrm_policy *pol, int *err) +{ + unsigned h = ntohl(daddr->s6_addr32[2]^daddr->s6_addr32[3]); + struct xfrm_state *x = NULL; + int acquire_in_progress = 0; + int error = 0; + struct xfrm_state *best = NULL; + + h = (h ^ (h>>16)) % XFRM_DST_HSIZE; + + spin_lock_bh(&xfrm_state_lock); + list_for_each_entry(x, xfrm_state_bydst+h, bydst) { + if (x->props.family == AF_INET6&& + !memcmp(daddr, &x->id.daddr, sizeof(*daddr)) && + x->props.reqid == tmpl->reqid && + (!memcmp(saddr, &x->props.saddr, sizeof(*saddr))|| ipv6_addr_any(saddr)) && + tmpl->mode == x->props.mode && + tmpl->id.proto == x->id.proto) { + /* Resolution logic: + 1. There is a valid state with matching selector. + Done. + 2. Valid state with inappropriate selector. Skip. + + Entering area of "sysdeps". + + 3. If state is not valid, selector is temporary, + it selects only session which triggered + previous resolution. Key manager will do + something to install a state with proper + selector. + */ + if (x->km.state == XFRM_STATE_VALID) { + if (!xfrm6_selector_match(&x->sel, fl)) + continue; + if (!best || + best->km.dying > x->km.dying || + (best->km.dying == x->km.dying && + best->curlft.add_time < x->curlft.add_time)) + best = x; + } else if (x->km.state == XFRM_STATE_ACQ) { + acquire_in_progress = 1; + } else if (x->km.state == XFRM_STATE_ERROR || + x->km.state == XFRM_STATE_EXPIRED) { + if (xfrm6_selector_match(&x->sel, fl)) + error = 1; + } + } + } + + if (best) { + atomic_inc(&best->refcnt); + spin_unlock_bh(&xfrm_state_lock); + return best; + } + x = NULL; + if (!error && !acquire_in_progress && + ((x = xfrm_state_alloc()) != NULL)) { + /* Initialize temporary selector matching only + * to current session. */ + memcpy(&x->sel.daddr, fl->fl6_dst, sizeof(struct in6_addr)); + memcpy(&x->sel.saddr, fl->fl6_src, sizeof(struct in6_addr)); + x->sel.dport = fl->uli_u.ports.dport; + x->sel.dport_mask = ~0; + x->sel.sport = fl->uli_u.ports.sport; + x->sel.sport_mask = ~0; + x->sel.prefixlen_d = 128; + x->sel.prefixlen_s = 128; + x->sel.proto = fl->proto; + x->sel.ifindex = fl->oif; + x->id = tmpl->id; + if (ipv6_addr_any((struct in6_addr*)&x->id.daddr)) + memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr)); + memcpy(&x->props.saddr, &tmpl->saddr, sizeof(x->props.saddr)); + if (ipv6_addr_any((struct in6_addr*)&x->props.saddr)) + memcpy(&x->props.saddr, &saddr, sizeof(x->sel.saddr)); + x->props.mode = tmpl->mode; + x->props.reqid = tmpl->reqid; + x->props.family = AF_INET6; + + if (km_query(x, tmpl, pol) == 0) { + x->km.state = XFRM_STATE_ACQ; + list_add_tail(&x->bydst, xfrm_state_bydst+h); + atomic_inc(&x->refcnt); + if (x->id.spi) { + struct in6_addr *addr = (struct in6_addr*)&x->id.daddr; + h = ntohl((addr->s6_addr32[2]^addr->s6_addr32[3])^x->id.spi^x->id.proto); + h = (h ^ (h>>10) ^ (h>>20)) % XFRM_DST_HSIZE; + list_add(&x->byspi, xfrm_state_byspi+h); + atomic_inc(&x->refcnt); + } + x->lft.hard_add_expires_seconds = ACQ_EXPIRES; + atomic_inc(&x->refcnt); + mod_timer(&x->timer, ACQ_EXPIRES*HZ); + } else { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + x = NULL; + error = 1; + } + } + spin_unlock_bh(&xfrm_state_lock); + if (!x) + *err = acquire_in_progress ? -EAGAIN : + (error ? -ESRCH : -ENOMEM); + return x; +} + #endif /* CONFIG_IPV6 || CONFIG_IPV6_MODULE */ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/Makefile linux25/net/ipv6/Makefile --- linux-2.5.62+cs1.1002/net/ipv6/Makefile 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/Makefile 2003-02-23 15:26:11.000000000 +0900 @@ -10,4 +10,6 @@ exthdrs.o sysctl_net_ipv6.o datagram.o proc.o \ ip6_flowlabel.o ipv6_syms.o +obj-$(CONFIG_INET_AH) += ah.o +obj-$(CONFIG_INET_ESP) += esp.o obj-$(CONFIG_NETFILTER) += netfilter/ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/ah.c linux25/net/ipv6/ah.c --- linux-2.5.62+cs1.1002/net/ipv6/ah.c 1970-01-01 09:00:00.000000000 +0900 +++ linux25/net/ipv6/ah.c 2003-02-23 20:52:24.000000000 +0900 @@ -0,0 +1,345 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define AH_HLEN_NOICV 12 + +/* XXX no ipv6 ah specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +int ah6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = sizeof(struct ipv6hdr); + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL; + struct ip_auth_hdr *ah; + struct ah_data *ahp; + u16 nh_offset = 0; + u8 nexthdr; +printk(KERN_DEBUG "%s\n", __FUNCTION__); + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + if (x->props.mode) { + iph = skb->nh.ipv6h; + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + skb->nh.ipv6h->version = 6; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb->nh.ipv6h->nexthdr = IPPROTO_AH; + memcpy(&skb->nh.ipv6h->saddr, &x->props.saddr, sizeof(struct in6_addr)); + memcpy(&skb->nh.ipv6h->daddr, &x->id.daddr, sizeof(struct in6_addr)); + ah = (struct ip_auth_hdr*)(skb->nh.ipv6h+1); + ah->nexthdr = IPPROTO_IPV6; + } else { + hdr_len = skb->h.raw - skb->nh.raw; + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->data, hdr_len); + skb->nh.ipv6h = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + memcpy(skb->nh.ipv6h, iph, hdr_len); + nexthdr = xfrm6_clear_mutable_options(skb, &nh_offset, XFRM_POLICY_OUT); + if (nexthdr == 0) + goto error; + + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + ah = (struct ip_auth_hdr*)(skb->nh.raw+hdr_len); + skb->h.raw = (unsigned char*) ah; + ah->nexthdr = nexthdr; + } + + skb->nh.ipv6h->priority = 0; + skb->nh.ipv6h->flow_lbl[0] = 0; + skb->nh.ipv6h->flow_lbl[1] = 0; + skb->nh.ipv6h->flow_lbl[2] = 0; + skb->nh.ipv6h->hop_limit = 0; + + ahp = x->data; + ah->hdrlen = (XFRM_ALIGN8(ahp->icv_trunc_len + + AH_HLEN_NOICV) >> 2) - 2; + + ah->reserved = 0; + ah->spi = x->id.spi; + ah->seq_no = htonl(++x->replay.oseq); + ahp->icv(ahp, skb, ah->auth_data); + + if (x->props.mode) { + skb->nh.ipv6h->hop_limit = iph->hop_limit; + skb->nh.ipv6h->priority = iph->priority; + skb->nh.ipv6h->flow_lbl[0] = iph->flow_lbl[0]; + skb->nh.ipv6h->flow_lbl[1] = iph->flow_lbl[1]; + skb->nh.ipv6h->flow_lbl[2] = iph->flow_lbl[2]; + } else { + memcpy(skb->nh.ipv6h, iph, hdr_len); + skb->nh.raw[nh_offset] = IPPROTO_AH; + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + kfree (iph); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int ah6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + int ah_hlen; + struct ipv6hdr *iph; + struct ipv6_auth_hdr *ah; + struct ah_data *ahp; + unsigned char *tmp_hdr = NULL; + int hdr_len = skb->h.raw - skb->nh.raw; + u8 nexthdr = 0; + + if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) + goto out; + + ah = (struct ipv6_auth_hdr*)skb->data; + ahp = x->data; + ah_hlen = (ah->hdrlen + 2) << 2; + + if (ah_hlen != XFRM_ALIGN8(ahp->icv_full_len + AH_HLEN_NOICV) && + ah_hlen != XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV)) + goto out; + + if (!pskb_may_pull(skb, ah_hlen)) + goto out; + + /* We are going to _remove_ AH header to keep sockets happy, + * so... Later this can change. */ + if (skb_cloned(skb) && + pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) + goto out; + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + ah = (struct ipv6_auth_hdr*)skb->data; + iph = skb->nh.ipv6h; + + { + u8 auth_data[ahp->icv_trunc_len]; + + memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len); + skb_push(skb, skb->data - skb->nh.raw); + ahp->icv(ahp, skb, ah->auth_data); + if (memcmp(ah->auth_data, auth_data, ahp->icv_trunc_len)) { + if (net_ratelimit()) + printk(KERN_WARNING "ipsec ah authentication error\n"); + x->stats.integrity_failed++; + goto free_out; + } + } + + nexthdr = ah->nexthdr; + skb->nh.raw = skb_pull(skb, (ah->hdrlen+2)<<2); + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + skb->nh.ipv6h->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + skb_pull(skb, hdr_len); + skb->h.raw = skb->data; + + + kfree(tmp_hdr); + + return nexthdr; + +free_out: + kfree(tmp_hdr); +out: + return -EINVAL; +} + +void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, ah->spi, IPPROTO_AH); + if (!x) + return; + + printk(KERN_DEBUG "pmtu discvovery on SA AH/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(ah->spi), NIP6(iph->daddr)); + + xfrm_state_put(x); +} + +static int ah6_init_state(struct xfrm_state *x, void *args) +{ + struct ah_data *ahp = NULL; + struct xfrm_algo_desc *aalg_desc; + + /* null auth can use a zero length key */ + if (x->aalg->alg_key_len > 512) + goto error; + + ahp = kmalloc(sizeof(*ahp), GFP_KERNEL); + if (ahp == NULL) + return -ENOMEM; + + memset(ahp, 0, sizeof(*ahp)); + + ahp->key = x->aalg->alg_key; + ahp->key_len = (x->aalg->alg_key_len+7)/8; + ahp->tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (!ahp->tfm) + goto error; + ahp->icv = ah_hmac_digest; + + /* + * Lookup the algorithm description maintained by xfrm_algo, + * verify crypto transform properties, and store information + * we need for AH processing. This lookup cannot fail here + * after a successful crypto_alloc_tfm(). + */ + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(ahp->tfm)) { + printk(KERN_INFO "AH: %s digestsize %u != %hu\n", + x->aalg->alg_name, crypto_tfm_alg_digestsize(ahp->tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + ahp->icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + ahp->work_icv = kmalloc(ahp->icv_full_len, GFP_KERNEL); + if (!ahp->work_icv) + goto error; + + x->props.header_len = XFRM_ALIGN8(ahp->icv_trunc_len + AH_HLEN_NOICV); + if (x->props.mode) + x->props.header_len += 20; + x->data = ahp; + + return 0; + +error: + if (ahp) { + if (ahp->work_icv) + kfree(ahp->work_icv); + if (ahp->tfm) + crypto_free_tfm(ahp->tfm); + kfree(ahp); + } + return -EINVAL; +} + +static void ah6_destroy(struct xfrm_state *x) +{ + struct ah_data *ahp = x->data; + + if (ahp->work_icv) { + kfree(ahp->work_icv); + ahp->work_icv = NULL; + } + if (ahp->tfm) { + crypto_free_tfm(ahp->tfm); + ahp->tfm = NULL; + } +} + +static struct xfrm_type ah6_type = +{ + .description = "AH6", + .proto = IPPROTO_AH, + .init_state = ah6_init_state, + .destructor = ah6_destroy, + .input = ah6_input, + .output = ah6_output +}; + +static struct inet6_protocol ah6_protocol = { + .handler = xfrm6_rcv, + .err_handler = ah6_err, +}; + +int __init ah6_init(void) +{ + SET_MODULE_OWNER(&ah6_type); + + if (xfrm6_register_type(&ah6_type) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add xfrm type\n"); + return -EAGAIN; + } + + if (inet6_add_protocol(&ah6_protocol, IPPROTO_AH) < 0) { + printk(KERN_INFO "ipv6 ah init: can't add protocol\n"); + xfrm6_unregister_type(&ah6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit ah6_fini(void) +{ + if (inet6_del_protocol(&ah6_protocol, IPPROTO_AH) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove protocol\n"); + + if (xfrm6_unregister_type(&ah6_type) < 0) + printk(KERN_INFO "ipv6 ah close: can't remove xfrm type\n"); + +} + +module_init(ah6_init); +module_exit(ah6_fini); + +MODULE_LICENSE("GPL"); diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/esp.c linux25/net/ipv6/esp.c --- linux-2.5.62+cs1.1002/net/ipv6/esp.c 1970-01-01 09:00:00.000000000 +0900 +++ linux25/net/ipv6/esp.c 2003-02-23 20:52:24.000000000 +0900 @@ -0,0 +1,508 @@ +/* Changes + * + * Mitsuru KANDA @USAGI : IPv6 Support + * Kazunori MIYAZAWA @USAGI : + * Kunihiro Ishiguro : + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_SG_ONSTACK 4 + +/* BUGS: + * - we assume replay seqno is always present. + */ + +/* Move to common area: it is shared with AH. */ +/* Common with AH after some work on arguments. */ + +/* XXX no ipv6 esp specific */ +#define NIP6(addr) \ + ntohs((addr).s6_addr16[0]),\ + ntohs((addr).s6_addr16[1]),\ + ntohs((addr).s6_addr16[2]),\ + ntohs((addr).s6_addr16[3]),\ + ntohs((addr).s6_addr16[4]),\ + ntohs((addr).s6_addr16[5]),\ + ntohs((addr).s6_addr16[6]),\ + ntohs((addr).s6_addr16[7]) + +static int get_offset(u8 *packet, u32 packet_len, u8 *nexthdr, struct ipv6_opt_hdr **prevhdr) +{ + u16 offset = sizeof(struct ipv6hdr); + struct ipv6_opt_hdr *exthdr = (struct ipv6_opt_hdr*)(packet + offset); + u8 nextnexthdr; + + *nexthdr = ((struct ipv6hdr*)packet)->nexthdr; + + while (offset + 1 < packet_len) { + + switch (*nexthdr) { + + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + case NEXTHDR_DEST: + nextnexthdr = + ((struct ipv6_opt_hdr*)(packet + offset + ipv6_optlen(exthdr)))->nexthdr; + /* XXX We know the option is inner dest opt + with next next header check. */ + if (nextnexthdr != NEXTHDR_HOP && + nextnexthdr != NEXTHDR_ROUTING && + nextnexthdr != NEXTHDR_DEST) { + return offset; + } + offset += ipv6_optlen(exthdr); + *nexthdr = exthdr->nexthdr; + *prevhdr = exthdr; + exthdr = (struct ipv6_opt_hdr*)(packet + offset); + break; + + default : + return offset; + } + } + + return offset; +} + +int esp6_output(struct sk_buff *skb) +{ + int err; + int hdr_len = 0; + struct dst_entry *dst = skb->dst; + struct xfrm_state *x = dst->xfrm; + struct ipv6hdr *iph = NULL, *top_iph; + struct ip_esp_hdr *esph; + struct crypto_tfm *tfm; + struct esp_data *esp; + struct sk_buff *trailer; + struct ipv6_opt_hdr *prevhdr = NULL; + int blksize; + int clen; + int alen; + int nfrags; + u8 nexthdr; +printk(KERN_DEBUG "%s\n", __FUNCTION__); + /* First, if the skb is not checksummed, complete checksum. */ + if (skb->ip_summed == CHECKSUM_HW && skb_checksum_help(skb) == NULL) + return -EINVAL; + + spin_lock_bh(&x->lock); + if ((err = xfrm_state_check_expire(x)) != 0) + goto error; + if ((err = xfrm_state_check_space(x, skb)) != 0) + goto error; + + err = -ENOMEM; + + /* Strip IP header in transport mode. Save it. */ + + if (!x->props.mode) { + hdr_len = get_offset(skb->nh.raw, skb->len, &nexthdr, &prevhdr); + iph = kmalloc(hdr_len, GFP_ATOMIC); + if (!iph) { + err = -ENOMEM; + goto error; + } + memcpy(iph, skb->nh.raw, hdr_len); + __skb_pull(skb, hdr_len); + } + + /* Now skb is pure payload to encrypt */ + + /* Round to block size */ + clen = skb->len; + + esp = x->data; + alen = esp->auth.icv_trunc_len; + tfm = esp->conf.tfm; + blksize = crypto_tfm_alg_blocksize(tfm); + clen = (clen + 2 + blksize-1)&~(blksize-1); + if (esp->conf.padlen) + clen = (clen + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + if ((nfrags = skb_cow_data(skb, clen-skb->len+alen, &trailer)) < 0) { + if (!x->props.mode && iph) kfree(iph); + goto error; + } + + /* Fill padding... */ + do { + int i; + for (i=0; ilen - 2; i++) + *(u8*)(trailer->tail + i) = i+1; + } while (0); + *(u8*)(trailer->tail + clen-skb->len - 2) = (clen - skb->len)-2; + pskb_put(skb, trailer, clen - skb->len); + + if (x->props.mode) { + iph = skb->nh.ipv6h; + top_iph = (struct ipv6hdr*)skb_push(skb, x->props.header_len); + esph = (struct ip_esp_hdr*)(top_iph+1); + *(u8*)(trailer->tail - 1) = IPPROTO_IPV6; + top_iph->version = 6; + top_iph->priority = iph->priority; + top_iph->flow_lbl[0] = iph->flow_lbl[0]; + top_iph->flow_lbl[1] = iph->flow_lbl[1]; + top_iph->flow_lbl[2] = iph->flow_lbl[2]; + top_iph->nexthdr = IPPROTO_ESP; + top_iph->payload_len = htons(skb->len + alen); + top_iph->hop_limit = iph->hop_limit; + memcpy(&top_iph->saddr, (struct in6_addr *)&x->props.saddr, sizeof(struct ipv6hdr)); + memcpy(&top_iph->daddr, (struct in6_addr *)&x->id.daddr, sizeof(struct ipv6hdr)); + } else { + /* XXX exthdr */ + esph = (struct ip_esp_hdr*)skb_push(skb, x->props.header_len); + skb->h.raw = (unsigned char*)esph; + top_iph = (struct ipv6hdr*)skb_push(skb, hdr_len); + memcpy(top_iph, iph, hdr_len); + kfree(iph); + top_iph->payload_len = htons(skb->len + alen - sizeof(struct ipv6hdr)); + if (prevhdr) { + prevhdr->nexthdr = IPPROTO_ESP; + } else { + top_iph->nexthdr = IPPROTO_ESP; + } + *(u8*)(trailer->tail - 1) = nexthdr; + } + + esph->spi = x->id.spi; + esph->seq_no = htonl(++x->replay.oseq); + + if (esp->conf.ivlen) + crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + + do { + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto error; + } + skb_to_sgvec(skb, sg, esph->enc_data+esp->conf.ivlen-skb->data, clen); + crypto_cipher_encrypt(tfm, sg, sg, clen); + if (unlikely(sg != sgbuf)) + kfree(sg); + } while (0); + + if (esp->conf.ivlen) { + memcpy(esph->enc_data, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + crypto_cipher_get_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + } + + if (esp->auth.icv_full_len) { + esp->auth.icv(esp, skb, (u8*)esph-skb->data, + 8+esp->conf.ivlen+clen, trailer->tail); + pskb_put(skb, trailer, alen); + } + + skb->nh.raw = skb->data; + + x->curlft.bytes += skb->len; + x->curlft.packets++; + spin_unlock_bh(&x->lock); + if ((skb->dst = dst_pop(dst)) == NULL) + goto error_nolock; + return NET_XMIT_BYPASS; + +error: + spin_unlock_bh(&x->lock); +error_nolock: + kfree_skb(skb); + return err; +} + +int esp6_input(struct xfrm_state *x, struct sk_buff *skb) +{ + struct ipv6hdr *iph; + struct ip_esp_hdr *esph; + struct esp_data *esp = x->data; + struct sk_buff *trailer; + int blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + int alen = esp->auth.icv_trunc_len; + int elen = skb->len - 8 - esp->conf.ivlen - alen; + + int hdr_len = skb->h.raw - skb->nh.raw; + int nfrags; + u8 ret_nexthdr = 0; + unsigned char *tmp_hdr = NULL; + + if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr))) + goto out; + + if (elen <= 0 || (elen & (blksize-1))) + goto out; + + tmp_hdr = kmalloc(hdr_len, GFP_ATOMIC); + if (!tmp_hdr) + goto out; + memcpy(tmp_hdr, skb->nh.raw, hdr_len); + + /* If integrity check is required, do this. */ + if (esp->auth.icv_full_len) { + u8 sum[esp->auth.icv_full_len]; + u8 sum1[alen]; + + esp->auth.icv(esp, skb, 0, skb->len-alen, sum); + + if (skb_copy_bits(skb, skb->len-alen, sum1, alen)) + BUG(); + + if (unlikely(memcmp(sum, sum1, alen))) { + x->stats.integrity_failed++; + goto out; + } + } + + if ((nfrags = skb_cow_data(skb, 0, &trailer)) < 0) + goto out; + + skb->ip_summed = CHECKSUM_NONE; + + esph = (struct ip_esp_hdr*)skb->data; + iph = skb->nh.ipv6h; + + /* Get ivec. This can be wrong, check against another impls. */ + if (esp->conf.ivlen) + crypto_cipher_set_iv(esp->conf.tfm, esph->enc_data, crypto_tfm_alg_ivsize(esp->conf.tfm)); + + { + u8 nexthdr[2]; + struct scatterlist sgbuf[nfrags>MAX_SG_ONSTACK ? 0 : nfrags]; + struct scatterlist *sg = sgbuf; + u8 padlen; + + if (unlikely(nfrags > MAX_SG_ONSTACK)) { + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto out; + } + skb_to_sgvec(skb, sg, 8+esp->conf.ivlen, elen); + crypto_cipher_decrypt(esp->conf.tfm, sg, sg, elen); + if (unlikely(sg != sgbuf)) + kfree(sg); + + if (skb_copy_bits(skb, skb->len-alen-2, nexthdr, 2)) + BUG(); + + padlen = nexthdr[0]; + if (padlen+2 >= elen) { + if (net_ratelimit()) { + printk(KERN_WARNING "ipsec esp packet is garbage padlen=%d, elen=%d\n", padlen+2, elen); + } + goto out; + } + /* ... check padding bits here. Silly. :-) */ + + ret_nexthdr = nexthdr[1]; + pskb_trim(skb, skb->len - alen - padlen - 2); + skb->h.raw = skb_pull(skb, 8 + esp->conf.ivlen); + skb->nh.raw += 8 + esp->conf.ivlen; + memcpy(skb->nh.raw, tmp_hdr, hdr_len); + } + kfree(tmp_hdr); + return ret_nexthdr; + +out: + return -EINVAL; +} + +static u32 esp6_get_max_size(struct xfrm_state *x, int mtu) +{ + struct esp_data *esp = x->data; + u32 blksize = crypto_tfm_alg_blocksize(esp->conf.tfm); + + if (x->props.mode) { + mtu = (mtu + 2 + blksize-1)&~(blksize-1); + } else { + /* The worst case. */ + mtu += 2 + blksize; + } + if (esp->conf.padlen) + mtu = (mtu + esp->conf.padlen-1)&~(esp->conf.padlen-1); + + return mtu + x->props.header_len + esp->auth.icv_full_len; +} + +void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + int type, int code, int offset, __u32 info) +{ + struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; + struct ip_esp_hdr *esph = (struct ip_esp_hdr*)(skb->data+offset); + struct xfrm_state *x; + + if (type != ICMPV6_DEST_UNREACH || + type != ICMPV6_PKT_TOOBIG) + return; + + x = xfrm6_state_lookup(&iph->daddr, esph->spi, IPPROTO_ESP); + if (!x) + return; + printk(KERN_DEBUG "pmtu discvovery on SA ESP/%08x/" + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + ntohl(esph->spi), NIP6(iph->daddr)); + xfrm_state_put(x); +} + +void esp6_destroy(struct xfrm_state *x) +{ + struct esp_data *esp = x->data; + + if (esp->conf.tfm) { + crypto_free_tfm(esp->conf.tfm); + esp->conf.tfm = NULL; + } + if (esp->conf.ivec) { + kfree(esp->conf.ivec); + esp->conf.ivec = NULL; + } + if (esp->auth.tfm) { + crypto_free_tfm(esp->auth.tfm); + esp->auth.tfm = NULL; + } + if (esp->auth.work_icv) { + kfree(esp->auth.work_icv); + esp->auth.work_icv = NULL; + } +} + +int esp6_init_state(struct xfrm_state *x, void *args) +{ + struct esp_data *esp = NULL; + + if (x->aalg) { + if (x->aalg->alg_key_len == 0 || x->aalg->alg_key_len > 512) + goto error; + } + if (x->ealg == NULL || x->ealg->alg_key_len == 0) + goto error; + + esp = kmalloc(sizeof(*esp), GFP_KERNEL); + if (esp == NULL) + return -ENOMEM; + + memset(esp, 0, sizeof(*esp)); + + if (x->aalg) { + struct xfrm_algo_desc *aalg_desc; + + esp->auth.key = x->aalg->alg_key; + esp->auth.key_len = (x->aalg->alg_key_len+7)/8; + esp->auth.tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); + if (esp->auth.tfm == NULL) + goto error; + esp->auth.icv = esp_hmac_digest; + + aalg_desc = xfrm_aalg_get_byname(x->aalg->alg_name); + BUG_ON(!aalg_desc); + + if (aalg_desc->uinfo.auth.icv_fullbits/8 != + crypto_tfm_alg_digestsize(esp->auth.tfm)) { + printk(KERN_INFO "ESP: %s digestsize %u != %hu\n", + x->aalg->alg_name, + crypto_tfm_alg_digestsize(esp->auth.tfm), + aalg_desc->uinfo.auth.icv_fullbits/8); + goto error; + } + + esp->auth.icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; + esp->auth.icv_trunc_len = aalg_desc->uinfo.auth.icv_truncbits/8; + + esp->auth.work_icv = kmalloc(esp->auth.icv_full_len, GFP_KERNEL); + if (!esp->auth.work_icv) + goto error; + } + esp->conf.key = x->ealg->alg_key; + esp->conf.key_len = (x->ealg->alg_key_len+7)/8; + esp->conf.tfm = crypto_alloc_tfm(x->ealg->alg_name, CRYPTO_TFM_MODE_CBC); + if (esp->conf.tfm == NULL) + goto error; + esp->conf.ivlen = crypto_tfm_alg_ivsize(esp->conf.tfm); + esp->conf.padlen = 0; + if (esp->conf.ivlen) { + esp->conf.ivec = kmalloc(esp->conf.ivlen, GFP_KERNEL); + get_random_bytes(esp->conf.ivec, esp->conf.ivlen); + } + crypto_cipher_setkey(esp->conf.tfm, esp->conf.key, esp->conf.key_len); + x->props.header_len = 8 + esp->conf.ivlen; + if (x->props.mode) + x->props.header_len += 40; /* XXX ext hdr */ + x->data = esp; + return 0; + +error: + if (esp) { + if (esp->auth.tfm) + crypto_free_tfm(esp->auth.tfm); + if (esp->auth.work_icv) + kfree(esp->auth.work_icv); + if (esp->conf.tfm) + crypto_free_tfm(esp->conf.tfm); + kfree(esp); + } + return -EINVAL; +} + +static struct xfrm_type esp6_type = +{ + .description = "ESP6", + .proto = IPPROTO_ESP, + .init_state = esp6_init_state, + .destructor = esp6_destroy, + .get_max_size = esp6_get_max_size, + .input = esp6_input, + .output = esp6_output +}; + +static struct inet6_protocol esp6_protocol = { + .handler = xfrm6_rcv, + .err_handler = esp6_err, +}; + +int __init esp6_init(void) +{ + SET_MODULE_OWNER(&esp6_type); + if (xfrm6_register_type(&esp6_type) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add xfrm type\n"); + return -EAGAIN; + } + if (inet6_add_protocol(&esp6_protocol, IPPROTO_ESP) < 0) { + printk(KERN_INFO "ipv6 esp init: can't add protocol\n"); + xfrm6_unregister_type(&esp6_type); + return -EAGAIN; + } + + return 0; +} + +static void __exit esp6_fini(void) +{ + if (inet6_del_protocol(&esp6_protocol, IPPROTO_ESP) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove protocol\n"); + if (xfrm6_unregister_type(&esp6_type) < 0) + printk(KERN_INFO "ipv6 esp close: can't remove xfrm type\n"); +} + +module_init(esp6_init); +module_exit(esp6_fini); + +MODULE_LICENSE("GPL"); diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/exthdrs.c linux25/net/ipv6/exthdrs.c --- linux-2.5.62+cs1.1002/net/ipv6/exthdrs.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/exthdrs.c 2003-02-23 13:25:00.000000000 +0900 @@ -392,7 +392,7 @@ cpu ticks, checking that sender did not something stupid and opt->hdrlen is even. Shit! --ANK (980730) */ - +#if 0 static int ipv6_auth_hdr(struct sk_buff **skb_ptr, int nhoff) { struct sk_buff *skb=*skb_ptr; @@ -424,6 +424,7 @@ kfree_skb(skb); return -1; } +#endif /* This list MUST NOT contain entry for NEXTHDR_HOP. It is parsed immediately after packet received @@ -436,7 +437,9 @@ {NEXTHDR_ROUTING, ipv6_routing_header}, {NEXTHDR_DEST, ipv6_dest_opt}, {NEXTHDR_NONE, ipv6_nodata}, + /* {NEXTHDR_AUTH, ipv6_auth_hdr}, + */ /* {NEXTHDR_ESP, ipv6_esp_hdr}, */ @@ -627,6 +630,8 @@ { if (opt->auth) prev_hdr = ipv6_build_authhdr(skb, prev_hdr, opt->auth); + + skb->h.raw = skb->tail; if (opt->dst1opt) prev_hdr = ipv6_build_exthdr(skb, prev_hdr, NEXTHDR_DEST, opt->dst1opt); return prev_hdr; @@ -689,8 +694,10 @@ void ipv6_push_frag_opts(struct sk_buff *skb, struct ipv6_txoptions *opt, u8 *proto) { - if (opt->dst1opt) + if (opt->dst1opt) { ipv6_push_exthdr(skb, proto, NEXTHDR_DEST, opt->dst1opt); + skb->h.raw = skb->data; + } if (opt->auth) ipv6_push_authhdr(skb, proto, opt->auth); } diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/ip6_input.c linux25/net/ipv6/ip6_input.c --- linux-2.5.62+cs1.1002/net/ipv6/ip6_input.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/ip6_input.c 2003-02-23 13:25:00.000000000 +0900 @@ -150,7 +150,8 @@ It would be stupid to detect for optional headers, which are missing with probability of 200% */ - if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP) { + if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP && + nexthdr != NEXTHDR_AUTH && nexthdr != NEXTHDR_ESP) { nhoff = ipv6_parse_exthdrs(&skb, nhoff); if (nhoff < 0) return 0; diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/ip6_output.c linux25/net/ipv6/ip6_output.c --- linux-2.5.62+cs1.1002/net/ipv6/ip6_output.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/ip6_output.c 2003-02-23 13:25:00.000000000 +0900 @@ -192,6 +192,11 @@ int seg_len = skb->len; int hlimit; u32 mtu; + int err = 0; + + if ((err = xfrm_lookup(&skb->dst, fl, sk, 0)) < 0) { + return err; + } if (opt) { int head_room; @@ -576,6 +581,13 @@ } pktlength = length; + if (dst) { + if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0) { + dst_release(dst); + return -ENETUNREACH; + } + } + if (hlimit < 0) { if (ipv6_addr_is_multicast(fl->fl6_dst)) hlimit = np->mcast_hops; @@ -630,10 +642,8 @@ err = 0; if (flags&MSG_PROBE) goto out; - - skb = sock_alloc_send_skb(sk, pktlength + 15 + - dev->hard_header_len, - flags & MSG_DONTWAIT, &err); + /* alloc skb with mtu as we do in the IPv4 stack for IPsec */ + skb = sock_alloc_send_skb(sk, mtu, flags & MSG_DONTWAIT, &err); if (skb == NULL) { IP6_INC_STATS(Ip6OutDiscards); @@ -663,6 +673,8 @@ err = getfrag(data, &hdr->saddr, ((char *) hdr) + (pktlength - length), 0, length); + if (!opt || !opt->dst1opt) + skb->h.raw = ((char *) hdr) + (pktlength - length); if (!err) { IP6_INC_STATS(Ip6OutRequests); diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/ndisc.c linux25/net/ipv6/ndisc.c --- linux-2.5.62+cs1.1002/net/ipv6/ndisc.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/ndisc.c 2003-02-23 13:25:00.000000000 +0900 @@ -72,6 +72,7 @@ #include #include +#include #include #include @@ -336,8 +337,6 @@ unsigned char ha[MAX_ADDR_LEN]; unsigned char *h_dest = NULL; - skb_reserve(skb, (dev->hard_header_len + 15) & ~15); - if (dev->hard_header) { if (ipv6_addr_type(daddr) & IPV6_ADDR_MULTICAST) { ndisc_mc_map(daddr, ha, dev, 1); @@ -374,10 +373,50 @@ * Send a Neighbour Advertisement */ +int ndisc_output(struct sk_buff *skb) +{ + if (skb) { + struct neighbour *neigh = (skb->dst ? skb->dst->neighbour : NULL); + if (ndisc_build_ll_hdr(skb, skb->dev, &skb->nh.ipv6h->daddr, neigh, skb->len) == 0) { + kfree_skb(skb); + return -EINVAL; + } + dev_queue_xmit(skb); + return 0; + } + return -EINVAL; +} + +static inline void ndisc_rt_init(struct rt6_info *rt, struct net_device *dev, + struct neighbour *neigh) +{ + rt->rt6i_dev = dev; + rt->rt6i_nexthop = neigh; + rt->rt6i_expires = 0; + rt->rt6i_flags = RTF_LOCAL; + rt->rt6i_metric = 0; + rt->rt6i_hoplimit = 255; + rt->u.dst.output = ndisc_output; +} + +static inline void ndisc_flow_init(struct flowi *fl, u8 type, + struct in6_addr *saddr, struct in6_addr *daddr) +{ + memset(fl, 0, sizeof(*fl)); + fl->fl6_src = saddr; + fl->fl6_dst = daddr; + fl->proto = IPPROTO_ICMPV6; + fl->uli_u.icmpt.type = type; + fl->uli_u.icmpt.code = 0; +} + static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, struct in6_addr *daddr, struct in6_addr *solicited_addr, int router, int solicited, int override, int inc_opt) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct nd_msg *msg; int len; @@ -386,6 +425,22 @@ len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_ADVERTISEMENT, solicited_addr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + if (inc_opt) { if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -401,14 +456,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, solicited_addr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *) skb_put(skb, len); + skb->h.raw = (unsigned char*) msg = (struct nd_msg *) skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; msg->icmph.icmp6_code = 0; @@ -431,7 +482,9 @@ csum_partial((__u8 *) msg, len, 0)); - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborAdvertisements); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -441,6 +494,9 @@ struct in6_addr *solicit, struct in6_addr *daddr, struct in6_addr *saddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct nd_msg *msg; @@ -455,6 +511,22 @@ saddr = &addr_buf; } + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_NEIGHBOUR_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, neigh); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); send_llinfo = dev->addr_len && ipv6_addr_type(saddr) != IPV6_ADDR_ANY; if (send_llinfo) @@ -467,14 +539,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - msg = (struct nd_msg *)skb_put(skb, len); + skb->h.raw = (unsigned char*) msg = (struct nd_msg *)skb_put(skb, len); msg->icmph.icmp6_type = NDISC_NEIGHBOUR_SOLICITATION; msg->icmph.icmp6_code = 0; msg->icmph.icmp6_cksum = 0; @@ -493,7 +561,9 @@ csum_partial((__u8 *) msg, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutNeighborSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -502,6 +572,9 @@ void ndisc_send_rs(struct net_device *dev, struct in6_addr *saddr, struct in6_addr *daddr) { + struct flowi fl; + struct rt6_info *rt = NULL; + struct dst_entry* dst; struct sock *sk = ndisc_socket->sk; struct sk_buff *skb; struct icmp6hdr *hdr; @@ -509,6 +582,22 @@ int len; int err; + rt = ndisc_get_dummy_rt(); + if (!rt) + return; + + ndisc_flow_init(&fl, NDISC_ROUTER_SOLICITATION, saddr, daddr); + ndisc_rt_init(rt, dev, NULL); + + dst = (struct dst_entry*)rt; + dst_clone(dst); + + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err < 0) { + dst_release(dst); + return; + } + len = sizeof(struct icmp6hdr); if (dev->addr_len) len += NDISC_OPT_SPACE(dev->addr_len); @@ -520,14 +609,10 @@ return; } - if (ndisc_build_ll_hdr(skb, dev, daddr, NULL, len) == 0) { - kfree_skb(skb); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); - hdr = (struct icmp6hdr *) skb_put(skb, len); + skb->h.raw = (unsigned char*) hdr = (struct icmp6hdr *) skb_put(skb, len); hdr->icmp6_type = NDISC_ROUTER_SOLICITATION; hdr->icmp6_code = 0; hdr->icmp6_cksum = 0; @@ -544,7 +629,9 @@ csum_partial((__u8 *) hdr, len, 0)); /* send it! */ - dev_queue_xmit(skb); + dst_clone(dst); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRouterSolicits); ICMP6_INC_STATS(Icmp6OutMsgs); @@ -1126,6 +1213,8 @@ struct in6_addr *addrp; struct net_device *dev; struct rt6_info *rt; + struct dst_entry *dst; + struct flowi fl; u8 *opt; int rd_len; int err; @@ -1137,6 +1226,22 @@ if (rt == NULL) return; + dst = (struct dst_entry*)rt; + + if (ipv6_get_lladdr(dev, &saddr_buf)) { + ND_PRINTK1("redirect: no link_local addr for dev\n"); + return; + } + + ndisc_flow_init(&fl, NDISC_REDIRECT, &saddr_buf, &skb->nh.ipv6h->saddr); + + dst_clone(dst); + err = xfrm_lookup(&dst, &fl, NULL, 0); + if (err) { + dst_release(dst); + return; + } + if (rt->rt6i_flags & RTF_GATEWAY) { ND_PRINTK1("ndisc_send_redirect: not a neighbour\n"); dst_release(&rt->u.dst); @@ -1165,11 +1270,6 @@ rd_len &= ~0x7; len += rd_len; - if (ipv6_get_lladdr(dev, &saddr_buf)) { - ND_PRINTK1("redirect: no link_local addr for dev\n"); - return; - } - buff = sock_alloc_send_skb(sk, MAX_HEADER + len + dev->hard_header_len + 15, 0, &err); if (buff == NULL) { @@ -1179,15 +1279,11 @@ hlen = 0; - if (ndisc_build_ll_hdr(buff, dev, &skb->nh.ipv6h->saddr, NULL, len) == 0) { - kfree_skb(buff); - return; - } - + skb_reserve(skb, (dev->hard_header_len + 15) & ~15); ip6_nd_hdr(sk, buff, dev, &saddr_buf, &skb->nh.ipv6h->saddr, IPPROTO_ICMPV6, len); - icmph = (struct icmp6hdr *) skb_put(buff, len); + skb->h.raw = (unsigned char*) icmph = (struct icmp6hdr *) skb_put(buff, len); memset(icmph, 0, sizeof(struct icmp6hdr)); icmph->icmp6_type = NDISC_REDIRECT; @@ -1225,7 +1321,8 @@ len, IPPROTO_ICMPV6, csum_partial((u8 *) icmph, len, 0)); - dev_queue_xmit(buff); + skb->dst = dst; + dst_output(skb); ICMP6_INC_STATS(Icmp6OutRedirects); ICMP6_INC_STATS(Icmp6OutMsgs); diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/raw.c linux25/net/ipv6/raw.c --- linux-2.5.62+cs1.1002/net/ipv6/raw.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/raw.c 2003-02-23 13:25:00.000000000 +0900 @@ -45,6 +45,7 @@ #include #include +#include struct sock *raw_v6_htable[RAWV6_HTABLE_SIZE]; rwlock_t raw_v6_lock = RW_LOCK_UNLOCKED; @@ -304,6 +305,11 @@ struct inet_opt *inet = inet_sk(sk); struct raw6_opt *raw_opt = raw6_sk(sk); + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return NET_RX_DROP; + } + if (!raw_opt->checksum) skb->ip_summed = CHECKSUM_UNNECESSARY; diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/route.c linux25/net/ipv6/route.c --- linux-2.5.62+cs1.1002/net/ipv6/route.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/route.c 2003-02-23 13:25:00.000000000 +0900 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -128,6 +129,12 @@ rwlock_t rt6_lock = RW_LOCK_UNLOCKED; +/* Dummy rt for ndisc */ +struct rt6_info *ndisc_get_dummy_rt() +{ + return dst_alloc(&ip6_dst_ops); +} + /* * Route lookup. Any rt6_lock is implied. */ diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/tcp_ipv6.c linux25/net/ipv6/tcp_ipv6.c --- linux-2.5.62+cs1.1002/net/ipv6/tcp_ipv6.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/tcp_ipv6.c 2003-02-23 13:25:00.000000000 +0900 @@ -51,6 +51,7 @@ #include #include #include +#include #include @@ -678,6 +679,9 @@ fl.nl_u.ip6_u.daddr = rt0->addr; } + if (!fl.fl6_src) + fl.fl6_src = &np->saddr; + dst = ip6_route_output(sk, &fl); if ((err = dst->error) != 0) { @@ -1638,6 +1642,9 @@ if (sk_filter(sk, skb, 0)) goto discard_and_relse; + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) + goto discard_it; + skb->dev = NULL; bh_lock_sock(sk); @@ -1653,6 +1660,9 @@ return ret; no_tcp_socket: + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(TcpInErrs); @@ -1672,8 +1682,11 @@ discard_and_relse: sock_put(sk); goto discard_it; - + do_time_wait: + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard_and_relse; + if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(TcpInErrs); sock_put(sk); diff -ruN -x CVS linux-2.5.62+cs1.1002/net/ipv6/udp.c linux25/net/ipv6/udp.c --- linux-2.5.62+cs1.1002/net/ipv6/udp.c 2003-02-23 17:53:47.000000000 +0900 +++ linux25/net/ipv6/udp.c 2003-02-23 13:25:01.000000000 +0900 @@ -50,6 +50,7 @@ #include #include +#include DEFINE_SNMP_STAT(struct udp_mib, udp_stats_in6); @@ -541,6 +542,11 @@ static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) { + if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + return -1; + } + #if defined(CONFIG_FILTER) if (sk->filter && skb->ip_summed != CHECKSUM_UNNECESSARY) { if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum))) { @@ -646,6 +652,9 @@ if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto short_packet; + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) + goto discard; + saddr = &skb->nh.ipv6h->saddr; daddr = &skb->nh.ipv6h->daddr; uh = skb->h.uh; diff -ruN -x CVS linux-2.5.62+cs1.1002/net/netsyms.c linux25/net/netsyms.c --- linux-2.5.62+cs1.1002/net/netsyms.c 2003-02-23 17:53:50.000000000 +0900 +++ linux25/net/netsyms.c 2003-02-23 13:24:59.000000000 +0900 @@ -325,12 +325,15 @@ EXPORT_SYMBOL(xfrm_policy_byid); EXPORT_SYMBOL(xfrm_policy_list); #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) +EXPORT_SYMBOL(xfrm6_state_find); +EXPORT_SYMBOL(xfrm6_rcv); EXPORT_SYMBOL(xfrm6_state_lookup); EXPORT_SYMBOL(xfrm6_find_acq); EXPORT_SYMBOL(xfrm6_alloc_spi); EXPORT_SYMBOL(xfrm6_register_type); EXPORT_SYMBOL(xfrm6_unregister_type); EXPORT_SYMBOL(xfrm6_get_type); +EXPORT_SYMBOL(xfrm6_clear_mutable_options); #endif EXPORT_SYMBOL_GPL(xfrm_probe_algs); From laforge@netfilter.org Sun Feb 23 11:30:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 11:30:46 -0800 (PST) Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1NJU63v002174 for ; Sun, 23 Feb 2003 11:30:08 -0800 Received: from sunbeam-tap0.de.gnumonks.org ([192.168.200.2] helo=sunbeam.gnumonks.org) by coruscant.gnumonks.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 3.34 #1) id 18n1xo-0002vo-00; Sun, 23 Feb 2003 20:39:04 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.35 #1) id 18n1sZ-0006SA-00; Sun, 23 Feb 2003 20:33:39 +0100 Date: Sun, 23 Feb 2003 20:33:39 +0100 From: Harald Welte To: Erik Hensema Cc: netdev@oss.sgi.com, Netfilter Development Mailinglist Subject: Re: RFC: promote netfilter MARK value from IPv6 packets to sit packets Message-ID: <20030223193339.GD15385@sunbeam.de.gnumonks.org> Mail-Followup-To: Harald Welte , Erik Hensema , netdev@oss.sgi.com, Netfilter Development Mailinglist References: <20030217145727.GA3413@hensema.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="HWvPVVuAAfuRc6SZ" Content-Disposition: inline In-Reply-To: <20030217145727.GA3413@hensema.net> User-Agent: Mutt/1.3.28i X-Operating-System: Linux sunbeam 2.4.20-nfpom X-Date: Today is Prickle-Prickle, the 54th day of Chaos in the YOLD 3169 X-archive-position: 1776 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@netfilter.org Precedence: bulk X-list: netdev --HWvPVVuAAfuRc6SZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 17, 2003 at 03:57:27PM +0100, Erik Hensema wrote: > In order to be able to provide QoS on tunneled IPv6 connections, I've > created a simple patch (definately not ready for inclusion in the kernel, > since it surely needs a configuration option) which promotes the netfilter > MARK value from the IPv6 packets to the sit packets. > Now I can mark packets using ip6tables, and on the ipv4 level I can still > differentiate between the priorities. Problem solved, I'm happy ;-) I like this patch. I think we should make it a kernel configuration option, but for all kind of tunnel interfaces. Something like 'propagate NFMARK while tunneling' (or maybe 'preserve' instead of 'propagate' is better language?) DaveM: Would this be acceptable? > Erik Hensema (erik@hensema.net) --=20 - Harald Welte http://www.netfilter.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie --HWvPVVuAAfuRc6SZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+WSITXaXGVTD0i/8RApb+AKCt4eyvqKsmQmXWic+xzOQvyHlJ3ACeOPeQ 3Nb7Q43Lp+sURhPJT7mQR/0= =a+Jm -----END PGP SIGNATURE----- --HWvPVVuAAfuRc6SZ-- From erik@hensema.net Sun Feb 23 15:33:38 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 15:33:42 -0800 (PST) Received: from dexter.hensema.net (cc78409-a.hnglo1.ov.home.nl [212.120.97.185]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1NNXZ3v006957 for ; Sun, 23 Feb 2003 15:33:36 -0800 Received: from dexter.hensema.net (localhost [127.0.0.1]) by dexter.hensema.net (8.12.3/8.12.3) with ESMTP id h1NNgRmp024638; Mon, 24 Feb 2003 00:42:27 +0100 Received: (from erik@localhost) by dexter.hensema.net (8.12.3/8.12.3/Submit) id h1NNgP4w024637; Mon, 24 Feb 2003 00:42:25 +0100 Date: Mon, 24 Feb 2003 00:42:25 +0100 From: Erik Hensema To: Harald Welte , netdev@oss.sgi.com, Netfilter Development Mailinglist Subject: Re: RFC: promote netfilter MARK value from IPv6 packets to sit packets Message-ID: <20030223234225.GA23556@hensema.net> Reply-To: erik@hensema.net References: <20030217145727.GA3413@hensema.net> <20030223193339.GD15385@sunbeam.de.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030223193339.GD15385@sunbeam.de.gnumonks.org> User-Agent: Mutt/1.3.27i X-archive-position: 1777 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: erik@hensema.net Precedence: bulk X-list: netdev On Sun, Feb 23, 2003 at 08:33:39PM +0100, Harald Welte wrote: > On Mon, Feb 17, 2003 at 03:57:27PM +0100, Erik Hensema wrote: > > > In order to be able to provide QoS on tunneled IPv6 connections, I've > > created a simple patch (definately not ready for inclusion in the kernel, > > since it surely needs a configuration option) which promotes the netfilter > > MARK value from the IPv6 packets to the sit packets. > > Now I can mark packets using ip6tables, and on the ipv4 level I can still > > differentiate between the priorities. Problem solved, I'm happy ;-) > > I like this patch. I think we should make it a kernel configuration > option, but for all kind of tunnel interfaces. Something like > 'propagate NFMARK while tunneling' (or maybe 'preserve' instead of > 'propagate' is better language?) It certainly should be configurable. I've already sent it to the list, but you can also download it from http://dexter.hensema.net/~erik/patches/sit-promote-mark-2.4.21-pre4.diff It should be easy to port this patch to gre and maybe ipip (don't know the code of the latter, but I assume it's similar to gre and sit). I'll work on that tomorrow, when I've got access to my development machine again. In my current patch the configuration option is called 'IPv6: Promote netfilter MARK value to sit packets'. I don't think we should call it 'preserve', because technically that's not what is happening. The tunnel interface creates a fresh new packet, with a fresh new nfmark. Propagate seems to be the right term to me (as a non-native english speaker). -- Erik Hensema (erik@hensema.net) From yoshfuji@linux-ipv6.org Sun Feb 23 19:48:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 19:48:14 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O3m93v012291 for ; Sun, 23 Feb 2003 19:48:11 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1O3v3BF018987; Mon, 24 Feb 2003 12:57:03 +0900 Date: Mon, 24 Feb 2003 12:57:02 +0900 (JST) Message-Id: <20030224.125702.13403857.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030223.011816.108201183.davem@redhat.com> References: <20021103.115427.104445233.yoshfuji@linux-ipv6.org> <20030223.011816.108201183.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1778 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. In article <20030223.011816.108201183.davem@redhat.com> (at Sun, 23 Feb 2003 01:18:16 -0800 (PST)), "David S. Miller" says: > Please change new name to ip6_route_me_harder(). When one > says "something me harder" is has amusing implications when > heard by most english speakers and I'd like to keep this :-) ok. :-) > I will apply this patch once you make the change. Would you > like me to add it to 2.4.x as well? yes. Do I need to send a patch for linux-2.4.xx, too? Here's the patch for linux-2.5.62. Thanks. Index: include/net/ip6_route.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/ip6_route.h,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.30.2 diff -u -r1.1.1.1 -r1.1.1.1.30.2 --- include/net/ip6_route.h 7 Oct 2002 10:22:46 -0000 1.1.1.1 +++ include/net/ip6_route.h 23 Feb 2003 17:46:50 -0000 1.1.1.1.30.2 @@ -30,6 +30,8 @@ extern struct dst_entry * ip6_route_output(struct sock *sk, struct flowi *fl); +extern int ip6_route_me_harder(struct sk_buff *skb); + extern void ip6_route_init(void); extern void ip6_route_cleanup(void); Index: net/ipv6/ip6_output.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v retrieving revision 1.1.1.3 retrieving revision 1.1.1.3.16.2 diff -u -r1.1.1.3 -r1.1.1.3.16.2 --- net/ipv6/ip6_output.c 30 Oct 2002 09:43:18 -0000 1.1.1.3 +++ net/ipv6/ip6_output.c 23 Feb 2003 17:46:50 -0000 1.1.1.3.16.2 @@ -134,7 +134,7 @@ #ifdef CONFIG_NETFILTER -static int route6_me_harder(struct sk_buff *skb) +int ip6_route_me_harder(struct sk_buff *skb) { struct ipv6hdr *iph = skb->nh.ipv6h; struct dst_entry *dst; @@ -152,7 +152,7 @@ if (dst->error) { if (net_ratelimit()) - printk(KERN_DEBUG "route6_me_harder: No more route.\n"); + printk(KERN_DEBUG "ip6_route_me_harder: No more route.\n"); return -EINVAL; } @@ -168,7 +168,7 @@ { #ifdef CONFIG_NETFILTER if (skb->nfcache & NFC_ALTERED){ - if (route6_me_harder(skb) != 0){ + if (ip6_route_me_harder(skb) != 0){ kfree_skb(skb); return -EINVAL; } Index: net/ipv6/ipv6_syms.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ipv6_syms.c,v retrieving revision 1.1.1.3 retrieving revision 1.1.1.3.8.4 diff -u -r1.1.1.3 -r1.1.1.3.8.4 --- net/ipv6/ipv6_syms.c 16 Feb 2003 04:09:28 -0000 1.1.1.3 +++ net/ipv6/ipv6_syms.c 24 Feb 2003 03:40:55 -0000 1.1.1.3.8.4 @@ -1,4 +1,5 @@ +#include #include #include #include @@ -12,6 +13,9 @@ EXPORT_SYMBOL(register_inet6addr_notifier); EXPORT_SYMBOL(unregister_inet6addr_notifier); EXPORT_SYMBOL(ip6_route_output); +#ifdef CONFIG_NETFILTER +EXPORT_SYMBOL(ip6_route_me_harder); +#endif EXPORT_SYMBOL(addrconf_lock); EXPORT_SYMBOL(ipv6_setsockopt); EXPORT_SYMBOL(ipv6_getsockopt); Index: net/ipv6/route.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/route.c,v retrieving revision 1.1.1.6 diff -u -r1.1.1.6 route.c --- net/ipv6/route.c 10 Feb 2003 19:40:49 -0000 1.1.1.6 +++ net/ipv6/route.c 24 Feb 2003 03:41:15 -0000 @@ -574,15 +574,17 @@ Remove it only when all the things will work! */ -static void ipv6_wash_prefix(struct in6_addr *pfx, int plen) +static void ipv6_addr_prefix(struct in6_addr *pfx, + const struct in6_addr *addr, int plen) { int b = plen&0x7; - int o = (plen + 7)>>3; + int o = plen>>3; + memcpy(prefix, addr, o); if (o < 16) memset(pfx->s6_addr + o, 0, 16 - o); if (b != 0) - pfx->s6_addr[plen>>3] &= (0xFF<<(8-b)); + pfx->s6_addr[o] = addr->s6_addr[o]&(0xff00 >> b); } static int ipv6_get_mtu(struct net_device *dev) @@ -655,16 +657,16 @@ goto out; } - ipv6_addr_copy(&rt->rt6i_dst.addr, &rtmsg->rtmsg_dst); + ipv6_addr_prefix(&rt->rt6i_dst.addr, + &rtmsg->rtmsg_dst, rtmsg->rtmsg_dst_len); rt->rt6i_dst.plen = rtmsg->rtmsg_dst_len; if (rt->rt6i_dst.plen == 128) rt->u.dst.flags = DST_HOST; - ipv6_wash_prefix(&rt->rt6i_dst.addr, rt->rt6i_dst.plen); #ifdef CONFIG_IPV6_SUBTREES - ipv6_addr_copy(&rt->rt6i_src.addr, &rtmsg->rtmsg_src); + ipv6_addr_prefix(&rt->rt6i_src.addr, + &rtmsg->rtmsg_src, rtmsg->rtmsg_src_len); rt->rt6i_src.plen = rtmsg->rtmsg_src_len; - ipv6_wash_prefix(&rt->rt6i_src.addr, rt->rt6i_src.plen); #endif rt->rt6i_metric = rtmsg->rtmsg_metric; Index: net/ipv6/netfilter/ip6_queue.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/netfilter/ip6_queue.c,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.12.2 diff -u -r1.1.1.4 -r1.1.1.4.12.2 --- net/ipv6/netfilter/ip6_queue.c 16 Feb 2003 04:09:30 -0000 1.1.1.4 +++ net/ipv6/netfilter/ip6_queue.c 23 Feb 2003 17:46:50 -0000 1.1.1.4.12.2 @@ -326,45 +326,6 @@ return status; } -/* - * Taken from net/ipv6/ip6_output.c - * - * We should use the one there, but is defined static - * so we put this just here and let the things as - * they are now. - * - * If that one is modified, this one should be modified too. - */ -static int -route6_me_harder(struct sk_buff *skb) -{ - struct ipv6hdr *iph = skb->nh.ipv6h; - struct dst_entry *dst; - struct flowi fl; - - fl.proto = iph->nexthdr; - fl.fl6_dst = &iph->daddr; - fl.fl6_src = &iph->saddr; - fl.oif = skb->sk ? skb->sk->bound_dev_if : 0; - fl.fl6_flowlabel = 0; - fl.uli_u.ports.dport = 0; - fl.uli_u.ports.sport = 0; - - dst = ip6_route_output(skb->sk, &fl); - - if (dst->error) { - if (net_ratelimit()) - printk(KERN_DEBUG "route6_me_harder: No more route.\n"); - return -EINVAL; - } - - /* Drop old route. */ - dst_release(skb->dst); - - skb->dst = dst; - return 0; -} - static int ipq_mangle_ipv6(ipq_verdict_msg_t *v, struct ipq_queue_entry *e) { @@ -410,7 +371,7 @@ struct ipv6hdr *iph = e->skb->nh.ipv6h; if (ipv6_addr_cmp(&iph->daddr, &e->rt_info.daddr) || ipv6_addr_cmp(&iph->saddr, &e->rt_info.saddr)) - return route6_me_harder(e->skb); + return ip6_route_me_harder(e->skb); } return 0; } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From ipv6_san@rediffmail.com Sun Feb 23 19:58:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 19:58:35 -0800 (PST) Received: from rediffmail.com (webmail17.rediffmail.com [203.199.83.27] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O3wQ3v012781 for ; Sun, 23 Feb 2003 19:58:27 -0800 Received: (qmail 612 invoked by uid 510); 24 Feb 2003 04:06:47 -0000 Date: 24 Feb 2003 04:06:47 -0000 Message-ID: <20030224040647.611.qmail@webmail17.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 24 feb 2003 04:06:47 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "Maciej W.Rozycki" Cc: netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: Re: (no subject) Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1779 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev On Sat, 22 Feb 2003 Maciej W. Rozycki wrote : >On 21 Feb 2003, santosh kumar gowda wrote: > > > Following message is produced at the IAD terminal..... > > > > # Unable to handle kernel paging request at virtual address > > 00000000, epc == 802 > > 4ce74, ra == 802592a8 > > Oops in fault.c:do_page_fault, line 172: >[...] > > Suggestions/Tips are welcome. > > Decode the oops first or nobody will be able to give any >help. how do i decode the oops ??? help pls. -San -------------------------------------- From rddunlap@osdl.org Sun Feb 23 20:09:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 20:10:02 -0800 (PST) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O49s3v013293 for ; Sun, 23 Feb 2003 20:09:54 -0800 Received: from fire-2.osdl.org (air2.pdx.osdl.net [172.20.0.6]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h1O4Iow15415; Sun, 23 Feb 2003 20:18:50 -0800 Received: from osdl.org (fire.osdl.org [65.172.181.4]) by fire-2.osdl.org (8.11.6/8.11.6) with SMTP id h1O4IoQ29515; Sun, 23 Feb 2003 20:18:50 -0800 Received: from 4.64.238.61 (SquirrelMail authenticated user rddunlap) by www.osdl.org with HTTP; Sun, 23 Feb 2003 20:18:50 -0800 (PST) Message-ID: <32869.4.64.238.61.1046060330.squirrel@www.osdl.org> Date: Sun, 23 Feb 2003 20:18:50 -0800 (PST) Subject: Re: Re: (no subject) From: "Randy.Dunlap" To: In-Reply-To: <20030224040647.611.qmail@webmail17.rediffmail.com> References: <20030224040647.611.qmail@webmail17.rediffmail.com> X-Priority: 3 Importance: Normal Cc: , , X-Mailer: SquirrelMail (version 1.2.8) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 1780 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev > > > On Sat, 22 Feb 2003 Maciej W. Rozycki wrote : >>On 21 Feb 2003, santosh kumar gowda wrote: >> >> > Following message is produced at the IAD terminal..... >> > >> > # Unable to handle kernel paging request at virtual address >> > 00000000, epc == 802 >> > 4ce74, ra == 802592a8 >> > Oops in fault.c:do_page_fault, line 172: >>[...] >> > Suggestions/Tips are welcome. >> >> Decode the oops first or nobody will be able to give any >>help. > > how do i decode the oops ??? help pls. Please see linux/REPORTING-BUGS and linux/Documentation/oops-tracing.txt . The latter will tell you how to use use 'ksymoops' to decode an oops message. ~Randy From ipv6_san@rediffmail.com Sun Feb 23 21:18:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 21:18:56 -0800 (PST) Received: from rediffmail.com (webmail29.rediffmail.com [203.199.83.39] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O5Im3v014572 for ; Sun, 23 Feb 2003 21:18:49 -0800 Received: (qmail 30151 invoked by uid 510); 24 Feb 2003 05:27:04 -0000 Date: 24 Feb 2003 05:27:04 -0000 Message-ID: <20030224052704.30149.qmail@webmail29.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 24 feb 2003 05:27:04 -0000 MIME-Version: 1.0 From: "santosh kumar gowda" Reply-To: "santosh kumar gowda" To: "Randy.Dunlap" Cc: macro@ds2.pg.gda.pl, netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: Re: Re: (no subject) Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1781 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev On Mon, 24 Feb 2003 Randy.Dunlap wrote : > > > > > > On Sat, 22 Feb 2003 Maciej W. Rozycki wrote : > >>On 21 Feb 2003, santosh kumar gowda wrote: > >> > >> > Following message is produced at the IAD terminal..... > >> > > >> > # Unable to handle kernel paging request at virtual >address > >> > 00000000, epc == 802 > >> > 4ce74, ra == 802592a8 > >> > Oops in fault.c:do_page_fault, line 172: > >>[...] > >> > Suggestions/Tips are welcome. > >> > >> Decode the oops first or nobody will be able to give any > >>help. > > > > how do i decode the oops ??? help pls. > >Please see linux/REPORTING-BUGS and >linux/Documentation/oops-tracing.txt . >The latter will tell you how to use use 'ksymoops' >to decode an oops message. The Embedded Linux running on my MIPS based device has following cmds... kallsyms kill killall klogd ksyms Also, Flash ROM of the device is loaded with kernel and filesystem images. so it not possible for me to browse through the source code. -San --------------------------------- From davem@redhat.com Sun Feb 23 22:33:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 22:33:26 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O6XL3v023389 for ; Sun, 23 Feb 2003 22:33:22 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA04171; Sun, 23 Feb 2003 22:25:57 -0800 Date: Sun, 23 Feb 2003 22:25:57 -0800 (PST) Message-Id: <20030223.222557.132586554.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: "David S. Miller" In-Reply-To: <20030224.125702.13403857.yoshfuji@linux-ipv6.org> References: <20021103.115427.104445233.yoshfuji@linux-ipv6.org> <20030223.011816.108201183.davem@redhat.com> <20030224.125702.13403857.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1782 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Mon, 24 Feb 2003 12:57:02 +0900 (JST) > I will apply this patch once you make the change. Would you > like me to add it to 2.4.x as well? yes. Do I need to send a patch for linux-2.4.xx, too? Not necessary, I know what is different in the networking between these two trees. Here's the patch for linux-2.5.62. Applied, thanks. From davem@redhat.com Sun Feb 23 22:38:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 22:38:42 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O6cd3v028801 for ; Sun, 23 Feb 2003 22:38:40 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA04185; Sun, 23 Feb 2003 22:31:15 -0800 Date: Sun, 23 Feb 2003 22:31:14 -0800 (PST) Message-Id: <20030223.223114.65976206.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: "David S. Miller" In-Reply-To: <20021101.174832.44646503.yoshfuji@linux-ipv6.org> References: <20021031.164940.672083668.yoshfuji@linux-ipv6.org> <20021101.174832.44646503.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1783 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Fri, 01 Nov 2002 17:48:32 +0900 (JST) Ok, here's revised one. - sync with linux-2.5.45. - change default value for use_tempaddr sysctl to 0 (don't generate and use temprary addresses by default) It is applied. Hmmm, some thinking is needed in order to backport this to 2.4.x due to lack of crypto library. I guess USAGI 2.4.x version of this patch uses crypto library from USAGI 2.4.x ipsec? From yoshfuji@linux-ipv6.org Sun Feb 23 22:49:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 22:49:55 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O6nq3v029719 for ; Sun, 23 Feb 2003 22:49:53 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1O6wqBF020307; Mon, 24 Feb 2003 15:58:52 +0900 Date: Mon, 24 Feb 2003 15:58:52 +0900 (JST) Message-Id: <20030224.155852.611429637.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030223.223114.65976206.davem@redhat.com> References: <20021101.174832.44646503.yoshfuji@linux-ipv6.org> <20030223.223114.65976206.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on XEmacs 21.4.6 (Common Lisp) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1784 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030223.223114.65976206.davem@redhat.com> (at Sun, 23 Feb 2003 22:31:14 -0800 (PST)), "David S. Miller" says: > Hmmm, some thinking is needed in order to backport this to > 2.4.x due to lack of crypto library. I guess USAGI 2.4.x > version of this patch uses crypto library from USAGI 2.4.x ipsec? MD5 code in linux-2.4.x patch what I sent you was taken from Colin Plumb's public domain implementation. (USAGI itself uses KAME implementation.) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Sun Feb 23 23:00:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 23:00:18 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O70F3v030211 for ; Sun, 23 Feb 2003 23:00:16 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA04253; Sun, 23 Feb 2003 22:52:51 -0800 Date: Sun, 23 Feb 2003 22:52:51 -0800 (PST) Message-Id: <20030223.225251.119557134.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: "David S. Miller" In-Reply-To: <20030224.155852.611429637.yoshfuji@linux-ipv6.org> References: <20021101.174832.44646503.yoshfuji@linux-ipv6.org> <20030223.223114.65976206.davem@redhat.com> <20030224.155852.611429637.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1785 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Mon, 24 Feb 2003 15:58:52 +0900 (JST) MD5 code in linux-2.4.x patch what I sent you was taken from Colin Plumb's public domain implementation. (USAGI itself uses KAME implementation.) Please send me the 2.4.x version of the privacy extension patch so that I may have a look. Thank you. From davem@redhat.com Sun Feb 23 23:01:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 23:01:53 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O71o3v030500 for ; Sun, 23 Feb 2003 23:01:51 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id WAA04266; Sun, 23 Feb 2003 22:54:26 -0800 Date: Sun, 23 Feb 2003 22:54:26 -0800 (PST) Message-Id: <20030223.225426.28829614.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: "David S. Miller" In-Reply-To: <20030224.125702.13403857.yoshfuji@linux-ipv6.org> References: <20021103.115427.104445233.yoshfuji@linux-ipv6.org> <20030223.011816.108201183.davem@redhat.com> <20030224.125702.13403857.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1786 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Mon, 24 Feb 2003 12:57:02 +0900 (JST) Here's the patch for linux-2.5.62. Hideaki-san, do you try to compile the patches you send to me? :-) -static void ipv6_wash_prefix(struct in6_addr *pfx, int plen) +static void ipv6_addr_prefix(struct in6_addr *pfx, + const struct in6_addr *addr, int plen) { int b = plen&0x7; - int o = (plen + 7)>>3; + int o = plen>>3; + memcpy(prefix, addr, o); Where is the variable 'prefix' declared? You probably mean 'pfx->s6_addr' and that is the change I will make in my tree. Thanks. From yoshfuji@linux-ipv6.org Sun Feb 23 23:09:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 23:09:21 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O79I3v031506 for ; Sun, 23 Feb 2003 23:09:19 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1O7IFBF020426; Mon, 24 Feb 2003 16:18:15 +0900 Date: Mon, 24 Feb 2003 16:18:15 +0900 (JST) Message-Id: <20030224.161815.511623971.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030223.225426.28829614.davem@redhat.com> References: <20030223.011816.108201183.davem@redhat.com> <20030224.125702.13403857.yoshfuji@linux-ipv6.org> <20030223.225426.28829614.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on XEmacs 21.4.6 (Common Lisp) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1787 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030223.225426.28829614.davem@redhat.com> (at Sun, 23 Feb 2003 22:54:26 -0800 (PST)), "David S. Miller" says: > Hideaki-san, do you try to compile the patches you send > to me? :-) sorry, I had compiled with wrong options... :-p just a moment, please... -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From yoshfuji@linux-ipv6.org Sun Feb 23 23:20:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 23:20:21 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O7KH3v032291 for ; Sun, 23 Feb 2003 23:20:18 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1O7TCBF020496; Mon, 24 Feb 2003 16:29:12 +0900 Date: Mon, 24 Feb 2003 16:29:11 +0900 (JST) Message-Id: <20030224.162911.826686204.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030224.161815.511623971.yoshfuji@linux-ipv6.org> References: <20030224.125702.13403857.yoshfuji@linux-ipv6.org> <20030223.225426.28829614.davem@redhat.com> <20030224.161815.511623971.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on XEmacs 21.4.6 (Common Lisp) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1788 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030224.161815.511623971.yoshfuji@linux-ipv6.org> (at Mon, 24 Feb 2003 16:18:15 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > In article <20030223.225426.28829614.davem@redhat.com> (at Sun, 23 Feb 2003 22:54:26 -0800 (PST)), "David S. Miller" says: > > > Hideaki-san, do you try to compile the patches you send > > to me? :-) > > sorry, I had compiled with wrong options... :-p > just a moment, please... Please apply this patch on top of the previous patch. Sorry for the mess. Index: net/ipv6/route.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/route.c,v retrieving revision 1.1.1.6.12.1 retrieving revision 1.1.1.6.12.2 diff -u -r1.1.1.6.12.1 -r1.1.1.6.12.2 --- net/ipv6/route.c 23 Feb 2003 17:40:42 -0000 1.1.1.6.12.1 +++ net/ipv6/route.c 24 Feb 2003 07:10:02 -0000 1.1.1.6.12.2 @@ -580,7 +580,7 @@ int b = plen&0x7; int o = plen>>3; - memcpy(prefix, addr, o); + memcpy(pfx, addr, o); if (o < 16) memset(pfx->s6_addr + o, 0, 16 - o); if (b != 0) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From davem@redhat.com Sun Feb 23 23:26:12 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 23 Feb 2003 23:26:14 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O7QB3v000365 for ; Sun, 23 Feb 2003 23:26:12 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id XAA04311; Sun, 23 Feb 2003 23:18:47 -0800 Date: Sun, 23 Feb 2003 23:18:47 -0800 (PST) Message-Id: <20030223.231847.58640988.davem@redhat.com> To: yoshfuji@linux-ipv6.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, netfilter-devel@lists.netfilter.org, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Functions Clean-up From: "David S. Miller" In-Reply-To: <20030224.162911.826686204.yoshfuji@linux-ipv6.org> References: <20030223.225426.28829614.davem@redhat.com> <20030224.161815.511623971.yoshfuji@linux-ipv6.org> <20030224.162911.826686204.yoshfuji@linux-ipv6.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1789 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: YOSHIFUJI Hideaki / $B5HF#1QL@(B Date: Mon, 24 Feb 2003 16:29:11 +0900 (JST) In article <20030224.161815.511623971.yoshfuji@linux-ipv6.org> (at Mon, 24 Feb 2003 16:18:15 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > sorry, I had compiled with wrong options... :-p > just a moment, please... Please apply this patch on top of the previous patch. Sorry for the mess. As I said, I fixed it already by using pfx->s6_addr. No problems and no need to apologize :) From Geert.Uytterhoeven@sonycom.com Mon Feb 24 00:32:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 00:32:22 -0800 (PST) Received: from mail.sonytel.be (mail2.sonytel.be [195.0.45.172]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O8WF3v002802 for ; Mon, 24 Feb 2003 00:32:17 -0800 Received: from vervain.sonytel.be (mail.sonytel.be [10.17.0.27]) by mail.sonytel.be (8.9.0/8.8.6) with ESMTP id JAA26225; Mon, 24 Feb 2003 09:38:32 +0100 (MET) Date: Mon, 24 Feb 2003 09:38:38 +0100 (MET) From: Geert Uytterhoeven To: santosh kumar gowda cc: "Randy.Dunlap" , macro@ds2.pg.gda.pl, netdev@oss.sgi.com, Linux/MIPS Development Subject: Re: Re: Re: (no subject) In-Reply-To: <20030224052704.30149.qmail@webmail29.rediffmail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1790 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: geert@linux-m68k.org Precedence: bulk X-list: netdev On 24 Feb 2003, santosh kumar gowda wrote: > Also, Flash ROM of the device is loaded with kernel and > filesystem > images. so it not possible for me to browse through the source > code. You do not have access to the Linux kernel sources? Sounds like a violation of the GPL! Please contact your vendor and ask them for the sources. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From mksarav@comp.nus.edu.sg Mon Feb 24 01:16:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 01:16:38 -0800 (PST) Received: from x86unx3.comp.nus.edu.sg (x86unx3.comp.nus.edu.sg [137.132.90.3]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1O9GL3v004485 for ; Mon, 24 Feb 2003 01:16:29 -0800 Received: from e500b.comp.nus.edu.sg (e500b.comp.nus.edu.sg [137.132.90.26]) by x86unx3.comp.nus.edu.sg (8.9.1/8.9.1) with SMTP id RAA11684; Mon, 24 Feb 2003 17:24:54 +0800 (GMT-8) Received: from se11.comp.nus.edu.sg(137.132.80.19) by e500b.comp.nus.edu.sg via csmap id 15922; Mon, 24 Feb 2003 17:18:27 +0800 (SGT) Received: (from http@localhost) by se11.comp.nus.edu.sg (8.12.2+Sun/8.12.5) id h1O9OrBl009929; Mon, 24 Feb 2003 17:24:53 +0800 (SGT) X-Authentication-Warning: se11.comp.nus.edu.sg: http set sender to mksarav@comp.nus.edu.sg using -f Received: from noc.comp.nus.edu.sg ([137.132.80.35]) (proxying for 138.198.100.38) (SquirrelMail authenticated user mksarav) by mysoc.nus.edu.sg with HTTP; Mon, 24 Feb 2003 17:24:53 +0800 (SGT) Message-ID: <2083.137.132.80.35.1046078693.squirrel@mysoc.nus.edu.sg> Date: Mon, 24 Feb 2003 17:24:53 +0800 (SGT) Subject: Please use appropriate subject line [was (Re: Re: Re: (no subject))] From: "M K Saravanan" To: In-Reply-To: References: <20030224052704.30149.qmail@webmail29.rediffmail.com> X-Priority: 3 Importance: Normal Cc: , , , , Reply-To: mksarav@comp.nus.edu.sg X-Mailer: SquirrelMail (version 1.2.8) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 1791 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mksarav@comp.nus.edu.sg Precedence: bulk X-list: netdev I don't remember who started this thread. Kindly use appropriate title in the "Subject" when posting queries. It will also help when somebody follow the archive on a particular topic. -- mks -- -- M K Saravanan http://www.comp.nus.edu.sg/~mksarav From hadi@cyberus.ca Mon Feb 24 05:32:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 05:32:43 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1ODWX3v016593 for ; Mon, 24 Feb 2003 05:32:35 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18nIrR-000IIG-00; Mon, 24 Feb 2003 08:41:37 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1ODfBYO043471; Mon, 24 Feb 2003 08:41:11 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1ODfAev043468; Mon, 24 Feb 2003 08:41:10 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 24 Feb 2003 08:41:10 -0500 (EST) From: jamal To: Erik Hensema cc: Harald Welte , "" , Netfilter Development Mailinglist Subject: Re: RFC: promote netfilter MARK value from IPv6 packets to sit packets In-Reply-To: <20030223234225.GA23556@hensema.net> Message-ID: <20030224083946.H34066@shell.cyberus.ca> References: <20030217145727.GA3413@hensema.net> <20030223193339.GD15385@sunbeam.de.gnumonks.org> <20030223234225.GA23556@hensema.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1792 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev If this is to be a config option, it should not be restricted to netfilter specifics but rather skb specifics. Example the tcindex (maybe even the cb) etc. cheers, jamal From laforge@netfilter.org Mon Feb 24 06:37:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 06:37:56 -0800 (PST) Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1OEbi3v021785 for ; Mon, 24 Feb 2003 06:37:46 -0800 Received: from sunbeam-tap0.de.gnumonks.org ([192.168.200.2] helo=sunbeam.gnumonks.org) by coruscant.gnumonks.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 3.34 #1) id 18nJsU-0005hx-00; Mon, 24 Feb 2003 15:46:47 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.35 #1) id 18nJnA-0002Lo-00; Mon, 24 Feb 2003 15:41:16 +0100 Date: Mon, 24 Feb 2003 15:41:16 +0100 From: Harald Welte To: jamal Cc: Erik Hensema , Harald Welte , netdev@oss.sgi.com, Netfilter Development Mailinglist Subject: Re: RFC: promote netfilter MARK value from IPv6 packets to sit packets Message-ID: <20030224144116.GN24960@sunbeam.de.gnumonks.org> Mail-Followup-To: Harald Welte , jamal , Erik Hensema , netdev@oss.sgi.com, Netfilter Development Mailinglist References: <20030217145727.GA3413@hensema.net> <20030223193339.GD15385@sunbeam.de.gnumonks.org> <20030223234225.GA23556@hensema.net> <20030224083946.H34066@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lYetfuAxy9ic4HK3" Content-Disposition: inline In-Reply-To: <20030224083946.H34066@shell.cyberus.ca> User-Agent: Mutt/1.3.28i X-Operating-System: Linux sunbeam 2.4.20-nfpom X-Date: Today is Prickle-Prickle, the 54th day of Chaos in the YOLD 3169 X-archive-position: 1793 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@netfilter.org Precedence: bulk X-list: netdev --lYetfuAxy9ic4HK3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 24, 2003 at 08:41:10AM -0500, jamal wrote: > If this is to be a config option, it should not be restricted to > netfilter specifics but rather skb specifics. Example the tcindex > (maybe even the cb) etc. No problem with me. I do understand the usefulness of tcindex, but what would a totally different protcol (or the user) do with the cb of a different protocol? > cheers, > jamal --=20 - Harald Welte http://www.netfilter.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie --lYetfuAxy9ic4HK3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+Wi8MXaXGVTD0i/8RAmfCAJsGL4Ga6R62Yezoj1HCJWC58wWR5gCfcEsv 1Bm/rI31ua78Rs9d20eNLfw= =mrzP -----END PGP SIGNATURE----- --lYetfuAxy9ic4HK3-- From bwa@us.ibm.com Mon Feb 24 09:55:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 09:55:12 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1OHt63v028890 for ; Mon, 24 Feb 2003 09:55:06 -0800 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e2.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1OHt0Xm063924; Mon, 24 Feb 2003 12:55:00 -0500 Received: from w-bwa1.beaverton.ibm.com (w-bwa1.beaverton.ibm.com [9.47.18.12]) by northrelay01.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1OHsure115876; Mon, 24 Feb 2003 12:54:57 -0500 Subject: Re: [PATCH] subset of RFC2553 From: Bruce Allan To: "David S. Miller" Cc: lksctp-developers@lists.sourceforge.net, linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: <20030221.232639.129509431.davem@redhat.com> References: <3E54128C.327D7759@us.ibm.com> <20030219.162129.11584427.davem@redhat.com> <1045847170.3104.7.camel@w-bwa1.beaverton.ibm.com> <20030221.232639.129509431.davem@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 24 Feb 2003 09:54:57 -0800 Message-Id: <1046109300.3503.12.camel@w-bwa1.beaverton.ibm.com> Mime-Version: 1.0 X-archive-position: 1794 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwa@us.ibm.com Precedence: bulk X-list: netdev Doh! Sorry, here (see below) it is against 2.5.59. On Fri, 2003-02-21 at 23:26, David S. Miller wrote: > > Bruce, while applying this I noticed that in6addr_{any,loopback} > are not exported by modules. > > Please send me a small patch to add the exports if this will be > needed by SCTP and friends. > > Thanks. > --- linux-2.5.59/net/ipv6/ipv6_syms.c 2003-01-16 18:22:25.000000000 -0800 +++ linux-2.5.59-RFC2553/net/ipv6/ipv6_syms.c 2003-02-24 09:02:41.000000000 -0800 @@ -25,3 +25,5 @@ EXPORT_SYMBOL(inet6_ioctl); EXPORT_SYMBOL(ipv6_get_saddr); EXPORT_SYMBOL(ipv6_chk_addr); +EXPORT_SYMBOL(in6addr_any); +EXPORT_SYMBOL(in6addr_loopback); -- Bruce Allan Linux Technology Center IBM Corporation, Beaverton OR From ilya@ns2.total-knowledge.com Mon Feb 24 10:34:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 10:34:42 -0800 (PST) Received: from gateway.total-knowledge.com (12-234-207-60.client.attbi.com [12.234.207.60]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1OIYU3v030464 for ; Mon, 24 Feb 2003 10:34:31 -0800 Received: (qmail 26394 invoked by uid 502); 24 Feb 2003 18:34:29 -0000 Date: Mon, 24 Feb 2003 10:34:29 -0800 From: ilya@theIlya.com To: santosh kumar gowda Cc: "Maciej W.Rozycki" , netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: Re: (no subject) Message-ID: <20030224183429.GA26310@gateway.total-knowledge.com> References: <20030224040647.611.qmail@webmail17.rediffmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="nFreZHaLTZJo0R7j" Content-Disposition: inline In-Reply-To: <20030224040647.611.qmail@webmail17.rediffmail.com> User-Agent: Mutt/1.4i X-archive-position: 1795 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ilya@theIlya.com Precedence: bulk X-list: netdev --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable RTFM (Hint $LINUX_SOURCE_TREE/Documentation/) On Mon, Feb 24, 2003 at 04:06:47AM -0000, santosh kumar gowda wrote: >=20 >=20 > On Sat, 22 Feb 2003 Maciej W. Rozycki wrote : > >On 21 Feb 2003, santosh kumar gowda wrote: > > > >> Following message is produced at the IAD terminal..... > >> > >> # Unable to handle kernel paging request at virtual address > >> 00000000, epc =3D=3D 802 > >> 4ce74, ra =3D=3D 802592a8 > >> Oops in fault.c:do_page_fault, line 172: > >[...] > >> Suggestions/Tips are welcome. > > > > Decode the oops first or nobody will be able to give any=20 > >help. >=20 > how do i decode the oops ??? help pls. >=20 > -San > -------------------------------------- >=20 >=20 --nFreZHaLTZJo0R7j Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE+WmW17sVBmHZT8w8RAhPLAJ9DuvnHC0LwOkknMd3oJHBnL/LqnACdHOrP aGFaW7cg9ZlQaZO2A9djftQ= =zmNV -----END PGP SIGNATURE----- --nFreZHaLTZJo0R7j-- From hadi@cyberus.ca Mon Feb 24 18:30:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 24 Feb 2003 18:31:20 -0800 (PST) Received: from mx03.cyberus.ca (mx03.cyberus.ca [216.191.240.24]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1P2Ug3v017683 for ; Mon, 24 Feb 2003 18:30:43 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx03.cyberus.ca with esmtp (Exim 4.10) id 18nUri-000I8G-00; Mon, 24 Feb 2003 21:30:42 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1P2UJYO045188; Mon, 24 Feb 2003 21:30:19 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1P2UBNM045185; Mon, 24 Feb 2003 21:30:15 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 24 Feb 2003 21:30:11 -0500 (EST) From: jamal To: Harald Welte cc: Erik Hensema , "" , Netfilter Development Mailinglist Subject: Re: RFC: promote netfilter MARK value from IPv6 packets to sit packets In-Reply-To: <20030224144116.GN24960@sunbeam.de.gnumonks.org> Message-ID: <20030224212312.Y44654@shell.cyberus.ca> References: <20030217145727.GA3413@hensema.net> <20030223193339.GD15385@sunbeam.de.gnumonks.org> <20030223234225.GA23556@hensema.net> <20030224083946.H34066@shell.cyberus.ca> <20030224144116.GN24960@sunbeam.de.gnumonks.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1796 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Mon, 24 Feb 2003, Harald Welte wrote: > On Mon, Feb 24, 2003 at 08:41:10AM -0500, jamal wrote: > > If this is to be a config option, it should not be restricted to > > netfilter specifics but rather skb specifics. Example the tcindex > > (maybe even the cb) etc. > > No problem with me. I do understand the usefulness of tcindex, but what > would a totally different protcol (or the user) do with the cb of a > different protocol? > cb is a maybe - it could be useful i think since the inner and outer headers may be closely related and so share the same state. I gacve tcindex as an example; others are: priority and some of the other netfilter stuff (is nfcache still used?) etc. cheers, jamal From ipv6_san@rediffmail.com Tue Feb 25 02:18:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 02:18:25 -0800 (PST) Received: from rediffmail.com (webmail14.rediffmail.com [203.199.83.24] (may be forged)) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PAIG3v027674 for ; Tue, 25 Feb 2003 02:18:18 -0800 Received: (qmail 9290 invoked by uid 510); 25 Feb 2003 10:17:31 -0000 Date: 25 Feb 2003 10:17:31 -0000 Message-ID: <20030225101731.9289.qmail@webmail14.rediffmail.com> Received: from unknown (194.175.117.86) by rediffmail.com via HTTP; 25 feb 2003 10:17:31 -0000 MIME-Version: 1.0 From: "Santosh " Reply-To: "Santosh " To: yoshfuji@wide.ad.jp Cc: usagi-users@linux-ipv6.org, netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: USAGI Kernel for MIPS based device Content-type: text/plain; format=flowed Content-Disposition: inline X-archive-position: 1797 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ipv6_san@rediffmail.com Precedence: bulk X-list: netdev Hello, I have a MIPS processor based device, currently running with Linux-2.4.5-pre1. I have a BSP for my device from Lineo Inc., which incorporates Linux-2.4.5-pre1 source code. Now i want to port latest USAGI kernel code onto my device. How to step further ?? Is the USAGI code, platform independent ?? Do i need to make any changes in the kernel source code ?? Need to change any configuration files ?? -San --------------------------------------------------------------------- This is Linux Country. On a quiet night, you can hear Windows reboot. --------------------------------------------------------------------- From yoshfuji@linux-ipv6.org Tue Feb 25 02:55:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 02:55:17 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PAtD3v031829 for ; Tue, 25 Feb 2003 02:55:14 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1PAtDBF028906; Tue, 25 Feb 2003 19:55:13 +0900 Date: Tue, 25 Feb 2003 19:55:12 +0900 (JST) Message-Id: <20030225.195512.18953246.yoshfuji@linux-ipv6.org> To: ipv6_san@rediffmail.com Cc: usagi-users@linux-ipv6.org, netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: USAGI Kernel for MIPS based device From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030225101731.9289.qmail@webmail14.rediffmail.com> References: <20030225101731.9289.qmail@webmail14.rediffmail.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1798 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev In article <20030225101731.9289.qmail@webmail14.rediffmail.com> (at 25 Feb 2003 10:17:31 -0000), "Santosh " says: > Now i want to port latest USAGI kernel code onto my device. : > Is the USAGI code, platform independent ?? It should be. We run our code on our - ix86 - Ultra SPARC - Power PC - MIPS - ARM machines. (Unfortunately, we don't have x86-64, ia86 or alpha machines...) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From marketing@euro5.org Tue Feb 25 03:06:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 03:06:43 -0800 (PST) Received: from terra (16.Red-80-33-132.pooles.rima-tde.net [80.33.132.16]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PB6b3v000370 for ; Tue, 25 Feb 2003 03:06:38 -0800 Message-Id: <200302251106.h1PB6b3v000370@oss.sgi.com> From: "Idiomas en pocas horas" To: Subject: Euro5 le ofrece idiomas Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Tue, 25 Feb 2003 11:47:05 X-archive-position: 1799 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: marketing@euro5.org Precedence: bulk X-list: netdev Le proporcionamos una enseñanza de idiomas de calidad con profesores bilingües o nativos. Aprenda o mejore sus idiomas. Creamos hábitos de conversación. Solventamos fracasos anteriores. Garantía total. Solo se cobran las clases impartidas/por horas, previa factura a fin de mes. Proponga en su empresa grupos reducidos. Nuestros profesores se desplazan a su empresa o domicilio en horas programadas. El aprendizaje de idiomas hoy, se considera un plus laboral. Pida información y disponibilidad, completando la ficha en nuestra web . www.euro5.org Cumplimos con la ley vigente en cuanto a la protección de datos. Si desea su baja haga un reply en la cabecera o a E/mail:bajas@euro5.org From ralf@linux-mips.org Tue Feb 25 04:44:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 04:44:50 -0800 (PST) Received: from dea.linux-mips.net (p508B7D68.dip.t-dialin.net [80.139.125.104]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PCie3v005292 for ; Tue, 25 Feb 2003 04:44:43 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.6/8.11.6) id h1PCecX14470; Tue, 25 Feb 2003 13:40:38 +0100 Date: Tue, 25 Feb 2003 13:40:38 +0100 From: Ralf Baechle To: Santosh Cc: yoshfuji@wide.ad.jp, usagi-users@linux-ipv6.org, netdev@oss.sgi.com, linux-mips@linux-mips.org Subject: Re: USAGI Kernel for MIPS based device Message-ID: <20030225134038.A14292@linux-mips.org> References: <20030225101731.9289.qmail@webmail14.rediffmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030225101731.9289.qmail@webmail14.rediffmail.com>; from ipv6_san@rediffmail.com on Tue, Feb 25, 2003 at 10:17:31AM -0000 X-archive-position: 1800 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev On Tue, Feb 25, 2003 at 10:17:31AM -0000, Santosh wrote: > I have a MIPS processor based device, currently running with > Linux-2.4.5-pre1. > I have a BSP for my device from Lineo Inc., which incorporates > Linux-2.4.5-pre1 source code. 2.4.5 is almost two years old by now. You're missing a huge pule of bug fixes among them MIPS IPv6 fixes. Ralf From yoshfuji@linux-ipv6.org Tue Feb 25 07:41:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 07:42:01 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PFfu3v015334 for ; Tue, 25 Feb 2003 07:41:57 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1PFftBF030314; Wed, 26 Feb 2003 00:41:56 +0900 Date: Wed, 26 Feb 2003 00:41:55 +0900 (JST) Message-Id: <20030226.004155.71903869.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030223.223114.65976206.davem@redhat.com> References: <20021101.174832.44646503.yoshfuji@linux-ipv6.org> <20030223.223114.65976206.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 1801 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hi, In article <20030223.223114.65976206.davem@redhat.com> (at Sun, 23 Feb 2003 22:31:14 -0800 (PST)), "David S. Miller" says: > From: YOSHIFUJI Hideaki / $B5HF#1QL@(B > Date: Fri, 01 Nov 2002 17:48:32 +0900 (JST) > > Ok, here's revised one. > > - sync with linux-2.5.45. > - change default value for use_tempaddr sysctl to 0 > (don't generate and use temprary addresses by default) > > It is applied. Thanks. Well, I've found a bug that a temporary addresses were not re-generated properly. Here's the patch for linux-2.5.63. (Patch I've sent for linux-2.4.x contains this change.) Thanks in advance. Index: net/ipv6/addrconf.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/addrconf.c,v retrieving revision 1.1.1.6 retrieving revision 1.1.1.6.2.1 diff -u -r1.1.1.6 -r1.1.1.6.2.1 --- net/ipv6/addrconf.c 25 Feb 2003 05:33:26 -0000 1.1.1.6 +++ net/ipv6/addrconf.c 25 Feb 2003 07:30:32 -0000 1.1.1.6.2.1 @@ -2015,6 +2015,9 @@ write_lock(&addrconf_hash_lock); for (ifp=inet6_addr_lst[i]; ifp; ifp=ifp->lst_next) { unsigned long age; +#ifdef CONFIG_IPV6_PRIVACY + unsigned long regen_advance; +#endif if (ifp->flags & IFA_F_PERMANENT) continue; @@ -2022,6 +2025,12 @@ spin_lock(&ifp->lock); age = (now - ifp->tstamp) / HZ; +#ifdef CONFIG_IPV6_PRIVACY + regen_advance = ifp->idev->cnf.regen_max_retry * + ifp->idev->cnf.dad_transmits * + ifp->idev->nd_parms->retrans_time / HZ; +#endif + if (age >= ifp->valid_lft) { spin_unlock(&ifp->lock); in6_ifa_hold(ifp); @@ -2050,6 +2059,28 @@ in6_ifa_put(ifp); goto restart; } +#ifdef CONFIG_IPV6_PRIVACY + } else if ((ifp->flags&IFA_F_TEMPORARY) && + !(ifp->flags&IFA_F_TENTATIVE)) { + if (age >= ifp->prefered_lft - regen_advance) { + struct inet6_ifaddr *ifpub = ifp->ifpub; + if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) + next = ifp->tstamp + ifp->prefered_lft * HZ; + if (!ifp->regen_count && ifpub) { + ifp->regen_count++; + in6_ifa_hold(ifp); + in6_ifa_hold(ifpub); + spin_unlock(&ifp->lock); + write_unlock(&addrconf_hash_lock); + ipv6_create_tempaddr(ifpub, ifp); + in6_ifa_put(ifpub); + in6_ifa_put(ifp); + goto restart; + } + } else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next)) + next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ; + spin_unlock(&ifp->lock); +#endif } else { /* ifp->prefered_lft <= ifp->valid_lft */ if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From hch@infradead.org Tue Feb 25 08:06:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 08:06:47 -0800 (PST) Received: from phoenix.infradead.org (phoenix.mvhi.com [195.224.96.167]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PG6e3v017750 for ; Tue, 25 Feb 2003 08:06:41 -0800 Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 18nhbG-0001Ca-00; Tue, 25 Feb 2003 16:06:34 +0000 Date: Tue, 25 Feb 2003 16:06:34 +0000 From: Christoph Hellwig To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" Cc: davem@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 Message-ID: <20030225160634.A4525@infradead.org> Mail-Followup-To: Christoph Hellwig , "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" , davem@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org References: <20030223.223114.65976206.davem@redhat.com> <20030224.155852.611429637.yoshfuji@linux-ipv6.org> <20030223.225251.119557134.davem@redhat.com> <20030226.003625.90530451.yoshfuji@linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030226.003625.90530451.yoshfuji@linux-ipv6.org>; from yoshfuji@linux-ipv6.org on Wed, Feb 26, 2003 at 12:36:25AM +0900 X-archive-position: 1802 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev On Wed, Feb 26, 2003 at 12:36:25AM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > +# > +if [ "$CONFIG_IPV6_PRIVACY" = "y" ]; then > + if [ "$CONFIG_IPV6" = "y" ]; then > + define_tristate CONFIG_MD5 y > + else > + define_tristate CONFIG_MD5 m > + fi > +else > + tristate 'MD5 digest support' CONFIG_MD5 > +fi Config.in files use three-space indents. > +obj-$(CONFIG_MD5) += md5.o > +ifeq ($(CONFIG_MD5),y) > + export-objs += md5.o > +endif this is wrong, objects are added to export-objs unconditional. > + > +#ifdef CONFIG_MD5 > +EXPORT_SYMBOL(MD5Init); > +EXPORT_SYMBOL(MD5Update); > +EXPORT_SYMBOL(MD5Final); > +EXPORT_SYMBOL(MD5Transform); > +#endif Please remove the ifdef, it doesn't make any sense. Also I really wonder whether we want to add just md5.c to 2.4 or backport the cryptoapi core with md5 as the only algorithm so far.. From yoshfuji@linux-ipv6.org Tue Feb 25 08:32:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 08:32:18 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PGW63v018355 for ; Tue, 25 Feb 2003 08:32:07 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1PFadBF030281; Wed, 26 Feb 2003 00:36:39 +0900 Date: Wed, 26 Feb 2003 00:36:25 +0900 (JST) Message-Id: <20030226.003625.90530451.yoshfuji@linux-ipv6.org> To: davem@redhat.com Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030223.225251.119557134.davem@redhat.com> References: <20030223.223114.65976206.davem@redhat.com> <20030224.155852.611429637.yoshfuji@linux-ipv6.org> <20030223.225251.119557134.davem@redhat.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1803 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hi, In article <20030223.225251.119557134.davem@redhat.com> (at Sun, 23 Feb 2003 22:52:51 -0800 (PST)), "David S. Miller" says: > MD5 code in linux-2.4.x patch what I sent you was taken from > Colin Plumb's public domain implementation. > (USAGI itself uses KAME implementation.) > > Please send me the 2.4.x version of the privacy > extension patch so that I may have a look. Here's the patch for linux-2.4.21-pre4. Thanks. Index: Documentation/Configure.help =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/Documentation/Configure.help,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.2.1 diff -u -r1.1.1.4 -r1.1.1.4.2.1 --- Documentation/Configure.help 24 Feb 2003 09:48:53 -0000 1.1.1.4 +++ Documentation/Configure.help 24 Feb 2003 10:40:59 -0000 1.1.1.4.2.1 @@ -5627,6 +5627,19 @@ It is safe to say N here for now. +IPv6: Privacy Extensions (RFC 3041) support +CONFIG_IPV6_PRIVACY + Privacy Extensions for Stateless Address Autoconfiguration in IPv6 + support. With this option, additional periodically-alter + pseudo-random global-scope unicast address(es) will assigned to + your interface(s). + + By default, kernel generates temporary addresses but it won't use + them unless application explicitly bind them. To prefer temporary + address, do + + echo 2 >/proc/sys/net/ipv6/conf/all/use_tempaddr + Kernel httpd acceleration CONFIG_KHTTPD The kernel httpd acceleration daemon (kHTTPd) is a (limited) web Index: Documentation/networking/ip-sysctl.txt =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/Documentation/networking/ip-sysctl.txt,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.2.1 diff -u -r1.1.1.2 -r1.1.1.2.2.1 --- Documentation/networking/ip-sysctl.txt 24 Feb 2003 09:48:56 -0000 1.1.1.2 +++ Documentation/networking/ip-sysctl.txt 24 Feb 2003 10:40:59 -0000 1.1.1.2.2.1 @@ -613,6 +613,36 @@ routers are present. Default: 3 +use_tempaddr - INTEGER + Preference for Privacy Extensions (RFC3041). + <= 0 : disable Privacy Extensions + == 1 : enable Privacy Extensions, but prefer public + addresses over temporary addresses. + > 1 : enable Privacy Extensions and prefer temporary + addresses over public addresses. + Default: 1 (for most devices) + 0 (for point-to-point devices and loopback devices) + +temp_valid_lft - INTEGER + valid lifetime (in seconds) for temporary addresses. + Default: 604800 (7 days) + +temp_prefered_lft - INTEGER + Preferred lifetime (in seconds) for temorary addresses. + Default: 86400 (1 day) + +max_desync_factor - INTEGER + Maximum value for DESYNC_FACTOR, which is a random value + that ensures that clients don't synchronize with each + other and generage new addresses at exactly the same time. + value is in seconds. + Default: 600 + +regen_max_retry - INTEGER + Number of attempts before give up attempting to generate + valid temporary addresses. + Default: 5 + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. Index: include/linux/md5.h =================================================================== RCS file: include/linux/md5.h diff -N include/linux/md5.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ include/linux/md5.h 24 Feb 2003 10:40:59 -0000 1.1.6.1 @@ -0,0 +1,23 @@ +/* + * md5.h - for lib/md5.c + * + * $USAGI: md5.h,v 1.1.6.1 2003/02/24 10:40:59 yoshfuji Exp $ + */ + +#ifndef _LINUX_MD5_H +#define _LINUX_MD5_H + +typedef struct MD5Context { + __u32 buf[4]; + __u32 bits[2]; + __u8 in[64]; +} MD5_CTX; + +void MD5Init(struct MD5Context *context); +void MD5Update(struct MD5Context *context, + __u8 const *buf, unsigned int len); +void MD5Final(__u8 digest[16], + struct MD5Context *context); +void MD5Transform(__u32 buf[4], __u32 const in[16]); + +#endif /* !_LINUX_MD5_H */ Index: include/linux/rtnetlink.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/include/linux/rtnetlink.h,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.30.1 diff -u -r1.1.1.2 -r1.1.1.2.30.1 --- include/linux/rtnetlink.h 9 Oct 2002 01:35:37 -0000 1.1.1.2 +++ include/linux/rtnetlink.h 24 Feb 2003 10:40:59 -0000 1.1.1.2.30.1 @@ -315,6 +315,7 @@ /* ifa_flags */ #define IFA_F_SECONDARY 0x01 +#define IFA_F_TEMPORARY IFA_F_SECONDARY #define IFA_F_DEPRECATED 0x20 #define IFA_F_TENTATIVE 0x40 Index: include/linux/sysctl.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/include/linux/sysctl.h,v retrieving revision 1.1.1.3 retrieving revision 1.1.1.3.2.1 diff -u -r1.1.1.3 -r1.1.1.3.2.1 --- include/linux/sysctl.h 24 Feb 2003 09:47:45 -0000 1.1.1.3 +++ include/linux/sysctl.h 24 Feb 2003 10:40:59 -0000 1.1.1.3.2.1 @@ -375,7 +375,12 @@ NET_IPV6_DAD_TRANSMITS=7, NET_IPV6_RTR_SOLICITS=8, NET_IPV6_RTR_SOLICIT_INTERVAL=9, - NET_IPV6_RTR_SOLICIT_DELAY=10 + NET_IPV6_RTR_SOLICIT_DELAY=10, + NET_IPV6_USE_TEMPADDR=11, + NET_IPV6_TEMP_VALID_LFT=12, + NET_IPV6_TEMP_PREFERED_LFT=13, + NET_IPV6_REGEN_MAX_RETRY=14, + NET_IPV6_MAX_DESYNC_FACTOR=15 }; /* /proc/sys/net/ipv6/icmp */ Index: include/net/addrconf.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/include/net/addrconf.h,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.2.1 diff -u -r1.1.1.2 -r1.1.1.2.2.1 --- include/net/addrconf.h 24 Feb 2003 09:47:49 -0000 1.1.1.2 +++ include/net/addrconf.h 24 Feb 2003 10:40:59 -0000 1.1.1.2.2.1 @@ -6,6 +6,11 @@ #define MAX_RTR_SOLICITATIONS 3 #define RTR_SOLICITATION_INTERVAL (4*HZ) +#define TEMP_VALID_LIFETIME (7*86400) +#define TEMP_PREFERRED_LIFETIME (86400) +#define REGEN_MAX_RETRY (5) +#define MAX_DESYNC_FACTOR (600) + #define ADDR_CHECK_FREQUENCY (120*HZ) struct prefix_info { Index: include/net/if_inet6.h =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/include/net/if_inet6.h,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.54.1 diff -u -r1.1.1.1 -r1.1.1.1.54.1 --- include/net/if_inet6.h 20 Aug 2002 09:46:45 -0000 1.1.1.1 +++ include/net/if_inet6.h 24 Feb 2003 10:40:59 -0000 1.1.1.1.54.1 @@ -43,6 +43,12 @@ struct inet6_ifaddr *lst_next; /* next addr in addr_lst */ struct inet6_ifaddr *if_next; /* next addr in inet6_dev */ +#ifdef CONFIG_IPV6_PRIVACY + struct inet6_ifaddr *tmp_next; /* next addr in tempaddr_lst */ + struct inet6_ifaddr *ifpub; + int regen_count; +#endif + int dead; }; @@ -86,7 +92,13 @@ int rtr_solicits; int rtr_solicit_interval; int rtr_solicit_delay; - +#ifdef CONFIG_IPV6_PRIVACY + int use_tempaddr; + int temp_valid_lft; + int temp_prefered_lft; + int regen_max_retry; + int max_desync_factor; +#endif void *sysctl; }; @@ -100,6 +112,13 @@ atomic_t refcnt; __u32 if_flags; int dead; + +#ifdef CONFIG_IPV6_PRIVACY + u8 rndid[8]; + u8 entropy[8]; + struct timer_list regen_timer; + struct inet6_ifaddr *tempaddr_list; +#endif struct neigh_parms *nd_parms; struct inet6_dev *next; Index: lib/Config.in =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/lib/Config.in,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.26.1 diff -u -r1.1.1.1 -r1.1.1.1.26.1 --- lib/Config.in 9 Oct 2002 01:35:37 -0000 1.1.1.1 +++ lib/Config.in 24 Feb 2003 10:40:59 -0000 1.1.1.1.26.1 @@ -5,6 +5,19 @@ comment 'Library routines' # +# MD5 digest +# +if [ "$CONFIG_IPV6_PRIVACY" = "y" ]; then + if [ "$CONFIG_IPV6" = "y" ]; then + define_tristate CONFIG_MD5 y + else + define_tristate CONFIG_MD5 m + fi +else + tristate 'MD5 digest support' CONFIG_MD5 +fi + +# # Do we need the compression support? # if [ "$CONFIG_CRAMFS" = "y" -o \ Index: lib/Makefile =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/lib/Makefile,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.30.1 diff -u -r1.1.1.2 -r1.1.1.2.30.1 --- lib/Makefile 9 Oct 2002 01:35:37 -0000 1.1.1.2 +++ lib/Makefile 24 Feb 2003 10:40:59 -0000 1.1.1.2.30.1 @@ -20,6 +20,11 @@ obj-y += dec_and_lock.o endif +obj-$(CONFIG_MD5) += md5.o +ifeq ($(CONFIG_MD5),y) + export-objs += md5.o +endif + subdir-$(CONFIG_ZLIB_INFLATE) += zlib_inflate subdir-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate Index: lib/md5.c =================================================================== RCS file: lib/md5.c diff -N lib/md5.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ lib/md5.c 24 Feb 2003 10:40:59 -0000 1.1.6.1 @@ -0,0 +1,252 @@ +/* + * This code implements the MD5 message-digest algorithm. + * The algorithm is due to Ron Rivest. This code was + * written by Colin Plumb in 1993, no copyright is claimed. + * This code is in the public domain; do with it what you wish. + * + * Equivalent code is available from RSA Data Security, Inc. + * This code has been tested against that, and is equivalent, + * except that you don't need to include two pages of legalese + * with every copy. + * + * To compute the message digest of a chunk of bytes, declare an + * MD5Context structure, pass it to MD5Init, call MD5Update as + * needed on buffers full of bytes, and then call MD5Final, which + * will fill a supplied 16-byte array with the digest. + * + * Modified for Linux kernel by YOSHIFUJI Hideaki / USAGI Project. + * $USAGI: md5.c,v 1.1.6.1 2003/02/24 10:40:59 yoshfuji Exp $ + */ +#include +#include +#include /* for memcpy() */ +#include + +#ifndef __LITTLE_ENDIAN +#define byteReverse(buf, len) do { } while(0) +#else +static inline void byteReverse(u32 *p, int longs) +{ + do { + *p = cpu_to_le32p(p); + p++; + } while (--longs); +} +#endif + +/* + * Start MD5 accumulation. Set bit count to 0 and buffer to mysterious + * initialization constants. + */ +void MD5Init(struct MD5Context *ctx) +{ + ctx->buf[0] = 0x67452301; + ctx->buf[1] = 0xefcdab89; + ctx->buf[2] = 0x98badcfe; + ctx->buf[3] = 0x10325476; + + ctx->bits[0] = 0; + ctx->bits[1] = 0; +} + +/* + * Update context to reflect the concatenation of another buffer full + * of bytes. + */ +void +MD5Update(struct MD5Context *ctx, u8 const *buf, unsigned int len) +{ + unsigned int t; + + /* Update bitcount */ + + t = ctx->bits[0]; + if ((ctx->bits[0] = t + ((u32) len << 3)) < t) + ctx->bits[1]++; /* Carry from low to high */ + ctx->bits[1] += len >> 29; + + t = (t >> 3) & 0x3f; /* Bytes already in shsInfo->data */ + + /* Handle any leading odd-sized chunks */ + + if (t) { + u8 *p = (u8 *) ctx->in + t; + + t = 64 - t; + if (len < t) { + memcpy(p, buf, len); + return; + } + memcpy(p, buf, t); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + buf += t; + len -= t; + } + /* Process data in 64-byte chunks */ + + while (len >= 64) { + memcpy(ctx->in, buf, 64); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + buf += 64; + len -= 64; + } + + /* Handle any remaining bytes of data. */ + + memcpy(ctx->in, buf, len); +} + +/* + * Final wrapup - pad to 64-byte boundary with the bit pattern + * 1 0* (64-bit count of bits processed, MSB-first) + */ +void MD5Final(u8 digest[16], struct MD5Context *ctx) +{ + u32 count; + u8 *p; + + /* Compute number of bytes mod 64 */ + count = (ctx->bits[0] >> 3) & 0x3F; + + /* Set the first char of padding to 0x80. This is safe since there is + * always at least one byte free */ + p = ctx->in + count; + *p++ = 0x80; + + /* Bytes of padding needed to make 64 bytes */ + count = 64 - 1 - count; + + /* Pad out to 56 mod 64 */ + if (count < 8) { + /* Two lots of padding: Pad the first block to 64 bytes */ + memset(p, 0, count); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + + /* Now fill the next block with 56 bytes */ + memset(ctx->in, 0, 56); + } else { + /* Pad block to 56 bytes */ + memset(p, 0, count - 8); + } + byteReverse((u32*)ctx->in, 14); + + /* Append length in bits and transform */ + ((u32 *) ctx->in)[14] = ctx->bits[0]; + ((u32 *) ctx->in)[15] = ctx->bits[1]; + + MD5Transform(ctx->buf, (u32 *) ctx->in); + byteReverse(ctx->buf, 4); + memcpy(digest, ctx->buf, 16); + memset((char *) ctx, 0, sizeof(ctx)); /* In case it's sensitive */ +} + +/* The four core functions - F1 is optimized somewhat */ + +/* #define F1(x, y, z) (x & y | ~x & z) */ +#define F1(x, y, z) (z ^ (x & (y ^ z))) +#define F2(x, y, z) F1(z, x, y) +#define F3(x, y, z) (x ^ y ^ z) +#define F4(x, y, z) (y ^ (x | ~z)) + +/* This is the central step in the MD5 algorithm. */ +#define MD5STEP(f, w, x, y, z, data, s) \ + ( w += f(x, y, z) + data, w = w<>(32-s), w += x ) + +/* + * The core of the MD5 algorithm, this alters an existing MD5 hash to + * reflect the addition of 16 longwords of new data. MD5Update blocks + * the data and converts bytes into longwords for this routine. + */ +void MD5Transform(__u32 buf[4], __u32 const in[16]) +{ + register u32 a, b, c, d; + + a = buf[0]; + b = buf[1]; + c = buf[2]; + d = buf[3]; + + MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7); + MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12); + MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17); + MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22); + MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7); + MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12); + MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17); + MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22); + MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7); + MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12); + MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17); + MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22); + MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7); + MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12); + MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17); + MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22); + + MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5); + MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9); + MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14); + MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20); + MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5); + MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9); + MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14); + MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20); + MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5); + MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9); + MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14); + MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20); + MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5); + MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9); + MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14); + MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20); + + MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4); + MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11); + MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16); + MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23); + MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4); + MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11); + MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16); + MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23); + MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4); + MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11); + MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16); + MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23); + MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4); + MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11); + MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16); + MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23); + + MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6); + MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10); + MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15); + MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21); + MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6); + MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10); + MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15); + MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21); + MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6); + MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10); + MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15); + MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21); + MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6); + MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10); + MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15); + MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21); + + buf[0] += a; + buf[1] += b; + buf[2] += c; + buf[3] += d; +} + +#ifdef CONFIG_MD5 +EXPORT_SYMBOL(MD5Init); +EXPORT_SYMBOL(MD5Update); +EXPORT_SYMBOL(MD5Final); +EXPORT_SYMBOL(MD5Transform); +#endif + Index: net/ipv6/Config.in =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/net/ipv6/Config.in,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.50.1 diff -u -r1.1.1.1 -r1.1.1.1.50.1 --- net/ipv6/Config.in 20 Aug 2002 09:47:02 -0000 1.1.1.1 +++ net/ipv6/Config.in 24 Feb 2003 10:40:59 -0000 1.1.1.1.50.1 @@ -5,6 +5,8 @@ #bool ' IPv6: flow policy support' CONFIG_RT6_POLICY #bool ' IPv6: firewall support' CONFIG_IPV6_FIREWALL +bool ' IPv6: Privacy Extentions (RFC 3041) support' CONFIG_IPV6_PRIVACY + if [ "$CONFIG_NETFILTER" != "n" ]; then source net/ipv6/netfilter/Config.in fi Index: net/ipv6/addrconf.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/net/ipv6/addrconf.c,v retrieving revision 1.1.1.4 retrieving revision 1.1.1.4.2.3 diff -u -r1.1.1.4 -r1.1.1.4.2.3 --- net/ipv6/addrconf.c 24 Feb 2003 09:47:55 -0000 1.1.1.4 +++ net/ipv6/addrconf.c 25 Feb 2003 07:45:16 -0000 1.1.1.4.2.3 @@ -28,6 +28,8 @@ * packets. * YOSHIFUJI Hideaki @USAGI : improved accuracy of * address validation timer. + * YOSHIFUJI Hideaki @USAGI : Privacy Extensions (RFC3041) + * support. */ #include @@ -62,6 +64,11 @@ #include #include +#ifdef CONFIG_IPV6_PRIVACY +#include +#include +#endif + #include #define IPV6_MAX_ADDRESSES 16 @@ -83,6 +90,16 @@ int inet6_dev_count; int inet6_ifa_count; +#ifdef CONFIG_IPV6_PRIVACY +static int __ipv6_regen_rndid(struct inet6_dev *idev); +static int __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpaddr); +static void ipv6_regen_rndid(unsigned long data); + +static int desync_factor = MAX_DESYNC_FACTOR * HZ; +#endif + +int ipv6_count_addresses(struct inet6_dev *idev); + /* * Configured unicast address hash table */ @@ -119,6 +136,13 @@ MAX_RTR_SOLICITATIONS, /* router solicits */ RTR_SOLICITATION_INTERVAL, /* rtr solicit interval */ MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ +#ifdef CONFIG_IPV6_PRIVACY + .use_tempaddr = 1, + .temp_valid_lft = TEMP_VALID_LIFETIME, + .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, + .regen_max_retry = REGEN_MAX_RETRY, + .max_desync_factor = MAX_DESYNC_FACTOR, +#endif }; static struct ipv6_devconf ipv6_devconf_dflt = @@ -133,6 +157,13 @@ MAX_RTR_SOLICITATIONS, /* router solicits */ RTR_SOLICITATION_INTERVAL, /* rtr solicit interval */ MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ +#ifdef CONFIG_IPV6_PRIVACY + .use_tempaddr = 1, + .temp_valid_lft = TEMP_VALID_LIFETIME, + .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, + .regen_max_retry = REGEN_MAX_RETRY, + .max_desync_factor = MAX_DESYNC_FACTOR, +#endif }; int ipv6_addr_type(struct in6_addr *addr) @@ -272,6 +303,24 @@ /* We refer to the device */ dev_hold(dev); +#ifdef CONFIG_IPV6_PRIVACY + get_random_bytes(ndev->rndid, sizeof(ndev->rndid)); + get_random_bytes(ndev->entropy, sizeof(ndev->entropy)); + init_timer(&ndev->regen_timer); + ndev->regen_timer.function = ipv6_regen_rndid; + ndev->regen_timer.data = (unsigned long) ndev; + if ((dev->flags&IFF_LOOPBACK) || + dev->type == ARPHRD_TUNNEL || + dev->type == ARPHRD_SIT) { + printk(KERN_INFO + "Disabled Privacy Extensions on device %p(%s)\n", + dev, dev->name); + ndev->cnf.use_tempaddr = -1; + } else { + __ipv6_regen_rndid(ndev); + } +#endif + write_lock_bh(&addrconf_lock); dev->ip6_ptr = ndev; /* One reference from device */ @@ -396,6 +445,18 @@ /* Add to inet6_dev unicast addr list. */ ifa->if_next = idev->addr_list; idev->addr_list = ifa; + +#ifdef CONFIG_IPV6_PRIVACY + ifa->regen_count = 0; + if (ifa->flags&IFA_F_TEMPORARY) { + ifa->tmp_next = idev->tempaddr_list; + idev->tempaddr_list = ifa; + in6_ifa_hold(ifa); + } else { + ifa->tmp_next = NULL; + } +#endif + in6_ifa_hold(ifa); write_unlock_bh(&idev->lock); read_unlock(&addrconf_lock); @@ -417,6 +478,15 @@ ifp->dead = 1; +#ifdef CONFIG_IPV6_PRIVACY + spin_lock_bh(&ifp->lock); + if (ifp->ifpub) { + __in6_ifa_put(ifp->ifpub); + ifp->ifpub = NULL; + } + spin_unlock_bh(&ifp->lock); +#endif + write_lock_bh(&addrconf_hash_lock); for (ifap = &inet6_addr_lst[hash]; (ifa=*ifap) != NULL; ifap = &ifa->lst_next) { @@ -430,6 +500,24 @@ write_unlock_bh(&addrconf_hash_lock); write_lock_bh(&idev->lock); +#ifdef CONFIG_IPV6_PRIVACY + if (ifp->flags&IFA_F_TEMPORARY) { + for (ifap = &idev->tempaddr_list; (ifa=*ifap) != NULL; + ifap = &ifa->tmp_next) { + if (ifa == ifp) { + *ifap = ifa->tmp_next; + if (ifp->ifpub) { + __in6_ifa_put(ifp->ifpub); + ifp->ifpub = NULL; + } + __in6_ifa_put(ifp); + ifa->tmp_next = NULL; + break; + } + } + } +#endif + for (ifap = &idev->addr_list; (ifa=*ifap) != NULL; ifap = &ifa->if_next) { if (ifa == ifp) { @@ -450,6 +538,96 @@ in6_ifa_put(ifp); } +#ifdef CONFIG_IPV6_PRIVACY +static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, struct inet6_ifaddr *ift) +{ + struct inet6_dev *idev; + struct in6_addr addr, *tmpaddr; + unsigned long tmp_prefered_lft, tmp_valid_lft; + int tmp_plen; + int ret = 0; + + if (ift) { + spin_lock_bh(&ift->lock); + memcpy(&addr.s6_addr[8], &ift->addr.s6_addr[8], 8); + spin_unlock_bh(&ift->lock); + tmpaddr = &addr; + } else { + tmpaddr = NULL; + } +retry: + spin_lock_bh(&ifp->lock); + in6_ifa_hold(ifp); + idev = ifp->idev; + in6_dev_hold(idev); + memcpy(addr.s6_addr, ifp->addr.s6_addr, 8); + write_lock(&idev->lock); + if (idev->cnf.use_tempaddr <= 0) { + write_unlock(&idev->lock); + spin_unlock_bh(&ifp->lock); + printk(KERN_INFO + "ipv6_create_tempaddr(): use_tempaddr is disabled.\n"); + in6_dev_put(idev); + in6_ifa_put(ifp); + ret = -1; + goto out; + } + if (ifp->regen_count++ >= idev->cnf.regen_max_retry) { + idev->cnf.use_tempaddr = -1; /*XXX*/ + write_unlock(&idev->lock); + spin_unlock_bh(&ifp->lock); + printk(KERN_WARNING + "ipv6_create_tempaddr(): regeneration time exceeded. disabled temporary address support.\n"); + in6_dev_put(idev); + in6_ifa_put(ifp); + ret = -1; + goto out; + } + if (__ipv6_try_regen_rndid(idev, tmpaddr) < 0) { + write_unlock(&idev->lock); + spin_unlock_bh(&ifp->lock); + printk(KERN_WARNING + "ipv6_create_tempaddr(): regeneration of randomized interface id failed.\n"); + in6_dev_put(idev); + in6_ifa_put(ifp); + ret = -1; + goto out; + } + memcpy(&addr.s6_addr[8], idev->rndid, 8); + tmp_valid_lft = min_t(__u32, + ifp->valid_lft, + idev->cnf.temp_valid_lft); + tmp_prefered_lft = min_t(__u32, + ifp->prefered_lft, + idev->cnf.temp_prefered_lft - desync_factor / HZ); + tmp_plen = ifp->prefix_len; + write_unlock(&idev->lock); + spin_unlock_bh(&ifp->lock); + ift = ipv6_count_addresses(idev) < IPV6_MAX_ADDRESSES ? + ipv6_add_addr(idev, &addr, tmp_plen, + ipv6_addr_type(&addr)&IPV6_ADDR_SCOPE_MASK, IFA_F_TEMPORARY) : 0; + if (!ift) { + in6_dev_put(idev); + in6_ifa_put(ifp); + printk(KERN_INFO + "ipv6_create_tempaddr(): retry temporary address regeneration.\n"); + tmpaddr = &addr; + goto retry; + } + spin_lock_bh(&ift->lock); + ift->ifpub = ifp; + ift->valid_lft = tmp_valid_lft; + ift->prefered_lft = tmp_prefered_lft; + ift->tstamp = ifp->tstamp; + spin_unlock_bh(&ift->lock); + addrconf_dad_start(ift); + in6_ifa_put(ift); + in6_dev_put(idev); +out: + return ret; +} +#endif + /* * Choose an apropriate source address * should do: @@ -458,6 +636,22 @@ * an address of the attached interface * iii) don't use deprecated addresses */ +static int inline ipv6_saddr_pref(const struct inet6_ifaddr *ifp, u8 invpref) +{ + int pref; + pref = ifp->flags&IFA_F_DEPRECATED ? 0 : 2; +#ifdef CONFIG_IPV6_PRIVACY + pref |= (ifp->flags^invpref)&IFA_F_TEMPORARY ? 0 : 1; +#endif + return pref; +} + +#ifdef CONFIG_IPV6_PRIVACY +#define IPV6_GET_SADDR_MAXSCORE(score) ((score) == 3) +#else +#define IPV6_GET_SADDR_MAXSCORE(score) (score) +#endif + int ipv6_get_saddr(struct dst_entry *dst, struct in6_addr *daddr, struct in6_addr *saddr) { @@ -468,6 +662,7 @@ struct inet6_dev *idev; struct rt6_info *rt; int err; + int hiscore = -1, score; rt = (struct rt6_info *) dst; if (rt) @@ -497,17 +692,27 @@ read_lock_bh(&idev->lock); for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { if (ifp->scope == scope) { - if (!(ifp->flags & (IFA_F_DEPRECATED|IFA_F_TENTATIVE))) { - in6_ifa_hold(ifp); + if (ifp->flags&IFA_F_TENTATIVE) + continue; +#ifdef CONFIG_IPV6_PRIVACY + score = ipv6_saddr_pref(ifp, idev->cnf.use_tempaddr > 1 ? IFA_F_TEMPORARY : 0); +#else + score = ipv6_saddr_pref(ifp, 0); +#endif + if (score <= hiscore) + continue; + + if (match) + in6_ifa_put(match); + match = ifp; + hiscore = score; + in6_ifa_hold(ifp); + + if (IPV6_GET_SADDR_MAXSCORE(score)) { read_unlock_bh(&idev->lock); read_unlock(&addrconf_lock); goto out; } - - if (!match && !(ifp->flags & IFA_F_TENTATIVE)) { - match = ifp; - in6_ifa_hold(ifp); - } } } read_unlock_bh(&idev->lock); @@ -530,16 +735,26 @@ read_lock_bh(&idev->lock); for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { if (ifp->scope == scope) { - if (!(ifp->flags&(IFA_F_DEPRECATED|IFA_F_TENTATIVE))) { - in6_ifa_hold(ifp); + if (ifp->flags&IFA_F_TENTATIVE) + continue; +#ifdef CONFIG_IPV6_PRIVACY + score = ipv6_saddr_pref(ifp, idev->cnf.use_tempaddr > 1 ? IFA_F_TEMPORARY : 0); +#else + score = ipv6_saddr_pref(ifp, 0); +#endif + if (score <= hiscore) + continue; + + if (match) + in6_ifa_put(match); + match = ifp; + hiscore = score; + in6_ifa_hold(ifp); + + if (IPV6_GET_SADDR_MAXSCORE(score)) { read_unlock_bh(&idev->lock); goto out_unlock_base; } - - if (!match && !(ifp->flags&IFA_F_TENTATIVE)) { - match = ifp; - in6_ifa_hold(ifp); - } } } read_unlock_bh(&idev->lock); @@ -551,19 +766,12 @@ read_unlock(&dev_base_lock); out: - if (ifp == NULL) { - ifp = match; - match = NULL; - } - err = -EADDRNOTAVAIL; - if (ifp) { - ipv6_addr_copy(saddr, &ifp->addr); + if (match) { + ipv6_addr_copy(saddr, &match->addr); err = 0; - in6_ifa_put(ifp); - } - if (match) in6_ifa_put(match); + } return err; } @@ -653,6 +861,21 @@ ifp->flags |= IFA_F_TENTATIVE; spin_unlock_bh(&ifp->lock); in6_ifa_put(ifp); +#ifdef CONFIG_IPV6_PRIVACY + } else if (ifp->flags&IFA_F_TEMPORARY) { + struct inet6_ifaddr *ifpub; + spin_lock_bh(&ifp->lock); + ifpub = ifp->ifpub; + if (ifpub) { + in6_ifa_hold(ifpub); + spin_unlock_bh(&ifp->lock); + ipv6_create_tempaddr(ifpub, ifp); + in6_ifa_put(ifpub); + } else { + spin_unlock_bh(&ifp->lock); + } + ipv6_del_addr(ifp); +#endif } else ipv6_del_addr(ifp); } @@ -718,6 +941,91 @@ return err; } +#ifdef CONFIG_IPV6_PRIVACY +/* (re)generation of randomized interface identifier (RFC 3041 3.2, 3.5) */ +static int __ipv6_regen_rndid(struct inet6_dev *idev) +{ + struct net_device *dev; + u8 eui64[8]; + u8 digest[16]; + MD5_CTX ctx; + + if (!del_timer(&idev->regen_timer)) + in6_dev_hold(idev); + + dev = idev->dev; + + if (ipv6_generate_eui64(eui64, dev)) { + printk(KERN_INFO + "__ipv6_regen_rndid(idev=%p): cannot get EUI64 identifier; use random bytes.\n", + idev); + get_random_bytes(eui64, sizeof(eui64)); + } +regen: + MD5Init(&ctx); + MD5Update(&ctx, idev->entropy, 8); + MD5Update(&ctx, eui64, 8); + MD5Final(digest, &ctx); + memcpy(idev->rndid, &digest[0], 8); + idev->rndid[0] &= ~0x02; + memcpy(idev->entropy, &digest[8], 8); + + /* + * : + * check if generated address is not inappropriate + * + * - Reserved subnet anycast (RFC 2526) + * 11111101 11....11 1xxxxxxx + * - ISATAP (draft-ietf-ngtrans-isatap-01.txt) 4.3 + * 00-00-5E-FE-xx-xx-xx-xx + * - value 0 + * - XXX: already assigned to an address on the device + */ + if (idev->rndid[0] == 0xfd && + (idev->rndid[1]&idev->rndid[2]&idev->rndid[3]&idev->rndid[4]&idev->rndid[5]&idev->rndid[6]) && + (idev->rndid[7]&0x80)) + goto regen; + if ((idev->rndid[0]|idev->rndid[1]) == 0) { + if (idev->rndid[2] == 0x5e && idev->rndid[3] == 0xfe) + goto regen; + if ((idev->rndid[2]|idev->rndid[3]|idev->rndid[4]|idev->rndid[5]|idev->rndid[6]|idev->rndid[7]) == 0x00) + goto regen; + } + + if (time_before(idev->regen_timer.expires, jiffies)) { + idev->regen_timer.expires = 0; + printk(KERN_WARNING + "__ipv6_regen_rndid(): too short regeneration interval; timer diabled for %s.\n", + idev->dev->name); + in6_dev_put(idev); + return -1; + } + + add_timer(&idev->regen_timer); + return 0; +} + +static void ipv6_regen_rndid(unsigned long data) +{ + struct inet6_dev *idev = (struct inet6_dev *) data; + + read_lock_bh(&addrconf_lock); + write_lock_bh(&idev->lock); + if (!idev->dead) + __ipv6_regen_rndid(idev); + write_unlock_bh(&idev->lock); + read_unlock_bh(&addrconf_lock); +} + +static int __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpaddr) { + int ret = 0; + + if (tmpaddr && memcmp(idev->rndid, &tmpaddr->s6_addr[8], 8) == 0) + ret = __ipv6_regen_rndid(idev); + return ret; +} +#endif + /* * Add prefix route. */ @@ -889,6 +1197,7 @@ struct inet6_ifaddr * ifp; struct in6_addr addr; int plen; + int create = 0; plen = pinfo->prefix_len >> 3; @@ -924,6 +1233,7 @@ return; } + create = 1; addrconf_dad_start(ifp); } @@ -934,6 +1244,9 @@ if (ifp) { int flags; +#ifdef CONFIG_IPV6_PRIVACY + struct inet6_ifaddr *ift; +#endif spin_lock(&ifp->lock); ifp->valid_lft = valid_lft; @@ -946,6 +1259,42 @@ if (!(flags&IFA_F_TENTATIVE)) ipv6_ifa_notify((flags&IFA_F_DEPRECATED) ? 0 : RTM_NEWADDR, ifp); + +#ifdef CONFIG_IPV6_PRIVACY + read_lock_bh(&in6_dev->lock); + /* update all temporary addresses in the list */ + for (ift=in6_dev->tempaddr_list; ift; ift=ift->tmp_next) { + /* + * When adjusting the lifetimes of an existing + * temporary address, only lower the lifetimes. + * Implementations must not increase the + * lifetimes of an existing temporary address + * when processing a Prefix Information Option. + */ + spin_lock(&ift->lock); + flags = ift->flags; + if (ift->valid_lft > valid_lft && + ift->valid_lft - valid_lft > (jiffies - ift->tstamp) / HZ) + ift->valid_lft = valid_lft + (jiffies - ift->tstamp) / HZ; + if (ift->prefered_lft > prefered_lft && + ift->prefered_lft - prefered_lft > (jiffies - ift->tstamp) / HZ) + ift->prefered_lft = prefered_lft + (jiffies - ift->tstamp) / HZ; + spin_unlock(&ift->lock); + if (!(flags&IFA_F_TENTATIVE)) + ipv6_ifa_notify(0, ift); + } + + if (create && in6_dev->cnf.use_tempaddr > 0) { + /* + * When a new public address is created as described in [ADDRCONF], + * also create a new temporary address. + */ + read_unlock_bh(&in6_dev->lock); + ipv6_create_tempaddr(ifp, NULL); + } else { + read_unlock_bh(&in6_dev->lock); + } +#endif in6_ifa_put(ifp); addrconf_verify(0); } @@ -1643,6 +1992,9 @@ write_lock(&addrconf_hash_lock); for (ifp=inet6_addr_lst[i]; ifp; ifp=ifp->lst_next) { unsigned long age; +#ifdef CONFIG_IPV6_PRIVACY + unsigned long regen_advance; +#endif if (ifp->flags & IFA_F_PERMANENT) continue; @@ -1650,6 +2002,12 @@ spin_lock(&ifp->lock); age = (now - ifp->tstamp) / HZ; +#ifdef CONFIG_IPV6_PRIVACY + regen_advance = ifp->idev->cnf.regen_max_retry * + ifp->idev->cnf.dad_transmits * + ifp->idev->nd_parms->retrans_time / HZ; +#endif + if (age >= ifp->valid_lft) { spin_unlock(&ifp->lock); in6_ifa_hold(ifp); @@ -1678,6 +2036,28 @@ in6_ifa_put(ifp); goto restart; } +#ifdef CONFIG_IPV6_PRIVACY + } else if ((ifp->flags&IFA_F_TEMPORARY) && + !(ifp->flags&IFA_F_TENTATIVE)) { + if (age >= ifp->prefered_lft - regen_advance) { + struct inet6_ifaddr *ifpub = ifp->ifpub; + if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) + next = ifp->tstamp + ifp->prefered_lft * HZ; + if (!ifp->regen_count && ifpub) { + ifp->regen_count++; + in6_ifa_hold(ifp); + in6_ifa_hold(ifpub); + spin_unlock(&ifp->lock); + write_unlock(&addrconf_hash_lock); + ipv6_create_tempaddr(ifpub, ifp); + in6_ifa_put(ifpub); + in6_ifa_put(ifp); + goto restart; + } + } else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next)) + next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ; + spin_unlock(&ifp->lock); +#endif } else { /* ifp->prefered_lft <= ifp->valid_lft */ if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) @@ -1910,7 +2290,7 @@ static struct addrconf_sysctl_table { struct ctl_table_header *sysctl_header; - ctl_table addrconf_vars[11]; + ctl_table addrconf_vars[16]; ctl_table addrconf_dev[2]; ctl_table addrconf_conf_dir[2]; ctl_table addrconf_proto_dir[2]; @@ -1957,6 +2337,28 @@ &ipv6_devconf.rtr_solicit_delay, sizeof(int), 0644, NULL, &proc_dointvec_jiffies}, +#ifdef CONFIG_IPV6_PRIVACY + {NET_IPV6_USE_TEMPADDR, "use_tempaddr", + &ipv6_devconf.use_tempaddr, sizeof(int), 0644, NULL, + &proc_dointvec}, + + {NET_IPV6_TEMP_VALID_LFT, "temp_valid_lft", + &ipv6_devconf.temp_valid_lft, sizeof(int), 0644, NULL, + &proc_dointvec}, + + {NET_IPV6_TEMP_PREFERED_LFT, "temp_prefered_lft", + &ipv6_devconf.temp_prefered_lft, sizeof(int), 0644, NULL, + &proc_dointvec}, + + {NET_IPV6_REGEN_MAX_RETRY, "regen_max_retry", + &ipv6_devconf.regen_max_retry, sizeof(int), 0644, NULL, + &proc_dointvec}, + + {NET_IPV6_MAX_DESYNC_FACTOR, "max_desync_factor", + &ipv6_devconf.max_desync_factor, sizeof(int), 0644, NULL, + &proc_dointvec}, +#endif + {0}}, {{NET_PROTO_CONF_ALL, "all", NULL, 0, 0555, addrconf_sysctl.addrconf_vars},{0}}, @@ -1975,7 +2377,7 @@ if (t == NULL) return; memcpy(t, &addrconf_sysctl, sizeof(*t)); - for (i=0; iaddrconf_vars)/sizeof(t->addrconf_vars[0])-1; i++) { + for (i=0; t->addrconf_vars[i].data; i++) { t->addrconf_vars[i].data += (char*)p - (char*)&ipv6_devconf; t->addrconf_vars[i].de = NULL; } -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From hch@infradead.org Tue Feb 25 09:51:58 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 09:52:06 -0800 (PST) Received: from phoenix.infradead.org (phoenix.mvhi.com [195.224.96.167]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PHps3v019859 for ; Tue, 25 Feb 2003 09:51:55 -0800 Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 18njFA-0001hA-00; Tue, 25 Feb 2003 17:51:52 +0000 Date: Tue, 25 Feb 2003 17:51:51 +0000 From: Christoph Hellwig To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" Cc: davem@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 Message-ID: <20030225175151.A6512@infradead.org> Mail-Followup-To: Christoph Hellwig , "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" , davem@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org References: <20030223.225251.119557134.davem@redhat.com> <20030226.003625.90530451.yoshfuji@linux-ipv6.org> <20030225160634.A4525@infradead.org> <20030226.024750.63517417.yoshfuji@linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030226.024750.63517417.yoshfuji@linux-ipv6.org>; from yoshfuji@linux-ipv6.org on Wed, Feb 26, 2003 at 02:47:50AM +0900 X-archive-position: 1804 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev On Wed, Feb 26, 2003 at 02:47:50AM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > Thanks for comments. How about this? (only lib part) This one looks fine. From yoshfuji@linux-ipv6.org Tue Feb 25 10:22:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 10:22:21 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PIM63v022625 for ; Tue, 25 Feb 2003 10:22:07 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1PHloBF031237; Wed, 26 Feb 2003 02:47:50 +0900 Date: Wed, 26 Feb 2003 02:47:50 +0900 (JST) Message-Id: <20030226.024750.63517417.yoshfuji@linux-ipv6.org> To: hch@infradead.org Cc: davem@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20030225160634.A4525@infradead.org> References: <20030223.225251.119557134.davem@redhat.com> <20030226.003625.90530451.yoshfuji@linux-ipv6.org> <20030225160634.A4525@infradead.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1805 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hello. In article <20030225160634.A4525@infradead.org> (at Tue, 25 Feb 2003 16:06:34 +0000), Christoph Hellwig says: > Config.in files use three-space indents. : > > +ifeq ($(CONFIG_MD5),y) > > + export-objs += md5.o > > +endif > > this is wrong, objects are added to export-objs unconditional. > > > + > > +#ifdef CONFIG_MD5 : > > +EXPORT_SYMBOL(MD5Transform); > > +#endif > > Please remove the ifdef, it doesn't make any sense. Thanks for comments. How about this? (only lib part) Index: lib/Config.in =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/lib/Config.in,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.26.2 diff -u -r1.1.1.1 -r1.1.1.1.26.2 --- lib/Config.in 9 Oct 2002 01:35:37 -0000 1.1.1.1 +++ lib/Config.in 25 Feb 2003 17:18:27 -0000 1.1.1.1.26.2 @@ -5,6 +5,19 @@ comment 'Library routines' # +# MD5 digest +# +if [ "$CONFIG_IPV6_PRIVACY" = "y" ]; then + if [ "$CONFIG_IPV6" = "y" ]; then + define_tristate CONFIG_MD5 y + else + define_tristate CONFIG_MD5 m + fi +else + tristate 'MD5 digest support' CONFIG_MD5 +fi + +# # Do we need the compression support? # if [ "$CONFIG_CRAMFS" = "y" -o \ Index: lib/Makefile =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux24/lib/Makefile,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.30.2 diff -u -r1.1.1.2 -r1.1.1.2.30.2 --- lib/Makefile 9 Oct 2002 01:35:37 -0000 1.1.1.2 +++ lib/Makefile 25 Feb 2003 17:18:27 -0000 1.1.1.2.30.2 @@ -8,7 +8,7 @@ L_TARGET := lib.a -export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o rbtree.o +export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o rbtree.o md5.o obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o \ bust_spinlocks.o rbtree.o dump_stack.o @@ -19,6 +19,8 @@ ifneq ($(CONFIG_HAVE_DEC_LOCK),y) obj-y += dec_and_lock.o endif + +obj-$(CONFIG_MD5) += md5.o subdir-$(CONFIG_ZLIB_INFLATE) += zlib_inflate subdir-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate Index: lib/md5.c =================================================================== RCS file: lib/md5.c diff -N lib/md5.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ lib/md5.c 25 Feb 2003 17:18:27 -0000 1.1.6.2 @@ -0,0 +1,250 @@ +/* + * This code implements the MD5 message-digest algorithm. + * The algorithm is due to Ron Rivest. This code was + * written by Colin Plumb in 1993, no copyright is claimed. + * This code is in the public domain; do with it what you wish. + * + * Equivalent code is available from RSA Data Security, Inc. + * This code has been tested against that, and is equivalent, + * except that you don't need to include two pages of legalese + * with every copy. + * + * To compute the message digest of a chunk of bytes, declare an + * MD5Context structure, pass it to MD5Init, call MD5Update as + * needed on buffers full of bytes, and then call MD5Final, which + * will fill a supplied 16-byte array with the digest. + * + * Modified for Linux kernel by YOSHIFUJI Hideaki / USAGI Project. + * $USAGI: md5.c,v 1.1.6.2 2003/02/25 17:18:27 yoshfuji Exp $ + */ +#include +#include +#include /* for memcpy() */ +#include + +#ifndef __LITTLE_ENDIAN +#define byteReverse(buf, len) do { } while(0) +#else +static inline void byteReverse(u32 *p, int longs) +{ + do { + *p = cpu_to_le32p(p); + p++; + } while (--longs); +} +#endif + +/* + * Start MD5 accumulation. Set bit count to 0 and buffer to mysterious + * initialization constants. + */ +void MD5Init(struct MD5Context *ctx) +{ + ctx->buf[0] = 0x67452301; + ctx->buf[1] = 0xefcdab89; + ctx->buf[2] = 0x98badcfe; + ctx->buf[3] = 0x10325476; + + ctx->bits[0] = 0; + ctx->bits[1] = 0; +} + +/* + * Update context to reflect the concatenation of another buffer full + * of bytes. + */ +void +MD5Update(struct MD5Context *ctx, u8 const *buf, unsigned int len) +{ + unsigned int t; + + /* Update bitcount */ + + t = ctx->bits[0]; + if ((ctx->bits[0] = t + ((u32) len << 3)) < t) + ctx->bits[1]++; /* Carry from low to high */ + ctx->bits[1] += len >> 29; + + t = (t >> 3) & 0x3f; /* Bytes already in shsInfo->data */ + + /* Handle any leading odd-sized chunks */ + + if (t) { + u8 *p = (u8 *) ctx->in + t; + + t = 64 - t; + if (len < t) { + memcpy(p, buf, len); + return; + } + memcpy(p, buf, t); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + buf += t; + len -= t; + } + /* Process data in 64-byte chunks */ + + while (len >= 64) { + memcpy(ctx->in, buf, 64); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + buf += 64; + len -= 64; + } + + /* Handle any remaining bytes of data. */ + + memcpy(ctx->in, buf, len); +} + +/* + * Final wrapup - pad to 64-byte boundary with the bit pattern + * 1 0* (64-bit count of bits processed, MSB-first) + */ +void MD5Final(u8 digest[16], struct MD5Context *ctx) +{ + u32 count; + u8 *p; + + /* Compute number of bytes mod 64 */ + count = (ctx->bits[0] >> 3) & 0x3F; + + /* Set the first char of padding to 0x80. This is safe since there is + * always at least one byte free */ + p = ctx->in + count; + *p++ = 0x80; + + /* Bytes of padding needed to make 64 bytes */ + count = 64 - 1 - count; + + /* Pad out to 56 mod 64 */ + if (count < 8) { + /* Two lots of padding: Pad the first block to 64 bytes */ + memset(p, 0, count); + byteReverse((u32*)ctx->in, 16); + MD5Transform(ctx->buf, (u32 *) ctx->in); + + /* Now fill the next block with 56 bytes */ + memset(ctx->in, 0, 56); + } else { + /* Pad block to 56 bytes */ + memset(p, 0, count - 8); + } + byteReverse((u32*)ctx->in, 14); + + /* Append length in bits and transform */ + ((u32 *) ctx->in)[14] = ctx->bits[0]; + ((u32 *) ctx->in)[15] = ctx->bits[1]; + + MD5Transform(ctx->buf, (u32 *) ctx->in); + byteReverse(ctx->buf, 4); + memcpy(digest, ctx->buf, 16); + memset((char *) ctx, 0, sizeof(ctx)); /* In case it's sensitive */ +} + +/* The four core functions - F1 is optimized somewhat */ + +/* #define F1(x, y, z) (x & y | ~x & z) */ +#define F1(x, y, z) (z ^ (x & (y ^ z))) +#define F2(x, y, z) F1(z, x, y) +#define F3(x, y, z) (x ^ y ^ z) +#define F4(x, y, z) (y ^ (x | ~z)) + +/* This is the central step in the MD5 algorithm. */ +#define MD5STEP(f, w, x, y, z, data, s) \ + ( w += f(x, y, z) + data, w = w<>(32-s), w += x ) + +/* + * The core of the MD5 algorithm, this alters an existing MD5 hash to + * reflect the addition of 16 longwords of new data. MD5Update blocks + * the data and converts bytes into longwords for this routine. + */ +void MD5Transform(__u32 buf[4], __u32 const in[16]) +{ + register u32 a, b, c, d; + + a = buf[0]; + b = buf[1]; + c = buf[2]; + d = buf[3]; + + MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7); + MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12); + MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17); + MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22); + MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7); + MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12); + MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17); + MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22); + MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7); + MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12); + MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17); + MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22); + MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7); + MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12); + MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17); + MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22); + + MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5); + MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9); + MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14); + MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20); + MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5); + MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9); + MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14); + MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20); + MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5); + MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9); + MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14); + MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20); + MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5); + MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9); + MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14); + MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20); + + MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4); + MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11); + MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16); + MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23); + MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4); + MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11); + MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16); + MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23); + MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4); + MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11); + MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16); + MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23); + MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4); + MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11); + MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16); + MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23); + + MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6); + MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10); + MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15); + MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21); + MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6); + MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10); + MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15); + MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21); + MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6); + MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10); + MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15); + MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21); + MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6); + MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10); + MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15); + MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21); + + buf[0] += a; + buf[1] += b; + buf[2] += c; + buf[3] += d; +} + +EXPORT_SYMBOL(MD5Init); +EXPORT_SYMBOL(MD5Update); +EXPORT_SYMBOL(MD5Final); +EXPORT_SYMBOL(MD5Transform); + -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From greearb@candelatech.com Tue Feb 25 13:42:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 13:43:22 -0800 (PST) Received: from grok.yi.org (IDENT:wY+cAnwRkB0DbmDAW6FBfptIsWeFaZr7@dhcp93-dsl-usw3.w-link.net [206.129.84.93]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1PLgl3v002439 for ; Tue, 25 Feb 2003 13:42:48 -0800 Received: from candelatech.com (IDENT:B5m2l09UyY2CtJnS+ETRwrE3gTatRYA+@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id h1PLgc101707; Tue, 25 Feb 2003 13:42:38 -0800 Message-ID: <3E5BE34E.4030500@candelatech.com> Date: Tue, 25 Feb 2003 13:42:38 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: eepro list , "'netdev@oss.sgi.com'" Subject: locked up 4-port Intel NIC with RH 8.0 and eepro100 driver Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1806 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Seems transmit no longer works out of the second interface... Only the first two ports have a cable attached, each to a tg3 GigE NIC with a cross-over cable. It ran for about 3 days at 50Mbps bi-directional on two ports before locking up. The machine was at 100% CPU utilization this entire time. After stopping traffic for 1 minute, and then starting it again, it started working again. The error I was seeing indicated trying to write to the socket would block (EAGAIN). For what it's worth, I saw similar problems on this machine with an rtl8139too when connected to a 10bt hub, but in that case the rtldiag showed more problems, and lockups happened in 3-8 hours... If this line means nothing, then it's possibly a user-space problem. The Command register has an unprocessed command 0c00(?!). dmesg output: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin and others divert: allocating divert_blk for eth3 eth3: OEM i82557/i82558 10/100 Ethernet, 00:30:F7:03:C5:4A, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0xd0a6c714). Receiver lock-up workaround activated. divert: allocating divert_blk for eth4 eth4: OEM i82557/i82558 10/100 Ethernet, 00:30:F7:03:C5:4B, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0xd0a6c714). Receiver lock-up workaround activated. divert: allocating divert_blk for eth5 eth5: OEM i82557/i82558 10/100 Ethernet, 00:30:F7:03:C5:4C, IRQ 10. Receiver lock-up bug exists -- enabling work-around. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0xd0a6c714). Receiver lock-up workaround activated. divert: allocating divert_blk for eth6 eth6: OEM i82557/i82558 10/100 Ethernet, 00:30:F7:03:C5:4D, IRQ 5. Receiver lock-up bug exists -- enabling work-around. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0xd0a6c714). Receiver lock-up workaround activated. Here is the eepro-diag output: eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com) http://www.scyld.com/diag/index.html Index #1: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0x9000. i82557 chip registers at 0x9000: 0c000050 03b52000 00000000 00080002 1825c5e1 00000600 No interrupt sources are pending. The transmit unit state is 'Suspended'. The receive unit state is 'Ready'. This status is normal for an activated but idle interface. The Command register has an unprocessed command 0c00(?!). EEPROM contents, size 64x16: 00: 3000 03f7 4ac5 0100 0000 0301 0701 0701 0x08: 0000 0000 4840 0610 140b 0000 0000 0000 ... 0x38: 0000 0000 0000 0000 0000 0000 0000 c7a0 The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:30:F7:03:C5:4A. Receiver lock-up bug exists. (The driver work-around *is* implemented.) Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555, PHY 1. MII PHY #1 transceiver registers: 1000 782d 02a8 0154 05e1 c5e1 0009 0000 0000 0000 0000 0000 0000 0000 0000 0000 0203 0000 0001 082e 0000 0001 0ef1 0001 0000 0000 0000 0000 0010 0000 0000 0000. Alternate MII PHY (#1) transceiver registers: 1000 782d 02a8 0154 05e1 c5e1 0009 0000 0000 0000 0000 0000 0000 0000 0000 0000 0a03 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Baseline value of MII status register is 782d. Index #2: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0x9400. i82557 chip registers at 0x9400: 0c000050 03b04000 00000000 00080002 1825c5e1 00000600 No interrupt sources are pending. The transmit unit state is 'Suspended'. The receive unit state is 'Ready'. This status is normal for an activated but idle interface. The Command register has an unprocessed command 0c00(?!). EEPROM contents, size 64x16: 00: 3000 03f7 4bc5 0100 0000 0301 0701 0701 0x08: 0000 0000 4840 0610 140b 0000 0000 0000 ... 0x38: 0000 0000 0000 0000 0000 0000 0000 c6a0 The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:30:F7:03:C5:4B. Receiver lock-up bug exists. (The driver work-around *is* implemented.) Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555, PHY 1. MII PHY #1 transceiver registers: 1000 782d 02a8 0154 05e1 c5e1 0009 0000 0000 0000 0000 0000 0000 0000 0000 0000 0203 0000 0001 10c7 0000 0001 2388 0001 0000 0000 0000 0000 0010 0000 0000 0000. Alternate MII PHY (#1) transceiver registers: 1000 782d 02a8 0154 05e1 c5e1 0009 0000 0000 0000 0000 0000 0000 0000 0000 0000 0a03 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Baseline value of MII status register is 782d. Index #3: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0x9800. i82557 chip registers at 0x9800: 0c000050 03aa9000 00000000 00080002 18250000 00000600 No interrupt sources are pending. The transmit unit state is 'Suspended'. The receive unit state is 'Ready'. This status is normal for an activated but idle interface. The Command register has an unprocessed command 0c00(?!). EEPROM contents, size 64x16: 00: 3000 03f7 4cc5 0100 0000 0301 0701 0701 0x08: 0000 0000 4840 0610 140b 0000 0000 0000 ... 0x38: 0000 0000 0000 0000 0000 0000 0000 c5a0 The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:30:F7:03:C5:4C. Receiver lock-up bug exists. (The driver work-around *is* implemented.) Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555, PHY 1. MII PHY #1 transceiver registers: 1000 7809 02a8 0154 05e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Alternate MII PHY (#1) transceiver registers: 1000 7809 02a8 0154 05e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Baseline value of MII status register is 7809. Index #4: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0x9c00. i82557 chip registers at 0x9c00: 0c000050 03a84000 00000000 00080002 18250000 00000600 No interrupt sources are pending. The transmit unit state is 'Suspended'. The receive unit state is 'Ready'. This status is normal for an activated but idle interface. The Command register has an unprocessed command 0c00(?!). EEPROM contents, size 64x16: 00: 3000 03f7 4dc5 0100 0000 0301 0701 0701 0x08: 0000 0000 4840 0610 140b 0000 0000 0000 ... 0x38: 0000 0000 0000 0000 0000 0000 0000 c4a0 The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:30:F7:03:C5:4D. Receiver lock-up bug exists. (The driver work-around *is* implemented.) Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Secondary interface chip i82555, PHY 1. MII PHY #1 transceiver registers: 1000 7809 02a8 0154 05e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Alternate MII PHY (#1) transceiver registers: 1000 7809 02a8 0154 05e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000. Baseline value of MII status register is 7809. -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From pb@bieringer.de Tue Feb 25 22:43:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 25 Feb 2003 22:43:09 -0800 (PST) Received: from smtp2.aerasec.de (gromit.aerasec.de [195.226.187.57]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1Q6h33v013045 for ; Tue, 25 Feb 2003 22:43:04 -0800 Received: from localhost (localhost [127.0.0.1]) by smtp2.aerasec.de (Postfix) with SMTP id A58CD13870; Wed, 26 Feb 2003 07:43:01 +0100 (CET) X-AV-Checked: Wed Feb 26 07:43:01 2003 smtp2.aerasec.de Received: from pD9E4E465.dip.t-dialin.net (pD9E4E465.dip.t-dialin.net [217.228.228.101]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client did not present a certificate) by smtp2.aerasec.de (Postfix) with ESMTP id 610CC1386F; Wed, 26 Feb 2003 07:42:55 +0100 (CET) Date: Wed, 26 Feb 2003 07:42:51 +0100 From: Peter Bieringer To: Ben Greear , eepro list , "'netdev@oss.sgi.com'" Subject: Re: locked up 4-port Intel NIC with RH 8.0 and eepro100 driver Message-ID: <192080000.1046241771@gate.muc.bieringer.de> In-Reply-To: <3E5BE34E.4030500@candelatech.com> References: <3E5BE34E.4030500@candelatech.com> X-Mailer: Mulberry/3.0.1 (Linux/x86) X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 1807 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pb@bieringer.de Precedence: bulk X-list: netdev --On Tuesday, February 25, 2003 01:42:38 PM -0800 Ben Greear wrote: > Seems transmit no longer works out of the second interface... > Only the first two ports have a cable attached, > each to a tg3 GigE NIC with a cross-over cable. It ran for > about 3 days at 50Mbps bi-directional on two ports before locking > up. The machine was at 100% CPU utilization this entire time. After > stopping traffic for 1 minute, and then starting it again, it started > working again. The error I was seeing indicated trying to write to the > socket would block (EAGAIN). For what it's worth, I saw similar problems > on this machine with an rtl8139too when connected to a 10bt hub, but in > that case the rtldiag showed more problems, and lockups happened in 3-8 > hours... Did it work again after reloading the module? I have also found such lockups on a Toshiba laptop with Intel chip on-board, but mostly after some minute. I'm not sure, but because also the wake-on-lan feature in RHL's eepro100 driver is broken, too (see here for more: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=84695 ), I think that the by delivered delivered driver isn't as stable as it should be (perhaps they didn't use the newest sources). Have you tried "e100" driver (unfortunatly needs kernel recompiling, but here e.g. wake-on-lan is working well). Peter -- Dr. Peter Bieringer http://www.bieringer.de/pb/ GPG/PGP Key 0x958F422D mailto: pb at bieringer dot de Deep Space 6 Co-Founder and Core Member http://www.deepspace6.net/ From jmorris@intercode.com.au Wed Feb 26 00:37:42 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 26 Feb 2003 00:37:50 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1Q8bd3v014463 for ; Wed, 26 Feb 2003 00:37:41 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id TAA13771; Wed, 26 Feb 2003 19:33:18 +1100 Date: Wed, 26 Feb 2003 19:33:18 +1100 (EST) From: James Morris To: Christoph Hellwig cc: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" , , , , , , Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 In-Reply-To: <20030225160634.A4525@infradead.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1808 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Tue, 25 Feb 2003, Christoph Hellwig wrote: > Also I really wonder whether we want to add just md5.c to 2.4 or > backport the cryptoapi core with md5 as the only algorithm so far.. Any backport of new cryptoapi is likely to be some way off (after 2.6 stabilizes), so the md5 module submitted for 2.4 is required for the time being. - James -- James Morris From davem@redhat.com Wed Feb 26 01:06:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 26 Feb 2003 01:06:59 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1Q96o3v015183 for ; Wed, 26 Feb 2003 01:06:51 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA15686; Wed, 26 Feb 2003 00:47:26 -0800 Date: Wed, 26 Feb 2003 00:47:26 -0800 (PST) Message-Id: <20030226.004726.22558908.davem@redhat.com> To: jmorris@intercode.com.au Cc: hch@infradead.org, yoshfuji@linux-ipv6.org, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, usagi@linux-ipv6.org Subject: Re: [PATCH] IPv6: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 From: "David S. Miller" In-Reply-To: References: <20030225160634.A4525@infradead.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1809 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: James Morris Date: Wed, 26 Feb 2003 19:33:18 +1100 (EST) On Tue, 25 Feb 2003, Christoph Hellwig wrote: > Also I really wonder whether we want to add just md5.c to 2.4 or > backport the cryptoapi core with md5 as the only algorithm so far.. Any backport of new cryptoapi is likely to be some way off (after 2.6 stabilizes), so the md5 module submitted for 2.4 is required for the time being. This could be accelerated. From eric.louvet@regis-dgac.net Thu Feb 27 00:21:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 00:21:41 -0800 (PST) Received: from relais1.stna.aviation-civile.gouv.fr (relais1.stna.aviation-civile.gouv.fr [143.196.161.40]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1R8LW3v032381 for ; Thu, 27 Feb 2003 00:21:33 -0800 Received: by relais1.stna.aviation-civile.gouv.fr (MTA server DGAC - STNA, from userid 0) id 569506F243; Thu, 27 Feb 2003 09:21:25 +0100 (CET) Received: from cranberries.stna.dgac.fr (localhost.localdomain [127.0.0.1]) by relais1.stna.aviation-civile.gouv.fr (MTA server DGAC - STNA) with ESMTP id 0A4487712E for ; Thu, 27 Feb 2003 09:21:25 +0100 (CET) Received: from regis-dgac.net ([143.196.38.146]) by coors.stna.dgac.fr (Lotus Domino Release 5.0.10) with ESMTP id 2003022709212124:25550 ; Thu, 27 Feb 2003 09:21:21 +0100 Message-ID: <3E5DCA7B.5050507@regis-dgac.net> Date: Thu, 27 Feb 2003 09:21:15 +0100 From: LOUVET Eric ATOS 7SB K236 p5036 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010726 Netscape6/6.1 X-Accept-Language: en-us MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Problem with TCP X-MIMETrack: Itemize by SMTP Server on NS-STNA-03/STNA/DGAC(Release 5.0.10 |March 22, 2002) at 27/02/2003 09:21:21, Serialize by Router on CS-SUD/DGAC(Release 5.07a |May 14, 2001) at 27/02/2003 09:18:21, Serialize complete at 27/02/2003 09:18:21 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii; format=flowed X-archive-position: 1810 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: eric.louvet@regis-dgac.net Precedence: bulk X-list: netdev Hello to all of you ! I 'm sorry to disturb you, it is not a spaming mail ! Alan Cox says me netdev@oss.sgi.com is the place to tell my story. In case you can help me to undestand something, I would like to describe a problem I encounter with TCP, under kernel 2.4.20. Here it is, I use almost successfully the TCP stack in kernel code, (in a streams driver to be precise). the only problem I have is this : I have an established connection, when this connection is local, I kill the client, or stop it, so it close its socket. From the server side I'm alerted of that by the state change callback of my socket, then I detach from it my callbacks fcn -state_change data_ready etc... and call sock_release ..... I observe this : sock_release call tcp_close, the socket is in TCP_LAST_ACK state !!! tcp_close call tcp_send_fin after that call, something as been inserted in the backlog queue of the tcp socket. and then when tcp_close call release_sock, then __release_sock runs because backlog is not empty, I encounter a crash, due to poisonning of free slab block. If poisonning is not active, the kernel crash later at random place .... ;-) remember __release_sock : void __release_sock(struct sock *sk) { struct sk_buff *skb = sk->backlog.head; do { sk->backlog.head = sk->backlog.tail = NULL; <- crashing code ! bh_unlock_sock(sk); do { struct sk_buff *next = skb->next; skb->next = NULL; sk->backlog_rcv(sk, skb); skb = next; } while (skb != NULL); bh_lock_sock(sk); } while((skb = sk->backlog.head) != NULL); } The crash is due to the value of skb, which is 0x5a5a5a5a (poisonned), which seems to indicate that sk has been freed ? How can this be possible ? what are the stuff I forgot to consider ? In fact, another point of matter is this : the crash only occur when the connection is local, when the client is not local, there is no crash. I hope some of you understand my story, sorry for my poor english, I hope you can give me ideas about my problem, that can help me to find what I'm doing wrong. Best regards to you all. Eric. From haveblue@us.ibm.com Thu Feb 27 01:06:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 01:06:43 -0800 (PST) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1R96Y3v002925 for ; Thu, 27 Feb 2003 01:06:35 -0800 Received: from westrelay05.boulder.ibm.com (westrelay05.boulder.ibm.com [9.17.193.33]) by e31.co.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1R96X0Z029172 for ; Thu, 27 Feb 2003 04:06:33 -0500 Received: from nighthawk.sr71.net (sig-9-65-16-211.mts.ibm.com [9.65.16.211]) by westrelay05.boulder.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1R95FJZ175088 for ; Thu, 27 Feb 2003 02:05:16 -0700 Received: from us.ibm.com (dave@nighthawk [127.0.0.1]) by nighthawk.sr71.net (8.12.3/8.12.3/Debian -4) with ESMTP id h1R92WiV028156 for ; Thu, 27 Feb 2003 01:02:33 -0800 Message-ID: <3E5DD427.2070801@us.ibm.com> Date: Thu, 27 Feb 2003 01:02:31 -0800 From: Dave Hansen User-Agent: Mozilla/5.0 (compatible; MSIE5.5; Windows 98; X-Accept-Language: en MIME-Version: 1.0 To: netdev Subject: TCP stalling problems dissipated in 2.5.63 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1811 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: netdev I've been complaining about these problems for a bit, so I thought I'd share some good news. http://marc.theaimsgroup.com/?l=linux-netdev&m=104169568212878&w=2 http://marc.theaimsgroup.com/?t=104343403700001&r=1&w=2 I decided to spend some time tracking down the TCP stalls that I was still seeing with the e1000 on 2.5.59 (probably not the e1000's fault, just the only cards I see it on). I pulled out my trusty old test that I've been using for a few weeks to replicate the problem. Anyway, I can't get it to occur any more. The real test will be to see if Specweb still triggers it, but my Specweb machine is a smoldering pile of slag right now. I have a 4-way PIII-Xeon and an 8-way PIII-Xeon plugged into the same copper gigabit switch. The 4-way runs 4 copies of http-bench.sh, which I've included below. It basically asks the server for a list of files (from a cgi script), then fetches them sequentially. The _sustained_ data rate is 750 Mb/sec with peaks of ~790 Mb/sec. The server runs Apache 2.0.43. During this time, the client is about 80% CPU saturated (shell scripts, what do you expect?), but the server is a much different story. vmstat shows ~1% user time, and only 5-6% kernel. The rest is idle! I have readprofile from a short slice, because oprofile doesn't work on these machines. 126 tcp_write_xmit 0.1703 128 kfree 1.2800 131 __kfree_skb 0.6550 132 ip_rcv 0.1303 142 qdisc_restart 0.3550 150 tcp_transmit_skb 0.1053 156 schedule 0.1566 161 e1000_clean_rx_irq 0.1720 169 kmalloc 1.2426 189 skb_clone 0.4500 215 alloc_skb 0.4594 233 do_generic_mapping_read 0.2709 240 tcp_v4_rcv 0.1268 294 e1000_intr 3.1957 311 e1000_clean_tx_irq 0.7266 353 e1000_xmit_frame 0.2229 370 do_tcp_sendpages 0.1370 408 skb_release_data 2.6154 127279 poll_idle 1515.2262 134275 total 0.0829 httpd-bench.sh: #!/bin/sh SERVER=10.1.1.96 rm -f file_list 2> /dev/null wget -O file_list http://$SERVER/ls.pl cat file_list | \ awk '{print "http://'$SERVER'/" $0}' | \ xargs --max-procs=1 --max-args=20 wget -O /dev/null --progress=dot 2>&1 | grep \'" saved " \ | awk -f speedavg.awk ls.pl from the server (yeah, yeah, I know it's not perl, but I didn't feel like adding another handler) #!/bin/sh echo 'Content-type: text/html' echo find file_set -type f -size +100k -- Dave Hansen haveblue@us.ibm.com From ralf@linux-mips.org Thu Feb 27 11:58:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 11:58:50 -0800 (PST) Received: from dea.linux-mips.net (p508B7BED.dip.t-dialin.net [80.139.123.237]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RJwkeA012788 for ; Thu, 27 Feb 2003 11:58:47 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.6/8.11.6) id h1RJwh617546 for netdev@oss.sgi.com; Thu, 27 Feb 2003 20:58:43 +0100 Resent-Message-Id: <200302271958.h1RJwh617546@dea.linux-mips.net> Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 11:16:52 -0800 (PST) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RJGleA011690 for ; Thu, 27 Feb 2003 11:16:47 -0800 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h1RJGjw31760; Thu, 27 Feb 2003 11:16:45 -0800 Date: Thu, 27 Feb 2003 11:12:12 -0800 From: "Randy.Dunlap" To: linux-net@vger.kernel.org Cc: netdev@oss.sgi.com Subject: linux_mib docs? Message-Id: <20030227111212.40388562.rddunlap@osdl.org> Organization: OSDL X-Mailer: Sylpheed version 0.8.6 (GTK+ 1.2.10; i586-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1813 X-ecartis-version: Ecartis v1.0.0 X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev Resent-From: ralf@linux-mips.org Resent-Date: Thu, 27 Feb 2003 20:58:43 +0100 Resent-To: netdev@oss.sgi.com X-archive-position: 1814 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev Hi, Is there a MIB definition for struct linux_mib in include/net/snmp.h ? Thanks, -- ~Randy From cfriesen@nortelnetworks.com Thu Feb 27 12:09:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 12:09:52 -0800 (PST) Received: from zcars04e.nortelnetworks.com (zcars04e.nortelnetworks.com [47.129.242.56]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RK9jeA013300 for ; Thu, 27 Feb 2003 12:09:46 -0800 Received: from zcard307.ca.nortel.com (zcard307.ca.nortel.com [47.129.242.67]) by zcars04e.nortelnetworks.com (Switch-2.2.0/Switch-2.2.0) with ESMTP id h1RK9bx19082; Thu, 27 Feb 2003 15:09:38 -0500 (EST) Received: from zcard0k6.ca.nortel.com ([47.129.242.158]) by zcard307.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FGRX5FK6; Thu, 27 Feb 2003 15:09:38 -0500 Received: from pcard0ks.ca.nortel.com ([47.129.117.131]) by zcard0k6.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FSL7YSPC; Thu, 27 Feb 2003 15:09:38 -0500 Received: from nortelnetworks.com (localhost.localdomain [127.0.0.1]) by pcard0ks.ca.nortel.com (Postfix) with ESMTP id 55C1E2E132; Thu, 27 Feb 2003 15:09:37 -0500 (EST) Message-ID: <3E5E7081.6020704@nortelnetworks.com> Date: Thu, 27 Feb 2003 15:09:37 -0500 X-Sybari-Space: 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: anyone ever done multicast AF_UNIX sockets? Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1815 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortelnetworks.com Precedence: bulk X-list: netdev It is fairly common to want to distribute information between a single sender and multiple receivers on a single box. Multicast IP sockets are one possibility, but then you have additional overhead in the IP stack. Unix sockets are more efficient and give notification if the listener is not present, but the problem then becomes that you must do one syscall for each listener. So, here's my main point--has anyone ever considered the concept of multicast AF_UNIX sockets? The main features would be: --ability to associate/disassociate a socket with a multicast address --ability to associate/disassociate with all multicast addresses (possibly through some kind of raw socket thing, or maybe a simple wildcard multicast address) --on process death all sockets owned by that process are disassociated from any multicast addresses that they were associated with --on sending a packet to a multicast address and there are no sockets associated with it, return -1 with errno=ECONNREFUSED The association/disassociation could be done using the setsockopt() calls the same as with udp sockets, everything else would be the same from a userspace perspective. Any thoughts? How hard would this be to put in? Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com From ak@suse.de Thu Feb 27 12:17:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 12:17:04 -0800 (PST) Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RKGweA013735 for ; Thu, 27 Feb 2003 12:16:59 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 4C39614ABD; Thu, 27 Feb 2003 21:16:53 +0100 (MET) Date: Thu, 27 Feb 2003 21:16:51 +0100 From: Andi Kleen To: "Randy.Dunlap" Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: linux_mib docs? Message-ID: <20030227201651.GA24698@wotan.suse.de> References: <20030227111212.40388562.rddunlap@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030227111212.40388562.rddunlap@osdl.org> X-archive-position: 1816 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev On Thu, Feb 27, 2003 at 11:12:12AM -0800, Randy.Dunlap wrote: > Hi, > > Is there a MIB definition for struct linux_mib in > include/net/snmp.h ? Not that I'm aware of, but netstat -s has (very) brief descriptions. -Andi From latten@austin.ibm.com Thu Feb 27 13:29:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 13:29:38 -0800 (PST) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RLTXeA024047 for ; Thu, 27 Feb 2003 13:29:34 -0800 Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1RLTROP058248; Thu, 27 Feb 2003 16:29:27 -0500 Received: from austin.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1RLTPS6244978; Thu, 27 Feb 2003 14:29:26 -0700 Received: from faith.austin.ibm.com (faith.austin.ibm.com [9.41.94.16]) by austin.ibm.com (8.12.6/8.12.6) with ESMTP id h1RLTPVO015456; Thu, 27 Feb 2003 15:29:25 -0600 Received: (from jml@localhost) by faith.austin.ibm.com (AIX5.1/8.11.0/8.11.0-client1.01) id h1RLTJW28434; Thu, 27 Feb 2003 15:29:19 -0600 Date: Thu, 27 Feb 2003 15:29:19 -0600 From: latten@austin.ibm.com Message-Id: <200302272129.h1RLTJW28434@faith.austin.ibm.com> To: davem@redhat.com, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: PATCH: IPSec not using padding when Null Encryption X-archive-position: 1817 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: latten@austin.ibm.com Precedence: bulk X-list: netdev Hi, When using the Null Encryption algorithm, the ESP packet is not on a 4-byte boundary. That is, the ciphertext, pad-length and next-header fields are not right aligned on a 4-byte boundary and no padding is used to ensure it. RFC 2406, section 2.4 states irrespective of encryption algorithm requirements, padding may be required to ensure that resulting ciphertext terminates on a 4-byte boundary. Specifically, the Pad Length and Next Header fields must be right aligned within a 4-byte word to ensure that the Authentication Data field (if present) is aligned on a 4-byte boundary. Ok, anyway, this fix just pretty much makes sure that when Null Encryption or any algorithm with a blocksize less than 4 is used, that the ciphertext, any padding, and next-header and pad-length fields terminate on a 4-byte boundary. I have tested it. Please let me know if all is well. Thanks, Joy --- esp.c.orig 2003-02-20 16:07:59.000000000 -0600 +++ esp.c 2003-02-27 10:30:25.000000000 -0600 @@ -360,7 +360,7 @@ esp = x->data; alen = esp->auth.icv_trunc_len; tfm = esp->conf.tfm; - blksize = crypto_tfm_alg_blocksize(tfm); + blksize = (crypto_tfm_alg_blocksize(tfm) + 3) & ~3; clen = (clen + 2 + blksize-1)&~(blksize-1); if (esp->conf.padlen) clen = (clen + esp->conf.padlen-1)&~(esp->conf.padlen-1); From yoshfuji@linux-ipv6.org Thu Feb 27 14:00:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 14:00:08 -0800 (PST) Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.139.94]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RM01eA024927 for ; Thu, 27 Feb 2003 14:00:02 -0800 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.12.3+3.5Wbeta/8.12.3/Debian -4) with ESMTP id h1RLxin1004344; Fri, 28 Feb 2003 06:59:44 +0900 Date: Fri, 28 Feb 2003 06:59:44 +0900 (JST) Message-Id: <20030228.065944.08980219.yoshfuji@linux-ipv6.org> To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com CC: davem@redhat.com, kuznet@ms2.inr.ac.ru, usagi@linux-ipv6.org Subject: [PATCH] Use C99 initializers in net/ipv6 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1818 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Hi, This convers net/ipv6/{addrconf,route,sit}.c files to use C99 initializers. We don't touch net/ipv6/exthdrs.c for now because it will conflicts with our patch for IPsec. Thanks in advance. ------------------------------------------------------------------- Patch-Name: Use C99 initializers in net/ipv6 Patch-Id: FIX_2_5_63_C99_CLEANUP-20030228 Patch-Author: YOSHIFUJI Hideaki / USAGI Project Credit: YOSHIFUJI Hideaki / USAGI Project ------------------------------------------------------------------- Index: net/ipv6/addrconf.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/addrconf.c,v retrieving revision 1.1.1.6 retrieving revision 1.1.1.6.4.1 diff -u -r1.1.1.6 -r1.1.1.6.4.1 --- net/ipv6/addrconf.c 25 Feb 2003 05:33:26 -0000 1.1.1.6 +++ net/ipv6/addrconf.c 27 Feb 2003 21:31:11 -0000 1.1.1.6.4.1 @@ -2288,75 +2288,163 @@ ctl_table addrconf_proto_dir[2]; ctl_table addrconf_root_dir[2]; } addrconf_sysctl = { - NULL, - {{NET_IPV6_FORWARDING, "forwarding", - &ipv6_devconf.forwarding, sizeof(int), 0644, NULL, - &addrconf_sysctl_forward}, - - {NET_IPV6_HOP_LIMIT, "hop_limit", - &ipv6_devconf.hop_limit, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_MTU, "mtu", - &ipv6_devconf.mtu6, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_ACCEPT_RA, "accept_ra", - &ipv6_devconf.accept_ra, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_ACCEPT_REDIRECTS, "accept_redirects", - &ipv6_devconf.accept_redirects, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_AUTOCONF, "autoconf", - &ipv6_devconf.autoconf, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_DAD_TRANSMITS, "dad_transmits", - &ipv6_devconf.dad_transmits, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_RTR_SOLICITS, "router_solicitations", - &ipv6_devconf.rtr_solicits, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_RTR_SOLICIT_INTERVAL, "router_solicitation_interval", - &ipv6_devconf.rtr_solicit_interval, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies}, - - {NET_IPV6_RTR_SOLICIT_DELAY, "router_solicitation_delay", - &ipv6_devconf.rtr_solicit_delay, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies}, - + .sysctl_header = NULL, + .addrconf_vars = { + { + .ctl_name = NET_IPV6_FORWARDING, + .procname = "forwarding", + .data = &ipv6_devconf.forwarding, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &addrconf_sysctl_forward, + }, + { + .ctl_name = NET_IPV6_HOP_LIMIT, + .procname = "hop_limit", + .data = &ipv6_devconf.hop_limit, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_IPV6_MTU, + .procname = "mtu", + .data = &ipv6_devconf.mtu6, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ACCEPT_RA, + .procname = "accept_ra", + .data = &ipv6_devconf.accept_ra, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ACCEPT_REDIRECTS, + .procname = "accept_redirects", + .data = &ipv6_devconf.accept_redirects, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_AUTOCONF, + .procname = "autoconf", + .data = &ipv6_devconf.autoconf, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_DAD_TRANSMITS, + .procname = "dad_transmits", + .data = &ipv6_devconf.dad_transmits, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_RTR_SOLICITS, + .procname = "router_solicitations", + .data = &ipv6_devconf.rtr_solicits, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_RTR_SOLICIT_INTERVAL, + .procname = "router_solicitation_interval", + .data = &ipv6_devconf.rtr_solicit_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + }, + { + .ctl_name = NET_IPV6_RTR_SOLICIT_DELAY, + .procname = "router_solicitation_delay", + .data = &ipv6_devconf.rtr_solicit_delay, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + }, #ifdef CONFIG_IPV6_PRIVACY - {NET_IPV6_USE_TEMPADDR, "use_tempaddr", - &ipv6_devconf.use_tempaddr, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_TEMP_VALID_LFT, "temp_valid_lft", - &ipv6_devconf.temp_valid_lft, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_TEMP_PREFERED_LFT, "temp_prefered_lft", - &ipv6_devconf.temp_prefered_lft, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_REGEN_MAX_RETRY, "regen_max_retry", - &ipv6_devconf.regen_max_retry, sizeof(int), 0644, NULL, - &proc_dointvec}, - - {NET_IPV6_MAX_DESYNC_FACTOR, "max_desync_factor", - &ipv6_devconf.max_desync_factor, sizeof(int), 0644, NULL, - &proc_dointvec}, + { + .ctl_name = NET_IPV6_USE_TEMPADDR, + .procname = "use_tempaddr", + .data = &ipv6_devconf.use_tempaddr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_TEMP_VALID_LFT, + .procname = "temp_valid_lft", + .data = &ipv6_devconf.temp_valid_lft, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_TEMP_PREFERED_LFT, + .procname = "temp_prefered_lft", + .data = &ipv6_devconf.temp_prefered_lft, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_REGEN_MAX_RETRY, + .procname = "regen_max_retry", + .data = &ipv6_devconf.regen_max_retry, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_MAX_DESYNC_FACTOR, + .procname = "max_desync_factor", + .data = &ipv6_devconf.max_desync_factor, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, #endif - - {0}}, - - {{NET_PROTO_CONF_ALL, "all", NULL, 0, 0555, addrconf_sysctl.addrconf_vars},{0}}, - {{NET_IPV6_CONF, "conf", NULL, 0, 0555, addrconf_sysctl.addrconf_dev},{0}}, - {{NET_IPV6, "ipv6", NULL, 0, 0555, addrconf_sysctl.addrconf_conf_dir},{0}}, - {{CTL_NET, "net", NULL, 0, 0555, addrconf_sysctl.addrconf_proto_dir},{0}} + }, + .addrconf_dev = { + { + .ctl_name = NET_PROTO_CONF_ALL, + .procname = "all", + .mode = 0555, + .child = addrconf_sysctl.addrconf_vars, + }, + }, + .addrconf_conf_dir = { + { + .ctl_name = NET_IPV6_CONF, + .procname = "conf", + .mode = 0555, + .child = addrconf_sysctl.addrconf_dev, + }, + }, + .addrconf_proto_dir = { + { + .ctl_name = NET_IPV6, + .procname = "ipv6", + .mode = 0555, + .child = addrconf_sysctl.addrconf_conf_dir, + }, + }, + .addrconf_root_dir = { + { + .ctl_name = CTL_NET, + .procname = "net", + .mode = 0555, + .child = addrconf_sysctl.addrconf_proto_dir, + }, + }, }; static void addrconf_sysctl_register(struct inet6_dev *idev, struct ipv6_devconf *p) Index: net/ipv6/route.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/route.c,v retrieving revision 1.1.1.7 retrieving revision 1.1.1.7.4.1 diff -u -r1.1.1.7 -r1.1.1.7.4.1 --- net/ipv6/route.c 25 Feb 2003 05:33:26 -0000 1.1.1.7 +++ net/ipv6/route.c 27 Feb 2003 21:31:11 -0000 1.1.1.7.4.1 @@ -1777,34 +1777,84 @@ } ctl_table ipv6_route_table[] = { - {NET_IPV6_ROUTE_FLUSH, "flush", - &flush_delay, sizeof(int), 0644, NULL, - &ipv6_sysctl_rtcache_flush}, - {NET_IPV6_ROUTE_GC_THRESH, "gc_thresh", - &ip6_dst_ops.gc_thresh, sizeof(int), 0644, NULL, - &proc_dointvec}, - {NET_IPV6_ROUTE_MAX_SIZE, "max_size", - &ip6_rt_max_size, sizeof(int), 0644, NULL, - &proc_dointvec}, - {NET_IPV6_ROUTE_GC_MIN_INTERVAL, "gc_min_interval", - &ip6_rt_gc_min_interval, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {NET_IPV6_ROUTE_GC_TIMEOUT, "gc_timeout", - &ip6_rt_gc_timeout, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {NET_IPV6_ROUTE_GC_INTERVAL, "gc_interval", - &ip6_rt_gc_interval, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {NET_IPV6_ROUTE_GC_ELASTICITY, "gc_elasticity", - &ip6_rt_gc_elasticity, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {NET_IPV6_ROUTE_MTU_EXPIRES, "mtu_expires", - &ip6_rt_mtu_expires, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {NET_IPV6_ROUTE_MIN_ADVMSS, "min_adv_mss", - &ip6_rt_min_advmss, sizeof(int), 0644, NULL, - &proc_dointvec_jiffies, &sysctl_jiffies}, - {0} + { + .ctl_name = NET_IPV6_ROUTE_FLUSH, + .procname = "flush", + .data = &flush_delay, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &ipv6_sysctl_rtcache_flush + }, + { + .ctl_name = NET_IPV6_ROUTE_GC_THRESH, + .procname = "gc_thresh", + .data = &ip6_dst_ops.gc_thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ROUTE_MAX_SIZE, + .procname = "max_size", + .data = &ip6_rt_max_size, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ROUTE_GC_MIN_INTERVAL, + .procname = "gc_min_interval", + .data = &ip6_rt_gc_min_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, + { + .ctl_name = NET_IPV6_ROUTE_GC_TIMEOUT, + .procname = "gc_timeout", + .data = &ip6_rt_gc_timeout, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, + { + .ctl_name = NET_IPV6_ROUTE_GC_INTERVAL, + .procname = "gc_interval", + .data = &ip6_rt_gc_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, + { + .ctl_name = NET_IPV6_ROUTE_GC_ELASTICITY, + .procname = "gc_elasticity", + .data = &ip6_rt_gc_elasticity, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, + { + .ctl_name = NET_IPV6_ROUTE_MTU_EXPIRES, + .procname = "mtu_expires", + .data = &ip6_rt_mtu_expires, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, + { + .ctl_name = NET_IPV6_ROUTE_MIN_ADVMSS, + .procname = "min_adv_mss", + .data = &ip6_rt_min_advmss, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, }; #endif Index: net/ipv6/sit.c =================================================================== RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/sit.c,v retrieving revision 1.1.1.6 retrieving revision 1.1.1.6.4.1 diff -u -r1.1.1.6 -r1.1.1.6.4.1 --- net/ipv6/sit.c 25 Feb 2003 05:33:26 -0000 1.1.1.6 +++ net/ipv6/sit.c 27 Feb 2003 21:31:11 -0000 1.1.1.6.4.1 @@ -68,7 +68,8 @@ }; static struct ip_tunnel ipip6_fb_tunnel = { - NULL, &ipip6_fb_tunnel_dev, {0, }, 0, 0, 0, 0, 0, 0, 0, {"sit0", } + .dev = &ipip6_fb_tunnel_dev, + .parms = { .name = "sit0" } }; static struct ip_tunnel *tunnels_r_l[HASH_SIZE]; -- Hideaki YOSHIFUJI @ USAGI Project GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From toml@us.ibm.com Thu Feb 27 14:00:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 14:00:46 -0800 (PST) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1RM0heA025256 for ; Thu, 27 Feb 2003 14:00:44 -0800 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e3.ny.us.ibm.com (8.12.7/8.12.2) with ESMTP id h1RM01J8049876; Thu, 27 Feb 2003 17:00:02 -0500 Received: from d01ml072.pok.ibm.com (d01ml072.pok.ibm.com [9.117.250.211]) by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.5) with ESMTP id h1RLxqNG172424; Thu, 27 Feb 2003 16:59:54 -0500 Subject: IPSec: setkey -DP freezes machine To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Tom Lendacky" Date: Thu, 27 Feb 2003 15:59:13 -0600 X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 02/27/2003 04:59:54 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-archive-position: 1819 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: toml@us.ibm.com Precedence: bulk X-list: netdev I found the reason for the hang problem when issuing the "setkey -DP" command while racoon is running. The racoon program sets a socket option on the socket(s) it listens on. The socket options are effectively "in bypass" and "out bypass" for the IP_IPSEC_POLICY option name. The af_key.c/pfkey_compile_policy function is ultimately invoked to create an xfrm_policy structure. The xfrm_policy structure's family value is not set (since this information is not available to pfkey_compile_policy). The xfrm_policy structure is then added to the xfrm_policy_list[] array by calling xfrm_policy.c/xfrm_sk_policy_insert. When the "setkey -DP" command is issued, the list of policies is walked and translated from the xfrm_policy structure to sadb_ messages by af_key.c/pfkey_xfrm_policy2msg. A change was added in 2.5.61 so that if the xfrm_policy family is not AF_INET or AF_INET6 then BUG() is executed. Since it is zero, BUG() is executed. This can be fixed in xfrm_state.c/xfrm_user_policy by assigning the socket family (the sock structure is an argument provided to xfrm_user_policy) to the xfrm_policy family before calling xfrm_sk_policy_insert. But, in the case of IP_XFRM_POLICY the xfrm_user.c, xfrm_compile_policy function sets the xfrm_policy family. And in the future, other "compile_policy" functions may be added. So for the fix, would it be preferable to have the xfrm_policy family always be assigned the socket family value or should it retain the current setting and only be set to the socket family value if the current value is 0 (AF_UNSPEC)? Tom From greg.daley@eng.monash.edu.au Thu Feb 27 16:31:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 16:31:41 -0800 (PST) Received: from ALPHA6.ITS.MONASH.EDU.AU (alpha6.its.monash.edu.au [130.194.1.25]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1S0VaeA023389 for ; Thu, 27 Feb 2003 16:31:37 -0800 Received: from thwack.its.monash.edu.au ([130.194.1.72]) by vaxc.its.monash.edu.au (PMDF V6.1 #39306) with ESMTP id <01KSYSTM5T8695OGAI@vaxc.its.monash.edu.au> for netdev@oss.sgi.com; Fri, 28 Feb 2003 09:22:11 +1100 Received: from thwack.its.monash.edu.au (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id 036FE12C00F; Fri, 28 Feb 2003 09:22:10 +1100 (EST) Received: from eng.monash.edu.au (knuth.eng.monash.edu.au [130.194.252.110]) by thwack.its.monash.edu.au (Postfix) with ESMTP id 828A612C018; Fri, 28 Feb 2003 09:21:38 +1100 (EST) Date: Fri, 28 Feb 2003 09:21:38 +1100 From: Greg Daley Subject: Re: anyone ever done multicast AF_UNIX sockets? To: Chris Friesen Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Reply-to: greg.daley@eng.monash.edu.au Message-id: <3E5E8F72.2080206@eng.monash.edu.au> Organization: Monash University MIME-version: 1.0 Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: en, en-us User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529 References: <3E5E7081.6020704@nortelnetworks.com> X-archive-position: 1820 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg.daley@eng.monash.edu.au Precedence: bulk X-list: netdev Hi Chris, Please check out the uml_switch written by jeff dike for Mser Mode Linux. It is a user-space program which emultates an ethernet switch (or hub). It emulates link-layer multicast on UNIX domain sockets. Greg Daley Chris Friesen wrote: > > It is fairly common to want to distribute information between a single > sender and multiple receivers on a single box. > > Multicast IP sockets are one possibility, but then you have additional > overhead in the IP stack. > > Unix sockets are more efficient and give notification if the listener is > not present, but the problem then becomes that you must do one syscall > for each listener. > > So, here's my main point--has anyone ever considered the concept of > multicast AF_UNIX sockets? > > The main features would be: > --ability to associate/disassociate a socket with a multicast address > --ability to associate/disassociate with all multicast addresses > (possibly through some kind of raw socket thing, or maybe a simple > wildcard multicast address) > --on process death all sockets owned by that process are disassociated > from any multicast addresses that they were associated with > --on sending a packet to a multicast address and there are no sockets > associated with it, return -1 with errno=ECONNREFUSED > > The association/disassociation could be done using the setsockopt() > calls the same as with udp sockets, everything else would be the same > from a userspace perspective. > > Any thoughts? How hard would this be to put in? > > Chris > > From jmorris@intercode.com.au Thu Feb 27 17:01:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 27 Feb 2003 17:01:42 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1S11beA023951 for ; Thu, 27 Feb 2003 17:01:39 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id MAA24645; Fri, 28 Feb 2003 12:01:10 +1100 Date: Fri, 28 Feb 2003 12:01:10 +1100 (EST) From: James Morris To: latten@austin.ibm.com cc: davem@redhat.com, , , Subject: Re: PATCH: IPSec not using padding when Null Encryption In-Reply-To: <200302272129.h1RLTJW28434@faith.austin.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1821 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 27 Feb 2003 latten@austin.ibm.com wrote: > I have tested it. Please let me know if all is well. Looks fine to me. (Perhaps change the name of the blksize variable to padto or similar, in case someone later thinks it's the real block size). > --- esp.c.orig 2003-02-20 16:07:59.000000000 -0600 > +++ esp.c 2003-02-27 10:30:25.000000000 -0600 > @@ -360,7 +360,7 @@ > esp = x->data; > alen = esp->auth.icv_trunc_len; > tfm = esp->conf.tfm; > - blksize = crypto_tfm_alg_blocksize(tfm); > + blksize = (crypto_tfm_alg_blocksize(tfm) + 3) & ~3; > clen = (clen + 2 + blksize-1)&~(blksize-1); > if (esp->conf.padlen) > clen = (clen + esp->conf.padlen-1)&~(esp->conf.padlen-1); > -- James Morris From hadi@cyberus.ca Fri Feb 28 05:28:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Feb 2003 05:28:13 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1SDS9eA014563 for ; Fri, 28 Feb 2003 05:28:10 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18okYa-000Mc0-00; Fri, 28 Feb 2003 08:28:08 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1SDRiYO054773; Fri, 28 Feb 2003 08:27:44 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1SDRhMc054770; Fri, 28 Feb 2003 08:27:43 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 28 Feb 2003 08:27:43 -0500 (EST) From: jamal To: Andi Kleen cc: "Randy.Dunlap" , "" , "" Subject: Re: linux_mib docs? In-Reply-To: <20030227201651.GA24698@wotan.suse.de> Message-ID: <20030228082235.F53276@shell.cyberus.ca> References: <20030227111212.40388562.rddunlap@osdl.org> <20030227201651.GA24698@wotan.suse.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1822 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Thu, 27 Feb 2003, Andi Kleen wrote: > On Thu, Feb 27, 2003 at 11:12:12AM -0800, Randy.Dunlap wrote: > > Hi, > > > > Is there a MIB definition for struct linux_mib in > > include/net/snmp.h ? > > Not that I'm aware of, but netstat -s has (very) brief descriptions. > If there was a MIB where that would fit it would be called the "linux private network" MIB. This is pretty common practise when you have additional things that dont fit the standards. The rest of the objects in that file are typically exported to RFC1213. cheers, jamal From hadi@cyberus.ca Fri Feb 28 05:34:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Feb 2003 05:34:15 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1SDYCeA014943 for ; Fri, 28 Feb 2003 05:34:12 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18okeS-000NQD-00; Fri, 28 Feb 2003 08:34:12 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h1SDXnYO054782; Fri, 28 Feb 2003 08:33:49 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h1SDXnmr054779; Fri, 28 Feb 2003 08:33:49 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 28 Feb 2003 08:33:48 -0500 (EST) From: jamal To: Chris Friesen cc: linux-kernel@vger.kernel.org, "" , "" Subject: Re: anyone ever done multicast AF_UNIX sockets? In-Reply-To: <3E5E7081.6020704@nortelnetworks.com> Message-ID: <20030228083009.Y53276@shell.cyberus.ca> References: <3E5E7081.6020704@nortelnetworks.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1823 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Thu, 27 Feb 2003, Chris Friesen wrote: > > It is fairly common to want to distribute information between a single > sender and multiple receivers on a single box. > > Multicast IP sockets are one possibility, but then you have additional > overhead in the IP stack. > I think this is a _very weak_ reason. Without addressing any of your other arguements, can you describe what such painful overhead you are talking about? Did you do any measurements and under what circumstances are unix sockets vs say localhost bound udp sockets are different? I am not looking for hand waving reason of "but theres an IP stack". cheers, jamal From cfriesen@nortelnetworks.com Fri Feb 28 06:39:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Feb 2003 06:39:57 -0800 (PST) Received: from zcars04e.nortelnetworks.com (zcars04e.nortelnetworks.com [47.129.242.56]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1SEdreA020004 for ; Fri, 28 Feb 2003 06:39:54 -0800 Received: from zcard307.ca.nortel.com (zcard307.ca.nortel.com [47.129.242.67]) by zcars04e.nortelnetworks.com (Switch-2.2.0/Switch-2.2.0) with ESMTP id h1SEdA429113; Fri, 28 Feb 2003 09:39:11 -0500 (EST) Received: from zcard0k6.ca.nortel.com ([47.129.242.158]) by zcard307.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FGRX7G1P; Fri, 28 Feb 2003 09:39:11 -0500 Received: from pcard0ks.ca.nortel.com ([47.129.117.131]) by zcard0k6.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FSL7YT6F; Fri, 28 Feb 2003 09:39:11 -0500 Received: from nortelnetworks.com (localhost.localdomain [127.0.0.1]) by pcard0ks.ca.nortel.com (Postfix) with ESMTP id 62F3D2E132; Fri, 28 Feb 2003 09:39:10 -0500 (EST) Message-ID: <3E5F748E.2080605@nortelnetworks.com> Date: Fri, 28 Feb 2003 09:39:10 -0500 X-Sybari-Space: 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: jamal Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: anyone ever done multicast AF_UNIX sockets? References: <3E5E7081.6020704@nortelnetworks.com> <20030228083009.Y53276@shell.cyberus.ca> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1824 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortelnetworks.com Precedence: bulk X-list: netdev jamal wrote: > > On Thu, 27 Feb 2003, Chris Friesen wrote: >>It is fairly common to want to distribute information between a single >>sender and multiple receivers on a single box. >>Multicast IP sockets are one possibility, but then you have additional >>overhead in the IP stack. > I think this is a _very weak_ reason. > Without addressing any of your other arguements, can you describe what > such painful overhead you are talking about? Did you do any measurements > and under what circumstances are unix sockets vs say localhost bound > udp sockets are different? I am not looking for hand waving reason of > "but theres an IP stack". From lmbench local communication tests: This is a multiproc 1GHz G4 Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- pcary0z0. Linux 2.4.18- 0.600 3.756 6.58 10.2 26.4 13.8 36.9 599K pcary0z0. Linux 2.4.18- 0.590 3.766 6.43 10.1 26.7 13.9 37.2 59.1 This is a 400MHz uniproc G4 Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- zcarm0pd. Linux 2.2.17- 1.710 9.888 21.3 26.4 59.4 43.0 105.4 146. zcarm0pd. Linux 2.2.17- 1.740 9.866 22.2 26.3 60.4 43.1 106.7 147. This is a 1.8GHz P4 Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- pcard0ks. Linux 2.4.18- 1.740 10.4 15.9 20.1 33.1 23.5 44.3 72.7 pcard0ks. Linux 2.4.18- 10.3 16.1 19.8 36.3 22.8 43.6 74.1 pcard0ks. Linux 2.4.18- 1.560 10.6 16.0 23.4 38.1 36.1 44.6 77.4 From these numbers, UDP has 18%-44% higher latency than AF_UNIX, with the difference going up as the machine speed goes up. Aside from that, IP multicast doesn't seem to work properly. I enabled multicast on lo and disabled it on eth0, and a ping to 224.0.0.1 still got responses from all the multicast-capable hosts on the network. From userspace, multicast unix would be *simple* to use, as in totally transparent. The other reason why I would like to see this happen is that it just makes *sense*, at least to me. We've got multicast IP, so multicast unix for local machine access is a logical extension in my books. Do we agree at least that some form of multicast is the logical solution to the case of one sender/many listeners? Thanks for your thoughts, Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com From jmorris@intercode.com.au Fri Feb 28 08:01:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Feb 2003 08:01:23 -0800 (PST) Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h1SG1BeA022034 for ; Fri, 28 Feb 2003 08:01:15 -0800 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id DAA27848; Sat, 1 Mar 2003 03:01:04 +1100 Date: Sat, 1 Mar 2003 03:01:04 +1100 (EST) From: James Morris To: Tom Lendacky cc: netdev@oss.sgi.com, , Subject: Re: IPSec: setkey -DP freezes machine In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1825 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@intercode.com.au Precedence: bulk X-list: netdev On Thu, 27 Feb 2003, Tom Lendacky wrote: > So for the fix, would it be preferable to have the xfrm_policy family > always be assigned the socket family value or should it retain the current > setting and only be set to the socket family value if the current value is > 0 (AF_UNSPEC)? The first may be necessary, as the family field is needed along the following path: pfkey_compile_policy() -> parse_ipsecrequests() -> parse_ipsecrequest() { ... if (t->mode) { switch (xp->family) { ... } In the code snippet above, xp->family will be zero as xp was allocated in pfkey_compile_policy() and not set after being zeroed. This is assuming we want to be able to set tunnel mode on a socket (which is supported in some implementations e.g. Solaris, and can be very useful). If so, it would be good if we could make use of half of the sadb_x_policy_reserved2 field to carry the socket family value, and copy it during pfkey_compile_policy(). Alternatively, a family parameter could be added to the compile_policy() operation, but this duplicates data already present in our native xfrm_userpolicy_info format. - James -- James Morris From hadi@cyberus.ca Fri Feb 28 19:19:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Feb 2003 19:19:18 -0800 (PST) Received: from mx02.cyberus.ca (mx02.cyberus.ca [216.191.240.26]) by oss.sgi.com (8.12.5/8.12.5) with SMTP id h213JFeA029161 for ; Fri, 28 Feb 2003 19:19:16 -0800 Received: from shell.cyberus.ca ([216.191.240.114]) by mx02.cyberus.ca with esmtp (Exim 4.10) id 18oxWt-0005Fa-00; Fri, 28 Feb 2003 22:19:15 -0500 Received: from shell.cyberus.ca (localhost.cyberus.ca [127.0.0.1]) by shell.cyberus.ca (8.12.6/8.12.6) with ESMTP id h213IpYO057342; Fri, 28 Feb 2003 22:18:51 -0500 (EST) (envelope-from hadi@cyberus.ca) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.12.6/8.12.6/Submit) with ESMTP id h213IpeM057339; Fri, 28 Feb 2003 22:18:51 -0500 (EST) (envelope-from hadi@cyberus.ca) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 28 Feb 2003 22:18:51 -0500 (EST) From: jamal To: Chris Friesen cc: linux-kernel@vger.kernel.org, "" , "" Subject: Re: anyone ever done multicast AF_UNIX sockets? In-Reply-To: <3E5F748E.2080605@nortelnetworks.com> Message-ID: <20030228212309.C57212@shell.cyberus.ca> References: <3E5E7081.6020704@nortelnetworks.com> <20030228083009.Y53276@shell.cyberus.ca> <3E5F748E.2080605@nortelnetworks.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1826 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 28 Feb 2003, Chris Friesen wrote: > From lmbench local communication tests: > > This is a multiproc 1GHz G4 > Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP > ctxsw UNIX UDP TCP conn > --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- > pcary0z0. Linux 2.4.18- 0.600 3.756 6.58 10.2 26.4 13.8 36.9 599K > pcary0z0. Linux 2.4.18- 0.590 3.766 6.43 10.1 26.7 13.9 37.2 59.1 > > > This is a 400MHz uniproc G4 > Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP > ctxsw UNIX UDP TCP conn > --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- > zcarm0pd. Linux 2.2.17- 1.710 9.888 21.3 26.4 59.4 43.0 105.4 146. > zcarm0pd. Linux 2.2.17- 1.740 9.866 22.2 26.3 60.4 43.1 106.7 147. > > This is a 1.8GHz P4 > Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP > ctxsw UNIX UDP TCP conn > --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- > pcard0ks. Linux 2.4.18- 1.740 10.4 15.9 20.1 33.1 23.5 44.3 72.7 > pcard0ks. Linux 2.4.18- 10.3 16.1 19.8 36.3 22.8 43.6 74.1 > pcard0ks. Linux 2.4.18- 1.560 10.6 16.0 23.4 38.1 36.1 44.6 77.4 > > > From these numbers, UDP has 18%-44% higher latency than AF_UNIX, with > the difference going up as the machine speed goes up. > Did you also measure throughput? You are overlooking the flexibility that already exists in IP based transports as an advantage; the fact that you can make them distributed instead of localized with a simple addressing change is a very powerful abstraction. > Aside from that, IP multicast doesn't seem to work properly. I enabled > multicast on lo and disabled it on eth0, and a ping to 224.0.0.1 still > got responses from all the multicast-capable hosts on the network. I think you may have something misconfigured. > From > userspace, multicast unix would be *simple* to use, as in totally > transparent. > You could implement the abstraction in user space as a library today by having some server that muxes to several registered clients. > The other reason why I would like to see this happen is that it just > makes *sense*, at least to me. We've got multicast IP, so multicast > unix for local machine access is a logical extension in my books. > So whats the addressing scheme for multicast unix? Would it be a reserved path? I am actually indifferent: You could do this in user space for starters. See if it buys you anything. Maybe you could do somethign clever with passing unix file descriptors around to avoid a single server point of failure etc. > Do we agree at least that some form of multicast is the logical solution > to the case of one sender/many listeners? > Thats what mcast definition is. You need to weigh your options; cost is probably worth the flexibility you get with sockets. cheers, jamal