From ahu@outpost.ds9a.nl Fri Apr 1 01:01:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:01:26 -0800 (PST) Received: from outpost.ds9a.nl (postfix@outpost.ds9a.nl [213.244.168.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3191Ju3017215 for ; Fri, 1 Apr 2005 01:01:19 -0800 Received: by outpost.ds9a.nl (Postfix, from userid 1000) id CE4E33FC3; Fri, 1 Apr 2005 11:01:16 +0200 (CEST) Date: Fri, 1 Apr 2005 11:01:16 +0200 From: bert hubert To: Ben Greear Cc: hadi@cyberus.ca, "David S. Miller" , netdev Subject: Re: RFC: Redirect-Device Message-ID: <20050401090116.GA21361@outpost.ds9a.nl> Mail-Followup-To: bert hubert , Ben Greear , hadi@cyberus.ca, "David S. Miller" , netdev References: <424C6089.1080507@candelatech.com> <1112303627.1073.71.camel@jzny.localdomain> <424C6B10.6030200@candelatech.com> <1112306031.1073.109.camel@jzny.localdomain> <424C7813.4000101@candelatech.com> <20050331143531.30f4eb8f.davem@davemloft.net> <424C7F96.4070002@candelatech.com> <1112311618.1090.20.camel@jzny.localdomain> <424C8E2C.70302@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <424C8E2C.70302@candelatech.com> User-Agent: Mutt/1.3.28i X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1185 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ahu@ds9a.nl Precedence: bulk X-list: netdev On Thu, Mar 31, 2005 at 03:56:28PM -0800, Ben Greear wrote: > >I think you are more comfortable with using netdevices and ioctls and > >/proc. > > Definately. Ever tried to sniff a socket with ethereal? :) On loopback, all the time. I'm probably dense but I don't understand what problem you've solved with this interface. Could you elaborate a bit? -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services From pekkas@netcore.fi Fri Apr 1 01:28:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:28:59 -0800 (PST) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319SqP5018568 for ; Fri, 1 Apr 2005 01:28:53 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id j319SiR11426; Fri, 1 Apr 2005 12:28:44 +0300 Date: Fri, 1 Apr 2005 12:28:44 +0300 (EEST) From: Pekka Savola To: Ben Greear cc: "'netdev@oss.sgi.com'" Subject: Re: RFC: Redirect-Device In-Reply-To: <424CDBA9.80703@candelatech.com> Message-ID: References: <424C6089.1080507@candelatech.com> <424CDBA9.80703@candelatech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1186 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Thu, 31 Mar 2005, Ben Greear wrote: >> Is there something in your problem statement I'm missing? > > That would be similar to what I'm doing, but I'm not really trying > to tunnel anything. I am trying to duplicate the behaviour of two > ethernet interfaces connected by an external cross-over cable, and I'm > trying to duplicate it at the network-device interface level so that > common tools (and my own tools) can treat these virtual interfaces > just like ethernet interfaces. Oh ok, what you seem to want is some kind of "Ethernet loopback++", but the "looped" packets should come back from a virtual interface instead of the same interface? Btw, does the kernel support traditional loopback, so that at the last stage, just before sending a packet on the wire, it would be pushed back. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From herbert@gondor.apana.org.au Fri Apr 1 01:37:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:37:26 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319bGoK019244 for ; Fri, 1 Apr 2005 01:37:17 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHIaL-00028u-00; Fri, 01 Apr 2005 19:37:01 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHIZt-0000N0-00; Fri, 01 Apr 2005 19:36:33 +1000 Date: Fri, 1 Apr 2005 19:36:33 +1000 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [NETLINK] cb_lock does not needs ref count on sk Message-ID: <20050401093633.GA32707@gondor.apana.org.au> References: <20050327091524.GA23215@elte.hu> <20050327133811.GA5569@elte.hu> <20050329104906.GA19836@gondor.apana.org.au> <20050329114926.GA14986@elte.hu> <20050330082640.GA8269@gondor.apana.org.au> <20050330170236.2bddf666.davem@davemloft.net> <20050331231922.GA26587@gondor.apana.org.au> <20050331232322.GA26693@gondor.apana.org.au> <20050331203313.57e1c5c3.davem@davemloft.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Nq2Wo0NMKNjxTN9z" Content-Disposition: inline In-Reply-To: <20050331203313.57e1c5c3.davem@davemloft.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1187 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: Here is a little optimisation for the cb_lock used by netlink_dump. While fixing that race earlier, I noticed that the reference count held by cb_lock is completely useless. The reason is that in order to obtain the protection of the reference count, you have to take the cb_lock. But the only way to take the cb_lock is through dereferencing the socket. That is, you must already possess a reference count on the socket before you can take advantage of the reference count held by cb_lock. As a corollary, we can remve the reference count held by the cb_lock. Signed-off-by: Herbert Xu Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p ===== net/netlink/af_netlink.c 1.75 vs edited ===== --- 1.75/net/netlink/af_netlink.c 2005-04-01 16:25:14 +10:00 +++ edited/net/netlink/af_netlink.c 2005-04-01 19:30:22 +10:00 @@ -374,7 +374,6 @@ nlk->cb->done(nlk->cb); netlink_destroy_callback(nlk->cb); nlk->cb = NULL; - __sock_put(sk); } spin_unlock(&nlk->cb_lock); @@ -1100,7 +1099,6 @@ spin_unlock(&nlk->cb_lock); netlink_destroy_callback(cb); - __sock_put(sk); return 0; } @@ -1139,7 +1137,6 @@ return -EBUSY; } nlk->cb = cb; - sock_hold(sk); spin_unlock(&nlk->cb_lock); netlink_dump(sk); --Nq2Wo0NMKNjxTN9z-- From abhishek@pal.ece.iisc.ernet.in Fri Apr 1 01:40:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:41:01 -0800 (PST) Received: from ece.iisc.ernet.in (ece.iisc.ernet.in [144.16.64.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319em4l019848 for ; Fri, 1 Apr 2005 01:40:54 -0800 Received: from pal.ece.iisc.ernet.in (pal.ece.iisc.ernet.in [144.16.64.149]) by ece.iisc.ernet.in (8.12.6/8.12.6) with ESMTP id j319cS8V023201 for ; Fri, 1 Apr 2005 15:08:28 +0530 (IST) (envelope-from abhishek@pal.ece.iisc.ernet.in) Received: by pal.ece.iisc.ernet.in (Postfix, from userid 1047) id 97D6331E59; Fri, 1 Apr 2005 15:10:40 +0530 (IST) Received: from localhost (localhost [127.0.0.1]) by pal.ece.iisc.ernet.in (Postfix) with ESMTP id 8C98A31E57 for ; Fri, 1 Apr 2005 15:10:40 +0530 (IST) Date: Fri, 1 Apr 2005 15:10:40 +0530 (IST) From: Abhishek Gupta To: netdev@oss.sgi.com Subject: Problem using HTB Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1188 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abhishek@pal.ece.iisc.ernet.in Precedence: bulk X-list: netdev hello everybody I am working on a project related to QoS. I am using Linux's tc to configure my PC based router. My setup is as follows:- eth0 eth1 eth0 eth0 PC-based server|----------|PC-based Router|---------|PC-Based Client (using tc) * All my ethernet cards are on 100Mbps lan * Traffic generators being used: > UDP: gen_send @ about 1Mbps (http://www.citi.umich.edu/projects/qbone/generator.html) * Kernel versions being used:- > At Router: linux-2.4.20 > At Client and Server: Linux-2.4.7-10 * iproute2 versions:- > At Router: iproute2-ss020116 > At Client and Server: iproute2-ss010824 * Packets before leaving sever and client are being marked with DSCP bits using Linux's tc option; Marking is done based on two-tuples: destination ip address and port number * At the Router, I have the following configuration(only related to HTB) for eth0 and similar configuration exits for eth1 too: ---Router Configuration Starts Here----- DEV0='eth0' tc qdisc add dev $DEV0 parent 1: handle 2: htb default 30 tc class add dev $DEV0 parent 2: classid 2:1 htb rate 100kbit burst 100 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:10 htb rate 60kbit burst 100 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:20 htb rate 30kbit burst 60 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:30 htb rate 10kbit burst 80 \ ceil 100kbit tc qdisc add dev $DEV0 parent 2:10 gred setup DPs 3 default 3 grio tc qdisc change dev $DEV0 parent 2:10 gred limit 185000 min 11394 \ max 11395 burst 100 avpkt 128 bandwidth 100kbit DP 1 probability 1 \ prio 1 tc qdisc change dev $DEV0 parent 2:10 gred limit 17972 min 4748 max 9493 \ burst 50 avpkt 1000 bandwidth 100kbit DP 2 probability 0.01 prio 2 tc qdisc change dev $DEV0 parent 2:10 gred limit 4368 min 1796 max 3582 \ burst 25 avpkt 1000 bandwidth 100kbit DP 3 probability 0.01 prio 2 tc qdisc add dev $DEV0 parent 2:20 gred setup DPs 2 default 2 grio tc qdisc change dev $DEV0 parent 2:20 gred limit 52480 min 11311 \ max 11312 burst 60 avpkt 256 bandwidth 100kbit DP 1 probability 1 \ prio 1 tc qdisc change dev $DEV0 parent 2:20 gred limit 47184 min 5898 \ max 11796 burst 60 avpkt 1000 bandwidth 100kbit DP 2 probability 0.01 \ prio 2 tc qdisc add dev $DEV0 parent 2:30 gred setup DPs 1 default 1 grio tc qdisc change dev $DEV0 parent 2:30 gred limit 15728 min 1966 \ max 3932 burst 80 avpkt 200 bandwidth 100kbit DP 1 probability 0.04 \ prio 1 -----Router Configuration Ends Here------ Now, the problem is that when I am sending packets from just one UDP source(at server), I am getting outbound bit rate at eth0(of Router) as 12kbps even though I have ceiled the corresponding HTB class to 100kbps; similar thing happens when I have two UDP sources(both at server). So, even though I have configured for 100kbps, I am getting only 12kbps as the link speed. Please help me out. Abhishek ========================================================================= ABHISHEK GUPTA E-mail:abhishek_it_bhu@yahoo.co.in ========================================================================= From akpm@osdl.org Fri Apr 1 02:11:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 02:12:01 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31ABrpd020835 for ; Fri, 1 Apr 2005 02:11:53 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j31ABgs4005803 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 1 Apr 2005 02:11:42 -0800 Received: from bix (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id j31ABXgB002239; Fri, 1 Apr 2005 02:11:34 -0800 Date: Fri, 1 Apr 2005 02:11:21 -0800 From: Andrew Morton To: netdev@oss.sgi.com Cc: lukeross@sys3175.co.uk Subject: Fw: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu Message-Id: <20050401021121.76da449b.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1189 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev hm, mtu is implemented in the device driver - you might be out of luck. Begin forwarded message: Date: Fri, 1 Apr 2005 02:01:19 -0800 From: bugme-daemon@osdl.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu http://bugme.osdl.org/show_bug.cgi?id=4430 Summary: Virtual interfaces cannot have their own mtu Kernel Version: kernel-2.6.9-1.6_FC2 Status: NEW Severity: low Owner: acme@conectiva.com.br Submitter: lukeross@sys3175.co.uk Distribution: Fedora Core 2,3 Hardware Environment: Broadcom gigabit card using tg3 (Tyan s2885 onboard) Problem Description: eth0 and eth0:1 cannot have different mtus. I have a jumbo-frame capable switch with three devices plugged in. Two are PCs with jumbo-capable cards, the other is a wireless router which isn't, and hangs if either PC attempts to discover whether it can support jumbo frames. To get the benefit of jumbo frames between the two PCs, I tried to set up eth0:1 - on a different subnet to the wireless router - on both PCs, and set the mtu of the eth0:1 to 9000. However it is not possible to set the mtu for eth0:1 to 9000 without setting the mtu of eth0 to 9000 as well. Also noted in http://xcat.org/pipermail/xcat-user/2003-April/002358.html ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hadi@cyberus.ca Fri Apr 1 03:03:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:03:35 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31B3SJF024059 for ; Fri, 1 Apr 2005 03:03:28 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHJvw-0003ME-QW for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:03:24 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHJvt-0007Pl-5B; Fri, 01 Apr 2005 06:03:21 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401042106.GA27762@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112353398.1096.116.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 06:03:18 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1190 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Thu, 2005-03-31 at 23:21, Herbert Xu wrote: > On Thu, Mar 31, 2005 at 08:37:21PM -0500, jamal wrote: > > > --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 > > +++ b/include/net/xfrm.h 2005-03-31 19:26:24.000000000 -0500 > > > > +/* callback structure passed from either netlink or pfkey */ > > +struct km_cb > > This name is a bit non-specific. > note: used by both SP/SA > > +{ > > + u32 data; /* callee to caller */ > > +}; > > Might as well put the event into it if we're going to keep this > structure. It'll help to shorten the function prototypes that > use it. > > And then we can just call this structure km_event. > sure. > > -extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard); > > +extern void km_policy_expired(struct xfrm_policy *pol, int dir, int event); > > Bogus prototype change. > agreed. > > +void xfrm_state_del_flush(struct xfrm_state *x) > > +{ > > + spin_lock_bh(&x->lock); > > + __xfrm_state_delete(x); > > + spin_unlock_bh(&x->lock); > > +} > > Sorry, I've changed my mind on this. This demonstrates why the > km_notify_* calls should be made from af_key/xfrm_user directly > instead of here. > > > Some of these functions are called internally as you discovered. > Since the notifications should only be generated by user requests, > calls to km_notify_* should be made at the places where the user > requests are handled, which is in the KM itself. > You need to be able to generate events at every km not just the one that generated the request. You also (most of the time) need to do it before affected object dissapears. So I am missing your point on this one. > Otherwise we'll have to add hacks like this to avoid the > notification for internal users. > I may be paranoid but i do this because x could be garbage collected way before i send the km user message - and i need it to use it to generate the event. I could take a copy of it ... > > void xfrm_state_delete(struct xfrm_state *x) > > { > > + int notif = 0; > > spin_lock_bh(&x->lock); > > + /* > > + * its unfortunate we have to freeze gc for this > > + * one moment - the other alternative would involve > > + * memcopying the state and then announcing that. > > + * think SMP where theres an iota where this could mess > > + * up - JHS > > + */ > > + spin_lock_bh(&xfrm_state_gc_lock); > > + if (x->km.state != XFRM_STATE_DEAD) > > + notif = 1; > > __xfrm_state_delete(x); > > + > > + if (notif) > > + km_state_notify(x, NULL, XFRM_SAP_DELETED); > > You've caught a real bug for af_key here. It's currently possible to > receive two delete notifications for the same state. Can you elaborate? > However, may I suggest that we code this differently. Make > __xfrm_state_delete return 0 if the state was really deleted > and -ESRCH otherwise. > > Then af_key/xfrm_user can simply call km_state_notify if the > return value was zero. > Again like i said: I need to tell every km user about the event, not just the originator. > BTW there is no need to grab xfrm_state_gc_lock. You've got > a reference count on the state from your caller. > Aha! I missed that - I will remove it. > > @@ -270,6 +319,10 @@ > > } > > } > > spin_unlock_bh(&xfrm_state_lock); > > + if (count) { > > + c.data = proto; > > + km_state_notify(NULL, &c, XFRM_SAP_FLUSHED); > > + } > > The notification should occur in all cases, even if count == 0. > Well, Masahide-San and I actually did discuss this and he was of the same opinion as you. My opinion: We only generate events when something happens, not just because someone issues a command. If flush was issued and there was nothing to flush why generate an event? does the PFKEY RFC say anything on this? > > @@ -957,8 +1020,9 @@ > > if (x->tunnel) { > > struct xfrm_state *t = x->tunnel; > > > > + /* XXX: Avoid announce?? */ > > if (atomic_read(&t->tunnel_users) == 2) > > - xfrm_state_delete(t); > > + xfrm_state_del_flush(t); > > That's right. We don't want to announce internal states to the world. > I will remove that comment. Thats achieved in the above code although the called funtion may not have the appropriate name . > > --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 > > +++ b/net/xfrm/xfrm_policy.c 2005-03-31 19:26:24.000000000 -0500 > > @@ -298,7 +298,7 @@ > > * entry dead. The rule must be unlinked from lists to the moment. > > */ > > > > -static void xfrm_policy_kill(struct xfrm_policy *policy) > > +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir, int notif) > > Again, had you done the km_* calls from af_key/xfrm_user, then there'd > be no need to check notif here. > Refer to my comments above on being able to tell multiple managers about the events originated by one. Actually, given that this function is being called in many places i would say this is the exact central location you want to issue the announce from. > BTW, as it is you're announcing expired policies twice. Once as an > expire event and once as a delete event. This problem will also go > away if you move the km_* calls into af_key/xfrm_user. > Theres an announcement only when policy goes dead ;-> So only one not two. Same with the state as well. And again cant do it from af_key/xfrm_user if you want to have events generated by one km to be sent to another as well. Its pf_key that needs fixing. > > @@ -579,7 +586,7 @@ > > write_unlock_bh(&xfrm_policy_lock); > > > > if (old_pol) { > > - xfrm_policy_kill(old_pol); > > + xfrm_policy_kill(old_pol, dir, 1); > > } > > Please don't announce socket policies :) > I missed this one - sorry. > > --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 > > +++ b/net/xfrm/xfrm_user.c 2005-03-31 19:26:24.000000000 -0500 > > @@ -683,6 +683,10 @@ > > if (!xp) > > return err; > > > > + /* shouldnt excl be based on nlh flags?? > > + * Aha! this is anti-netlink really i.e more pfkey derived > > + * in netlink excl is a flag and you wouldnt need > > + * a type XFRM_MSG_UPDPOLICY - JHS */ > > Good point. Care to provide a patch to treat NEW + NLM_F_REPLACE > as UPD? > > > @@ -1053,10 +1057,10 @@ > > return -1; > > } > > > > -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) > > +static int xfrm_exp_state_notify(struct xfrm_state *x, u32 hard) > > How about calling this xfrm_notify_sa_expired for consistency? > Ditto for the policy function. sure. > > > +static int xfrm_notify_sa_flush(struct km_cb *c) > > +{ > > + struct xfrm_usersa_flush *p; > > + struct nlmsghdr *nlh; > > + struct sk_buff *skb; > > + unsigned char *b; > > + u32 ppid = 0; > > + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); > > + > > + skb = alloc_skb(len, GFP_ATOMIC); > > + if (skb == NULL) > > + return -ENOMEM; > > + b = skb->tail; > > + > > + nlh = NLMSG_PUT(skb, ppid, jiffies, > > If we're serious about providing sequence numbers then please > set it up as an atomic integer and use it throughout this file. > > Otherwise just pop zero in there. > I was just being lazy. I could send a 0 but whats wrong with using jiffies? > > + p = NLMSG_DATA(nlh); > > + if (!c) { > > + printk("xfrm_notify_sa_flush NULL km cb\n"); > > + p->proto = 0; > > Is anyone expected to call this with a NULL pointer? If not then > just let it OOPS. Same comment applies to the cb checks later on. > Will fix this. > > +static int xfrm_notify_sa( struct xfrm_state *x, int event, struct km_cb *c) > > > + if (event == XFRM_SAP_ADDED) > > + nlt = XFRM_MSG_NEWSA; > > + else if (event == XFRM_SAP_UPDATED) > > + nlt = XFRM_MSG_UPDSA; > > + else if (event == XFRM_SAP_DELETED) > > + nlt = XFRM_MSG_DELSA; > > + else > > + goto nlmsg_failure; > > Please use a switch. > sure. > > +static int xfrm_send_state_notify(struct xfrm_state *x, int event, struct km_cb *c) > > +{ > > + > > + if ((event == XFRM_SAP_ADDED) || > > + (event == XFRM_SAP_UPDATED) || > > + (event == XFRM_SAP_DELETED)) > > + return xfrm_notify_sa(x, event, c); > > + > > + if (event == XFRM_SAP_FLUSHED) > > + xfrm_notify_sa_flush(c); > > + > > + if (event != XFRM_SAP_EXPIRED) > > + return 0; > > Again a switch would be perfect. > Will fix this. BTW, Herbert, thanks for taking the time; appreciated. cheers, jamal From hadi@cyberus.ca Fri Apr 1 03:15:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:16:00 -0800 (PST) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31BFrHe024882 for ; Fri, 1 Apr 2005 03:15:54 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DHK7s-0005FE-I0 for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:15:44 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHK7p-0008Tf-9u; Fri, 01 Apr 2005 06:15:41 -0500 Subject: Re: Resend: Re: PATCH: IPSEC acquire in presence of multiple managers From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: herbert@gondor.apana.org.au, "David S. Miller" , nakam@linux-ipv6.org, shinta.sugimoto@ericsson.com, netdev In-Reply-To: <20050331211340.0e6fbdfb.davem@davemloft.net> References: <1111795927.1089.749.camel@jzny.localdomain> <1111862131.1092.872.camel@jzny.localdomain> <20050331211340.0e6fbdfb.davem@davemloft.net> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112354137.1090.129.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 06:15:38 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1191 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 00:13, David S. Miller wrote: > On 26 Mar 2005 13:35:31 -0500 > jamal wrote: > > > Apologies, The last patch had some a glitch in the filename. Dave please > > apply this one instead > > Doesn't apply, in the current tree km_query() is marked static. > > Please regenerate your patch and sorry for not getting to this > sooner. Dave, I am combining this with the other event patch that is under discussion right now which i will end up sending to you. If you want it separate i could do that. cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 03:45:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:45:08 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Biv2i026089 for ; Fri, 1 Apr 2005 03:44:58 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHKZX-00032I-00; Fri, 01 Apr 2005 21:44:19 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHKYE-0000p4-00; Fri, 01 Apr 2005 21:42:58 +1000 Date: Fri, 1 Apr 2005 21:42:58 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050401114258.GA2932@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112353398.1096.116.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1192 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 06:03:18AM -0500, jamal wrote: > > > Some of these functions are called internally as you discovered. > > Since the notifications should only be generated by user requests, > > calls to km_notify_* should be made at the places where the user > > requests are handled, which is in the KM itself. > > You need to be able to generate events at every km not just the one that > generated the request. You also (most of the time) need to do it before I understand. However, that's not determined by where you put the km_notify call itself. Even when you call km_notify from af_key or xfrm_user it will notify every km in the system. It's the fact that we're calling km_notify instead of pfkey_broadcast or netlink_broadcast that's important, not the location. Having the km_notify call made in af_key/xfrm_user is convenient though for the reason I outlined above. > I may be paranoid but i do this because x could be garbage collected way > before i send the km user message - and i need it to use it to generate > the event. I could take a copy of it ... That's what the ref counter is for. > > You've caught a real bug for af_key here. It's currently possible to > > receive two delete notifications for the same state. > > Can you elaborate? Imagine you've got a KM that's trying to delete a state via af_key that's about to expire. If pfkey_delete looks up the state successfully, and then the timer triggers before the actual xfrm_state_delete, you will get one event generated by the timer and another by pfkey_delete. > Again like i said: I need to tell every km user about the event, not > just the originator. I'm suggesting that you add the km_notify calls to af_key and xfrm_user. That will take care of notifying everyone. > Well, Masahide-San and I actually did discuss this and he was of the > same opinion as you. My opinion: We only generate events when something > happens, not just because someone issues a command. If flush was issued > and there was nothing to flush why generate an event? does the PFKEY RFC > say anything on this? RFC 2367 says that: The messaging behavior for SADB_FLUSH is: Send an SADB_FLUSH message from a user process to the kernel. The kernel will return an SADB_FLUSH message to all listening sockets. As you can see, there is no exception for the case of an empty database. So my interpretation would be that a broadcast is needed. > Refer to my comments above on being able to tell multiple managers about > the events originated by one. May I also refer you to my comment above about this being achieved by calling km_notify, even if you do it from within af_key or xfrm_user :) > Actually, given that this function is being called in many places i > would say this is the exact central location you want to issue the > announce from. Try this as an exercise. List all the xfrm_policy_kills that need notifications and all those that don't, you will find that the former all originate from delete/flush commands in af_key/xfrm_user, while the latter originate from other callers. In other words, by placing the call in af_key/xfrm_user you simplify the logic and make it more maintainable. > > BTW, as it is you're announcing expired policies twice. Once as an > > expire event and once as a delete event. This problem will also go > > away if you move the km_* calls into af_key/xfrm_user. > > Theres an announcement only when policy goes dead ;-> > So only one not two. Same with the state as well. Well when the policy expires you will get one expire notification from the current timer code and a new one from your patch since the timer calls xfrm_policy_delete. See my point? By putting the call in xfrm_policy.c you have to be really careful in dividing the internal users which shouldn't generate notifications and the external users which should. By doing it in af_key/xfrm_user you can avoid all this work. > And again cant do it from af_key/xfrm_user if you want to have events > generated by one km to be sent to another as well. Its pf_key that needs > fixing. Well I must repeat that if you were calling km_notify from af_key/xfrm_user you will be sending these events to all km's no matter what their affiliation is :) > > If we're serious about providing sequence numbers then please > > set it up as an atomic integer and use it throughout this file. > > > > Otherwise just pop zero in there. > > I was just being lazy. I could send a 0 but whats wrong with using > jiffies? Using jiffies means that you can have two successive messages that share the same sequence number. It's not a big deal of course. But if we're going to indicate ordering, we might as well go the full length. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Fri Apr 1 03:47:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:47:20 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31BlDko026849 for ; Fri, 1 Apr 2005 03:47:13 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHKc0-000330-00; Fri, 01 Apr 2005 21:46:52 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHKbd-0000pi-00; Fri, 01 Apr 2005 21:46:29 +1000 From: Herbert Xu To: akpm@osdl.org (Andrew Morton) Subject: Re: Fw: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu Cc: netdev@oss.sgi.com, lukeross@sys3175.co.uk Organization: Core In-Reply-To: <20050401021121.76da449b.akpm@osdl.org> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Fri, 01 Apr 2005 21:46:29 +1000 X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1193 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Andrew Morton wrote: > > the eth0:1 to 9000. However it is not possible to set the mtu for eth0:1 to 9000 > without setting the mtu of eth0 to 9000 as well. The solution is to set the mtu using ip route in addition to setting it on eth0, e.g., ip ro add x.0.0.0/8 via gw dev eth0 mtu 1500 src a.b.c.d ip ro add y.0.0.0/8 via gw2 dev eth0 mtu 9000 src e.f.g.h You still have to set the mtu on eth0 to 9000 since that determines the maximum receive size as well (MRU). -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Fri Apr 1 04:24:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:24:54 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31COmBE032004 for ; Fri, 1 Apr 2005 04:24:49 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHLCf-0004Uu-JU for netdev@oss.sgi.com; Fri, 01 Apr 2005 07:24:45 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHLCb-0007cd-Jd; Fri, 01 Apr 2005 07:24:41 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401114258.GA2932@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112358278.1096.160.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 07:24:38 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1194 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 06:42, Herbert Xu wrote: > On Fri, Apr 01, 2005 at 06:03:18AM -0500, jamal wrote: > > > > > Some of these functions are called internally as you discovered. > > > Since the notifications should only be generated by user requests, > > > calls to km_notify_* should be made at the places where the user > > > requests are handled, which is in the KM itself. > > > > You need to be able to generate events at every km not just the one that > > generated the request. You also (most of the time) need to do it before > > I understand. However, that's not determined by where you put the > km_notify call itself. Even when you call km_notify from af_key > or xfrm_user it will notify every km in the system. > > It's the fact that we're calling km_notify instead of pfkey_broadcast > or netlink_broadcast that's important, not the location. > > Having the km_notify call made in af_key/xfrm_user is convenient though > for the reason I outlined above. I think either scheme is fine really;-> I will definetely go back and consider the approach you are suggesting and see if it results into more maintanable code - then fair. Otherwise you realize its more work for me ;-> > > > You've caught a real bug for af_key here. It's currently possible to > > > receive two delete notifications for the same state. > > > > Can you elaborate? > > Imagine you've got a KM that's trying to delete a state via af_key that's > about to expire. If pfkey_delete looks up the state successfully, and > then the timer triggers before the actual xfrm_state_delete, you will > get one event generated by the timer and another by pfkey_delete. > I havent checked the state machine closely, but the following seems to make sense: The first thing that happens to delete the state/policy should win if the state/policy is transitioned to dead. > RFC 2367 says that: > > The messaging behavior for SADB_FLUSH is: > > Send an SADB_FLUSH message from a user process to the kernel. > > > > The kernel will return an SADB_FLUSH message to all listening > sockets. > > > > As you can see, there is no exception for the case of an empty database. > So my interpretation would be that a broadcast is needed. > Does it really make sense, Herbert? ;-> What is it that you just flushed that results in the event? The RFC is ambigous in my opinion. Look at what it says about deleting (same ambiguity). ---- 3.1.4 SADB_DELETE The SADB_DELETE message causes the kernel to delete a Security Association from the key table. The delete message consists of the base header followed by the association, and the source and destination sockaddrs in the address extension. The kernel deletes the security association matching the type, spi, source address, and destination address in the message. The message behavior for SADB_DELETE is as follows: Send an SADB_DELETE message from a user process to the kernel. The kernel returns the SADB_DELETE message to all listening processes. ------ So why would you generate an event in the case when you didnt delete anything? > > Actually, given that this function is being called in many places i > > would say this is the exact central location you want to issue the > > announce from. > > Try this as an exercise. List all the xfrm_policy_kills that need > notifications and all those that don't, you will find that the former > all originate from delete/flush commands in af_key/xfrm_user, while > the latter originate from other callers. > > In other words, by placing the call in af_key/xfrm_user you simplify > the logic and make it more maintainable. > I will go over the code and review. You may be absolutely right - thats the better approach to take. > BTW, as it is you're announcing expired policies twice. Once as an > > > expire event and once as a delete event. This problem will also go > > > away if you move the km_* calls into af_key/xfrm_user. > > > > Theres an announcement only when policy goes dead ;-> > > So only one not two. Same with the state as well. > > Well when the policy expires you will get one expire notification from > the current timer code and a new one from your patch since the timer > calls xfrm_policy_delete. > > See my point? By putting the call in xfrm_policy.c you have to be > really careful in dividing the internal users which shouldn't > generate notifications and the external users which should. By doing > it in af_key/xfrm_user you can avoid all this work. > Thats a bug really which is being exposed now. So it has nothing to do with the approach taken ;-> No expire should be sent if the policy has transitioned to dead. The bug is trivial to fix - and actually should be fixed regardless of this patch. > > I was just being lazy. I could send a 0 but whats wrong with using > > jiffies? > > Using jiffies means that you can have two successive messages that > share the same sequence number. It's not a big deal of course. But > if we're going to indicate ordering, we might as well go the full > length. > Good point. I will stay lazy and just set a 0 ;-> cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 04:37:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:37:50 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Cbcta032644 for ; Fri, 1 Apr 2005 04:37:39 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHLNt-0003Fl-00; Fri, 01 Apr 2005 22:36:21 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHLNS-0000ud-00; Fri, 01 Apr 2005 22:35:54 +1000 Date: Fri, 1 Apr 2005 22:35:54 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050401123554.GA3468@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112358278.1096.160.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1195 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 07:24:38AM -0500, jamal wrote: > > I think either scheme is fine really;-> I will definetely go back and > consider the approach you are suggesting and see if it results into > more maintanable code - then fair. Otherwise you realize its more work > for me ;-> Well I'm happy to code that part if you want :) > I havent checked the state machine closely, but the following seems to > make sense: > The first thing that happens to delete the state/policy should win if > the state/policy is transitioned to dead. Agreed. That's what we'll get if we make __xfrm_state_delete return success/failure. > So why would you generate an event in the case when you didnt delete anything? You're right that the RFC isn't very clear. Let's forget about the RFC and simply consider the usefulness of this. I contend that it is useful to see a FLUSH notification even when it flushed nothing. The reason is that this is an indication to all listeners that the database is completely empty. > > Well when the policy expires you will get one expire notification from > > the current timer code and a new one from your patch since the timer > > calls xfrm_policy_delete. > > > > See my point? By putting the call in xfrm_policy.c you have to be > > really careful in dividing the internal users which shouldn't > > generate notifications and the external users which should. By doing > > it in af_key/xfrm_user you can avoid all this work. > > Thats a bug really which is being exposed now. So it has nothing to do > with the approach taken ;-> You're right that it is a bug. However, this bug would've never triggered before because we simply didn't have delete policy notifications :) > No expire should be sent if the policy has transitioned to dead. The bug > is trivial to fix - and actually should be fixed regardless of this > patch. Yes the same fix to __xfrm_state_delete can be applied to xfrm_policy_delete. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Fri Apr 1 04:59:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:59:52 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Cxl2i001350 for ; Fri, 1 Apr 2005 04:59:48 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHLkW-0001aH-N2 for netdev@oss.sgi.com; Fri, 01 Apr 2005 07:59:44 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHLkU-0002gZ-2l; Fri, 01 Apr 2005 07:59:42 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401123554.GA3468@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112360379.1096.193.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 07:59:39 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1196 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 07:35, Herbert Xu wrote: > On Fri, Apr 01, 2005 at 07:24:38AM -0500, jamal wrote: > > > > I think either scheme is fine really;-> I will definetely go back and > > consider the approach you are suggesting and see if it results into > > more maintanable code - then fair. Otherwise you realize its more work > > for me ;-> > > Well I'm happy to code that part if you want :) > Let me review first. If it is valuable (we may have to leave expire alone). If i can get it done within next day or two fine - else if i get busyed out elsewhere i will hand it to you. Actually if you have plenty cycles and are very enthusiastic about this i can hand it to you right now ;-> Masahide and myself have some momentum going right now but i dont think this will be that disruptive. > You're right that the RFC isn't very clear. > > Let's forget about the RFC and simply consider the usefulness of this. > I contend that it is useful to see a FLUSH notification even when > it flushed nothing. > > The reason is that this is an indication to all listeners that the > database is completely empty. > Ok, let me hear from Masahide-san: If he still holds the same opinion as you then i will make the change. > > Thats a bug really which is being exposed now. So it has nothing to do > > with the approach taken ;-> > > You're right that it is a bug. However, this bug would've never triggered > before because we simply didn't have delete policy notifications :) > indeed. > > No expire should be sent if the policy has transitioned to dead. The bug > > is trivial to fix - and actually should be fixed regardless of this > > patch. > > Yes the same fix to __xfrm_state_delete can be applied to > xfrm_policy_delete. > agreed. cheers, jamal From hadi@cyberus.ca Fri Apr 1 05:18:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 05:18:49 -0800 (PST) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31DIif1002619 for ; Fri, 1 Apr 2005 05:18:45 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DHM2o-00010O-TF for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:18:38 -0700 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHM2q-00055C-BD; Fri, 01 Apr 2005 08:18:40 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <1112360379.1096.193.camel@jzny.localdomain> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112360379.1096.193.camel@jzny.localdomain> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112361517.1089.197.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 08:18:37 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1197 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 07:59, jamal wrote: > Let me review first. If it is valuable (we may have to leave expire > alone). Ok, from a first review I would agree with you the result of doing it in km user will be more maintainable. It will result in a larger patch but in the long run more maintainable. > If i can get it done within next day or two fine - else if i get > busyed out elsewhere i will hand it to you. Let me code away at it - The offer still stands though ;-> cheers, jamal From nakam@linux-ipv6.org Fri Apr 1 06:20:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 06:20:05 -0800 (PST) Received: from mail406.noc.n-bone.net (mail4.noc.n-bone.net [138.243.50.144]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31EJxCd004700 for ; Fri, 1 Apr 2005 06:19:59 -0800 Received: from [192.168.2.196] (polaris.linux-ipv6.org [203.178.140.10]) by mail406.noc.n-bone.net (NBONE-MTA) with ESMTP id CD2CBFD9; Fri, 1 Apr 2005 23:19:47 +0900 (JST) Message-ID: <424D5881.4010005@linux-ipv6.org> Date: Fri, 01 Apr 2005 23:19:45 +0900 From: Masahide NAKAMURA User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca, Herbert Xu Cc: Patrick McHardy , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112360379.1096.193.camel@jzny.localdomain> In-Reply-To: <1112360379.1096.193.camel@jzny.localdomain> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1198 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello Jamal and Herbert, jamal wrote: > Let me review first. If it is valuable (we may have to leave expire > alone). If i can get it done within next day or two fine - else if i get > busyed out elsewhere i will hand it to you. Actually if you have plenty > cycles and are very enthusiastic about this i can hand it to you right > now ;-> Masahide and myself have some momentum going right now but i > dont think this will be that disruptive. > > >>You're right that the RFC isn't very clear. >> >>Let's forget about the RFC and simply consider the usefulness of this. >>I contend that it is useful to see a FLUSH notification even when >>it flushed nothing. >> >>The reason is that this is an indication to all listeners that the >>database is completely empty. >> > > > Ok, let me hear from Masahide-san: If he still holds the same opinion as > you then i will make the change. I think FLUSH should be sent in such case. Because flushing empty SADB/SPD is not an error (at current code), it is reasonable to broadcast it. Regards, -- Masahide NAKAMURA From dada1@cosmosbay.com Fri Apr 1 06:39:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 06:40:05 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31EdvAi005615 for ; Fri, 1 Apr 2005 06:39:58 -0800 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31Edm5v023180; Fri, 1 Apr 2005 16:39:49 +0200 Message-ID: <424D5D34.4030800@cosmosbay.com> Date: Fri, 01 Apr 2005 16:39:48 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> In-Reply-To: <20050331221352.13695124.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Fri, 01 Apr 2005 16:39:49 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1199 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Thu, 17 Mar 2005 20:52:44 +0100 > Eric Dumazet wrote: > > >> - Move the spinlocks out of tr_hash_table[] to a fixed size table : Saves a lot of memory (particulary on UP) > > > If spinlock_t is a zero sized structure on UP, how can this save memory > on UP? :-) Because I deleted the __attribute__((__aligned__(8))) constraint on struct rt_hash_bucket. So sizeof(struct rt_hash_bucket) is now 4 instead of 8 on 32 bits architectures. May I remind you some people still use 32 bits CPU ? :-) By the way I have an updated patch... surviving very serious loads. > > Anyways, I think perhaps you should dynamically allocate this lock table. Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > Otherwise it looks fine. > > From Robert.Olsson@data.slu.se Fri Apr 1 07:53:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 07:53:12 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Fr6ax007887 for ; Fri, 1 Apr 2005 07:53:07 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j31Fr21P015728; Fri, 1 Apr 2005 17:53:02 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 43674EE2B1; Fri, 1 Apr 2005 17:53:02 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16973.28254.203492.400896@robur.slu.se> Date: Fri, 1 Apr 2005 17:53:02 +0200 To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424D5D34.4030800@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1200 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Hello! Eric Dumazet writes: > By the way I have an updated patch... surviving very serious loads. Did you check for performance changes too? From what I understand we can add new lookup and cache miss in the fast packet path. > > Anyways, I think perhaps you should dynamically allocate this lock table. > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? IMO we should be careful with adding new complexity the route hash. Also was this dynamic behavior gc_interval needed to fix the overflow? gc_interval is only sort of last resort timer. --ro From greearb@candelatech.com Fri Apr 1 08:29:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:29:27 -0800 (PST) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GTLSX014658 for ; Fri, 1 Apr 2005 08:29:22 -0800 Received: from [4.33.45.22] (evrtwa1-ar2-4-33-045-022.evrtwa1.dsl-verizon.net [4.33.45.22]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j31GtHLH009322; Fri, 1 Apr 2005 08:55:17 -0800 Message-ID: <424D76DF.5070002@candelatech.com> Date: Fri, 01 Apr 2005 08:29:19 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.3) Gecko/20041020 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pekka Savola CC: "'netdev@oss.sgi.com'" Subject: Re: RFC: Redirect-Device References: <424C6089.1080507@candelatech.com> <424CDBA9.80703@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1202 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Pekka Savola wrote: > On Thu, 31 Mar 2005, Ben Greear wrote: > >>> Is there something in your problem statement I'm missing? >> >> >> That would be similar to what I'm doing, but I'm not really trying >> to tunnel anything. I am trying to duplicate the behaviour of two >> ethernet interfaces connected by an external cross-over cable, and I'm >> trying to duplicate it at the network-device interface level so that >> common tools (and my own tools) can treat these virtual interfaces >> just like ethernet interfaces. > > > Oh ok, what you seem to want is some kind of "Ethernet loopback++", but > the "looped" packets should come back from a virtual interface instead > of the same interface? Yes. In practice, I use a pair of virtual interfaces, so I send on one virtual and receive on the other. I use separate software to bridge, or the normal linux stacks to route, the packets to other interfaces, including real interfaces. > Btw, does the kernel support traditional loopback, so that at the last > stage, just before sending a packet on the wire, it would be pushed back. Not that I'm aware of. -- Ben Greear Candela Technologies Inc http://www.candelatech.com From dada1@cosmosbay.com Fri Apr 1 08:34:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:34:33 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GYSnC015284 for ; Fri, 1 Apr 2005 08:34:29 -0800 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31GYIaH026085; Fri, 1 Apr 2005 18:34:19 +0200 Message-ID: <424D780A.9000101@cosmosbay.com> Date: Fri, 01 Apr 2005 18:34:18 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Robert Olsson CC: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <16973.28254.203492.400896@robur.slu.se> In-Reply-To: <16973.28254.203492.400896@robur.slu.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Fri, 01 Apr 2005 18:34:19 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1203 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Robert Olsson a écrit : > Hello! > > Did you check for performance changes too? From what I understand > we can add new lookup and cache miss in the fast packet path. Performance is better because in case of stress (lot of incoming packets per second), the 1024 bytes of the locks are all in cache. As the size of the hash is divided by a 2 factor, rt_check_expire() and/or rt_garbage_collect() have to touch less cache lines. According to oprofile, an unpatched kernel was spending more than 15% of time in route.c routines, now I see ip_route_input() at 1.88% > > > > Anyways, I think perhaps you should dynamically allocate this lock table. > > > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > > IMO we should be careful with adding new complexity the route hash. > Also was this dynamic behavior gc_interval needed to fix the overflow? In my case yes, because I have huge route cache. > gc_interval is only sort of last resort timer. Actually not : gc_interval controls the rt_check_expire() to clean the hash table after use. All old enough entries can be deleted smoothly, on behalf of a timer tick (so network interrupts can still occur) I found it was better to adjust gc_interval to 1 (to let it fire every second and examine 1/300 table slots, or more if the dynamic behavior triggers), and ajust params so that rt_garbage_collect() doesnt run at all : rt_garbage_collect() can take forever to complete, blocking network trafic. Eric Dumazet From ak@muc.de Fri Apr 1 08:40:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:40:16 -0800 (PST) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Ge9g6015890 for ; Fri, 1 Apr 2005 08:40:10 -0800 Received: by one.firstfloor.org (Postfix, from userid 502) id 97B16D033E; Fri, 1 Apr 2005 18:40:07 +0200 (CEST) To: Rick Jones Cc: netdev@oss.sgi.com Subject: Re: [RFC] netif_rx: receive path optimization References: <20050330132815.605c17d0@dxpl.pdx.osdl.net> <20050331120410.7effa94d@dxpl.pdx.osdl.net> <1112303431.1073.67.camel@jzny.localdomain> <424C6A98.1070509@hp.com> From: Andi Kleen Date: Fri, 01 Apr 2005 18:40:07 +0200 In-Reply-To: <424C6A98.1070509@hp.com> (Rick Jones's message of "Thu, 31 Mar 2005 13:24:40 -0800") Message-ID: User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1204 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Rick Jones writes: > At the risk of again chewing on my toes (yum), if multiple CPUs are > pulling packets from the per-device queue there will be packet > reordering. HP-UX 10.0 did just that and it was quite nasty even at > low CPU counts (<=4). It was changed by HP-UX 10.20 (ca 1995) to > per-CPU queues with queue selection computed from packet headers (hash > the IP and TCP/UDP header to pick a CPU) It was called IPS for Inbound > Packet Scheduling. 11.0 (ca 1998) later changed that to "find where > the connection last ran and queue to that CPU" That was called TOPS - > Thread Optimized Packet Scheduling. We went over this a lot several years ago when Linux got multi threaded RX with softnet in 2.1. You might want to go over the archives. Some things that came out of it was a sender side TCP optimization to tolerate reordering without slowing down (works great with other Linux peers) and NAPI style polling mode (which was mostly designed for routing and still seems to have regressions for the client/server case :/) Something like TOPS was discussed, but afaik nobody ever implemented it. Of course benchmark guys do it manually by setting interrupt and scheduler affinity. -Andi From greearb@candelatech.com Fri Apr 1 08:58:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:59:02 -0800 (PST) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GwuW4016989 for ; Fri, 1 Apr 2005 08:58:57 -0800 Received: from [4.33.45.22] (evrtwa1-ar2-4-33-045-022.evrtwa1.dsl-verizon.net [4.33.45.22]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j31HOoLH009680; Fri, 1 Apr 2005 09:24:51 -0800 Message-ID: <424D7DCC.5030202@candelatech.com> Date: Fri, 01 Apr 2005 08:58:52 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.3) Gecko/20041020 X-Accept-Language: en-us, en MIME-Version: 1.0 To: bert hubert CC: hadi@cyberus.ca, "David S. Miller" , netdev Subject: Re: RFC: Redirect-Device References: <424C6089.1080507@candelatech.com> <1112303627.1073.71.camel@jzny.localdomain> <424C6B10.6030200@candelatech.com> <1112306031.1073.109.camel@jzny.localdomain> <424C7813.4000101@candelatech.com> <20050331143531.30f4eb8f.davem@davemloft.net> <424C7F96.4070002@candelatech.com> <1112311618.1090.20.camel@jzny.localdomain> <424C8E2C.70302@candelatech.com> <20050401090116.GA21361@outpost.ds9a.nl> In-Reply-To: <20050401090116.GA21361@outpost.ds9a.nl> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1205 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev bert hubert wrote: > On Thu, Mar 31, 2005 at 03:56:28PM -0800, Ben Greear wrote: > > >>>I think you are more comfortable with using netdevices and ioctls and >>>/proc. >> >>Definately. Ever tried to sniff a socket with ethereal? :) > > > On loopback, all the time. I'm probably dense but I don't understand what > problem you've solved with this interface. Could you elaborate a bit? It allows me to place a software bridge that can intercept all packets from user-space via raw packet sockets, and kernel space via registering an 'all' protocol on the device. Please note that to bridge in this manner I have to remove the IP protocol (set IP to 0.0.0.0), otherwise the IP stack can interfere with the bridging behaviour. By using a virtual pair of interfaces that are looped back, I can add an IP to the second virtual network interface that does not interfere with the two bridged interfaces (one physical, one redirect, both with 0.0.0.0 IP addresses). If there were an API to register handlers dynamically that act like the netpoll hook (ie, with ability to consume frames), then I would not have to remove the IP from the physical interface and I probably would not have had to create these redirect devices. But, when I was suggesting such a hook in the past, it was shot down because it could allow someone to write their own TCP stack, and the network guys did not want to allow this possibility. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From Robert.Olsson@data.slu.se Fri Apr 1 09:26:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:26:46 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HQfZm018140 for ; Fri, 1 Apr 2005 09:26:42 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j31HQWQG025702; Fri, 1 Apr 2005 19:26:32 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 9CDC6EE2B1; Fri, 1 Apr 2005 19:26:32 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16973.33864.613333.389857@robur.slu.se> Date: Fri, 1 Apr 2005 19:26:32 +0200 To: Eric Dumazet Cc: Robert Olsson , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424D780A.9000101@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <16973.28254.203492.400896@robur.slu.se> <424D780A.9000101@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1206 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Eric Dumazet writes: > According to oprofile, an unpatched kernel was spending more than 15% of time in route.c routines, now I see ip_route_input() at 1.88% Would like to see absolute numbers for UP/SMP single flow and DoS to be confident. > I found it was better to adjust gc_interval to 1 (to let it fire every second and examine 1/300 table slots, or more if the dynamic behavior > triggers), and ajust params so that rt_garbage_collect() doesnt run at all : rt_garbage_collect() can take forever to complete, blocking > network trafic. I don't think you can depend on timer for GC solely. Timer tick is eternity for todays packet rates. You can distribute the GC load by allowing it to run more frequent this in combination with huge cache seems to be a very interesting approach given that you have memory. --ro From nakam@linux-ipv6.org Fri Apr 1 09:28:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:28:20 -0800 (PST) Received: from mail406.noc.n-bone.net (mail4.noc.n-bone.net [138.243.50.144]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HSFxd018552 for ; Fri, 1 Apr 2005 09:28:16 -0800 Received: from [192.168.2.195] (polaris.linux-ipv6.org [203.178.140.10]) by mail406.noc.n-bone.net (NBONE-MTA) with ESMTP id BDA70AE5; Sat, 2 Apr 2005 02:28:09 +0900 (JST) Message-ID: <424D84A7.6060707@linux-ipv6.org> Date: Sat, 02 Apr 2005 02:28:07 +0900 From: Masahide NAKAMURA User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca, Herbert Xu Cc: Patrick McHardy , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events References: <1112319441.1089.83.camel@jzny.localdomain> In-Reply-To: <1112319441.1089.83.camel@jzny.localdomain> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1207 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Jamal and Herbert, jamal wrote: > Herbert et al, > > Ok, heres the final patch with all the changes discussed. > > include/linux/xfrm.h | 2 > include/net/xfrm.h | 29 ++++++- > net/key/af_key.c | 24 +++++- > net/xfrm/xfrm_policy.c | 25 ++++-- > net/xfrm/xfrm_state.c | 84 +++++++++++++++++++-- > net/xfrm/xfrm_user.c | 188 > ++++++++++++++++++++++++++++++++++++++++++++++++- > 6 files changed, 323 insertions(+), 29 deletions(-) > > I have tested this with both setkey and iproute2 (about 10 scenarios or > so). Masahide-san is doing a lot more thorough testing with key servers > as well. He has not tested this patch yet (time difference) but it is > based on the last one he tested. Short report: I've tested on this patched kernel and it works. - add/del/flush for SA/SP and allocspi/acquire/upd for SA through netlink socket - racoon runs fine (pfkey works for normal operation) both without and with opening netlink socket to listen Since we have discussion which is still going on about the patch, the code will be change and I'll need to test again anyway. Thanks, -- Masahide NAKAMURA From roland@topspin.com Fri Apr 1 09:53:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:54:00 -0800 (PST) Received: from exch-1.topspincom.com (webmail.topspin.com [12.162.17.3]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HrqZc019816 for ; Fri, 1 Apr 2005 09:53:53 -0800 Received: from localhost.localdomain ([10.3.1.93]) by exch-1.topspincom.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 1 Apr 2005 09:45:33 -0800 Received: by localhost.localdomain (Postfix, from userid 1113) id 7EA6C4FDF2; Fri, 1 Apr 2005 09:45:33 -0800 (PST) To: akpm@osdl.org Cc: linux-kernel@vger.kernel.org, openib-general@openib.org, netdev@oss.sgi.com, davem@davemloft.net Subject: [PATCH][4/3] IPoIB: document conversion to debugfs X-Message-Flag: Warning: May contain useful information References: <20053311936.XaQmN4N9new7dTCP@topspin.com> From: Roland Dreier Date: Fri, 01 Apr 2005 09:45:33 -0800 In-Reply-To: <20053311936.XaQmN4N9new7dTCP@topspin.com> (Roland Dreier's message of "Thu, 31 Mar 2005 19:36:12 -0800") Message-ID: <52r7hujsqq.fsf@topspin.com> User-Agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Jumbo Shrimp, linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OriginalArrivalTime: 01 Apr 2005 17:45:33.0676 (UTC) FILETIME=[9AC0C2C0:01C536E2] X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1208 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: roland@topspin.com Precedence: bulk X-list: netdev Update IPoIB documentation now that multicast debugging files have moved from ipoibdebugfs to debugfs. Signed-off-by: Roland Dreier --- linux-export.orig/Documentation/infiniband/ipoib.txt 2005-03-31 19:07:01.000000000 -0800 +++ linux-export/Documentation/infiniband/ipoib.txt 2005-04-01 09:43:27.122520190 -0800 @@ -32,14 +32,13 @@ mcast_debug_level to 1. These parameters can be controlled at runtime through files in /sys/module/ib_ipoib/. - CONFIG_INFINIBAND_IPOIB_DEBUG also enables the "ipoib_debugfs" + CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs virtual filesystem. By mounting this filesystem, for example with - mkdir -p /ipoib_debugfs - mount -t ipoib_debugfs none /ipoib_debufs + mount -t debugfs none /sys/kernel/debug - it is possible to get statistics about multicast groups from the - files /ipoib_debugfs/ib0_mcg and so on. + it is possible to get statistics about munlticast groups from the + files /sys/kernel/debug/ipoib/ib0_mcg and so on. The performance impact of this option is negligible, so it is safe to enable this option with debug_level set to 0 for normal From rick.jones2@hp.com Fri Apr 1 10:55:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 10:56:03 -0800 (PST) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Itxgb022131 for ; Fri, 1 Apr 2005 10:55:59 -0800 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel11.hp.com (Postfix) with ESMTP id 29A4E1F36E7 for ; Fri, 1 Apr 2005 10:22:52 -0800 (PST) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id KAA01022 for ; Fri, 1 Apr 2005 10:22:51 -0800 (PST) Message-ID: <424D917B.2060108@hp.com> Date: Fri, 01 Apr 2005 10:22:51 -0800 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev Subject: Re: [RFC] netif_rx: receive path optimization References: <20050330132815.605c17d0@dxpl.pdx.osdl.net> <20050331120410.7effa94d@dxpl.pdx.osdl.net> <1112303431.1073.67.camel@jzny.localdomain> <424C6A98.1070509@hp.com> <1112305084.1073.94.camel@jzny.localdomain> <424C7CDC.8050801@hp.com> <1112312206.1096.25.camel@jzny.localdomain> <424C90DA.7030600@hp.com> <1112318229.1090.63.camel@jzny.localdomain> In-Reply-To: <1112318229.1090.63.camel@jzny.localdomain> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1209 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev >>The main idea behind TOPS and prior to that IPS was to spread-out >>the processing of packets across as many CPUs as we could, as "correctly" as we >>could. > > > Very very hard to do. Why do you say that? "Correct" can be defined as either the same CPU for each packet in a given flow (IPS) or the same CPU as last accessed the endpoint (TOPS). > Isnt MSI supposed to give you ability such that a > NIC can pick a CPU to interupt? That would help in a small way That gives the NIC the knowledge of how to direct to a CPU, but as you know does not tell it how to decide where. Since I doubt that the NIC wants to reach-out and touch connection state in the host (nor I suppose do we want it to either) the best a NIC with MSI could do would be IPS >>TOPS lets the process (I suppose the scheduler really) decide where some of the >>processing for the packet will happen - the part after the handoff. >> > > I think this last part should be easy to do - but perhaps the expense of > landing on the wrong CPU may override any benefits perceived. Unless one has a scheduler that likes to migrate processes, the chances of landing on the wrong CPU are minimal and shortlived, and overall, the chances of being right are greater than if not doing anything and sticking with the interrupt CPU. (Handwaving based on experience-driven intuition and a bit of math as one increases the CPU count) This is all on the premis that one is running with numNIC << numCPU. With numNIC == numCPU one does things as seen in certain networking-intensive benchmarks :) rick jones From shemminger@osdl.org Fri Apr 1 12:07:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:07:41 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31K7aJG024341 for ; Fri, 1 Apr 2005 12:07:36 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j31K7Rs4028918 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 1 Apr 2005 12:07:27 -0800 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [172.20.1.103]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j31K7RLd030565; Fri, 1 Apr 2005 12:07:27 -0800 Date: Fri, 1 Apr 2005 12:07:27 -0800 From: Stephen Hemminger To: lartc@mailman.ds9a.nl, linux-kernel@vger.kernel.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [ANNOUNCE] iproute2 2.6.11-050330 Message-ID: <20050401120727.62700e8c@dxpl.pdx.osdl.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1210 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev An updated version of the iproute2 utilities is available at: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.11-050330.tar.gz It supports the latest features from 2.6, but is backwards compatiable with 2.4. This update includes several bugfixes and build clean from the previous version (2.6.11-050314): [Jamal Hadi Salim] * Proper verison of iptables headers (from 1.3.1) * Set revision file in m_ipt * Fix action_util naming in mirred * don't call ll_init_map in mirred [Thomas Graf] * Warn about wildcard deletions and provide IFA_ADDRESS upon deletions to enforce prefix length validation for IPv4. * Fix netlink message alignment when the last routing attribute added has a data length not aligned to RTA_ALIGNTO. [Masahide NAKAMURA] * ipv6 xfrm allocspi and monitor support. [Stephen Hemminger] * include/linux/netfilter_ipv4/ip_tables.h dont include compiler.h because it isn't needed and not on all systems * Update rtnetlink.h and pkt_cls.h to be stripped versions of headers from 2.6.12-rc1 * switch to stack for netem tables * add -force option to batch mode * handle midline comments in batch mode * sum per cpu fields in lnstat correctly From sds@tycho.nsa.gov Fri Apr 1 12:15:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:15:29 -0800 (PST) Received: from jazzhorn.ncsc.mil (mummy.ncsc.mil [144.51.88.129]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KFLo9025229 for ; Fri, 1 Apr 2005 12:15:22 -0800 Received: from tycho.ncsc.mil (jazzhorn.ncsc.mil [144.51.5.9]) by jazzhorn.ncsc.mil (8.12.10/8.12.10) with ESMTP id j31KBvhV026499; Fri, 1 Apr 2005 20:11:57 GMT Received: from moss-spartans.epoch.ncsc.mil (moss-spartans [144.51.25.121]) by tycho.ncsc.mil (8.12.8/8.12.8) with ESMTP id j31KG5Do015003; Fri, 1 Apr 2005 15:16:05 -0500 (EST) Subject: [PATCH] Fix SELinux for removal of i_sock From: Stephen Smalley To: "David S. Miller" , James Morris , lkml , netdev@oss.sgi.com, matthew@wil.cx Content-Type: text/plain Organization: National Security Agency Date: Fri, 01 Apr 2005 15:06:37 -0500 Message-Id: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-14) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1211 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sds@tycho.nsa.gov Precedence: bulk X-list: netdev Hi, This patch against -bk eliminates the use of i_sock by SELinux as it appears to have been removed recently, breaking the build of SELinux in -bk. Simply replacing the i_sock test with an S_ISSOCK test would be unsafe in the SELinux code, as the latter will also return true for the inodes of socket files in the filesystem, not just the actual socket objects IIUC. Hence this patch reworks the SELinux code to avoid the need to apply such a test in the first place, part of which was obsoleted anyway by earlier changes to SELinux. Please apply. Signed-off-by: Stephen Smalley Signed-off-by: James Morris security/selinux/hooks.c | 21 +++------------------ 1 files changed, 3 insertions(+), 18 deletions(-) ===== security/selinux/hooks.c 1.93 vs edited ===== --- 1.93/security/selinux/hooks.c 2005-03-28 17:21:19 -05:00 +++ edited/security/selinux/hooks.c 2005-04-01 15:01:58 -05:00 @@ -877,18 +877,8 @@ static int inode_doinit_with_dentry(stru isec->initialized = 1; out: - if (inode->i_sock) { - struct socket *sock = SOCKET_I(inode); - if (sock->sk) { - isec->sclass = socket_type_to_security_class(sock->sk->sk_family, - sock->sk->sk_type, - sock->sk->sk_protocol); - } else { - isec->sclass = SECCLASS_SOCKET; - } - } else { + if (isec->sclass == SECCLASS_FILE) isec->sclass = inode_mode_to_security_class(inode->i_mode); - } if (hold_sem) up(&isec->sem); @@ -2979,18 +2969,15 @@ out: static void selinux_socket_post_create(struct socket *sock, int family, int type, int protocol, int kern) { - int err; struct inode_security_struct *isec; struct task_security_struct *tsec; - err = inode_doinit(SOCK_INODE(sock)); - if (err < 0) - return; isec = SOCK_INODE(sock)->i_security; tsec = current->security; isec->sclass = socket_type_to_security_class(family, type, protocol); isec->sid = kern ? SECINITSID_KERNEL : tsec->sid; + isec->initialized = 1; return; } @@ -3158,14 +3145,12 @@ static int selinux_socket_accept(struct if (err) return err; - err = inode_doinit(SOCK_INODE(newsock)); - if (err < 0) - return err; newisec = SOCK_INODE(newsock)->i_security; isec = SOCK_INODE(sock)->i_security; newisec->sclass = isec->sclass; newisec->sid = isec->sid; + newisec->initialized = 1; return 0; } -- Stephen Smalley National Security Agency From davem@davemloft.net Fri Apr 1 12:28:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:29:01 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KSt7x029634 for ; Fri, 1 Apr 2005 12:28:55 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHSkM-0002UR-00; Fri, 01 Apr 2005 12:28:02 -0800 Date: Fri, 1 Apr 2005 12:28:02 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401122802.7c71afbc.davem@davemloft.net> In-Reply-To: <424D5D34.4030800@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1212 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 16:39:48 +0200 Eric Dumazet wrote: > > If spinlock_t is a zero sized structure on UP, how can this save memory > > on UP? :-) > > Because I deleted the __attribute__((__aligned__(8))) constraint on struct rt_hash_bucket. Right. > > Anyways, I think perhaps you should dynamically allocate this lock table. > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? Even for NR_CPUS, I think the table should be dynamically allocated. It is a goal to eliminate all of these huge arrays in the static kernel image, which has grown incredibly too much in recent times. I work often to eliminate such things, let's not add new ones :-) From davem@davemloft.net Fri Apr 1 12:36:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:36:18 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KaD6Z030336 for ; Fri, 1 Apr 2005 12:36:13 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHSrQ-0002Xi-00; Fri, 01 Apr 2005 12:35:20 -0800 Date: Fri, 1 Apr 2005 12:35:20 -0800 From: "David S. Miller" To: Stephen Smalley Cc: jmorris@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, matthew@wil.cx Subject: Re: [PATCH] Fix SELinux for removal of i_sock Message-Id: <20050401123520.7532528b.davem@davemloft.net> In-Reply-To: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> References: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1213 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 15:06:37 -0500 Stephen Smalley wrote: > This patch against -bk eliminates the use of i_sock by SELinux as it > appears to have been removed recently, breaking the build of SELinux in > -bk. Simply replacing the i_sock test with an S_ISSOCK test would be > unsafe in the SELinux code, as the latter will also return true for the > inodes of socket files in the filesystem, not just the actual socket > objects IIUC. Hence this patch reworks the SELinux code to avoid the > need to apply such a test in the first place, part of which was > obsoleted anyway by earlier changes to SELinux. Please apply. > > Signed-off-by: Stephen Smalley > Signed-off-by: James Morris Applied, thanks Stephen. From dada1@cosmosbay.com Fri Apr 1 13:05:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:05:58 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L5plG031537 for ; Fri, 1 Apr 2005 13:05:52 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31L5csf030409; Fri, 1 Apr 2005 23:05:43 +0200 Message-ID: <424DB7A1.8090803@cosmosbay.com> Date: Fri, 01 Apr 2005 23:05:37 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> In-Reply-To: <20050401122802.7c71afbc.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Fri, 01 Apr 2005 23:05:44 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1214 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Fri, 01 Apr 2005 16:39:48 +0200 > Eric Dumazet wrote: > >>Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > > > Even for NR_CPUS, I think the table should be dynamically allocated. > > It is a goal to eliminate all of these huge arrays in the static > kernel image, which has grown incredibly too much in recent times. > I work often to eliminate such things, let's not add new ones :-) You mean you prefer : static spinlock_t *rt_hash_lock ; /* rt_hash_lock = alloc_memory_at_boot_time(...) */ instead of static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; In both cases, memory is taken from lowmem, and size of kernel image is roughly the same (bss section takes no space in image) Then the runtime cost is more expensive in the 'dynamic case' because of the extra indirection... ? From jheffner@psc.edu Fri Apr 1 13:05:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:06:03 -0800 (PST) Received: from mailer2.psc.edu (mailer2.psc.edu [128.182.66.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L5ufx031548 for ; Fri, 1 Apr 2005 13:05:56 -0800 Received: from dexter.psc.edu (dexter.psc.edu [128.182.61.232]) by mailer2.psc.edu (8.13.3/8.13.3) with ESMTP id j31LAYiG018305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Apr 2005 16:10:38 -0500 (EST) Received: from dexter.psc.edu (localhost.psc.edu [127.0.0.1]) by dexter.psc.edu (8.12.11/8.12.10) with ESMTP id j31L5nhA018741; Fri, 1 Apr 2005 16:05:50 -0500 Received: from localhost (jheffner@localhost) by dexter.psc.edu (8.12.11/8.12.11/Submit) with ESMTP id j31L5nZa018738; Fri, 1 Apr 2005 16:05:49 -0500 X-Authentication-Warning: dexter.psc.edu: jheffner owned process doing -bs Date: Fri, 1 Apr 2005 16:05:49 -0500 (EST) From: John Heffner To: davem@davemloft.net, netdev@oss.sgi.com Subject: [PATCH] skb pcount with MTU discovery Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1215 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev The problem is that when doing MTU discovery, the too-large segments in the write queue will be calculated as having a pcount of >1. When tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test when pcount > cwnd. The segments are eventually transmitted one at a time by keepalive, but this can take a long time. This patch checks if TSO is enabled when setting pcount. -John Signed-off-by: John Heffner ===== include/net/tcp.h 1.114 vs edited ===== --- 1.114/include/net/tcp.h 2005-03-31 11:51:09 -05:00 +++ edited/include/net/tcp.h 2005-04-01 14:44:13 -05:00 @@ -1470,19 +1470,20 @@ tcp_minshall_check(tp)))); } -extern void tcp_set_skb_tso_segs(struct sk_buff *, unsigned int); +extern void tcp_set_skb_tso_segs(struct sock *, struct sk_buff *); /* This checks if the data bearing packet SKB (usually sk->sk_send_head) * should be put on the wire right now. */ -static __inline__ int tcp_snd_test(const struct tcp_sock *tp, +static __inline__ int tcp_snd_test(struct sock *sk, struct sk_buff *skb, unsigned cur_mss, int nonagle) { + struct tcp_sock *tp = tcp_sk(sk); int pkts = tcp_skb_pcount(skb); if (!pkts) { - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); pkts = tcp_skb_pcount(skb); } @@ -1543,7 +1544,7 @@ if (skb) { if (!tcp_skb_is_last(sk, skb)) nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(tp, skb, cur_mss, nonagle) || + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || tcp_write_xmit(sk, nonagle)) tcp_check_probe_timer(sk, tp); } @@ -1561,7 +1562,7 @@ struct sk_buff *skb = sk->sk_send_head; return (skb && - tcp_snd_test(tp, skb, tcp_current_mss(sk, 1), + tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), tcp_skb_is_last(sk, skb) ? TCP_NAGLE_PUSH : tp->nonagle)); } ===== net/ipv4/tcp_output.c 1.90 vs edited ===== --- 1.90/net/ipv4/tcp_output.c 2005-04-01 09:08:34 -05:00 +++ edited/net/ipv4/tcp_output.c 2005-04-01 14:45:27 -05:00 @@ -433,7 +433,7 @@ struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb = sk->sk_send_head; - if (tcp_snd_test(tp, skb, cur_mss, TCP_NAGLE_PUSH)) { + if (tcp_snd_test(sk, skb, cur_mss, TCP_NAGLE_PUSH)) { /* Send it out now. */ TCP_SKB_CB(skb)->when = tcp_time_stamp; tcp_tso_set_push(skb); @@ -446,9 +446,12 @@ } } -void tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_std) +void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) { - if (skb->len <= mss_std) { + struct tcp_sock *tp = tcp_sk(sk); + + if (skb->len <= tp->mss_cache_std || + !(sk->sk_route_caps & NETIF_F_TSO)) { /* Avoid the costly divide in the normal * non-TSO case. */ @@ -457,10 +460,10 @@ } else { unsigned int factor; - factor = skb->len + (mss_std - 1); - factor /= mss_std; + factor = skb->len + (tp->mss_cache_std - 1); + factor /= tp->mss_cache_std; skb_shinfo(skb)->tso_segs = factor; - skb_shinfo(skb)->tso_size = mss_std; + skb_shinfo(skb)->tso_size = tp->mss_cache_std; } } @@ -531,8 +534,8 @@ } /* Fix up tso_factor for both original and new SKB. */ - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); - tcp_set_skb_tso_segs(buff, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); + tcp_set_skb_tso_segs(sk, buff); if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) { tp->lost_out += tcp_skb_pcount(skb); @@ -607,7 +610,7 @@ * factor and mss. */ if (tcp_skb_pcount(skb) > 1) - tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb)); + tcp_set_skb_tso_segs(sk, skb); return 0; } @@ -815,7 +818,7 @@ sk_stream_free_skb(sk, skb); } else { TCP_SKB_CB(skb)->seq += copy; - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); } len += copy; @@ -824,7 +827,7 @@ __skb_insert(nskb, skb->prev, skb, &sk->sk_write_queue); sk->sk_send_head = nskb; - tcp_set_skb_tso_segs(nskb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, nskb); /* We're ready to send. If this fails, the probe will * be resegmented into mss-sized pieces by tcp_write_xmit(). */ @@ -885,7 +888,7 @@ mss_now = tcp_current_mss(sk, 1); while ((skb = sk->sk_send_head) && - tcp_snd_test(tp, skb, mss_now, + tcp_snd_test(sk, skb, mss_now, tcp_skb_is_last(sk, skb) ? nonagle : TCP_NAGLE_PUSH)) { if (skb->len > mss_now) { @@ -1822,7 +1825,7 @@ tp->mss_cache = tp->mss_cache_std; } } else if (!tcp_skb_pcount(skb)) - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH; TCP_SKB_CB(skb)->when = tcp_time_stamp; From davem@davemloft.net Fri Apr 1 13:09:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:09:27 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L9Nls032679 for ; Fri, 1 Apr 2005 13:09:23 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHTNY-0002m8-00; Fri, 01 Apr 2005 13:08:32 -0800 Date: Fri, 1 Apr 2005 13:08:32 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401130832.1f972a3b.davem@davemloft.net> In-Reply-To: <424DB7A1.8090803@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1216 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 23:05:37 +0200 Eric Dumazet wrote: > You mean you prefer : > > static spinlock_t *rt_hash_lock ; /* rt_hash_lock = > alloc_memory_at_boot_time(...) */ > > instead of > > static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; > > In both cases, memory is taken from lowmem, and size of kernel image > is roughly the same (bss section takes no space in image) In the former case the kernel image the bootloader has to load is smaller. That's important, believe it or not. It means less TLB entries need to be locked permanently into the MMU on certain platforms. From davem@davemloft.net Fri Apr 1 13:11:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:11:42 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LBaXI000825 for ; Fri, 1 Apr 2005 13:11:36 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHTPh-0002mS-00; Fri, 01 Apr 2005 13:10:45 -0800 Date: Fri, 1 Apr 2005 13:10:45 -0800 From: "David S. Miller" To: John Heffner Cc: netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery Message-Id: <20050401131045.4e558f65.davem@davemloft.net> In-Reply-To: References: X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1217 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 1 Apr 2005 16:05:49 -0500 (EST) John Heffner wrote: > The problem is that when doing MTU discovery, the too-large segments in > the write queue will be calculated as having a pcount of >1. When > tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test > when pcount > cwnd. > > The segments are eventually transmitted one at a time by keepalive, but > this can take a long time. > > This patch checks if TSO is enabled when setting pcount. Why isn't the MSS properly updated at this point in time? If it were, the pcount setting would do the right thing. That's how this code is supposed to work. From jheffner@psc.edu Fri Apr 1 13:23:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:23:13 -0800 (PST) Received: from mailer2.psc.edu (mailer2.psc.edu [128.182.66.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LN5Pf001621 for ; Fri, 1 Apr 2005 13:23:06 -0800 Received: from dexter.psc.edu (dexter.psc.edu [128.182.61.232]) by mailer2.psc.edu (8.13.3/8.13.3) with ESMTP id j31LRi33009348 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Apr 2005 16:27:48 -0500 (EST) Received: from dexter.psc.edu (localhost.psc.edu [127.0.0.1]) by dexter.psc.edu (8.12.11/8.12.10) with ESMTP id j31LMxdx018810; Fri, 1 Apr 2005 16:22:59 -0500 Received: from localhost (jheffner@localhost) by dexter.psc.edu (8.12.11/8.12.11/Submit) with ESMTP id j31LMx4H018807; Fri, 1 Apr 2005 16:22:59 -0500 X-Authentication-Warning: dexter.psc.edu: jheffner owned process doing -bs Date: Fri, 1 Apr 2005 16:22:59 -0500 (EST) From: John Heffner To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery In-Reply-To: <20050401131045.4e558f65.davem@davemloft.net> Message-ID: References: <20050401131045.4e558f65.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1218 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev On Fri, 1 Apr 2005, David S. Miller wrote: > On Fri, 1 Apr 2005 16:05:49 -0500 (EST) > John Heffner wrote: > > > The problem is that when doing MTU discovery, the too-large segments in > > the write queue will be calculated as having a pcount of >1. When > > tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test > > when pcount > cwnd. > > > > The segments are eventually transmitted one at a time by keepalive, but > > this can take a long time. > > > > This patch checks if TSO is enabled when setting pcount. > > Why isn't the MSS properly updated at this point in time? > If it were, the pcount setting would do the right thing. > > That's how this code is supposed to work. The problem occurs when TSO is disabled. Common case, start out with mss of 8948. Send 2 segments; neither are acknowledged, and we receive an ICMP can't fragment indicating a pmtu of 1500 so mss is set down to 1448. Now tcp_set_skb_tso_segs() sets tso_segs to 6, so tcp_snd_test thinks we are doing TSO and will send the full 6 mss, and fails the cwnd test since cwnd == 2. -John From colin@colino.net Fri Apr 1 13:28:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:28:11 -0800 (PST) Received: from paperstreet.colino.net (colino.net [213.41.131.56]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LS3F5002218 for ; Fri, 1 Apr 2005 13:28:04 -0800 Received: by paperstreet.colino.net (Postfix, from userid 1015) id 3D0C3101D9; Fri, 1 Apr 2005 23:27:52 +0200 (CEST) Received: from jack.colino.net (jack.colino.net [192.168.0.11]) by paperstreet.colino.net (Postfix) with ESMTP id 974A9101A2; Fri, 1 Apr 2005 23:27:49 +0200 (CEST) Date: Fri, 1 Apr 2005 23:27:47 +0200 From: Colin Leroy To: David Brownell Cc: linux-usb-devel@lists.sourceforge.net, Andrew Morton , Jeroen Vreeken , netdev@oss.sgi.com Subject: Re: [linux-usb-devel] [PATCH] PM support for zd1201 Message-ID: <20050401232747.3f9ed365@jack.colino.net> In-Reply-To: <200504011030.57978.david-b@pacbell.net> References: <20050330144423.0dde5b71@jack.colino.net> <200504011030.57978.david-b@pacbell.net> X-Mailer: Sylpheed-Claws 1.9.6cvs18 (GTK+ 2.6.4; powerpc-unknown-linux-gnu) X-Face: Fy:*XpRna1/tz}cJ@O'0^:qYs:8b[Rg`*8,+o^[fI?<%5LeB,Xz8ZJK[r7V0hBs8G)*&C+XA0qHoR=LoTohe@7X5K$A-@cN6n~~J/]+{[)E4h'lK$13WQf$.R+Pi;E09tk&{t|;~dakRD%CLHrk6m!?gA,5|Sb=fJ=>[9#n1Bu8?VngkVM4{'^'V_qgdA.8yn3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1219 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: colin@colino.net Precedence: bulk X-list: netdev On 01 Apr 2005 at 10h04, David Brownell wrote: Hi, > Looked ok to me, other than needing to change "u32 state" into > a "pm_message_t message". And I'm not sure why "mac_enabled" > would be the right test, rather than maybe netif_running(). Here it is. Signed-off-by: Colin Leroy --- drivers/usb/net/zd1201.c.orig 2005-03-30 14:35:23.000000000 +0200 +++ drivers/usb/net/zd1201.c 2005-04-01 23:24:04.000000000 +0200 @@ -1896,12 +1896,50 @@ kfree(zd); } +#ifdef CONFIG_PM + +static int zd1201_suspend (struct usb_interface *interface, + pm_message_t message) +{ + struct zd1201 *zd = (struct zd1201 *)usb_get_intfdata(interface); + + netif_device_detach(zd->dev); + + zd->was_enabled = zd->mac_enabled; + + if (zd->was_enabled) + return zd1201_disable(zd); + else + return 0; +} + +static int zd1201_resume (struct usb_interface *interface) +{ + struct zd1201 *zd = (struct zd1201 *)usb_get_intfdata(interface); + + netif_device_attach(zd->dev); + + if (zd->was_enabled) + return zd1201_enable(zd); + else + return 0; +} + +#else + +#define zd1201_suspend NULL +#define zd1201_resume NULL + +#endif + struct usb_driver zd1201_usb = { .owner = THIS_MODULE, .name = "zd1201", .probe = zd1201_probe, .disconnect = zd1201_disconnect, .id_table = zd1201_table, + .suspend = zd1201_suspend, + .resume = zd1201_resume, }; static int __init zd1201_init(void) --- drivers/usb/net/zd1201.h.orig 2005-03-30 14:35:36.000000000 +0200 +++ drivers/usb/net/zd1201.h 2005-03-30 14:24:33.000000000 +0200 @@ -46,6 +46,7 @@ char essid[IW_ESSID_MAX_SIZE+1]; int essidlen; int mac_enabled; + int was_enabled; int monitor; int encode_enabled; int encode_restricted; From dada1@cosmosbay.com Fri Apr 1 13:43:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:43:58 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LhqNT003067 for ; Fri, 1 Apr 2005 13:43:53 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31Lhd6I031012; Fri, 1 Apr 2005 23:43:45 +0200 Message-ID: <424DC08A.3020204@cosmosbay.com> Date: Fri, 01 Apr 2005 23:43:38 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> In-Reply-To: <20050401130832.1f972a3b.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Fri, 01 Apr 2005 23:43:45 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1220 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Fri, 01 Apr 2005 23:05:37 +0200 > Eric Dumazet wrote: > > >>You mean you prefer : >> >>static spinlock_t *rt_hash_lock ; /* rt_hash_lock = >>alloc_memory_at_boot_time(...) */ >> >>instead of >> >>static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; >> >>In both cases, memory is taken from lowmem, and size of kernel image >>is roughly the same (bss section takes no space in image) > > > In the former case the kernel image the bootloader has to > load is smaller. That's important, believe it or not. It > means less TLB entries need to be locked permanently into > the MMU on certain platforms. > > OK thanks for this clarification. I changed to : #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) /* * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks * The size of this table is a power of two and depends on the number of CPUS. */ #if NR_CPUS >= 32 #define RT_HASH_LOCK_SZ 4096 #elif NR_CPUS >= 16 #define RT_HASH_LOCK_SZ 2048 #elif NR_CPUS >= 8 #define RT_HASH_LOCK_SZ 1024 #elif NR_CPUS >= 4 #define RT_HASH_LOCK_SZ 512 #else #define RT_HASH_LOCK_SZ 256 #endif static spinlock_t *rt_hash_locks; # define rt_hash_lock_addr(slot) &rt_hash_locks[slot & (RT_HASH_LOCK_SZ - 1)] # define rt_hash_lock_init() { \ int i; \ rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ, GFP_KERNEL); \ if (!rt_hash_locks) panic("IP: failed to allocate rt_hash_locks\n"); \ for (i = 0; i < RT_HASH_LOCK_SZ; i++) \ spin_lock_init(&rt_hash_locks[i]); \ } #else # define rt_hash_lock_addr(slot) NULL # define rt_hash_lock_init() #endif Are you OK if I also use alloc_large_system_hash() to allocate rt_hash_table, instead of the current method ? This new method is used in net/ipv4/tcp.c for tcp_ehash and tcp_bhash and permits NUMA tuning. Eric From davem@davemloft.net Fri Apr 1 14:35:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 14:35:47 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31MZbk5005035 for ; Fri, 1 Apr 2005 14:35:37 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHUiw-0003Dx-00; Fri, 01 Apr 2005 14:34:42 -0800 Date: Fri, 1 Apr 2005 14:34:42 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401143442.62ed8bb9.davem@davemloft.net> In-Reply-To: <424DC08A.3020204@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> <424DC08A.3020204@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1221 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 23:43:38 +0200 Eric Dumazet wrote: > Are you OK if I also use alloc_large_system_hash() to allocate > rt_hash_table, instead of the current method ? This new method is used > in net/ipv4/tcp.c for tcp_ehash and tcp_bhash and permits NUMA tuning. Sure, that's fine. BTW, please line-wrap your emails. :-/ From herbert@gondor.apana.org.au Fri Apr 1 14:48:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 14:48:50 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31MmfmE005961 for ; Fri, 1 Apr 2005 14:48:42 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHUvx-0000Cu-00; Sat, 02 Apr 2005 08:48:09 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHUvP-00050C-00; Sat, 02 Apr 2005 08:47:35 +1000 From: Herbert Xu To: jheffner@psc.edu (John Heffner) Subject: Re: [PATCH] skb pcount with MTU discovery Cc: davem@davemloft.net, netdev@oss.sgi.com Organization: Core In-Reply-To: X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sat, 02 Apr 2005 08:47:35 +1000 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1222 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev John Heffner wrote: > > Common case, start out with mss of 8948. Send 2 segments; neither are > acknowledged, and we receive an ICMP can't fragment indicating a pmtu of > 1500 so mss is set down to 1448. Now tcp_set_skb_tso_segs() sets tso_segs > to 6, so tcp_snd_test thinks we are doing TSO and will send the full 6 > mss, and fails the cwnd test since cwnd == 2. How about fixing tcp_snd_test directly like this? Of course all this will be moot once Dave finishes his TSO rewrite :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- ===== include/net/tcp.h 1.107 vs edited ===== --- 1.107/include/net/tcp.h 2005-03-16 10:15:03 +11:00 +++ edited/include/net/tcp.h 2005-04-02 08:45:48 +10:00 @@ -1433,6 +1433,9 @@ pkts = tcp_skb_pcount(skb); } + if (!(tp->inet.sk.sk_route_caps & NETIF_F_TSO)) + pkts = 1; + /* RFC 1122 - section 4.2.3.4 * * We must queue if From dada1@cosmosbay.com Fri Apr 1 15:22:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:22:11 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NM2Gf007563 for ; Fri, 1 Apr 2005 15:22:03 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31NLm32032417; Sat, 2 Apr 2005 01:21:54 +0200 Message-ID: <424DD78D.7070001@cosmosbay.com> Date: Sat, 02 Apr 2005 01:21:49 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> <424DC08A.3020204@cosmosbay.com> <20050401143442.62ed8bb9.davem@davemloft.net> In-Reply-To: <20050401143442.62ed8bb9.davem@davemloft.net> Content-Type: multipart/mixed; boundary="------------090807070004040008080507" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Sat, 02 Apr 2005 01:21:55 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1223 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------090807070004040008080507 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit David S. Miller a écrit : > On Fri, 01 Apr 2005 23:43:38 +0200 > Eric Dumazet wrote: > > >>Are you OK if I also use alloc_large_system_hash() to allocate >>rt_hash_table, instead of the current method ? This new method is used >>in net/ipv4/tcp.c for tcp_ehash and tcp_bhash and permits NUMA tuning. > > > Sure, that's fine. > > BTW, please line-wrap your emails. :-/ > > :-) OK this patch includes everything... - Locking abstraction - rt_check_expire() fixes - New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second - New gc_debug sysctl to let sysadmin tune gc - Less memory used by hash table (spinlocks moved to a smaller table) - sizing of spinlocks table depends on NR_CPUS - hash table allocated using alloc_large_system_hash() function - header fix for /proc/net/stat/rt_cache Thank you Eric --------------090807070004040008080507 Content-Type: text/plain; name="diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="diff" diff -Nru linux-2.6.11/net/ipv4/route.c linux-2.6.11-ed/net/ipv4/route.c --- linux-2.6.11/net/ipv4/route.c 2005-03-02 08:38:38.000000000 +0100 +++ linux-2.6.11-ed/net/ipv4/route.c 2005-04-02 01:10:37.000000000 +0200 @@ -54,6 +54,8 @@ * Marc Boucher : routing by fwmark * Robert Olsson : Added rt_cache statistics * Arnaldo C. Melo : Convert proc stuff to seq_file + * Eric Dumazet : hashed spinlocks and rt_check_expire() fixes. + * : bugfix in rt_cpu_seq_show() * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -70,6 +72,7 @@ #include #include #include +#include #include #include #include @@ -107,12 +110,13 @@ #define IP_MAX_MTU 0xFFF0 #define RT_GC_TIMEOUT (300*HZ) +#define RT_GC_INTERVAL (RT_GC_TIMEOUT/10) /* rt_check_expire() scans 1/10 of the table each round */ static int ip_rt_min_delay = 2 * HZ; static int ip_rt_max_delay = 10 * HZ; static int ip_rt_max_size; static int ip_rt_gc_timeout = RT_GC_TIMEOUT; -static int ip_rt_gc_interval = 60 * HZ; +static int ip_rt_gc_interval = RT_GC_INTERVAL; static int ip_rt_gc_min_interval = HZ / 2; static int ip_rt_redirect_number = 9; static int ip_rt_redirect_load = HZ / 50; @@ -124,6 +128,7 @@ static int ip_rt_min_pmtu = 512 + 20 + 20; static int ip_rt_min_advmss = 256; static int ip_rt_secret_interval = 10 * 60 * HZ; +static int ip_rt_debug; static unsigned long rt_deadline; #define RTprint(a...) printk(KERN_DEBUG a) @@ -197,8 +202,38 @@ struct rt_hash_bucket { struct rtable *chain; - spinlock_t lock; -} __attribute__((__aligned__(8))); +}; + +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) +/* + * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks + * The size of this table is a power of two and depends on the number of CPUS. + */ +#if NR_CPUS >= 32 +#define RT_HASH_LOCK_SZ 4096 +#elif NR_CPUS >= 16 +#define RT_HASH_LOCK_SZ 2048 +#elif NR_CPUS >= 8 +#define RT_HASH_LOCK_SZ 1024 +#elif NR_CPUS >= 4 +#define RT_HASH_LOCK_SZ 512 +#else +#define RT_HASH_LOCK_SZ 256 +#endif + + static spinlock_t *rt_hash_locks; +# define rt_hash_lock_addr(slot) &rt_hash_locks[slot & (RT_HASH_LOCK_SZ - 1)] +# define rt_hash_lock_init() { \ + int i; \ + rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ, GFP_KERNEL); \ + if (!rt_hash_locks) panic("IP: failed to allocate rt_hash_locks\n"); \ + for (i = 0; i < RT_HASH_LOCK_SZ; i++) \ + spin_lock_init(&rt_hash_locks[i]); \ + } +#else +# define rt_hash_lock_addr(slot) NULL +# define rt_hash_lock_init() +#endif static struct rt_hash_bucket *rt_hash_table; static unsigned rt_hash_mask; @@ -393,7 +428,7 @@ struct rt_cache_stat *st = v; if (v == SEQ_START_TOKEN) { - seq_printf(seq, "entries in_hit in_slow_tot in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n"); + seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n"); return 0; } @@ -470,7 +505,7 @@ rth->u.dst.expires; } -static int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2) +static __inline__ int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2) { unsigned long age; int ret = 0; @@ -516,45 +551,93 @@ /* This runs via a timer and thus is always in BH context. */ static void rt_check_expire(unsigned long dummy) { - static int rover; - int i = rover, t; + static unsigned int rover; + static unsigned int effective_interval = RT_GC_INTERVAL; + static unsigned int cached_gc_interval = RT_GC_INTERVAL; + unsigned int i, goal; struct rtable *rth, **rthp; unsigned long now = jiffies; + unsigned int freed = 0 , t0; + u64 mult; - for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; - t -= ip_rt_gc_timeout) { - unsigned long tmo = ip_rt_gc_timeout; - + if (cached_gc_interval != ip_rt_gc_interval) { /* ip_rt_gc_interval may have changed with sysctl */ + cached_gc_interval = ip_rt_gc_interval; + effective_interval = cached_gc_interval; + } + /* Computes the number of slots we should examin in this run : + * We want to perform a full scan every ip_rt_gc_timeout, and + * the timer is started every 'effective_interval' ticks. + * so goal = (number_of_slots) * (effective_interval / ip_rt_gc_timeout) + */ + mult = ((u64)effective_interval) << rt_hash_log; + do_div(mult, ip_rt_gc_timeout); + goal = (unsigned int)mult; + + i = atomic_read(&ipv4_dst_ops.entries) << 3; + if (i > ip_rt_max_size) { + goal <<= 1; /* be more aggressive */ + i >>= 1; + if (i > ip_rt_max_size) { + goal <<= 1; /* be more aggressive */ + i >>= 1; + if (i > ip_rt_max_size) { + goal <<= 1; /* be more aggressive */ + now++; /* give us one more tick (time) to do our job */ + } + } + } + if (goal > rt_hash_mask) goal = rt_hash_mask + 1; + t0 = goal; + i = rover; + for ( ; goal > 0; goal--) { i = (i + 1) & rt_hash_mask; rthp = &rt_hash_table[i].chain; - - spin_lock(&rt_hash_table[i].lock); - while ((rth = *rthp) != NULL) { - if (rth->u.dst.expires) { - /* Entry is expired even if it is in use */ - if (time_before_eq(now, rth->u.dst.expires)) { + if (*rthp) { + unsigned long tmo = ip_rt_gc_timeout; + spin_lock(rt_hash_lock_addr(i)); + while ((rth = *rthp) != NULL) { + if (rth->u.dst.expires) { + /* Entry is expired even if it is in use */ + if (time_before_eq(now, rth->u.dst.expires)) { + tmo >>= 1; + rthp = &rth->u.rt_next; + continue; + } + } else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) { tmo >>= 1; rthp = &rth->u.rt_next; continue; } - } else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) { - tmo >>= 1; - rthp = &rth->u.rt_next; - continue; - } - /* Cleanup aged off entries. */ - *rthp = rth->u.rt_next; - rt_free(rth); + /* Cleanup aged off entries. */ + *rthp = rth->u.rt_next; + freed++; + rt_free(rth); + } + spin_unlock(rt_hash_lock_addr(i)); } - spin_unlock(&rt_hash_table[i].lock); - /* Fallback loop breaker. */ if (time_after(jiffies, now)) break; } rover = i; - mod_timer(&rt_periodic_timer, now + ip_rt_gc_interval); + if (goal != 0) { + /* Not enough time to perform our job, try to adjust the timer. + * Firing the timer sooner means less planned work. + * We allow the timer to be 1/8 of the sysctl value. + */ + effective_interval = (effective_interval + cached_gc_interval/8)/2; + } + else { + /* We finished our job before time limit, try to increase the timer + * The limit is the sysctl value, we use a weight of 3/1 to + * increase slowly. + */ + effective_interval = (3*effective_interval + cached_gc_interval + 3)/4; + } + if (ip_rt_debug & 1) + printk(KERN_WARNING "rt_check_expire() : %u freed, goal=%u/%u, interval=%u ticks\n", freed, goal, t0, effective_interval); + mod_timer(&rt_periodic_timer, jiffies + effective_interval); } /* This can run from both BH and non-BH contexts, the latter @@ -570,11 +653,11 @@ get_random_bytes(&rt_hash_rnd, 4); for (i = rt_hash_mask; i >= 0; i--) { - spin_lock_bh(&rt_hash_table[i].lock); + spin_lock_bh(rt_hash_lock_addr(i)); rth = rt_hash_table[i].chain; if (rth) rt_hash_table[i].chain = NULL; - spin_unlock_bh(&rt_hash_table[i].lock); + spin_unlock_bh(rt_hash_lock_addr(i)); for (; rth; rth = next) { next = rth->u.rt_next; @@ -704,7 +787,7 @@ k = (k + 1) & rt_hash_mask; rthp = &rt_hash_table[k].chain; - spin_lock_bh(&rt_hash_table[k].lock); + spin_lock_bh(rt_hash_lock_addr(k)); while ((rth = *rthp) != NULL) { if (!rt_may_expire(rth, tmo, expire)) { tmo >>= 1; @@ -715,7 +798,7 @@ rt_free(rth); goal--; } - spin_unlock_bh(&rt_hash_table[k].lock); + spin_unlock_bh(rt_hash_lock_addr(k)); if (goal <= 0) break; } @@ -792,7 +875,7 @@ rthp = &rt_hash_table[hash].chain; - spin_lock_bh(&rt_hash_table[hash].lock); + spin_lock_bh(rt_hash_lock_addr(hash)); while ((rth = *rthp) != NULL) { if (compare_keys(&rth->fl, &rt->fl)) { /* Put it first */ @@ -813,7 +896,7 @@ rth->u.dst.__use++; dst_hold(&rth->u.dst); rth->u.dst.lastuse = now; - spin_unlock_bh(&rt_hash_table[hash].lock); + spin_unlock_bh(rt_hash_lock_addr(hash)); rt_drop(rt); *rp = rth; @@ -854,7 +937,7 @@ if (rt->rt_type == RTN_UNICAST || rt->fl.iif == 0) { int err = arp_bind_neighbour(&rt->u.dst); if (err) { - spin_unlock_bh(&rt_hash_table[hash].lock); + spin_unlock_bh(rt_hash_lock_addr(hash)); if (err != -ENOBUFS) { rt_drop(rt); @@ -895,7 +978,7 @@ } #endif rt_hash_table[hash].chain = rt; - spin_unlock_bh(&rt_hash_table[hash].lock); + spin_unlock_bh(rt_hash_lock_addr(hash)); *rp = rt; return 0; } @@ -962,7 +1045,7 @@ { struct rtable **rthp; - spin_lock_bh(&rt_hash_table[hash].lock); + spin_lock_bh(rt_hash_lock_addr(hash)); ip_rt_put(rt); for (rthp = &rt_hash_table[hash].chain; *rthp; rthp = &(*rthp)->u.rt_next) @@ -971,7 +1054,7 @@ rt_free(rt); break; } - spin_unlock_bh(&rt_hash_table[hash].lock); + spin_unlock_bh(rt_hash_lock_addr(hash)); } void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, @@ -2569,6 +2652,23 @@ .strategy = &sysctl_jiffies, }, { + .ctl_name = NET_IPV4_ROUTE_GC_INTERVAL_MS, + .procname = "gc_interval_ms", + .data = &ip_rt_gc_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_ms_jiffies, + .strategy = &sysctl_ms_jiffies, + }, + { + .ctl_name = NET_IPV4_ROUTE_GC_DEBUG, + .procname = "gc_debug", + .data = &ip_rt_debug, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = NET_IPV4_ROUTE_REDIRECT_LOAD, .procname = "redirect_load", .data = &ip_rt_redirect_load, @@ -2718,12 +2818,13 @@ int __init ip_rt_init(void) { - int i, order, goal, rc = 0; rt_hash_rnd = (int) ((num_physpages ^ (num_physpages>>8)) ^ (jiffies ^ (jiffies >> 7))); #ifdef CONFIG_NET_CLS_ROUTE + { + int order; for (order = 0; (PAGE_SIZE << order) < 256 * sizeof(struct ip_rt_acct) * NR_CPUS; order++) /* NOTHING */; @@ -2731,6 +2832,7 @@ if (!ip_rt_acct) panic("IP: failed to allocate ip_rt_acct\n"); memset(ip_rt_acct, 0, PAGE_SIZE << order); + } #endif ipv4_dst_ops.kmem_cachep = kmem_cache_create("ip_dst_cache", @@ -2741,39 +2843,24 @@ if (!ipv4_dst_ops.kmem_cachep) panic("IP: failed to allocate ip_dst_cache\n"); - goal = num_physpages >> (26 - PAGE_SHIFT); - if (rhash_entries) - goal = (rhash_entries * sizeof(struct rt_hash_bucket)) >> PAGE_SHIFT; - for (order = 0; (1UL << order) < goal; order++) - /* NOTHING */; - - do { - rt_hash_mask = (1UL << order) * PAGE_SIZE / - sizeof(struct rt_hash_bucket); - while (rt_hash_mask & (rt_hash_mask - 1)) - rt_hash_mask--; - rt_hash_table = (struct rt_hash_bucket *) - __get_free_pages(GFP_ATOMIC, order); - } while (rt_hash_table == NULL && --order > 0); - - if (!rt_hash_table) - panic("Failed to allocate IP route cache hash table\n"); - - printk(KERN_INFO "IP: routing cache hash table of %u buckets, %ldKbytes\n", - rt_hash_mask, - (long) (rt_hash_mask * sizeof(struct rt_hash_bucket)) / 1024); + rt_hash_table = (struct rt_hash_bucket *) + alloc_large_system_hash("IP route cache", + sizeof(struct rt_hash_bucket), + rhash_entries, + (num_physpages >= 128 * 1024) ? + (27 - PAGE_SHIFT) : + (29 - PAGE_SHIFT), + HASH_HIGHMEM, + &rt_hash_log, + &rt_hash_mask, + 0); - for (rt_hash_log = 0; (1 << rt_hash_log) != rt_hash_mask; rt_hash_log++) - /* NOTHING */; + memset(rt_hash_table, 0, rt_hash_mask * sizeof(struct rt_hash_bucket)); + rt_hash_lock_init(); + ipv4_dst_ops.gc_thresh = rt_hash_mask; + ip_rt_max_size = rt_hash_mask * 16; rt_hash_mask--; - for (i = 0; i <= rt_hash_mask; i++) { - spin_lock_init(&rt_hash_table[i].lock); - rt_hash_table[i].chain = NULL; - } - - ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1); - ip_rt_max_size = (rt_hash_mask + 1) * 16; rt_cache_stat = alloc_percpu(struct rt_cache_stat); if (!rt_cache_stat) @@ -2819,7 +2906,7 @@ xfrm_init(); xfrm4_init(); #endif - return rc; + return 0; } EXPORT_SYMBOL(__ip_select_ident); diff -Nru linux-2.6.11/Documentation/filesystems/proc.txt linux-2.6.11-ed/Documentation/filesystems/proc.txt --- linux-2.6.11/Documentation/filesystems/proc.txt 2005-04-02 01:19:15.000000000 +0200 +++ linux-2.6.11-ed/Documentation/filesystems/proc.txt 2005-04-02 01:19:04.000000000 +0200 @@ -1709,12 +1709,13 @@ Writing to this file results in a flush of the routing cache. -gc_elasticity, gc_interval, gc_min_interval_ms, gc_timeout, gc_thresh +gc_elasticity, gc_interval_ms, gc_min_interval_ms, gc_timeout, gc_thresh, gc_debug --------------------------------------------------------------------- Values to control the frequency and behavior of the garbage collection algorithm for the routing cache. gc_min_interval is deprecated and replaced -by gc_min_interval_ms. +by gc_min_interval_ms. gc_interval is deprecated and replaced by +gc_interval_ms. gc_debug enables some printk() max_size diff -Nru linux-2.6.11/include/linux/sysctl.h linux-2.6.11-ed/include/linux/sysctl.h --- linux-2.6.11/include/linux/sysctl.h 2005-03-02 08:38:10.000000000 +0100 +++ linux-2.6.11-ed/include/linux/sysctl.h 2005-04-02 00:43:11.000000000 +0200 @@ -367,6 +367,8 @@ NET_IPV4_ROUTE_MIN_ADVMSS=17, NET_IPV4_ROUTE_SECRET_INTERVAL=18, NET_IPV4_ROUTE_GC_MIN_INTERVAL_MS=19, + NET_IPV4_ROUTE_GC_INTERVAL_MS=20, + NET_IPV4_ROUTE_GC_DEBUG=21, }; enum --------------090807070004040008080507-- From tgraf@suug.ch Fri Apr 1 15:26:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:26:45 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NQbov008184 for ; Fri, 1 Apr 2005 15:26:38 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id C77938A; Sat, 2 Apr 2005 01:26:12 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 354251C0EB; Sat, 2 Apr 2005 01:26:54 +0200 (CEST) Date: Sat, 2 Apr 2005 01:26:54 +0200 From: Thomas Graf To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCHSET] action statistics dumping fix & gnet_stats improvements Message-ID: <20050401232654.GJ3086@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1224 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Fixes a stupid bug I introduced in the last patchset which for some reason didn't get caught in the testing process. The other two patches change the behaviour of yet unused but likely use cases to what one would expect without reading the code. Please do a bk pull bk://kernel.bkbits.net/tgraf/net-2.6-tcf_exts This will update the following files: include/net/gen_stats.h | 3 ++- net/core/gen_stats.c | 48 +++++++++++++++++++++++++++++++----------------- net/sched/act_api.c | 2 ++ 3 files changed, 35 insertions(+), 18 deletions(-) through these ChangeSets: (05/04/01 1.2181.44.3) [NET]: Improve gnet_stats_* dumping logic to be less error prone The recent additions to make gnet_stats_* useable for action statistics dumping in two steps introcuded a few error prone assumptions which can easly be forgotten. This patch fixes this up by simplifying the process of adding new fields to struct gnet_dump or adding additional backward compatibility TLVs. Signed-off-by: Thomas Graf Signed-off-by: David S. Miller (05/04/01 1.2181.44.2) [NET]: Allow dumping of application specific statistics if no primary TLV is used Although this case is hypothetical at the moment, more advanced actions are likely to need this in the future. Signed-off-by: Thomas Graf Signed-off-by: David S. Miller (05/04/01 1.2181.44.1) [PKT_SCHED]: Properly return when no backward compatibility action statistics are to be dumped Fixes a stupid bug introcuded in my "Fix action statistics dumping in compatibility mode" patch, no clue why it actually worked without this fix. Signed-off-by: Thomas Graf Signed-off-by: David S. Miller From tgraf@suug.ch Fri Apr 1 15:27:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:27:15 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NR9eD008422 for ; Fri, 1 Apr 2005 15:27:09 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id F1549F; Sat, 2 Apr 2005 01:26:46 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 5D8641C0EA; Sat, 2 Apr 2005 01:27:30 +0200 (CEST) Date: Sat, 2 Apr 2005 01:27:30 +0200 From: Thomas Graf To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 1/3] [PKT_SCHED]: Properly return when no backward compatibility action statistics are to be dumped Message-ID: <20050401232730.GK3086@postel.suug.ch> References: <20050401232654.GJ3086@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050401232654.GJ3086@postel.suug.ch> X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1225 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/01 14:05:21+02:00 tgraf@suug.ch # [PKT_SCHED]: Properly return when no backward compatibility action statistics are to be dumped # # Fixes a stupid bug introcuded in my "Fix action statistics dumping in # compatibility mode" patch, no clue why it actually worked without this fix. # # Signed-off-by: Thomas Graf # Signed-off-by: David S. Miller # # net/sched/act_api.c # 2005/04/01 14:05:09+02:00 tgraf@suug.ch +2 -0 # [PKT_SCHED]: Properly return when no backward compatibility action statistics are to be dumped # diff -Nru a/net/sched/act_api.c b/net/sched/act_api.c --- a/net/sched/act_api.c 2005-04-02 01:18:40 +02:00 +++ b/net/sched/act_api.c 2005-04-02 01:18:40 +02:00 @@ -397,6 +397,8 @@ if (a->type == TCA_OLD_COMPAT) err = gnet_stats_start_copy_compat(skb, 0, TCA_STATS, TCA_XSTATS, h->stats_lock, &d); + else + return 0; } else err = gnet_stats_start_copy(skb, TCA_ACT_STATS, h->stats_lock, &d); From tgraf@suug.ch Fri Apr 1 15:27:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:27:47 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NRfgu008967 for ; Fri, 1 Apr 2005 15:27:41 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id A1105F; Sat, 2 Apr 2005 01:27:18 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 100F31C0EB; Sat, 2 Apr 2005 01:28:02 +0200 (CEST) Date: Sat, 2 Apr 2005 01:28:01 +0200 From: Thomas Graf To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 2/3] [NET]: Allow dumping of application specific statistics if no primary TLV is used Message-ID: <20050401232801.GL3086@postel.suug.ch> References: <20050401232654.GJ3086@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050401232654.GJ3086@postel.suug.ch> X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1226 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/01 14:24:14+02:00 tgraf@suug.ch # [NET]: Allow dumping of application specific statistics if no primary TLV is used # # Although this case is hypothetical at the moment, more advanced actions are # likely to need this in the future. # # Signed-off-by: Thomas Graf # Signed-off-by: David S. Miller # # net/core/gen_stats.c # 2005/04/01 14:23:57+02:00 tgraf@suug.ch +7 -4 # [NET]: Allow dumping of application specific statistics if no primary TLV is used # # include/net/gen_stats.h # 2005/04/01 14:23:57+02:00 tgraf@suug.ch +2 -1 # [NET]: Allow dumping of application specific statistics if no primary TLV is used # diff -Nru a/include/net/gen_stats.h b/include/net/gen_stats.h --- a/include/net/gen_stats.h 2005-04-02 01:18:33 +02:00 +++ b/include/net/gen_stats.h 2005-04-02 01:18:33 +02:00 @@ -15,7 +15,8 @@ /* Backward compatability */ int compat_tc_stats; int compat_xstats; - struct rtattr * xstats; + void * xstats; + int xstats_len; struct tc_stats tc_stats; }; diff -Nru a/net/core/gen_stats.c b/net/core/gen_stats.c --- a/net/core/gen_stats.c 2005-04-02 01:18:33 +02:00 +++ b/net/core/gen_stats.c 2005-04-02 01:18:33 +02:00 @@ -177,8 +177,11 @@ int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len) { - if (d->compat_xstats) - d->xstats = (struct rtattr *) d->skb->tail; + if (d->compat_xstats) { + d->xstats = st; + d->xstats_len = len; + } + return gnet_stats_copy(d, TCA_STATS_APP, st, len); } @@ -206,8 +209,8 @@ return -1; if (d->compat_xstats && d->xstats) { - if (gnet_stats_copy(d, d->compat_xstats, RTA_DATA(d->xstats), - RTA_PAYLOAD(d->xstats)) < 0) + if (gnet_stats_copy(d, d->compat_xstats, d->xstats, + d->xstats_len) < 0) return -1; } From tgraf@suug.ch Fri Apr 1 15:28:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:28:21 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NSFVt009537 for ; Fri, 1 Apr 2005 15:28:15 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id ABB7AF; Sat, 2 Apr 2005 01:27:52 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 2556D1C0EA; Sat, 2 Apr 2005 01:28:36 +0200 (CEST) Date: Sat, 2 Apr 2005 01:28:36 +0200 From: Thomas Graf To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH 3/3] [NET]: Improve gnet_stats_* dumping logic to be less error prone Message-ID: <20050401232835.GM3086@postel.suug.ch> References: <20050401232654.GJ3086@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050401232654.GJ3086@postel.suug.ch> X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1227 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/01 15:01:24+02:00 tgraf@suug.ch # [NET]: Improve gnet_stats_* dumping logic to be less error prone # # The recent additions to make gnet_stats_* useable for action # statistics dumping in two steps introcuded a few error prone # assumptions which can easly be forgotten. This patch fixes this # up by simplifying the process of adding new fields to struct # gnet_dump or adding additional backward compatibility TLVs. # # Signed-off-by: Thomas Graf # Signed-off-by: David S. Miller # # net/core/gen_stats.c # 2005/04/01 15:01:12+02:00 tgraf@suug.ch +24 -13 # [NET]: Improve gnet_stats_* dumping logic to be less error prone # diff -Nru a/net/core/gen_stats.c b/net/core/gen_stats.c --- a/net/core/gen_stats.c 2005-04-02 01:18:26 +02:00 +++ b/net/core/gen_stats.c 2005-04-02 01:18:26 +02:00 @@ -26,9 +26,7 @@ static inline int gnet_stats_copy(struct gnet_dump *d, int type, void *buf, int size) { - if (type) - RTA_PUT(d->skb, type, size, buf); - + RTA_PUT(d->skb, type, size, buf); return 0; rtattr_failure: @@ -58,6 +56,8 @@ gnet_stats_start_copy_compat(struct sk_buff *skb, int type, int tc_stats_type, int xstats_type, spinlock_t *lock, struct gnet_dump *d) { + memset(d, 0, sizeof(*d)); + spin_lock_bh(lock); d->lock = lock; if (type) @@ -65,12 +65,11 @@ d->skb = skb; d->compat_tc_stats = tc_stats_type; d->compat_xstats = xstats_type; - d->xstats = NULL; - if (d->compat_tc_stats) - memset(&d->tc_stats, 0, sizeof(d->tc_stats)); + if (d->tail) + return gnet_stats_copy(d, type, NULL, 0); - return gnet_stats_copy(d, type, NULL, 0); + return 0; } /** @@ -111,8 +110,11 @@ d->tc_stats.bytes = b->bytes; d->tc_stats.packets = b->packets; } - - return gnet_stats_copy(d, TCA_STATS_BASIC, b, sizeof(*b)); + + if (d->tail) + return gnet_stats_copy(d, TCA_STATS_BASIC, b, sizeof(*b)); + + return 0; } /** @@ -134,7 +136,10 @@ d->tc_stats.pps = r->pps; } - return gnet_stats_copy(d, TCA_STATS_RATE_EST, r, sizeof(*r)); + if (d->tail) + return gnet_stats_copy(d, TCA_STATS_RATE_EST, r, sizeof(*r)); + + return 0; } /** @@ -157,8 +162,11 @@ d->tc_stats.backlog = q->backlog; d->tc_stats.overlimits = q->overlimits; } - - return gnet_stats_copy(d, TCA_STATS_QUEUE, q, sizeof(*q)); + + if (d->tail) + return gnet_stats_copy(d, TCA_STATS_QUEUE, q, sizeof(*q)); + + return 0; } /** @@ -182,7 +190,10 @@ d->xstats_len = len; } - return gnet_stats_copy(d, TCA_STATS_APP, st, len); + if (d->tail) + return gnet_stats_copy(d, TCA_STATS_APP, st, len); + + return 0; } /** From shemminger@osdl.org Fri Apr 1 15:44:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:44:22 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Ni4j6010722 for ; Fri, 1 Apr 2005 15:44:08 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j31Nhns4014569 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 1 Apr 2005 15:43:49 -0800 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [172.20.1.103]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j31Nhm42010798; Fri, 1 Apr 2005 15:43:48 -0800 Date: Fri, 1 Apr 2005 15:43:48 -0800 From: Stephen Hemminger To: jaganav@us.ibm.com Cc: Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050401154348.553f3c46@dxpl.pdx.osdl.net> In-Reply-To: <1112321619.424cae539e75e@imap.linux.ibm.com> References: <1112321619.424cae539e75e@imap.linux.ibm.com> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1228 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 31 Mar 2005 21:13:39 -0500 jaganav@us.ibm.com wrote: > Quoting Roland Dreier : > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or > > whatever you want to call it) world. However I know that the large > > majority of InfiniBand use right now is running on Linux, and I hope > > the Linux community is willing to work with the IB community. > > > > Just want to let everyone know know that we have started an opensource > effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs). This > community has now come up with an architecture > (http://rdma.sourceforge.net/architecture.pdf) to build this support in Linux. > Would really appreciate if you review and provide any comments. We have just > started to hack but no code is available on this project yet. > > Thanks > Venkat OpenRdma is a misnomer, because as I read your architecture you are trying to create a "kernel abstraction layer" for closed source vendor RDMA drivers. This will never be accepted, please go back to the drawing board and figure out how to make real open source drivers. From davem@davemloft.net Fri Apr 1 15:50:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:50:56 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NooXX011406 for ; Fri, 1 Apr 2005 15:50:51 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHVtn-0003gY-00; Fri, 01 Apr 2005 15:49:59 -0800 Date: Fri, 1 Apr 2005 15:49:59 -0800 From: "David S. Miller" To: Thomas Graf Cc: netdev@oss.sgi.com Subject: Re: [PATCHSET] action statistics dumping fix & gnet_stats improvements Message-Id: <20050401154959.1eef4880.davem@davemloft.net> In-Reply-To: <20050401232654.GJ3086@postel.suug.ch> References: <20050401232654.GJ3086@postel.suug.ch> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1229 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Sat, 2 Apr 2005 01:26:54 +0200 Thomas Graf wrote: > Fixes a stupid bug I introduced in the last patchset which for some > reason didn't get caught in the testing process. The other two > patches change the behaviour of yet unused but likely use cases > to what one would expect without reading the code. > > Please do a > > bk pull bk://kernel.bkbits.net/tgraf/net-2.6-tcf_exts All looks good. Pulled, thanks Thomas. From asgeir@chelsio.com Fri Apr 1 15:51:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:52:00 -0800 (PST) Received: from stargate.chelsio.com (stargate.chelsio.com [64.186.171.138] (may be forged)) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NprqB011864 for ; Fri, 1 Apr 2005 15:51:53 -0800 Received: from YOGI.asicdesigners.com (yogi.asicdesigners.com [10.192.160.7]) by stargate.chelsio.com (8.12.5/8.12.5) with SMTP id j31NotfZ012683; Fri, 1 Apr 2005 15:50:55 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.0.6487.1 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: Linux support for RDMA Date: Fri, 1 Apr 2005 15:50:55 -0800 Message-ID: <67D69596DDF0C2448DB0F0547D0F947E01781F1A@yogi.asicdesigners.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Linux support for RDMA Thread-Index: AcU2XRJTLR7JHDJ6RnC21gxkbfiaswAtOgbQ From: "Asgeir Eiriksson" To: , "H. Peter Anvin" Cc: "Roland Dreier" , "Dmitry Yusupov" , , "David S. Miller" , , , , , , , "Benjamin LaHaise" X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j31NprqB011864 X-archive-position: 1230 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: asgeir@chelsio.com Precedence: bulk X-list: netdev Venkat Your assessment of the IB vs. Ethernet latencies isn't necessarily correct. - you already have available low latency 10GE switches (< 1us port-to-port) - you already have available low latency (cut-through processing) 10GE TOE engines The Veritest verified 10GE TOE end-to-end latency is < 10us today (end-to-end being from a Linux user-space-process to a Linux user-space-process through a switch; full report with detail of the setup is available at http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf) For comparison: the published IB latency numbers are around 5us today and those use a polling receiver, and those don't include a context switch(es) as does the Ethernet number quoted above. 'Asgeir > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of jaganav@us.ibm.com > Sent: Thursday, March 31, 2005 5:49 PM > To: H. Peter Anvin > Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David S. > Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu; > James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org; > netdev@oss.sgi.com; Benjamin LaHaise > Subject: Re: Linux support for RDMA > > Quoting "H. Peter Anvin" : > > Benjamin LaHaise wrote: > > > > > > I'm curious how the 10Gig ethernet market will pan out. Time and > again > > > the market has shown that ethernet always has the cost advantage in > the > > > end. If something like Intel's I/O Acceleration Technology makes it > > > that much easier for commodity ethernet to achieve similar performance > > > characteristics over ethernet to that of IB and fibre channel, the > cost > > > advantage alone might switch some new customers over. But the > hardware > > > isn't near what IB offers today, making IB an important niche filler. > > > > > > > From what I've seen coming down the pipe, I think 10GE is going to > > eventually win over IB, just like previous generations did over Token > > Ring, FDDI and other niche filler technologies. It doesn't, as you say, > > mean that e.g. IB doesn't matter *now*; furthermore, it also matters for > > the purpose of fixing the kind of issues that are going to have to be > > fixed anyway. > > > > -hpa > > > > > > > > No doubt, Ethernet will eventually win .. btw, Hasn't history proven this > over > ATM? More specifically when the industry predicted that ATM will replace > ethernet :) > > However, I'll have to agree with Ben that IB technolgy will fill an > important > niche segment, more specifically so in the low end of High Performance > Computing > (HPC) segment which is in a transition mode currently moving away from > proprietary interconnects to industry standards based IB technology. > Eventhough, > ethernet may eventually may catch up with IB in terms of the bandwidth but > IB > fabrics can offer better latencies. > > Thanks > Venkat From davem@davemloft.net Fri Apr 1 15:55:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:55:43 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NtbXM012578 for ; Fri, 1 Apr 2005 15:55:38 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHVyM-0003hr-00; Fri, 01 Apr 2005 15:54:42 -0800 Date: Fri, 1 Apr 2005 15:54:42 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401155442.3bbd6a73.davem@davemloft.net> In-Reply-To: <424DD78D.7070001@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> <424DC08A.3020204@cosmosbay.com> <20050401143442.62ed8bb9.davem@davemloft.net> <424DD78D.7070001@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1231 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Sat, 02 Apr 2005 01:21:49 +0200 Eric Dumazet wrote: > OK this patch includes everything... > > - Locking abstraction > - rt_check_expire() fixes > - New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second > - New gc_debug sysctl to let sysadmin tune gc > - Less memory used by hash table (spinlocks moved to a smaller table) > - sizing of spinlocks table depends on NR_CPUS > - hash table allocated using alloc_large_system_hash() function > - header fix for /proc/net/stat/rt_cache Looks fine to me. I'd like to see some feedback from folks like Robert Olsson and co. before applying this, so let's allow the patch to simmer over the weekend, ok? :-) From dima@neterion.com Fri Apr 1 16:03:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 16:03:51 -0800 (PST) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3203jR6013216 for ; Fri, 1 Apr 2005 16:03:46 -0800 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j3202eOC027166; Fri, 1 Apr 2005 19:02:40 -0500 (EST) Received: from beastie ([10.16.16.220]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j3202cDD002273; Fri, 1 Apr 2005 19:02:38 -0500 (EST) Subject: RE: Linux support for RDMA From: Dmitry Yusupov To: Asgeir Eiriksson Cc: jaganav@us.ibm.com, "H. Peter Anvin" , Roland Dreier , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, Benjamin LaHaise In-Reply-To: <67D69596DDF0C2448DB0F0547D0F947E01781F1A@yogi.asicdesigners.com> References: <67D69596DDF0C2448DB0F0547D0F947E01781F1A@yogi.asicdesigners.com> Content-Type: text/plain Organization: Neterion, Inc Date: Fri, 01 Apr 2005 16:02:37 -0800 Message-Id: <1112400157.9559.98.camel@beastie> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.34 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1232 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dima@neterion.com Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote: > Venkat > > Your assessment of the IB vs. Ethernet latencies isn't necessarily > correct. > - you already have available low latency 10GE switches (< 1us > port-to-port) > - you already have available low latency (cut-through processing) 10GE > TOE engines > > The Veritest verified 10GE TOE end-to-end latency is < 10us today > (end-to-end being from a Linux user-space-process to a Linux > user-space-process through a switch; full report with detail of the > setup is available at > http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf) > > For comparison: the published IB latency numbers are around 5us today > and those use a polling receiver, and those don't include a context > switch(es) as does the Ethernet number quoted above. yep. I should agree in here. On 10Gbps network latencies numbers are around 5-15us. Even with non-TOE card, I managed to get 13us latency with regular TCP/IP stack. [root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h 17.1.1.227 Latency: 0.000013 Now starting main loop 0: 256 bytes 7 times --> 131.37 Mbps in 0.000015 sec 1: 512 bytes 65 times --> 239.75 Mbps in 0.000016 sec Dima > 'Asgeir > > > > -----Original Message----- > > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > > Behalf Of jaganav@us.ibm.com > > Sent: Thursday, March 31, 2005 5:49 PM > > To: H. Peter Anvin > > Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David > S. > > Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu; > > James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org; > > netdev@oss.sgi.com; Benjamin LaHaise > > Subject: Re: Linux support for RDMA > > > > Quoting "H. Peter Anvin" : > > > Benjamin LaHaise wrote: > > > > > > > > I'm curious how the 10Gig ethernet market will pan out. Time and > > again > > > > the market has shown that ethernet always has the cost advantage > in > > the > > > > end. If something like Intel's I/O Acceleration Technology makes > it > > > > that much easier for commodity ethernet to achieve similar > performance > > > > characteristics over ethernet to that of IB and fibre channel, the > > cost > > > > advantage alone might switch some new customers over. But the > > hardware > > > > isn't near what IB offers today, making IB an important niche > filler. > > > > > > > > > > From what I've seen coming down the pipe, I think 10GE is going to > > > eventually win over IB, just like previous generations did over > Token > > > Ring, FDDI and other niche filler technologies. It doesn't, as you > say, > > > mean that e.g. IB doesn't matter *now*; furthermore, it also matters > for > > > the purpose of fixing the kind of issues that are going to have to > be > > > fixed anyway. > > > > > > -hpa > > > > > > > > > > > > > No doubt, Ethernet will eventually win .. btw, Hasn't history proven > this > > over > > ATM? More specifically when the industry predicted that ATM will > replace > > ethernet :) > > > > However, I'll have to agree with Ben that IB technolgy will fill an > > important > > niche segment, more specifically so in the low end of High Performance > > Computing > > (HPC) segment which is in a transition mode currently moving away from > > proprietary interconnects to industry standards based IB technology. > > Eventhough, > > ethernet may eventually may catch up with IB in terms of the bandwidth > but > > IB > > fabrics can offer better latencies. > > > > Thanks > > Venkat > > > > From herbert@gondor.apana.org.au Fri Apr 1 16:51:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 16:51:50 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j320pdAt018588 for ; Fri, 1 Apr 2005 16:51:40 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHWqX-0001AE-00; Sat, 02 Apr 2005 10:50:41 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHWpo-0006Mj-00; Sat, 02 Apr 2005 10:49:56 +1000 Date: Sat, 2 Apr 2005 10:49:56 +1000 To: "David S. Miller" Cc: kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: [IPSEC]: Kill nested read lock by deleting xfrm_init_tempsel Message-ID: <20050402004956.GA24339@gondor.apana.org.au> References: <20050214221006.GA18415@gondor.apana.org.au> <20050214221200.GA18465@gondor.apana.org.au> <20050214221433.GB18465@gondor.apana.org.au> <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> <20050331212325.5e996432.davem@davemloft.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="YiEDa0DAkWCtVeE4" Content-Disposition: inline In-Reply-To: <20050331212325.5e996432.davem@davemloft.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1233 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --YiEDa0DAkWCtVeE4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: On Thu, Mar 31, 2005 at 09:23:25PM -0800, David S. Miller wrote: > On Thu, 31 Mar 2005 10:46:58 +1000 > Herbert Xu wrote: > > > > # This is a BitKeeper generated diff -Nru style patch. > > > # > > > # ChangeSet > > > # 2005/03/30 06:02:45+02:00 kaber@coreworks.de > > > # [IPSEC]: Check SPI in xfrm_state_find() > > > # > > > # Signed-off-by: Patrick McHardy > > > > Looks good. > > > > Signed-off-by: Herbert Xu > > To me too, both patches applied, thanks Patrick. Actually I only signed off on the first patch :) The second patch creates a dead lock since it does a nested read lock. The solution is simply to get rid of xfrm_init_tempsel and call the afinfo version directly. Signed-off-by: Herbert Xu BTW I'd like to start cleaning up the locking in net/xfrm. I don't want these changes to go into 2.6.12. However, I'd like to have them sit in mm for a while so that they get some testing coverage. What's the best way to do this? Could you create a tree slated for 2.6.13? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --YiEDa0DAkWCtVeE4 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p ===== net/xfrm/xfrm_state.c 1.60 vs edited ===== --- 1.60/net/xfrm/xfrm_state.c 2005-04-01 15:19:54 +10:00 +++ edited/net/xfrm/xfrm_state.c 2005-04-02 10:35:06 +10:00 @@ -283,20 +283,6 @@ } EXPORT_SYMBOL(xfrm_state_flush); -static int -xfrm_init_tempsel(struct xfrm_state *x, struct flowi *fl, - struct xfrm_tmpl *tmpl, - xfrm_address_t *daddr, xfrm_address_t *saddr, - unsigned short family) -{ - struct xfrm_state_afinfo *afinfo = xfrm_state_get_afinfo(family); - if (!afinfo) - return -1; - afinfo->init_tempsel(x, fl, tmpl, daddr, saddr); - xfrm_state_put_afinfo(afinfo); - return 0; -} - struct xfrm_state * xfrm_state_find(xfrm_address_t *daddr, xfrm_address_t *saddr, struct flowi *fl, struct xfrm_tmpl *tmpl, @@ -370,7 +356,7 @@ } /* Initialize temporary selector matching only * to current session. */ - xfrm_init_tempsel(x, fl, tmpl, daddr, saddr, family); + afinfo->init_tempsel(x, fl, tmpl, daddr, saddr); if (km_query(x, tmpl, pol) == 0) { x->km.state = XFRM_STATE_ACQ; --YiEDa0DAkWCtVeE4-- From hadi@cyberus.ca Fri Apr 1 17:04:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:04:21 -0800 (PST) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3214FuF019427 for ; Fri, 1 Apr 2005 17:04:16 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DHX3g-0006qh-TC for netdev@oss.sgi.com; Fri, 01 Apr 2005 20:04:16 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHX3a-0006Je-Ca; Fri, 01 Apr 2005 20:04:10 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401123554.GA3468@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112403845.1088.14.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 20:04:05 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1234 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Herbert, Staring at the code, obversation: -> PFKEY is going to be interesting to have it actually generate events as a result of some app using netlink such as ip x - the reverse is actually easier to deal with. This problem doesnt exist with current approach i am taking. The issue is that pfkey echoes back a few things from the original message - important ones being version, pid, seq, and msgtype (as a sample take a look at pfkey_add()). So these need to be remembered... Brings back the original behavior i had netlink doing which was similar (but innacurate now that i stare at this). At the time i carried the nlmsg header around in the cb. So we would have to do the same for netlink[1]. The good news is all these fields happen to exist on netlink (except for the version - to which, for netlink created events, we could pass a hardcoded matching PFKEY2). In other words the structure i called km_cb will now have to have these fields i mentioned above. Thoughts before i start ? cheers, jamal [1]I actually would have no problems using a pid/seq etc generated by pfkey on a netlink header and viceversa. It shouldnt be an issue. From davem@davemloft.net Fri Apr 1 17:21:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:21:58 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321Lp5B024479 for ; Fri, 1 Apr 2005 17:21:52 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHXJ1-0005XH-00; Fri, 01 Apr 2005 17:20:07 -0800 Date: Fri, 1 Apr 2005 17:20:07 -0800 From: "David S. Miller" To: Herbert Xu Cc: kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [IPSEC]: Kill nested read lock by deleting xfrm_init_tempsel Message-Id: <20050401172007.7296eced.davem@davemloft.net> In-Reply-To: <20050402004956.GA24339@gondor.apana.org.au> References: <20050214221006.GA18415@gondor.apana.org.au> <20050214221200.GA18465@gondor.apana.org.au> <20050214221433.GB18465@gondor.apana.org.au> <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> <20050331212325.5e996432.davem@davemloft.net> <20050402004956.GA24339@gondor.apana.org.au> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1235 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Sat, 2 Apr 2005 10:49:56 +1000 Herbert Xu wrote: > The second patch creates a dead lock since it does a nested read > lock. The solution is simply to get rid of xfrm_init_tempsel > and call the afinfo version directly. read locks nest even in the presence of pending writers From hadi@cyberus.ca Fri Apr 1 17:25:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:26:01 -0800 (PST) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321Pt8b025072 for ; Fri, 1 Apr 2005 17:25:55 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DHXOc-0007og-Kj for netdev@oss.sgi.com; Fri, 01 Apr 2005 20:25:54 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHXOY-0008VP-6i; Fri, 01 Apr 2005 20:25:50 -0500 Subject: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu , "David S. Miller" , Masahide NAKAMURA Cc: psec-tools-devel@lists.sourceforge.net, netdev@oss.sgi.com, kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com Content-Type: text/plain Organization: jamalopolous Message-Id: <1112405144.1096.33.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 20:25:44 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1236 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Folks, Theres something wrong in the way acquire works - IMO in both pfkey and netlink. I asked this before but didnt get satisfactory answer. Masahide-san and myself have had private exchanges and we are both unsatisfied with current situation. Theres probably a spec or known good practise documented somewhere ... Let me provide some testcases then theorize. The idea is to simulate a situation where the kernel thinks a km is listening (it could be there but just non-responsive) or just a scenario where the acquire gets lost. You need the current events patches to see this. test1)on one window run setkey -x: ping -c 1 someDST -1) packet arrives towards outbound 0) Larval state created 1) one acquire sent. 2) timeout. 3) packet dropped. -ESRCH returned. 4) larval state deleted So question 1): Shouldnt the return code be -ERESTART to ask the app to retry? question 2) Why is there a hardcoding of 1 try only? ping -c2 someDST Same as above (Steps -1 to 4) repeated twice one for each packet sent ping -c3 DST Same as above repeated 3 times. test2) With ip x m (but not setkey). ping -c 1 DST -1) packet arrives 0) Larval state created Loop: 1) one acquire sent. 2) timeout. go to loop. So loop has no way to break. ping is hang waiting. the only way to break out is by hitting control-c on prompt. I think ping gets a -ERESTART which i believe is the correct signal? When you hit control-c Larval state is deleted. Clearly this is not desirable. We want at some point to give up. Question: Can we have a configurable max retries (sysctl settable) for acquire - or does it already exist just not being used? Couldnt find any staring at the code. ping -c2/3 DST does not change the above behavior. Ping is hang after first packet - so it doesnt matter. The conclusion we reached in our discussion is: a) -ERESTART is the correct signal to return b) number of acquire retries should be configurable preferably a system wide value. Thoughts? cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 17:28:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:28:55 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321Sjxf025671 for ; Fri, 1 Apr 2005 17:28:46 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHXR8-0001Ml-00; Sat, 02 Apr 2005 11:28:30 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHXQr-0006Qw-00; Sat, 02 Apr 2005 11:28:13 +1000 Date: Sat, 2 Apr 2005 11:28:13 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050402012813.GA24575@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112403845.1088.14.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1237 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Hi Jamal: On Fri, Apr 01, 2005 at 08:04:05PM -0500, jamal wrote: > > The issue is that pfkey echoes back a few things from the original > message - important ones being version, pid, seq, and msgtype (as a > sample take a look at pfkey_add()). So these need to be remembered... You're right. The pid and seq should be stored in km_event by af_key and xfrm_user before they call km_notify. In fact bring back that the km_type field too and put it in km_event. That'll become useful when we figure out a way to include it in the netlink message so that the originator can be uniquely identified. The version should always be set by the kernel though. This is because the packet we're broadcasting has been regenerated by the kernel. If we ever get PFKEY v3 then in order that all existing applications understand these messages you'll have to reformat them as PFKEY v2 anyway. msgtype should be derived from the event as you did in xfrm_user. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From jaganav@us.ibm.com Fri Apr 1 17:37:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:37:33 -0800 (PST) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321bQ3b026512 for ; Fri, 1 Apr 2005 17:37:26 -0800 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j321bK5j733728 for ; Fri, 1 Apr 2005 20:37:20 -0500 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by westrelay01.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j321bKlD200322 for ; Fri, 1 Apr 2005 18:37:20 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j321bJ1U014970 for ; Fri, 1 Apr 2005 18:37:20 -0700 Received: from imap.linux.ibm.com (imap.rtp.raleigh.ibm.com [9.42.107.100]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j321bFBZ014935; Fri, 1 Apr 2005 18:37:19 -0700 Received: by imap.linux.ibm.com (Postfix, from userid 48) id 3D36E7C015; Fri, 1 Apr 2005 20:37:14 -0500 (EST) Received: from dyn9047018082.beaverton.ibm.com (dyn9047018082.beaverton.ibm.com [9.47.18.82]) by imap.rtp.raleigh.ibm.com (IMP) with HTTP for ; Fri, 1 Apr 2005 20:37:13 -0500 Message-ID: <1112405833.424df749e61b5@imap.linux.ibm.com> Date: Fri, 1 Apr 2005 20:37:13 -0500 From: jaganav@us.ibm.com To: Stephen Hemminger Cc: Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.7 X-Originating-IP: 9.47.18.82 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1238 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaganav@us.ibm.com Precedence: bulk X-list: netdev Quoting Stephen Hemminger : > On Thu, 31 Mar 2005 21:13:39 -0500 > jaganav@us.ibm.com wrote: > > > Quoting Roland Dreier : > > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or > > > whatever you want to call it) world. However I know that the large > > > majority of InfiniBand use right now is running on Linux, and I hope > > > the Linux community is willing to work with the IB community. > > > > > > > Just want to let everyone know know that we have started an opensource > > effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs). > This > > community has now come up with an architecture > > (http://rdma.sourceforge.net/architecture.pdf) to build this support in > Linux. > > Would really appreciate if you review and provide any comments. We have > just > > started to hack but no code is available on this project yet. > > > > Thanks > > Venkat > > OpenRdma is a misnomer, because as I read your architecture you are trying > to > create a "kernel abstraction layer" for closed source vendor RDMA drivers. > This will > never be accepted, please go back to the drawing board and figure out how to > make > real open source drivers. > > First let me say that the purpose of this project is to make the entire stack (with all of the enablement layers) including the drivers opensourced. The kernel abstraction layer will be built around standards based (opengroup.org/icsc) RNIC-PI interface and which allows the RNIC vendors to opensource their drivers using that interface. BTW, RNIC-PI interface is work-in-progress and the first draft is targeted to be published soon. Several RNIC adapter vendors, who contribute to the openRDMA effort, are quite willing to opensource their drivers through openRDMA project. BTW, I understood why you got the impression that the this is for closed source vendor drivers: Our intention is not to allow the kernel verbs provider code (kVP) to be private and that was an error. Thanks for pointing this out but we'll make this change soon. Thanks Venkat From hadi@cyberus.ca Fri Apr 1 17:42:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:43:00 -0800 (PST) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321gsmJ027199 for ; Fri, 1 Apr 2005 17:42:54 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DHXex-0007wz-2o for netdev@oss.sgi.com; Fri, 01 Apr 2005 18:42:47 -0700 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHXf1-0001oe-Em; Fri, 01 Apr 2005 20:42:51 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050402012813.GA24575@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112406164.1088.54.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 20:42:45 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1239 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Herbert, On Fri, 2005-04-01 at 20:28, Herbert Xu wrote: > Hi Jamal: > > On Fri, Apr 01, 2005 at 08:04:05PM -0500, jamal wrote: > > > > The issue is that pfkey echoes back a few things from the original > > message - important ones being version, pid, seq, and msgtype (as a > > sample take a look at pfkey_add()). So these need to be remembered... > > You're right. The pid and seq should be stored in km_event by > af_key and xfrm_user before they call km_notify. In fact bring > back that the km_type field too and put it in km_event. Do we need km_type? Given we have: the event, seq, pid (regardless of where it was generated) we have sufficient info to create eitehr a netlink or pfkey message. > That'll > become useful when we figure out a way to include it in the netlink > message so that the originator can be uniquely identified. > The pid seems pretty accurate to describe what process generated the initial message. hold on: Ah, I think i may get what you are trying to get to: You want iproute to display something along the lines of "this was created by a pfkey app pid 1534". Did i read you correctly? > The version should always be set by the kernel though. This is because > the packet we're broadcasting has been regenerated by the kernel. If > we ever get PFKEY v3 then in order that all existing applications > understand these messages you'll have to reformat them as PFKEY v2 > anyway. > So always go v2? > msgtype should be derived from the event as you did in xfrm_user. > indeed. cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 17:46:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:46:47 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321kbQW027801 for ; Fri, 1 Apr 2005 17:46:38 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHXiE-0001SM-00; Sat, 02 Apr 2005 11:46:10 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHXhn-0006TH-00; Sat, 02 Apr 2005 11:45:43 +1000 Date: Sat, 2 Apr 2005 11:45:43 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050402014543.GA24861@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112406164.1088.54.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1240 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 08:42:45PM -0500, jamal wrote: > > hold on: Ah, I think i may get what you are trying to get to: You want > iproute to display something along the lines of "this was created by a > pfkey app pid 1534". Did i read you correctly? That's right. Someone with a pathological mind might do pfkey and netlink from the same pid :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Fri Apr 1 17:46:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 17:46:58 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j321kqw2027873 for ; Fri, 1 Apr 2005 17:46:52 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHXiY-0001TB-00; Sat, 02 Apr 2005 11:46:30 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHXiN-0006Tc-00; Sat, 02 Apr 2005 11:46:19 +1000 Date: Sat, 2 Apr 2005 11:46:19 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050402014619.GB24861@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112406164.1088.54.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1241 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 08:42:45PM -0500, jamal wrote: > > So always go v2? Yes since that's the only version that the kernel knows how to generate. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From jaganav@us.ibm.com Fri Apr 1 18:00:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 18:00:21 -0800 (PST) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32208Nk029126 for ; Fri, 1 Apr 2005 18:00:14 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j322014I563788 for ; Fri, 1 Apr 2005 21:00:01 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j32200dg184050 for ; Fri, 1 Apr 2005 19:00:00 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j321xx6X031851 for ; Fri, 1 Apr 2005 19:00:00 -0700 Received: from imap.linux.ibm.com (imap.rtp.raleigh.ibm.com [9.42.107.100]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j321xwQY031721; Fri, 1 Apr 2005 18:59:59 -0700 Received: by imap.linux.ibm.com (Postfix, from userid 48) id 34C8D7C015; Fri, 1 Apr 2005 20:59:47 -0500 (EST) Received: from dyn9047018082.beaverton.ibm.com (dyn9047018082.beaverton.ibm.com [9.47.18.82]) by imap.rtp.raleigh.ibm.com (IMP) with HTTP for ; Fri, 1 Apr 2005 20:59:46 -0500 Message-ID: <1112407186.424dfc92dc37a@imap.linux.ibm.com> Date: Fri, 1 Apr 2005 20:59:46 -0500 From: jaganav@us.ibm.com To: Dmitry Yusupov Cc: Asgeir Eiriksson , "H. Peter Anvin" , Roland Dreier , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, Benjamin LaHaise Subject: RE: Linux support for RDMA MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.7 X-Originating-IP: 9.47.18.82 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1242 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaganav@us.ibm.com Precedence: bulk X-list: netdev Quoting Dmitry Yusupov : > On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote: > > Venkat > > > > Your assessment of the IB vs. Ethernet latencies isn't necessarily > > correct. > > - you already have available low latency 10GE switches (< 1us > > port-to-port) > > - you already have available low latency (cut-through processing) 10GE > > TOE engines > > > > The Veritest verified 10GE TOE end-to-end latency is < 10us today > > (end-to-end being from a Linux user-space-process to a Linux > > user-space-process through a switch; full report with detail of the > > setup is available at > > http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf) > > > > For comparison: the published IB latency numbers are around 5us today > > and those use a polling receiver, and those don't include a context > > switch(es) as does the Ethernet number quoted above. > > yep. I should agree in here. On 10Gbps network latencies numbers are > around 5-15us. Even with non-TOE card, I managed to get 13us latency > with regular TCP/IP stack. > > [root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h > 17.1.1.227 > Latency: 0.000013 > Now starting main loop > 0: 256 bytes 7 times --> 131.37 Mbps in 0.000015 sec > 1: 512 bytes 65 times --> 239.75 Mbps in 0.000016 sec > > Dima When I mentioned about latency, the measurement is from end-to-end (i.e. from app to app) but not just the switching or port to port latencies. With IB, I have seen the best numbers ranging from 5 to 7 us and which is far better than ethernet today (15 to 35us) with the network we have. I am not denyig the fact that ethernet is trying to close the gap here but IB has got a relative advantage now. Good to see you have got 5us in one case but what were the switch and adapter latencies in this case. Thanks Venkat From herbert@gondor.apana.org.au Fri Apr 1 18:11:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 18:11:07 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j322AwI7030178 for ; Fri, 1 Apr 2005 18:10:59 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHY5c-0001Zm-00; Sat, 02 Apr 2005 12:10:20 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHY55-0006Vk-00; Sat, 02 Apr 2005 12:09:47 +1000 Date: Sat, 2 Apr 2005 12:09:47 +1000 To: "David S. Miller" Cc: kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [IPSEC]: Kill nested read lock by deleting xfrm_init_tempsel Message-ID: <20050402020947.GA24998@gondor.apana.org.au> References: <20050214221200.GA18465@gondor.apana.org.au> <20050214221433.GB18465@gondor.apana.org.au> <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> <20050331212325.5e996432.davem@davemloft.net> <20050402004956.GA24339@gondor.apana.org.au> <20050401172007.7296eced.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050401172007.7296eced.davem@davemloft.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1243 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 05:20:07PM -0800, David S. Miller wrote: > On Sat, 2 Apr 2005 10:49:56 +1000 > Herbert Xu wrote: > > > The second patch creates a dead lock since it does a nested read > > lock. The solution is simply to get rid of xfrm_init_tempsel > > and call the afinfo version directly. > > read locks nest even in the presence of pending writers Doh! I should've read the code first :) It's still a valid clean-up patch though. There is another reason why it won't dead lock. We don't actually ever hold the write lock on afinfo :) Is there any reason why we dont't just use xfrm_state_afinfo_lock instead of afinfo->lock? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Fri Apr 1 18:14:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 18:14:36 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j322ES0j030810 for ; Fri, 1 Apr 2005 18:14:29 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHY8k-0001bB-00; Sat, 02 Apr 2005 12:13:35 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHY7o-0006WQ-00; Sat, 02 Apr 2005 12:12:36 +1000 Date: Sat, 2 Apr 2005 12:12:36 +1000 To: jamal Cc: "David S. Miller" , Masahide NAKAMURA , psec-tools-devel@lists.sourceforge.net, netdev@oss.sgi.com, kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com Subject: Re: IPSEC: on behavior of acquire Message-ID: <20050402021236.GA25054@gondor.apana.org.au> References: <1112405144.1096.33.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112405144.1096.33.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1244 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 08:25:44PM -0500, jamal wrote: > > The conclusion we reached in our discussion is: > a) -ERESTART is the correct signal to return > b) number of acquire retries should be configurable preferably a system > wide value. > > Thoughts? Once we have the xfrm resolution stuff that Patrick is working on, we can have knobs for these cases just like those in the neighbour code. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From greg@kroah.com Fri Apr 1 21:29:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 21:29:29 -0800 (PST) Received: from perch.kroah.org (mail.kroah.org [69.55.234.183]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j325TG8L007523 for ; Fri, 1 Apr 2005 21:29:16 -0800 Received: from [192.168.0.10] (c-24-22-118-199.hsd1.or.comcast.net [24.22.118.199]) (authenticated) by perch.kroah.org (8.11.6/8.11.6) with ESMTP id j325Rsi06304; Fri, 1 Apr 2005 21:27:54 -0800 Received: from greg by echidna.kroah.org with local (masqmail 0.2.19) id 1DHbAY-4ZB-00; Fri, 01 Apr 2005 21:27:38 -0800 Date: Fri, 1 Apr 2005 21:27:38 -0800 From: Greg KH To: jaganav@us.ibm.com Cc: Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050402052738.GA17506@kroah.com> References: <20050401154348.553f3c46@dxpl.pdx.osdl.net> <1112405833.424df749e61b5@imap.linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112405833.424df749e61b5@imap.linux.ibm.com> User-Agent: Mutt/1.5.8i X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1245 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg@kroah.com Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote: > > Several RNIC adapter vendors, who contribute to the > openRDMA effort, are quite willing to opensource > their drivers through openRDMA project. "Several"? Why not all? And why the dual license? What good is writing Linux kernel code that is BSD licensed for such a core component? Didn't you all learn from the openib licensing mess? thanks, greg k-h From greg@kroah.com Fri Apr 1 22:02:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 22:02:53 -0800 (PST) Received: from perch.kroah.org (mail.kroah.org [69.55.234.183]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3262fL9008963 for ; Fri, 1 Apr 2005 22:02:41 -0800 Received: from [192.168.0.10] (c-24-22-118-199.hsd1.or.comcast.net [24.22.118.199]) (authenticated) by perch.kroah.org (8.11.6/8.11.6) with ESMTP id j3262Ri06657; Fri, 1 Apr 2005 22:02:27 -0800 Received: from greg by echidna.kroah.org with local (masqmail 0.2.19) id 1DHbi4-4dJ-00; Fri, 01 Apr 2005 22:02:16 -0800 Date: Fri, 1 Apr 2005 22:02:16 -0800 From: Greg KH To: jaganav@us.ibm.com Cc: Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050402060216.GA17766@kroah.com> References: <20050401154348.553f3c46@dxpl.pdx.osdl.net> <1112405833.424df749e61b5@imap.linux.ibm.com> <20050402052738.GA17506@kroah.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050402052738.GA17506@kroah.com> User-Agent: Mutt/1.5.8i X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1246 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg@kroah.com Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 09:27:38PM -0800, Greg KH wrote: > On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote: > > > > Several RNIC adapter vendors, who contribute to the > > openRDMA effort, are quite willing to opensource > > their drivers through openRDMA project. > > "Several"? Why not all? > > And why the dual license? What good is writing Linux kernel code that > is BSD licensed for such a core component? Didn't you all learn from > the openib licensing mess? Oh, and for those of you who might not know what mess I am talking about: The openib code was set up to be dual GPL and BSD licensed for the express purpose of taking the openib code and placing it into a closed source operating system (not any of the *BSDs). Needless to say, this has prevented me from doing any openib work, and probably the same for a number of other Linux kernel developers. If you all wish to duplicate this stupidity, feel free, but do not expect to get any help from the community... thanks, greg k-h From a.kasparas@gmc.lt Fri Apr 1 23:10:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 23:10:19 -0800 (PST) Received: from smtp02.omnitel.sun (smtp02-neptunas.omnitel.net [194.176.45.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j327ADSH011054 for ; Fri, 1 Apr 2005 23:10:14 -0800 Received: from smtp04-neptunas.omnitel.net ([194.176.45.42]) by smtp02.omnitel.sun (Sun Java System Messaging Server 6.1 HotFix 0.01 (built Jun 24 2004)) with ESMTP id <0IEB0018L58VLY00@smtp02.omnitel.sun> for netdev@oss.sgi.com; Sat, 02 Apr 2005 10:10:07 +0300 (EEST) Received: from smtp04-neptunas.omnitel.net (localhost [127.0.0.1]) by smtp04-neptunas.omnitel.net (Postfix) with SMTP id 59872398079; Sat, 02 Apr 2005 10:10:05 +0300 (EEST) Received: from [192.168.0.128] (unknown [62.212.195.62]) by smtp04-neptunas.omnitel.net (Postfix) with ESMTP id DB5F9398069; Sat, 02 Apr 2005 10:10:04 +0300 (EEST) Date: Sat, 02 Apr 2005 10:10:05 +0300 From: Aidas Kasparas Subject: Re: IPSEC: on behavior of acquire In-reply-to: <1112405303.1096.37.camel@jzny.localdomain> To: hadi@cyberus.ca Cc: ipsec-tools-devel@lists.sourceforge.net, netdev@oss.sgi.com, nakam@linux-ipv6.org Message-id: <424E454D.4090402@gmc.lt> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: lt, en, ru, fr X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime References: <1112405303.1096.37.camel@jzny.localdomain> User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1247 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: a.kasparas@gmc.lt Precedence: bulk X-list: netdev jamal wrote: > test1)on one window run setkey -x: > > ping -c 1 someDST > > -1) packet arrives towards outbound > 0) Larval state created > 1) one acquire sent. > 2) timeout. > 3) packet dropped. -ESRCH returned. > 4) larval state deleted > > So question 1): Shouldnt the return code be -ERESTART to ask > the app to retry? > question 2) Why is there a hardcoding of 1 try only? Re 1 try only. There is little sense to do more tries. If there is no deamon listening to pfkey messages, then no connection will be made no matter how many retries you'll do. If deamon/link/peer is slow and SA was not established before timeout expired, then repeated acquire will be simply ignored (deamon will find out that negotiation is already in progress, there is no reason to start another negotiation and therefore will drop that acquire request). And the only situation where repeated acquires may help is when pfkey messages are lost. But pfkey was not designed to survive message loses, therefore you should not operate your boxes in mode when lost pfkey messages are a rule, not an exception. And on the other hand, occasional pfkey message loses can be worked around by applications/user retry. Re error code returned. Error codes returned by pfkey never were perfect. But your experiment is not perfect too. You sent pings with no KE deamon running. pfkey code found that there is nothing receiving acquire messages => there is no chance that any process will setup required SAs and tried to inform about that (I agree, return code is not very informative, at least until you learn about reasons why it is such). If you would have racoon (or other pfkey based ISAKMP daemon) running, you would get "resource temporarily unavailable" (don't know which error code corresponds to that message), which IMHO is ok (if it is not, please explain). Re netlink behaviour I can not comment as I don't use it for ipsec purposes, but would like to read similar explanation. Reason for that - idea that ipsec-tools one day could support operation via netlink is not ruled out of our minds. Yet, afaik nobody is working on it at the moment. -- Aidas Kasparas IT administrator GM Consult Group, UAB From jaganav@us.ibm.com Fri Apr 1 23:30:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 23:30:27 -0800 (PST) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j327UFIh012085 for ; Fri, 1 Apr 2005 23:30:21 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e31.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j327U8ua333834 for ; Sat, 2 Apr 2005 02:30:08 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j327U8dg153986 for ; Sat, 2 Apr 2005 00:30:08 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j327U70n002154 for ; Sat, 2 Apr 2005 00:30:08 -0700 Received: from imap.linux.ibm.com (imap.rtp.raleigh.ibm.com [9.42.107.100]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j327U3OI001848; Sat, 2 Apr 2005 00:30:07 -0700 Received: by imap.linux.ibm.com (Postfix, from userid 48) id 05DA67C015; Sat, 2 Apr 2005 02:29:51 -0500 (EST) Received: from sig-9-65-29-50.mts.ibm.com (sig-9-65-29-50.mts.ibm.com [9.65.29.50]) by imap.rtp.raleigh.ibm.com (IMP) with HTTP for ; Sat, 2 Apr 2005 02:29:51 -0500 Message-ID: <1112426991.424e49ef57e2b@imap.linux.ibm.com> Date: Sat, 2 Apr 2005 02:29:51 -0500 From: jaganav@us.ibm.com To: Greg KH Cc: Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.7 X-Originating-IP: 9.65.29.50 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1248 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaganav@us.ibm.com Precedence: bulk X-list: netdev Quoting Greg KH : > On Fri, Apr 01, 2005 at 09:27:38PM -0800, Greg KH wrote: > > On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote: > > > > > > Several RNIC adapter vendors, who contribute to the > > > openRDMA effort, are quite willing to opensource > > > their drivers through openRDMA project. > > > > "Several"? Why not all? Because I haven't heard from 'all' of them yet that they would opensource. I am sure every vendor will do when the most of the other vendors are opensourcing it but I can't speak for them. I have asked in the past and will continue to ask every vendor to opensource their driver and make it part of openRDMA stack. > > > > And why the dual license? What good is writing Linux kernel code that > > is BSD licensed for such a core component? Didn't you all learn from > > the openib licensing mess? > > Oh, and for those of you who might not know what mess I am talking > about: > > The openib code was set up to be dual GPL and BSD licensed for the > express purpose of taking the openib code and placing it into a closed > source operating system (not any of the *BSDs). Needless to say, this > has prevented me from doing any openib work, and probably the same for a > number of other Linux kernel developers. > Absolutely understand the dual-license mess with openIB code. -:) However the intention of dual license with OpenRDMA is not for placing the code in closed source OSes but specifically for BSD* and in fact, the request is specifically made by the most adapter vendors as they wanted to offer the same on BSD platforms as well. BTW, unlike OpenIB initial stack (i.e. Gen1) which was already developed when it got opensourced, the openRDMA code is developed from scratch in true opensource fashion (of course, OpenIB has also followed this approach for their next generation stack though) with no ifdef code for BSD*. If this dual license is a concern to other kernel developers as well from contributing to OpenRDMA, we would seriously consider this and discuss with the adapter vendors. Thanks Venkat From herbert@gondor.apana.org.au Sat Apr 2 00:22:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 00:22:34 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j328MLgN017287 for ; Sat, 2 Apr 2005 00:22:24 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHdt3-0002t0-00; Sat, 02 Apr 2005 18:21:45 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHdsP-0003Lr-00; Sat, 02 Apr 2005 18:21:05 +1000 From: Herbert Xu To: dada1@cosmosbay.com (Eric Dumazet) Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Cc: davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se Organization: Core In-Reply-To: <424DD78D.7070001@cosmosbay.com> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sat, 02 Apr 2005 18:21:05 +1000 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1249 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Eric Dumazet wrote: > > OK this patch includes everything... > > - Locking abstraction > - rt_check_expire() fixes > - New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second > - New gc_debug sysctl to let sysadmin tune gc > - Less memory used by hash table (spinlocks moved to a smaller table) > - sizing of spinlocks table depends on NR_CPUS > - hash table allocated using alloc_large_system_hash() function > - header fix for /proc/net/stat/rt_cache This patch is doing too many things. How about splitting it up? For instance the spin lock stuff is pretty straightforward and should be in its own patch. The benefits of the GC changes are not obvious to me. rt_check_expire is simply meant to kill off old entries. It's not really meant to be used to free up entries when the table gets full. rt_garbage_collect on the other hand is designed to free entries when it is needed. Eric raised the point that rt_garbage_collect is pretty expensive. So what about amortising its cost a bit more? For instance, we can set a new threshold that's lower than gc_thresh and perform GC on the chain being inserted in rt_intern_hash if we're above that threshold. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From mroos@tartu.cyber.ee Sat Apr 2 00:41:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 00:41:22 -0800 (PST) Received: from tartu.cyber.ee (tartu.cyber.ee [193.40.6.68]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j328fHft018280 for ; Sat, 2 Apr 2005 00:41:18 -0800 Received: Message by Barricade tartu.cyber.ee with ESMTP id j328LgA06688; Sat, 2 Apr 2005 11:21:42 +0300 Received: from rhn.tartu-labor (rhn.tartu-labor [192.168.74.17]) by ondatra.tartu-labor (Postfix) with ESMTP id 65A2314C48; Sat, 2 Apr 2005 10:41:11 +0200 (EET) Received: from mroos by rhn.tartu-labor with local (Exim 4.50) id 1DHeBr-0002mb-2L; Sat, 02 Apr 2005 11:41:11 +0300 From: Meelis Roos To: hadi@cyberus.ca, netdev@oss.sgi.com Subject: Re: RFC: Redirect-Device In-Reply-To: <1112303627.1073.71.camel@jzny.localdomain> User-Agent: tin/1.7.8-20050315 ("Scalpay") (UNIX) (Linux/2.6.12-rc1 (i686)) Message-Id: Date: Sat, 02 Apr 2005 11:41:11 +0300 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1250 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mroos@linux.ee Precedence: bulk X-list: netdev j> I must be missing something: What is it that this device can do that the j> mirred action cant do? I know what I am missing here: documentation. There is very basic documentation about tc qdisc+class+filter level and almost nothing on the newer features. Without good documentation only some developers understand it. -- Meelis Roos From dada1@cosmosbay.com Sat Apr 2 01:23:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 01:23:23 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j329NGlo020425 for ; Sat, 2 Apr 2005 01:23:16 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j329LV9q008040; Sat, 2 Apr 2005 11:21:36 +0200 Message-ID: <424E641A.1020609@cosmosbay.com> Date: Sat, 02 Apr 2005 11:21:30 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Herbert Xu CC: davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Sat, 02 Apr 2005 11:21:37 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1251 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Herbert Xu a écrit : > Eric Dumazet wrote: > >>OK this patch includes everything... >> >> - Locking abstraction >> - rt_check_expire() fixes >> - New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second >> - New gc_debug sysctl to let sysadmin tune gc >> - Less memory used by hash table (spinlocks moved to a smaller table) >> - sizing of spinlocks table depends on NR_CPUS >> - hash table allocated using alloc_large_system_hash() function >> - header fix for /proc/net/stat/rt_cache > > > This patch is doing too many things. How about splitting it up? > > For instance the spin lock stuff is pretty straightforward and > should be in its own patch. > > The benefits of the GC changes are not obvious to me. rt_check_expire > is simply meant to kill off old entries. It's not really meant to be > used to free up entries when the table gets full. Well, I began my work because of the overflow bug in rt_check_expire()... Then I realize this function could not work as expected. On a loaded machine, one timer tick is 1 ms. During this time, number of chains that are scanned is ridiculous. With the standard timer of 60 second, fact is rt_check_expire() is useless. > > rt_garbage_collect on the other hand is designed to free entries > when it is needed. Eric raised the point that rt_garbage_collect > is pretty expensive. So what about amortising its cost a bit more? Yes. rt_garbage_collect() has serious problems. But this function is sooo complex I dont want to touch it and let experts do it if they want. But then one may think why we have two similar functions that are doing basically the same thing : garbage collection. One of a production machine rtstat -i 1 output is : rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 2618087| 28581| 7673| 0| 0| 0| 0| 0| 1800| 1450| 0| 0| 0| 0| 0| 37630| 4783| 2618689| 25444| 4918| 0| 0| 0| 0| 0| 2051| 1699| 0| 0| 0| 0| 0| 27741| 5461| 2619369| 25000| 4567| 0| 0| 0| 0| 0| 1860| 1304| 0| 0| 0| 0| 0| 26606| 4563| 2618396| 24830| 4633| 0| 0| 0| 0| 0| 1959| 1492| 0| 0| 0| 0| 0| 26643| 4930| Without serious tuning, this machine could not handle this load, or even half of it. Crashes usually occurs when secret_interval interval is elapsed : rt_cache_flush(0); is called, and the whole machine begins to die. > > For instance, we can set a new threshold that's lower than gc_thresh > and perform GC on the chain being inserted in rt_intern_hash if we're > above that threshold. We could also try to perform GC on L1_CACHE_SIZE/sizeof(struct rt_hash_bucket) chains, not only the 'current chain', to fully use the cache miss. > > Cheers, Thank you From akpm@osdl.org Sat Apr 2 01:56:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 01:56:49 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j329ufvY021947 for ; Sat, 2 Apr 2005 01:56:42 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j329uas4032011 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sat, 2 Apr 2005 01:56:36 -0800 Received: from bix (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id j329uZIu002985; Sat, 2 Apr 2005 01:56:35 -0800 Date: Sat, 2 Apr 2005 01:56:22 -0800 From: Andrew Morton To: netdev@oss.sgi.com Cc: kernel@wpascanner.com Subject: Fw: [Bugme-new] [Bug 4434] New: Tulip based NIC card causes hard lock up of PC Message-Id: <20050402015622.41dff439.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1252 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev Begin forwarded message: Date: Sat, 2 Apr 2005 01:49:50 -0800 From: bugme-daemon@osdl.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 4434] New: Tulip based NIC card causes hard lock up of PC http://bugme.osdl.org/show_bug.cgi?id=4434 Summary: Tulip based NIC card causes hard lock up of PC Kernel Version: 2.6.11 Status: NEW Severity: high Owner: acme@conectiva.com.br Submitter: kernel@wpascanner.com Distribution: Knoppix V3.8 CeBIT, V3.7 PC-Welt, ANY Knoppix under kernel 2.6.x Hardware Environment: #1 FIC PA-2007 MB 160MB RAM BIOS V1.09CD12 #2 ABIT K7R MB 384MB RAM LAN Cards OEM DEC Tulip 21041 DLink DE-530+ LAN Cards Intel 21143 Tulip based Software Environment: de4x5 Problem Description: Hard lock up on setting up LAN/NIC card Steps to reproduce: Can not boot to working enviroment with DHCP enabled (default for Knoppix) or after booting via NODHCP cheat code on command line and using netcardconfig results in the hard lock up. See: http://www.knoppix.net/forum/viewtopic.php?t=17985&highlight= http://www.knoppix.net/forum/viewtopic.php?t=17986&highlight= ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From linux781@gmail.com Sat Apr 2 02:31:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 02:31:45 -0800 (PST) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.197]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32AVex2023541 for ; Sat, 2 Apr 2005 02:31:41 -0800 Received: by zproxy.gmail.com with SMTP id 34so92309nzf for ; Sat, 02 Apr 2005 02:31:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding; b=m8KmkOw5qebPHfeH+i3pMbrY/IbFik0mgjgGVabMStvpnnuBglcvbkF810DYXX7mhlZyBICiTYXcoExX3TB/uLXNrMC9g5TzzrVn9jW1V0kD9Z8MyHmIMWY30VUCqz/HWq37msrhni/axG7jZg18xPMzdOqKIHcs/U7DZb8EZL0= Received: by 10.36.5.5 with SMTP id 5mr6599nze; Sat, 02 Apr 2005 02:31:35 -0800 (PST) Received: by 10.36.58.7 with HTTP; Sat, 2 Apr 2005 02:31:35 -0800 (PST) Message-ID: <72252ed05040202313a309e77@mail.gmail.com> Date: Sat, 2 Apr 2005 05:31:35 -0500 From: Akshay Kawale Reply-To: Akshay Kawale To: netdev@oss.sgi.com Subject: Problem accessing IP header fields. Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1253 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linux781@gmail.com Precedence: bulk X-list: netdev Hi, I am trying to access the tot_len field in the IP Header using a sk_buff structure inside a Netfilter hook. I do something like: (**skb).nh.iph->tot_len += 64 I have tried other variants of the same statement but none of them work. I want to increment the length by 64 bytes, but it gives me an error saying that I am trying to access an 'incomplete data type'. Can anyone shed some light on this problem? tot_len if of type __u16 (unsigned short int). Thanks. - Akshay From herbert@gondor.apana.org.au Sat Apr 2 03:24:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 03:25:06 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32BOrml027752 for ; Sat, 2 Apr 2005 03:24:54 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHgjp-0003nQ-00; Sat, 02 Apr 2005 21:24:25 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHgiW-00063U-00; Sat, 02 Apr 2005 21:23:04 +1000 Date: Sat, 2 Apr 2005 21:23:04 +1000 To: Eric Dumazet Cc: davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, hadi@cyberus.ca Subject: Get rid of rt_check_expire and rt_garbage_collect Message-ID: <20050402112304.GA11321@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <424E641A.1020609@cosmosbay.com> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1254 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 11:21:30AM +0200, Eric Dumazet wrote: > > Well, I began my work because of the overflow bug in rt_check_expire()... > Then I realize this function could not work as expected. On a loaded > machine, one timer tick is 1 ms. > During this time, number of chains that are scanned is ridiculous. > With the standard timer of 60 second, fact is rt_check_expire() is useless. I see. What we've got here is a scalability problem with respect to the number of hash buckets. As the number of buckets increases, the amount of work the timer GC has to perform inreases proportionally. Since the timer GC parameters are fixed, this will eventually break. Rather than changing the timer GC so that it runs more often to keep up with the large routing cache, we should get out of this by reducing the amount of work we have to do. Imagine an ideal balanced hash table with 2.6 million entries. That is, all incoming/outgoing packets belong to flows that are already in the hash table. Imagine also that there is no PMTU/link failure taking place so all entries are valid forever. In this state there is absolutely no need to execute the timer GC. Let's remove one of those assumptions and allow there to be entries which need to expire after a set period. Instead of having the timer GC clean them up, we can move the expire check to the place where the entries are used. That is, we make ip_route_input/ip_route_output/ipv4_dst_check check whether the entry has expired. On the face of it we're doing more work since every routing cache hit will need to check the validity of the dst. However, because it's a single subtraction it is actually pretty cheap. There is also no additional cache miss compared to doing it in the timer GC since we have to read the dst anyway. Let's go one step further and make the routing cache come to life. Now there are new entries coming in and we need to remove old ones in order to make room for them. That task is currently carried out by the timer GC in rt_check_expire and on demand by rt_garbage_collect. Either way we have to walk the entire routing cache looking for entries to get rid of. This is quite expensive when the routing cache is large. However, there is a better way. The reason we keep a cap on the routing cache (for a given hash size) is so that individual chains do not degenerate into long linked lists. In other words, we don't really care about how many entries there are in the routing cache. But we do care about how long each hash chain is. So instead of walking the entire routing cache to keep the number of entries down, what we should do is keep each hash chain as short as possible. Assuming that the hash function is good, this should achieve the same end result. Here is how it can be done: Every time a routing entry is inserted into a hash chain, we perform GC on that chain unconditionally. It might seem that we're doing more work again. However, as before because we're traversing the chain anyway, it is very cheap to perform the GC operations which mainly involve the checks in rt_may_expire. OK that's enough thinking and it's time to write some code to see whether this is all bullshit :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From zilvinas@barclay.balt.net Sat Apr 2 04:26:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 04:28:20 -0800 (PST) Received: from barclay.balt.net (root@barclay.balt.net [195.14.162.78]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32CQiid001289 for ; Sat, 2 Apr 2005 04:26:45 -0800 Received: from barclay.balt.net (zilvinas@localhost [127.0.0.1]) by barclay.balt.net (8.13.2/8.13.1/Debian-15) with ESMTP id j32CPsuD007894; Sat, 2 Apr 2005 15:25:54 +0300 Received: (from zilvinas@localhost) by barclay.balt.net (8.13.2/8.13.1/Submit) id j32CPrsa007893; Sat, 2 Apr 2005 15:25:53 +0300 Date: Sat, 2 Apr 2005 15:25:53 +0300 From: Zilvinas Valinskas To: Aidas Kasparas Cc: hadi@cyberus.ca, ipsec-tools-devel@lists.sourceforge.net, netdev@oss.sgi.com, nakam@linux-ipv6.org Subject: Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire Message-ID: <20050402122553.GA7521@gemtek.lt> Reply-To: Zilvinas Valinskas References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <424E454D.4090402@gmc.lt> X-Attribution: Zilvinas X-Url: http://www.gemtek.lt/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1255 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zilvinas@gemtek.lt Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 10:10:05AM +0300, Aidas Kasparas wrote: > > > jamal wrote: > >test1)on one window run setkey -x: > > > >ping -c 1 someDST > > > >-1) packet arrives towards outbound > >0) Larval state created > >1) one acquire sent. > >2) timeout. > >3) packet dropped. -ESRCH returned. > >4) larval state deleted > > > >So question 1): Shouldnt the return code be -ERESTART to ask > >the app to retry? > >question 2) Why is there a hardcoding of 1 try only? > > Re 1 try only. There is little sense to do more tries. If there is no > deamon listening to pfkey messages, then no connection will be made no > matter how many retries you'll do. If deamon/link/peer is slow and SA > was not established before timeout expired, then repeated acquire will > be simply ignored (deamon will find out that negotiation is already in > progress, there is no reason to start another negotiation and therefore > will drop that acquire request). And the only situation where repeated > acquires may help is when pfkey messages are lost. But pfkey was not > designed to survive message loses, therefore you should not operate your > boxes in mode when lost pfkey messages are a rule, not an exception. And > on the other hand, occasional pfkey message loses can be worked around > by applications/user retry. > > Re error code returned. Error codes returned by pfkey never were > perfect. But your experiment is not perfect too. You sent pings with no > KE deamon running. pfkey code found that there is nothing receiving > acquire messages => there is no chance that any process will setup > required SAs and tried to inform about that (I agree, return code is not > very informative, at least until you learn about reasons why it is > such). If you would have racoon (or other pfkey based ISAKMP daemon) > running, you would get "resource temporarily unavailable" (don't know > which error code corresponds to that message), which IMHO is ok (if it > is not, please explain). EBUSY I think it is. I am not entirely sure it is ok to return such error, some applications are not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it doesn't brake old apps assumption (connection cannot be established, doesn't matter if that is due to routing or IPsec SPD or anything else). Although it is quite simple to fix applications to handle EBUSY and retry ... I thought it was annoying that applications quit because of EBUSY - when I had tried IPsec first time. Now I think it is quite handy - especially from scripts, I am sure that if something goes wrong - ping (or other application) won't block ... > > Re netlink behaviour I can not comment as I don't use it for ipsec > purposes, but would like to read similar explanation. Reason for that - > idea that ipsec-tools one day could support operation via netlink is not > ruled out of our minds. Yet, afaik nobody is working on it at the moment. > > > -- > Aidas Kasparas > IT administrator > GM Consult Group, UAB > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Ipsec-tools-devel mailing list > Ipsec-tools-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ipsec-tools-devel From khc@pm.waw.pl Sat Apr 2 05:29:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 05:29:30 -0800 (PST) Received: from khc.piap.pl (khc.piap.pl [195.187.100.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32DTEYT004128 for ; Sat, 2 Apr 2005 05:29:16 -0800 Received: by khc.piap.pl (Postfix, from userid 500) id F0E7E1084C; Sat, 2 Apr 2005 15:29:12 +0200 (CEST) To: Jeff Garzik Cc: Subject: [PATCH] Generic HDLC update From: Krzysztof Halasa Date: Sat, 02 Apr 2005 15:29:12 +0200 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1256 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: khc@pm.waw.pl Precedence: bulk X-list: netdev --=-=-= Hi, The attached patch updates generic HDLC to version 1.18. Lab-tested. Please apply to Linux 2.6. Thanks. Changes: - doc updates - added Cisco LMI support to Frame-Relay code - cleaned hdlc_fr.c a bit, removed some orphaned #defines etc. - fixed a problem with non-functional LMI in FR DCE mode. - changed diagnostic messages to better conform to FR standards - all protocols: information about carrier changes (DCD line) is now printed to kernel logs. Signed-Off-By: Krzysztof Halasa -- Krzysztof Halasa --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=hdlc-2.6-1.18.patch --- linux-2.6/Documentation/networking/generic-hdlc.txt 25 May 2003 22:13:37 -0000 1.4 +++ linux-2.6/Documentation/networking/generic-hdlc.txt 2 Apr 2005 13:12:18 -0000 @@ -1,21 +1,21 @@ Generic HDLC layer Krzysztof Halasa -January, 2003 Generic HDLC layer currently supports: -- Frame Relay (ANSI, CCITT and no LMI), with ARP support (no InARP). - Normal (routed) and Ethernet-bridged (Ethernet device emulation) - interfaces can share a single PVC. -- raw HDLC - either IP (IPv4) interface or Ethernet device emulation. -- Cisco HDLC, -- PPP (uses syncppp.c), -- X.25 (uses X.25 routines). - -There are hardware drivers for the following cards: -- C101 by Moxa Technologies Co., Ltd. -- RISCom/N2 by SDL Communications Inc. -- and others, some not in the official kernel. +1. Frame Relay (ANSI, CCITT, Cisco and no LMI). + - Normal (routed) and Ethernet-bridged (Ethernet device emulation) + interfaces can share a single PVC. + - ARP support (no InARP support in the kernel - there is an + experimental InARP user-space daemon available on: + http://www.kernel.org/pub/linux/utils/net/hdlc/). +2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation. +3. Cisco HDLC. +4. PPP (uses syncppp.c). +5. X.25 (uses X.25 routines). + +Generic HDLC is a protocol driver only - it needs a low-level driver +for your particular hardware. Ethernet device emulation (using HDLC or Frame-Relay PVC) is compatible with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging). @@ -24,7 +24,7 @@ Make sure the hdlc.o and the hardware driver are loaded. It should create a number of "hdlc" (hdlc0 etc) network devices, one for each WAN port. You'll need the "sethdlc" utility, get it from: - http://hq.pm.waw.pl/hdlc/ + http://www.kernel.org/pub/linux/utils/net/hdlc/ Compile sethdlc.c utility: gcc -O2 -Wall -o sethdlc sethdlc.c @@ -52,12 +52,12 @@ * v35 | rs232 | x21 | t1 | e1 - sets physical interface for a given port if the card has software-selectable interfaces loopback - activate hardware loopback (for testing only) -* clock ext - external clock (uses DTE RX and TX clock) -* clock int - internal clock (provides clock signal on DCE clock output) -* clock txint - TX internal, RX external (provides TX clock on DCE output) -* clock txfromrx - TX clock derived from RX clock (TX clock on DCE output) -* rate - sets clock rate in bps (not required for external clock or - for txfromrx) +* clock ext - both RX clock and TX clock external +* clock int - both RX clock and TX clock internal +* clock txint - RX clock external, TX clock internal +* clock txfromrx - RX clock external, TX clock derived from RX clock +* rate - sets clock rate in bps (for "int" or "txint" clock only) + Setting protocol: @@ -79,7 +79,7 @@ * x25 - sets X.25 mode * fr - Frame Relay mode - lmi ansi / ccitt / none - LMI (link management) type + lmi ansi / ccitt / cisco / none - LMI (link management) type dce - Frame Relay DCE (network) side LMI instead of default DTE (user). It has nothing to do with clocks! t391 - link integrity verification polling timer (in seconds) - user @@ -119,13 +119,14 @@ -If you have a problem with N2 or C101 card, you can issue the "private" -command to see port's packet descriptor rings (in kernel logs): +If you have a problem with N2, C101 or PLX200SYN card, you can issue the +"private" command to see port's packet descriptor rings (in kernel logs): sethdlc hdlc0 private -The hardware driver has to be build with CONFIG_HDLC_DEBUG_RINGS. +The hardware driver has to be build with #define DEBUG_RINGS. Attaching this info to bug reports would be helpful. Anyway, let me know if you have problems using this. -For patches and other info look at http://hq.pm.waw.pl/hdlc/ +For patches and other info look at: +. --- linux-2.6/include/linux/hdlc.h 28 Oct 2004 06:16:08 -0000 1.12 +++ linux-2.6/include/linux/hdlc.h 2 Apr 2005 13:12:18 -0000 @@ -1,7 +1,7 @@ /* * Generic HDLC support routines for Linux * - * Copyright (C) 1999-2003 Krzysztof Halasa + * Copyright (C) 1999-2005 Krzysztof Halasa * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2 of the GNU General Public License @@ -41,6 +41,7 @@ #define LMI_NONE 1 /* No LMI, all PVCs are static */ #define LMI_ANSI 2 /* ANSI Annex D */ #define LMI_CCITT 3 /* ITU-T Annex A */ +#define LMI_CISCO 4 /* The "original" LMI, aka Gang of Four */ #define HDLC_MAX_MTU 1500 /* Ethernet 1500 bytes */ #define HDLC_MAX_MRU (HDLC_MAX_MTU + 10 + 14 + 4) /* for ETH+VLAN over FR */ @@ -89,6 +90,7 @@ unsigned int deleted: 1; unsigned int fecn: 1; unsigned int becn: 1; + unsigned int bandwidth; /* Cisco LMI reporting only */ }state; }pvc_device; --- linux-2.6/drivers/net/wan/hdlc_fr.c 22 Jun 2004 03:25:28 -0000 1.13 +++ linux-2.6/drivers/net/wan/hdlc_fr.c 2 Apr 2005 13:12:18 -0000 @@ -2,7 +2,7 @@ * Generic HDLC support routines for Linux * Frame Relay support * - * Copyright (C) 1999 - 2003 Krzysztof Halasa + * Copyright (C) 1999 - 2005 Krzysztof Halasa * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2 of the GNU General Public License @@ -27,6 +27,10 @@ active = open and "link reliable" exist = new = not used + CCITT LMI: ITU-T Q.933 Annex A + ANSI LMI: ANSI T1.617 Annex D + CISCO LMI: the original, aka "Gang of Four" LMI + */ #include @@ -49,45 +53,41 @@ #undef DEBUG_ECN #undef DEBUG_LINK -#define MAXLEN_LMISTAT 20 /* max size of status enquiry frame */ +#define FR_UI 0x03 +#define FR_PAD 0x00 + +#define NLPID_IP 0xCC +#define NLPID_IPV6 0x8E +#define NLPID_SNAP 0x80 +#define NLPID_PAD 0x00 +#define NLPID_CCITT_ANSI_LMI 0x08 +#define NLPID_CISCO_LMI 0x09 + + +#define LMI_CCITT_ANSI_DLCI 0 /* LMI DLCI */ +#define LMI_CISCO_DLCI 1023 + +#define LMI_CALLREF 0x00 /* Call Reference */ +#define LMI_ANSI_LOCKSHIFT 0x95 /* ANSI locking shift */ +#define LMI_ANSI_CISCO_REPTYPE 0x01 /* report type */ +#define LMI_CCITT_REPTYPE 0x51 +#define LMI_ANSI_CISCO_ALIVE 0x03 /* keep alive */ +#define LMI_CCITT_ALIVE 0x53 +#define LMI_ANSI_CISCO_PVCSTAT 0x07 /* PVC status */ +#define LMI_CCITT_PVCSTAT 0x57 + +#define LMI_FULLREP 0x00 /* full report */ +#define LMI_INTEGRITY 0x01 /* link integrity report */ +#define LMI_SINGLE 0x02 /* single PVC report */ -#define PVC_STATE_NEW 0x01 -#define PVC_STATE_ACTIVE 0x02 -#define PVC_STATE_FECN 0x08 /* FECN condition */ -#define PVC_STATE_BECN 0x10 /* BECN condition */ - - -#define FR_UI 0x03 -#define FR_PAD 0x00 - -#define NLPID_IP 0xCC -#define NLPID_IPV6 0x8E -#define NLPID_SNAP 0x80 -#define NLPID_PAD 0x00 -#define NLPID_Q933 0x08 - - -#define LMI_DLCI 0 /* LMI DLCI */ -#define LMI_PROTO 0x08 -#define LMI_CALLREF 0x00 /* Call Reference */ -#define LMI_ANSI_LOCKSHIFT 0x95 /* ANSI lockshift */ -#define LMI_REPTYPE 1 /* report type */ -#define LMI_CCITT_REPTYPE 0x51 -#define LMI_ALIVE 3 /* keep alive */ -#define LMI_CCITT_ALIVE 0x53 -#define LMI_PVCSTAT 7 /* pvc status */ -#define LMI_CCITT_PVCSTAT 0x57 -#define LMI_FULLREP 0 /* full report */ -#define LMI_INTEGRITY 1 /* link integrity report */ -#define LMI_SINGLE 2 /* single pvc report */ #define LMI_STATUS_ENQUIRY 0x75 #define LMI_STATUS 0x7D /* reply */ #define LMI_REPT_LEN 1 /* report type element length */ #define LMI_INTEG_LEN 2 /* link integrity element length */ -#define LMI_LENGTH 13 /* standard LMI frame length */ -#define LMI_ANSI_LENGTH 14 +#define LMI_CCITT_CISCO_LENGTH 13 /* LMI frame lengths */ +#define LMI_ANSI_LENGTH 14 typedef struct { @@ -223,51 +223,34 @@ } -static inline u16 status_to_dlci(u8 *status, int *active, int *new) -{ - *new = (status[2] & 0x08) ? 1 : 0; - *active = (status[2] & 0x02) ? 1 : 0; - - return ((status[0] & 0x3F) << 4) | ((status[1] & 0x78) >> 3); -} - - -static inline void dlci_to_status(u16 dlci, u8 *status, int active, int new) -{ - status[0] = (dlci >> 4) & 0x3F; - status[1] = ((dlci << 3) & 0x78) | 0x80; - status[2] = 0x80; - - if (new) - status[2] |= 0x08; - else if (active) - status[2] |= 0x02; -} - - - static int fr_hard_header(struct sk_buff **skb_p, u16 dlci) { u16 head_len; struct sk_buff *skb = *skb_p; switch (skb->protocol) { - case __constant_ntohs(ETH_P_IP): + case __constant_ntohs(NLPID_CCITT_ANSI_LMI): head_len = 4; skb_push(skb, head_len); - skb->data[3] = NLPID_IP; + skb->data[3] = NLPID_CCITT_ANSI_LMI; break; - case __constant_ntohs(ETH_P_IPV6): + case __constant_ntohs(NLPID_CISCO_LMI): head_len = 4; skb_push(skb, head_len); - skb->data[3] = NLPID_IPV6; + skb->data[3] = NLPID_CISCO_LMI; break; - case __constant_ntohs(LMI_PROTO): + case __constant_ntohs(ETH_P_IP): head_len = 4; skb_push(skb, head_len); - skb->data[3] = LMI_PROTO; + skb->data[3] = NLPID_IP; + break; + + case __constant_ntohs(ETH_P_IPV6): + head_len = 4; + skb_push(skb, head_len); + skb->data[3] = NLPID_IPV6; break; case __constant_ntohs(ETH_P_802_3): @@ -461,13 +444,14 @@ hdlc_device *hdlc = dev_to_hdlc(dev); struct sk_buff *skb; pvc_device *pvc = hdlc->state.fr.first_pvc; - int len = (hdlc->state.fr.settings.lmi == LMI_ANSI) ? LMI_ANSI_LENGTH - : LMI_LENGTH; - int stat_len = 3; + int lmi = hdlc->state.fr.settings.lmi; + int dce = hdlc->state.fr.settings.dce; + int len = lmi == LMI_ANSI ? LMI_ANSI_LENGTH : LMI_CCITT_CISCO_LENGTH; + int stat_len = (lmi == LMI_CISCO) ? 6 : 3; u8 *data; int i = 0; - if (hdlc->state.fr.settings.dce && fullrep) { + if (dce && fullrep) { len += hdlc->state.fr.dce_pvc_count * (2 + stat_len); if (len > HDLC_MAX_MRU) { printk(KERN_WARNING "%s: Too many PVCs while sending " @@ -484,29 +468,31 @@ } memset(skb->data, 0, len); skb_reserve(skb, 4); - skb->protocol = __constant_htons(LMI_PROTO); - fr_hard_header(&skb, LMI_DLCI); + if (lmi == LMI_CISCO) { + skb->protocol = __constant_htons(NLPID_CISCO_LMI); + fr_hard_header(&skb, LMI_CISCO_DLCI); + } else { + skb->protocol = __constant_htons(NLPID_CCITT_ANSI_LMI); + fr_hard_header(&skb, LMI_CCITT_ANSI_DLCI); + } data = skb->tail; data[i++] = LMI_CALLREF; - data[i++] = hdlc->state.fr.settings.dce - ? LMI_STATUS : LMI_STATUS_ENQUIRY; - if (hdlc->state.fr.settings.lmi == LMI_ANSI) + data[i++] = dce ? LMI_STATUS : LMI_STATUS_ENQUIRY; + if (lmi == LMI_ANSI) data[i++] = LMI_ANSI_LOCKSHIFT; - data[i++] = (hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_REPTYPE : LMI_REPTYPE; + data[i++] = lmi == LMI_CCITT ? LMI_CCITT_REPTYPE : + LMI_ANSI_CISCO_REPTYPE; data[i++] = LMI_REPT_LEN; data[i++] = fullrep ? LMI_FULLREP : LMI_INTEGRITY; - - data[i++] = (hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_ALIVE : LMI_ALIVE; + data[i++] = lmi == LMI_CCITT ? LMI_CCITT_ALIVE : LMI_ANSI_CISCO_ALIVE; data[i++] = LMI_INTEG_LEN; data[i++] = hdlc->state.fr.txseq =fr_lmi_nextseq(hdlc->state.fr.txseq); data[i++] = hdlc->state.fr.rxseq; - if (hdlc->state.fr.settings.dce && fullrep) { + if (dce && fullrep) { while (pvc) { - data[i++] = (hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_PVCSTAT : LMI_PVCSTAT; + data[i++] = lmi == LMI_CCITT ? LMI_CCITT_PVCSTAT : + LMI_ANSI_CISCO_PVCSTAT; data[i++] = stat_len; /* LMI start/restart */ @@ -523,8 +509,20 @@ fr_log_dlci_active(pvc); } - dlci_to_status(pvc->dlci, data + i, - pvc->state.active, pvc->state.new); + if (lmi == LMI_CISCO) { + data[i] = pvc->dlci >> 8; + data[i + 1] = pvc->dlci & 0xFF; + } else { + data[i] = (pvc->dlci >> 4) & 0x3F; + data[i + 1] = ((pvc->dlci << 3) & 0x78) | 0x80; + data[i + 2] = 0x80; + } + + if (pvc->state.new) + data[i + 2] |= 0x08; + else if (pvc->state.active) + data[i + 2] |= 0x02; + i += stat_len; pvc = pvc->next; } @@ -569,6 +567,8 @@ pvc_carrier(0, pvc); pvc->state.exist = pvc->state.active = 0; pvc->state.new = 0; + if (!hdlc->state.fr.settings.dce) + pvc->state.bandwidth = 0; pvc = pvc->next; } } @@ -583,11 +583,12 @@ int i, cnt = 0, reliable; u32 list; - if (hdlc->state.fr.settings.dce) + if (hdlc->state.fr.settings.dce) { reliable = hdlc->state.fr.request && time_before(jiffies, hdlc->state.fr.last_poll + hdlc->state.fr.settings.t392 * HZ); - else { + hdlc->state.fr.request = 0; + } else { hdlc->state.fr.last_errors <<= 1; /* Shift the list */ if (hdlc->state.fr.request) { if (hdlc->state.fr.reliable) @@ -634,65 +635,88 @@ static int fr_lmi_recv(struct net_device *dev, struct sk_buff *skb) { hdlc_device *hdlc = dev_to_hdlc(dev); - int stat_len; pvc_device *pvc; - int reptype = -1, error, no_ram; u8 rxseq, txseq; - int i; + int lmi = hdlc->state.fr.settings.lmi; + int dce = hdlc->state.fr.settings.dce; + int stat_len = (lmi == LMI_CISCO) ? 6 : 3, reptype, error, no_ram, i; - if (skb->len < ((hdlc->state.fr.settings.lmi == LMI_ANSI) - ? LMI_ANSI_LENGTH : LMI_LENGTH)) { + if (skb->len < (lmi == LMI_ANSI ? LMI_ANSI_LENGTH : + LMI_CCITT_CISCO_LENGTH)) { printk(KERN_INFO "%s: Short LMI frame\n", dev->name); return 1; } - if (skb->data[5] != (!hdlc->state.fr.settings.dce ? - LMI_STATUS : LMI_STATUS_ENQUIRY)) { - printk(KERN_INFO "%s: LMI msgtype=%x, Not LMI status %s\n", - dev->name, skb->data[2], - hdlc->state.fr.settings.dce ? "enquiry" : "reply"); + if (skb->data[3] != (lmi == LMI_CISCO ? NLPID_CISCO_LMI : + NLPID_CCITT_ANSI_LMI)) { + printk(KERN_INFO "%s: Received non-LMI frame with LMI" + " DLCI\n", dev->name); return 1; } - i = (hdlc->state.fr.settings.lmi == LMI_ANSI) ? 7 : 6; + if (skb->data[4] != LMI_CALLREF) { + printk(KERN_INFO "%s: Invalid LMI Call reference (0x%02X)\n", + dev->name, skb->data[4]); + return 1; + } + + if (skb->data[5] != (dce ? LMI_STATUS_ENQUIRY : LMI_STATUS)) { + printk(KERN_INFO "%s: Invalid LMI Message type (0x%02X)\n", + dev->name, skb->data[5]); + return 1; + } + + if (lmi == LMI_ANSI) { + if (skb->data[6] != LMI_ANSI_LOCKSHIFT) { + printk(KERN_INFO "%s: Not ANSI locking shift in LMI" + " message (0x%02X)\n", dev->name, skb->data[6]); + return 1; + } + i = 7; + } else + i = 6; - if (skb->data[i] != - ((hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_REPTYPE : LMI_REPTYPE)) { - printk(KERN_INFO "%s: Not a report type=%x\n", + if (skb->data[i] != (lmi == LMI_CCITT ? LMI_CCITT_REPTYPE : + LMI_ANSI_CISCO_REPTYPE)) { + printk(KERN_INFO "%s: Not an LMI Report type IE (0x%02X)\n", dev->name, skb->data[i]); return 1; } - i++; - i++; /* Skip length field */ + if (skb->data[++i] != LMI_REPT_LEN) { + printk(KERN_INFO "%s: Invalid LMI Report type IE length" + " (%u)\n", dev->name, skb->data[i]); + return 1; + } - reptype = skb->data[i++]; + reptype = skb->data[++i]; + if (reptype != LMI_INTEGRITY && reptype != LMI_FULLREP) { + printk(KERN_INFO "%s: Unsupported LMI Report type (0x%02X)\n", + dev->name, reptype); + return 1; + } - if (skb->data[i]!= - ((hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_ALIVE : LMI_ALIVE)) { - printk(KERN_INFO "%s: Unsupported status element=%x\n", - dev->name, skb->data[i]); + if (skb->data[++i] != (lmi == LMI_CCITT ? LMI_CCITT_ALIVE : + LMI_ANSI_CISCO_ALIVE)) { + printk(KERN_INFO "%s: Not an LMI Link integrity verification" + " IE (0x%02X)\n", dev->name, skb->data[i]); return 1; } - i++; - i++; /* Skip length field */ + if (skb->data[++i] != LMI_INTEG_LEN) { + printk(KERN_INFO "%s: Invalid LMI Link integrity verification" + " IE length (%u)\n", dev->name, skb->data[i]); + return 1; + } + i++; hdlc->state.fr.rxseq = skb->data[i++]; /* TX sequence from peer */ rxseq = skb->data[i++]; /* Should confirm our sequence */ txseq = hdlc->state.fr.txseq; - if (hdlc->state.fr.settings.dce) { - if (reptype != LMI_FULLREP && reptype != LMI_INTEGRITY) { - printk(KERN_INFO "%s: Unsupported report type=%x\n", - dev->name, reptype); - return 1; - } + if (dce) hdlc->state.fr.last_poll = jiffies; - } error = 0; if (!hdlc->state.fr.reliable) @@ -703,7 +727,7 @@ error = 1; } - if (hdlc->state.fr.settings.dce) { + if (dce) { if (hdlc->state.fr.fullrep_sent && !error) { /* Stop sending full report - the last one has been confirmed by DTE */ hdlc->state.fr.fullrep_sent = 0; @@ -725,6 +749,7 @@ hdlc->state.fr.dce_changed = 0; } + hdlc->state.fr.request = 1; /* got request */ fr_lmi_send(dev, reptype == LMI_FULLREP ? 1 : 0); return 0; } @@ -739,7 +764,6 @@ if (reptype != LMI_FULLREP) return 0; - stat_len = 3; pvc = hdlc->state.fr.first_pvc; while (pvc) { @@ -750,24 +774,35 @@ no_ram = 0; while (skb->len >= i + 2 + stat_len) { u16 dlci; + u32 bw; unsigned int active, new; - if (skb->data[i] != ((hdlc->state.fr.settings.lmi == LMI_CCITT) - ? LMI_CCITT_PVCSTAT : LMI_PVCSTAT)) { - printk(KERN_WARNING "%s: Invalid PVCSTAT ID: %x\n", - dev->name, skb->data[i]); + if (skb->data[i] != (lmi == LMI_CCITT ? LMI_CCITT_PVCSTAT : + LMI_ANSI_CISCO_PVCSTAT)) { + printk(KERN_INFO "%s: Not an LMI PVC status IE" + " (0x%02X)\n", dev->name, skb->data[i]); return 1; } - i++; - if (skb->data[i] != stat_len) { - printk(KERN_WARNING "%s: Invalid PVCSTAT length: %x\n", - dev->name, skb->data[i]); + if (skb->data[++i] != stat_len) { + printk(KERN_INFO "%s: Invalid LMI PVC status IE length" + " (%u)\n", dev->name, skb->data[i]); return 1; } i++; - dlci = status_to_dlci(skb->data + i, &active, &new); + new = !! (skb->data[i + 2] & 0x08); + active = !! (skb->data[i + 2] & 0x02); + if (lmi == LMI_CISCO) { + dlci = (skb->data[i] << 8) | skb->data[i + 1]; + bw = (skb->data[i + 3] << 16) | + (skb->data[i + 4] << 8) | + (skb->data[i + 5]); + } else { + dlci = ((skb->data[i] & 0x3F) << 4) | + ((skb->data[i + 1] & 0x78) >> 3); + bw = 0; + } pvc = add_pvc(dev, dlci); @@ -783,9 +818,11 @@ pvc->state.deleted = 0; if (active != pvc->state.active || new != pvc->state.new || + bw != pvc->state.bandwidth || !pvc->state.exist) { pvc->state.new = new; pvc->state.active = active; + pvc->state.bandwidth = bw; pvc_carrier(active, pvc); fr_log_dlci_active(pvc); } @@ -801,6 +838,7 @@ pvc_carrier(0, pvc); pvc->state.active = pvc->state.new = 0; pvc->state.exist = 0; + pvc->state.bandwidth = 0; fr_log_dlci_active(pvc); } pvc = pvc->next; @@ -829,22 +867,15 @@ dlci = q922_to_dlci(skb->data); - if (dlci == LMI_DLCI) { - if (hdlc->state.fr.settings.lmi == LMI_NONE) - goto rx_error; /* LMI packet with no LMI? */ - - if (data[3] == LMI_PROTO) { - if (fr_lmi_recv(ndev, skb)) - goto rx_error; - else { - dev_kfree_skb_any(skb); - return NET_RX_SUCCESS; - } - } - - printk(KERN_INFO "%s: Received non-LMI frame with LMI DLCI\n", - ndev->name); - goto rx_error; + if ((dlci == LMI_CCITT_ANSI_DLCI && + (hdlc->state.fr.settings.lmi == LMI_ANSI || + hdlc->state.fr.settings.lmi == LMI_CCITT)) || + (dlci == LMI_CISCO_DLCI && + hdlc->state.fr.settings.lmi == LMI_CISCO)) { + if (fr_lmi_recv(ndev, skb)) + goto rx_error; + dev_kfree_skb_any(skb); + return NET_RX_SUCCESS; } pvc = find_pvc(hdlc, dlci); @@ -1170,7 +1201,8 @@ if ((new_settings.lmi != LMI_NONE && new_settings.lmi != LMI_ANSI && - new_settings.lmi != LMI_CCITT) || + new_settings.lmi != LMI_CCITT && + new_settings.lmi != LMI_CISCO) || new_settings.t391 < 1 || new_settings.t392 < 2 || new_settings.n391 < 1 || --- linux-2.6/drivers/net/wan/hdlc_generic.c 3 Jun 2004 05:04:21 -0000 1.15 +++ linux-2.6/drivers/net/wan/hdlc_generic.c 2 Apr 2005 13:12:18 -0000 @@ -1,7 +1,7 @@ /* * Generic HDLC support routines for Linux * - * Copyright (C) 1999 - 2003 Krzysztof Halasa + * Copyright (C) 1999 - 2005 Krzysztof Halasa * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2 of the GNU General Public License @@ -38,7 +38,7 @@ #include -static const char* version = "HDLC support module revision 1.17"; +static const char* version = "HDLC support module revision 1.18"; #undef DEBUG_LINK @@ -126,10 +126,13 @@ if (!hdlc->open) goto carrier_exit; - if (hdlc->carrier) + if (hdlc->carrier) { + printk(KERN_INFO "%s: Carrier detected\n", dev->name); __hdlc_set_carrier_on(dev); - else + } else { + printk(KERN_INFO "%s: Carrier lost\n", dev->name); __hdlc_set_carrier_off(dev); + } carrier_exit: spin_unlock_irqrestore(&hdlc->state_lock, flags); @@ -157,8 +160,11 @@ spin_lock_irq(&hdlc->state_lock); - if (hdlc->carrier) + if (hdlc->carrier) { + printk(KERN_INFO "%s: Carrier detected\n", dev->name); __hdlc_set_carrier_on(dev); + } else + printk(KERN_INFO "%s: No carrier\n", dev->name); hdlc->open = 1; --=-=-=-- From Robert.Olsson@data.slu.se Sat Apr 2 05:48:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 05:48:49 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32Dmir2005056 for ; Sat, 2 Apr 2005 05:48:44 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j32DmWJo028236; Sat, 2 Apr 2005 15:48:33 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id AA1B9EE2B1; Sat, 2 Apr 2005 15:48:32 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16974.41648.568927.54429@robur.slu.se> Date: Sat, 2 Apr 2005 15:48:32 +0200 To: Eric Dumazet Cc: Herbert Xu , davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424E641A.1020609@cosmosbay.com> References: <424E641A.1020609@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1257 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Eric Dumazet writes: > > This patch is doing too many things. How about splitting it up? > > > > For instance the spin lock stuff is pretty straightforward and > > should be in its own patch. Yes a good idea so it can be tested separatly.... > > The benefits of the GC changes are not obvious to me. rt_check_expire > > is simply meant to kill off old entries. It's not really meant to be > > used to free up entries when the table gets full. Agree with Herbert... > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis| > | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 2618087| 28581| 7673| 0| 0| 0| 0| 0| 1800| 1450| 0| 0| 0| 0| 0| > Without serious tuning, this machine could not handle this load, or even half of it. Yes thats a pretty much load. Very short flows some reason? What's your ip_rt_gc_min_interval? GC should be allowed to run frequent to smoothen out the GC load. Also good idea to decrease gc_thresh and you hash is really huge. > Crashes usually occurs when secret_interval interval is elapsed : rt_cache_flush(0); is called, and the whole machine begins to die. A good idea to increase the secret_interval interval but it should survive. --ro From dada1@cosmosbay.com Sat Apr 2 06:00:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 06:00:24 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32E0I23005893 for ; Sat, 2 Apr 2005 06:00:19 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j32DwuRD012090; Sat, 2 Apr 2005 15:59:02 +0200 Message-ID: <424EA51F.6000300@cosmosbay.com> Date: Sat, 02 Apr 2005 15:58:55 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Herbert Xu CC: davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, hadi@cyberus.ca Subject: Re: Get rid of rt_check_expire and rt_garbage_collect References: <424E641A.1020609@cosmosbay.com> <20050402112304.GA11321@gondor.apana.org.au> In-Reply-To: <20050402112304.GA11321@gondor.apana.org.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Sat, 02 Apr 2005 15:59:03 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1258 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Herbert Xu a écrit : > On Sat, Apr 02, 2005 at 11:21:30AM +0200, Eric Dumazet wrote: > >>Well, I began my work because of the overflow bug in rt_check_expire()... >>Then I realize this function could not work as expected. On a loaded >>machine, one timer tick is 1 ms. >>During this time, number of chains that are scanned is ridiculous. >>With the standard timer of 60 second, fact is rt_check_expire() is useless. > > > I see. What we've got here is a scalability problem with respect > to the number of hash buckets. As the number of buckets increases, > the amount of work the timer GC has to perform inreases proportionally. > > Since the timer GC parameters are fixed, this will eventually break. > > Rather than changing the timer GC so that it runs more often to keep > up with the large routing cache, we should get out of this by reducing > the amount of work we have to do. > > Imagine an ideal balanced hash table with 2.6 million entries. That > is, all incoming/outgoing packets belong to flows that are already in > the hash table. Imagine also that there is no PMTU/link failure taking > place so all entries are valid forever. > > In this state there is absolutely no need to execute the timer GC. > > Let's remove one of those assumptions and allow there to be entries > which need to expire after a set period. > > Instead of having the timer GC clean them up, we can move the expire > check to the place where the entries are used. That is, we make > ip_route_input/ip_route_output/ipv4_dst_check check whether the > entry has expired. > > On the face of it we're doing more work since every routing cache > hit will need to check the validity of the dst. However, because > it's a single subtraction it is actually pretty cheap. There is > also no additional cache miss compared to doing it in the timer > GC since we have to read the dst anyway. > > Let's go one step further and make the routing cache come to life. > Now there are new entries coming in and we need to remove old ones > in order to make room for them. > > That task is currently carried out by the timer GC in rt_check_expire > and on demand by rt_garbage_collect. Either way we have to walk the > entire routing cache looking for entries to get rid of. > > This is quite expensive when the routing cache is large. However, > there is a better way. > > The reason we keep a cap on the routing cache (for a given hash size) > is so that individual chains do not degenerate into long linked lists. > > In other words, we don't really care about how many entries there are > in the routing cache. But we do care about how long each hash chain > is. > > So instead of walking the entire routing cache to keep the number of > entries down, what we should do is keep each hash chain as short as > possible. > > Assuming that the hash function is good, this should achieve the > same end result. > > Here is how it can be done: Every time a routing entry is inserted into > a hash chain, we perform GC on that chain unconditionally. > > It might seem that we're doing more work again. However, as before > because we're traversing the chain anyway, it is very cheap to perform > the GC operations which mainly involve the checks in rt_may_expire. > > OK that's enough thinking and it's time to write some code to see > whether this is all bullshit :) > > Cheers, Well, it may work if you dont care about memory used. # grep dst /proc/slabinfo ip_dst_cache 2825575 2849590 384 10 1 : tunables 54 27 8 : slabdata 284959 284959 0 On this machine, route cache takes 1.1 GB of ram... impressive. Then if the network load decrease (or completely stop), only a timer driven gc could purge the cache. So rt_check_expire() is *needed* You are right saying that gc parameters are fixed, thus gc breaks at high load. Eric From kuznet@yakov.inr.ac.ru Sat Apr 2 06:01:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 06:01:55 -0800 (PST) Received: from yakov.inr.ac.ru (yakov.inr.ac.ru [194.67.69.111]) by oss.sgi.com (8.13.0/8.13.0) with SMTP id j32E1nGS006071 for ; Sat, 2 Apr 2005 06:01:50 -0800 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=ms2.inr.ac.ru; b=B8c1/dbp/mKlnqQmR1uEiCDuXy7JrqjD3TOaLRO6GavyKIR5pkkfrtTkqwrL2rqtDeKJl2ixtsTFnEwsjwFigj4zaLU4CR6XietT3qfLyqFjhM4vvihNur9oHqnMiVZdxEMSzE2amGZimpXr59CxLROBlINBibpA7S6BSMOpbq8=; Received: (from kuznet@localhost) envelope-from=kuznet by yakov.inr.ac.ru (8.6.13/ANK) id SAA13068; Sat, 2 Apr 2005 18:00:19 +0400 Date: Sat, 2 Apr 2005 18:00:19 +0400 From: Alexey Kuznetsov To: jamal Cc: Herbert Xu , "David S. Miller" , Masahide NAKAMURA , psec-tools-devel@lists.sourceforge.net, netdev@oss.sgi.com, kaber@trash.net, kuznet@ms2.inr.ac.ru, jmorris@redhat.com Subject: Re: IPSEC: on behavior of acquire Message-ID: <20050402140019.GA13017@yakov.inr.ac.ru> References: <1112405144.1096.33.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112405144.1096.33.camel@jzny.localdomain> User-Agent: Mutt/1.5.6i X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1259 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > a) -ERESTART is the correct signal to return Right behaviour is to behave like ARP. A few of packets are queued, no errors (until timeout), no blocking. Alexey From Robert.Olsson@data.slu.se Sat Apr 2 06:04:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 06:04:30 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32E4PT9007106 for ; Sat, 2 Apr 2005 06:04:25 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j32E3gvg029545; Sat, 2 Apr 2005 16:03:43 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id C0F16EE2B2; Sat, 2 Apr 2005 16:03:42 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16974.42558.753736.846391@robur.slu.se> Date: Sat, 2 Apr 2005 16:03:42 +0200 To: Herbert Xu Cc: Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, hadi@cyberus.ca Subject: Get rid of rt_check_expire and rt_garbage_collect In-Reply-To: <20050402112304.GA11321@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <20050402112304.GA11321@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1260 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > Rather than changing the timer GC so that it runs more often to keep > up with the large routing cache, we should get out of this by reducing > the amount of work we have to do. Yeep. > Imagine an ideal balanced hash table with 2.6 million entries. That > is, all incoming/outgoing packets belong to flows that are already in > the hash table. Imagine also that there is no PMTU/link failure taking > place so all entries are valid forever. > > In this state there is absolutely no need to execute the timer GC. > Let's remove one of those assumptions and allow there to be entries > which need to expire after a set period. > > Instead of having the timer GC clean them up, we can move the expire > check to the place where the entries are used. That is, we make > ip_route_input/ip_route_output/ipv4_dst_check check whether the > entry has expired. > > On the face of it we're doing more work since every routing cache > hit will need to check the validity of the dst. However, because > it's a single subtraction it is actually pretty cheap. There is > also no additional cache miss compared to doing it in the timer > GC since we have to read the dst anyway. > > Let's go one step further and make the routing cache come to life. > Now there are new entries coming in and we need to remove old ones > in order to make room for them. > > That task is currently carried out by the timer GC in rt_check_expire > and on demand by rt_garbage_collect. Either way we have to walk the > entire routing cache looking for entries to get rid of. > > This is quite expensive when the routing cache is large. However, > there is a better way. > > The reason we keep a cap on the routing cache (for a given hash size) > is so that individual chains do not degenerate into long linked lists. > > In other words, we don't really care about how many entries there are > in the routing cache. But we do care about how long each hash chain > is. > > So instead of walking the entire routing cache to keep the number of > entries down, what we should do is keep each hash chain as short as > possible. > > Assuming that the hash function is good, this should achieve the > same end result. > > Here is how it can be done: Every time a routing entry is inserted into > a hash chain, we perform GC on that chain unconditionally. > > It might seem that we're doing more work again. However, as before > because we're traversing the chain anyway, it is very cheap to perform > the GC operations which mainly involve the checks in rt_may_expire. Agree... It's very interesting and worth to test something like this. also it could clean up the GC process and the need for tuning which would be very welcome. --ro From dada1@cosmosbay.com Sat Apr 2 06:10:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 06:10:55 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32EAnSm007751 for ; Sat, 2 Apr 2005 06:10:50 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j32EACvQ012304; Sat, 2 Apr 2005 16:10:17 +0200 Message-ID: <424EA7C2.6060308@cosmosbay.com> Date: Sat, 02 Apr 2005 16:10:10 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Robert Olsson CC: Herbert Xu , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> In-Reply-To: <16974.41648.568927.54429@robur.slu.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Sat, 02 Apr 2005 16:10:18 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1261 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Robert Olsson a écrit : > Eric Dumazet writes: > Yes thats a pretty much load. Very short flows some reason? Well... yes. This is a real server, not a DOS simulation. 1 million TCP flows, and about 3 million peers using UDP frames. > What's your ip_rt_gc_min_interval? GC should be allowed to > run frequent to smoothen out the GC load. Also good idea > to decrease gc_thresh and you hash is really huge. No. As soon as I lower gc_thresh (and let gc running), the machine starts to drop connections and crash some seconds later. I found I had to make the hash table very large (but lowering elasticity, ie chain length) . It needs lot of ram, but at least CPU usage of net/ipv4/route.c is close to 0. # grep . /proc/sys/net/ipv4/route/* /proc/sys/net/ipv4/route/error_burst:5000 /proc/sys/net/ipv4/route/error_cost:1000 /proc/sys/net/ipv4/route/gc_elasticity:2 /proc/sys/net/ipv4/route/gc_interval:1 /proc/sys/net/ipv4/route/gc_min_interval:0 /proc/sys/net/ipv4/route/gc_min_interval_ms:500 /proc/sys/net/ipv4/route/gc_thresh:2900000 /proc/sys/net/ipv4/route/gc_timeout:155 /proc/sys/net/ipv4/route/max_delay:10 /proc/sys/net/ipv4/route/max_size:16777216 /proc/sys/net/ipv4/route/min_adv_mss:256 /proc/sys/net/ipv4/route/min_delay:2 /proc/sys/net/ipv4/route/min_pmtu:552 /proc/sys/net/ipv4/route/mtu_expires:600 /proc/sys/net/ipv4/route/redirect_load:20 /proc/sys/net/ipv4/route/redirect_number:9 /proc/sys/net/ipv4/route/redirect_silence:20480 /proc/sys/net/ipv4/route/secret_interval:36000 From Robert.Olsson@data.slu.se Sat Apr 2 06:47:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 06:47:09 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32El4NW009228 for ; Sat, 2 Apr 2005 06:47:05 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j32EkV4m000845; Sat, 2 Apr 2005 16:46:31 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 63253EE2B1; Sat, 2 Apr 2005 16:46:31 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16974.45127.318022.377635@robur.slu.se> Date: Sat, 2 Apr 2005 16:46:31 +0200 To: Eric Dumazet Cc: Robert Olsson , Herbert Xu , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424EA7C2.6060308@cosmosbay.com> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <424EA7C2.6060308@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1262 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Eric Dumazet writes: > Well... yes. This is a real server, not a DOS simulation. > 1 million TCP flows, and about 3 million peers using UDP frames. I see. > > What's your ip_rt_gc_min_interval? GC should be allowed to > > run frequent to smoothen out the GC load. Also good idea > > to decrease gc_thresh and you hash is really huge. > No. As soon as I lower gc_thresh (and let gc running), the machine starts to drop connections and crash some seconds later. > I found I had to make the hash table very large (but lowering elasticity, ie chain length) . > It needs lot of ram, but at least CPU usage of net/ipv4/route.c is close to 0. OK! Not so bad. Most of your GC likely happens in rt_intern_hash chain pruning. This way you keep hash-chains short and get "datadriven" GC. But there must be bugs causing the crash... Maybe there should be an explicit control hash lengths not via elasticity but adding even more tuning knobs hurts. :) --ro From andrea@suse.de Sat Apr 2 07:01:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 07:01:33 -0800 (PST) Received: from g5.random (ppp-217-133-42-200.cust-adsl.tiscali.it [217.133.42.200]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32F1RNf010182 for ; Sat, 2 Apr 2005 07:01:28 -0800 Received: by g5.random (Postfix, from userid 500) id 308A05753AA; Sat, 2 Apr 2005 17:01:17 +0200 (CEST) Date: Sat, 2 Apr 2005 17:01:17 +0200 From: Andrea Arcangeli To: Greg KH Cc: jaganav@us.ibm.com, Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050402150116.GU29492@g5.random> References: <20050401154348.553f3c46@dxpl.pdx.osdl.net> <1112405833.424df749e61b5@imap.linux.ibm.com> <20050402052738.GA17506@kroah.com> <20050402060216.GA17766@kroah.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050402060216.GA17766@kroah.com> X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 User-Agent: Mutt/1.5.9i X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1263 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andrea@suse.de Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 10:02:16PM -0800, Greg KH wrote: > If you all wish to duplicate this stupidity, feel free, but do not > expect to get any help from the community... And just in case: do not expect to be allowed to use stuff like the rbtree.[ch] which is GPL'd (not LGPL). (ib patches from topspin originally relicensed rbtree.[ch] under BSD...) From jheffner@psc.edu Sat Apr 2 07:33:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 07:33:23 -0800 (PST) Received: from mailer2.psc.edu (mailer2.psc.edu [128.182.66.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32FXAWu011778 for ; Sat, 2 Apr 2005 07:33:11 -0800 Received: from dexter.psc.edu (dexter.psc.edu [128.182.61.232]) by mailer2.psc.edu (8.13.3/8.13.3) with ESMTP id j32FbOUl023976 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 2 Apr 2005 10:37:25 -0500 (EST) Received: from dexter.psc.edu (localhost.psc.edu [127.0.0.1]) by dexter.psc.edu (8.12.11/8.12.10) with ESMTP id j32FWXP1021028; Sat, 2 Apr 2005 10:32:33 -0500 Received: from localhost (jheffner@localhost) by dexter.psc.edu (8.12.11/8.12.11/Submit) with ESMTP id j32FWWVo021025; Sat, 2 Apr 2005 10:32:33 -0500 X-Authentication-Warning: dexter.psc.edu: jheffner owned process doing -bs Date: Sat, 2 Apr 2005 10:32:32 -0500 (EST) From: John Heffner To: Herbert Xu cc: davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1264 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev On Sat, 2 Apr 2005, Herbert Xu wrote: > How about fixing tcp_snd_test directly like this? I tried that first, but it caused a panic. I assumed some other point in the code assumed that invariant that if TSO is disabled then tso_segs==1. I didn't investigate though. > Of course all this will be moot once Dave finishes his TSO rewrite :) That will make things much simpler. ;) -John From dmitry_yus@yahoo.com Sat Apr 2 10:08:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 10:08:48 -0800 (PST) Received: from smtp014.mail.yahoo.com (smtp014.mail.yahoo.com [216.136.173.58]) by oss.sgi.com (8.13.0/8.13.0) with SMTP id j32I8gNY020564 for ; Sat, 2 Apr 2005 10:08:42 -0800 Received: from unknown (HELO ?172.10.7.7?) (dmitry?yus@24.7.114.77 with plain) by smtp014.mail.yahoo.com with SMTP; 2 Apr 2005 18:08:42 -0000 Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics From: Dmitry Yusupov To: "open-iscsi@googlegroups.com" Cc: "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <20050328223203.GC28983@kvack.org> References: <20050324215922.GT14202@opteron.random> <424346FE.20704@cs.wisc.edu> <20050324233921.GZ14202@opteron.random> <20050325034341.GV32638@waste.org> <20050327035149.GD4053@g5.random> <20050327054831.GA15453@waste.org> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@davemloft.net> <52vf7bwo4w.fsf@topspin.com> <1112042936.5088.22.camel@beastie> <20050328223203.GC28983@kvack.org> Content-Type: text/plain Date: Sat, 02 Apr 2005 10:08:37 -0800 Message-Id: <1112465317.24936.10.camel@mylaptop> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1265 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dmitry_yus@yahoo.com Precedence: bulk X-list: netdev On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote: > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > > If you have plans to start new project such as SoftRDMA than yes. lets > > discuss it since set of problems will be similar to what we've got with > > software iSCSI Initiators. > > I'm somewhat interested in seeing a SoftRDMA project get off the ground. > At least the NatSemi 83820 gige MAC is able to provide early-rx interrupts > that allow one to get an rx interrupt before the full payload has arrived > making it possible to write out a new rx descriptor to place the payload > wherever it is ultimately desired. It would be fun to work on if not the > most performant RDMA implementation. I see a lot of skepticism around early-rx interrupt schema. It might work for gige, but i'm not sure if it will fit into 10g. What RDMA gives us is zero-copy on receive and new networking api which has a potential to be HW accelerated. SoftRDMA will never avoid copying on receive. But benefit for SoftRDMA would be its availability on client sides. It is free and it could be easily deployed. Soon Intel & Co will give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if one of those cores will do receive side copying? From willy@www.linux.org.uk Sat Apr 2 10:27:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 10:27:29 -0800 (PST) Received: from parcelfarce.linux.theplanet.co.uk (IDENT:93@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32IRI3F021472 for ; Sat, 2 Apr 2005 10:27:19 -0800 Received: from willy by parcelfarce.linux.theplanet.co.uk with local (Exim 4.33) id 1DHnKy-000079-IH; Sat, 02 Apr 2005 19:27:12 +0100 Date: Sat, 2 Apr 2005 19:27:12 +0100 From: Matthew Wilcox To: jaganav@us.ibm.com Cc: Greg KH , Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050402182712.GA24234@parcelfarce.linux.theplanet.co.uk> References: <1112426991.424e49ef57e2b@imap.linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112426991.424e49ef57e2b@imap.linux.ibm.com> User-Agent: Mutt/1.4.1i X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1266 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: matthew@wil.cx Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 02:29:51AM -0500, jaganav@us.ibm.com wrote: > If this dual license is a concern to other kernel developers as well from > contributing to OpenRDMA, we would seriously consider this and discuss with the > adapter vendors. Yes, it's a serious concern. Please release the code under the GPL only. -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain From linux781@gmail.com Sat Apr 2 10:44:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 10:44:19 -0800 (PST) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.205]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32IiEuN022356 for ; Sat, 2 Apr 2005 10:44:15 -0800 Received: by zproxy.gmail.com with SMTP id 8so55531nzo for ; Sat, 02 Apr 2005 10:44:07 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=aKGmZ4hpLJ7N6OferqIHuGfei+vozd7D7DoJc0CZObBsEX+Mu5qTTB2axEchIpMcZemzRqQPeQ3kqgriTNrooAzSnyHNVIfRqNKlktYZmVTUwbwciM5zfsmetes3V//dC4xOy547FvLkVteeFFTwsAJCOhvnu2XlwfTox/mL+Kw= Received: by 10.36.74.14 with SMTP id w14mr23466nza; Sat, 02 Apr 2005 10:44:07 -0800 (PST) Received: by 10.36.58.7 with HTTP; Sat, 2 Apr 2005 10:44:07 -0800 (PST) Message-ID: <72252ed0504021044e69d634@mail.gmail.com> Date: Sat, 2 Apr 2005 13:44:07 -0500 From: Akshay Kawale Reply-To: Akshay Kawale To: netdev@oss.sgi.com Subject: Re: Difference between skb_put() and skb_push() In-Reply-To: <72252ed05033021463a1f45b6@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <72252ed05033021463a1f45b6@mail.gmail.com> X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1267 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linux781@gmail.com Precedence: bulk X-list: netdev Hi, I am trying to access the tot_len field in the IP Header using a sk_buff structure inside a Netfilter hook. I do something like: (**skb).nh.iph->tot_len += 64 I have tried other variants of the same statement but none of them work. I want to increment the length by 64 bytes, but it gives me an error saying that I am trying to access an 'incomplete data type'. Can anyone shed some light on this problem? tot_len if of type __u16 (unsigned short int). Thanks. - Akshay From asgeir@chelsio.com Sat Apr 2 11:08:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:08:23 -0800 (PST) Received: from stargate.chelsio.com (stargate.chelsio.com [64.186.171.138] (may be forged)) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32J8HZ2024757 for ; Sat, 2 Apr 2005 11:08:18 -0800 Received: from YOGI.asicdesigners.com (yogi.asicdesigners.com [10.192.160.7]) by stargate.chelsio.com (8.12.5/8.12.5) with SMTP id j32J7SfZ015126; Sat, 2 Apr 2005 11:07:28 -0800 Subject: RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Date: Sat, 2 Apr 2005 11:07:28 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Message-ID: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: X-MimeOLE: Produced By Microsoft Exchange V6.0.6487.1 Thread-Topic: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics content-class: urn:content-classes:message Thread-Index: AcU3rw1xnl0wdez6QdSv3xCWP+9qxgABjiog From: "Asgeir Eiriksson" To: "Dmitry Yusupov" , Cc: "David S. Miller" , , , , , , X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j32J8HZ2024757 X-archive-position: 1268 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: asgeir@chelsio.com Precedence: bulk X-list: netdev Dmitry The CPU cycles is only at most half of the story with the other half being the memory sub-system BW. So the validity of your observation depends on the BW we're talking about, i.e. if the client is using a fraction of 10Gbps for RDMA (or DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of the memory sub-system total BW so we don't much care about the extra copy. The situation is different if the client wants something close to 10Gbps (already have such client applications), because today 10Gbps is still a big chunk of the overall memory BW so you really care about eliminating that copy via DDP. 'Asgeir > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Dmitry Yusupov > Sent: Saturday, April 02, 2005 10:09 AM > To: open-iscsi@googlegroups.com > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de; > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com; ksummit-2005- > discuss@thunk.org; netdev@oss.sgi.com > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit > ProposedTopics > > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote: > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > > > If you have plans to start new project such as SoftRDMA than yes. lets > > > discuss it since set of problems will be similar to what we've got > with > > > software iSCSI Initiators. > > > > I'm somewhat interested in seeing a SoftRDMA project get off the ground. > > At least the NatSemi 83820 gige MAC is able to provide early-rx > interrupts > > that allow one to get an rx interrupt before the full payload has > arrived > > making it possible to write out a new rx descriptor to place the payload > > wherever it is ultimately desired. It would be fun to work on if not > the > > most performant RDMA implementation. > > I see a lot of skepticism around early-rx interrupt schema. It might > work for gige, but i'm not sure if it will fit into 10g. > > What RDMA gives us is zero-copy on receive and new networking api which > has a potential to be HW accelerated. SoftRDMA will never avoid copying > on receive. But benefit for SoftRDMA would be its availability on client > sides. It is free and it could be easily deployed. Soon Intel & Co will > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if > one of those cores will do receive side copying? > From laforge@gnumonks.org Sat Apr 2 11:11:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:11:49 -0800 (PST) Received: from ganesha.gnumonks.org (Debian-exim@ganesha.gnumonks.org [213.95.27.120]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JBd6w025449 for ; Sat, 2 Apr 2005 11:11:40 -0800 Received: from sunbeam.hmw-consulting.de ([83.236.178.203] helo=sunbeam.gnumonks.org) by ganesha.gnumonks.org with asmtp (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.34) id 1DHo1t-0000J3-L8 for netdev@oss.sgi.com; Sat, 02 Apr 2005 21:11:33 +0200 Received: from laforge by sunbeam.gnumonks.org with local (Exim 4.50) id 1DHo1s-0000mb-FR for netdev@oss.sgi.com; Sat, 02 Apr 2005 21:11:32 +0200 Date: Sat, 2 Apr 2005 21:11:32 +0200 From: Harald Welte To: netdev@oss.sgi.com Subject: pktgen problem (skb refcount) in 2.6.12-rc1 Message-ID: <20050402191132.GF1890@sunbeam.de.gnumonks.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hK8Uo4Yp55NZU70L" Content-Disposition: inline User-Agent: mutt-ng 1.5.8-r168i (Debian) X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1269 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@gnumonks.org Precedence: bulk X-list: netdev --hK8Uo4Yp55NZU70L Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi! I've tried to get pktgen running on 2.6.12-rc1 (dual-opteron system, two dual e1000 boards). =20 It transmits the requested amount of packets, but the kernel thread(s) will continue to use 100% cpu even after that. I've tried to track the problem down, and I've confirmed that skb->users never goes down to 1 but instead stays at '2'. Therefore the while loop at line 2706 loops forever. Killing the kernel thread or configuring the interface down helps (as a kludge). However, the e1000 module will refuse to unload since apparently it's still referenced by that skb. The system is otherwise idle, and no fancy modules such as netfilter/iptables are loaded. The same system with the same pktgen script works fine with 2.6.11.6. I'm reporting this since it seems like it sounds like we have a skb usage count leak somewhere :( --=20 - Harald Welte http://gnumonks.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) --hK8Uo4Yp55NZU70L Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFCTu5kXaXGVTD0i/8RAnFTAJ0Zx+raxRpD3NBQYYp0vIh8uxK7lgCdFuqS cxrYExXXXuNnx4NAXVGfono= =9ULo -----END PGP SIGNATURE----- --hK8Uo4Yp55NZU70L-- From mingz@ele.uri.edu Sat Apr 2 11:13:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:13:35 -0800 (PST) Received: from leviathan.ele.uri.edu (leviathan.ele.uri.edu [131.128.51.64]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JDTJI025678 for ; Sat, 2 Apr 2005 11:13:30 -0800 Received: from [127.0.0.1] (leviathan [131.128.51.64]) by leviathan.ele.uri.edu (8.12.9/8.12.9) with ESMTP id j32JDLCu013331; Sat, 2 Apr 2005 14:13:22 -0500 (EST) Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics From: Ming Zhang Reply-To: mingz@ele.uri.edu To: open-iscsi Cc: "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <1112465317.24936.10.camel@mylaptop> References: <20050324215922.GT14202@opteron.random> <424346FE.20704@cs.wisc.edu> <20050324233921.GZ14202@opteron.random> <20050325034341.GV32638@waste.org> <20050327035149.GD4053@g5.random> <20050327054831.GA15453@waste.org> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@davemloft.net> <52vf7bwo4w.fsf@topspin.com> <1112042936.5088.22.camel@beastie> <20050328223203.GC28983@kvack.org> <1112465317.24936.10.camel@mylaptop> Content-Type: text/plain Message-Id: <1112469200.4599.4.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Sat, 02 Apr 2005 14:13:21 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1270 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 13:08, Dmitry Yusupov wrote: > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote: > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > > > If you have plans to start new project such as SoftRDMA than yes. lets > > > discuss it since set of problems will be similar to what we've got with > > > software iSCSI Initiators. > > > > I'm somewhat interested in seeing a SoftRDMA project get off the ground. > > At least the NatSemi 83820 gige MAC is able to provide early-rx interrupts > > that allow one to get an rx interrupt before the full payload has arrived > > making it possible to write out a new rx descriptor to place the payload > > wherever it is ultimately desired. It would be fun to work on if not the > > most performant RDMA implementation. > > I see a lot of skepticism around early-rx interrupt schema. It might > work for gige, but i'm not sure if it will fit into 10g. > > What RDMA gives us is zero-copy on receive and new networking api which > has a potential to be HW accelerated. SoftRDMA will never avoid copying > on receive. But benefit for SoftRDMA would be its availability on client > sides. It is free and it could be easily deployed. Soon Intel & Co will > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if > one of those cores will do receive side copying? > dedicated core to dealing with interrupt is fine. but the memory bandwidth is still over-used right? ming From mingz@ele.uri.edu Sat Apr 2 11:14:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:15:00 -0800 (PST) Received: from leviathan.ele.uri.edu (leviathan.ele.uri.edu [131.128.51.64]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JErRE026229 for ; Sat, 2 Apr 2005 11:14:54 -0800 Received: from [127.0.0.1] (leviathan [131.128.51.64]) by leviathan.ele.uri.edu (8.12.9/8.12.9) with ESMTP id j32JElCu013372; Sat, 2 Apr 2005 14:14:47 -0500 (EST) Subject: RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics From: Ming Zhang Reply-To: mingz@ele.uri.edu To: open-iscsi Cc: Dmitry Yusupov , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> Content-Type: text/plain Message-Id: <1112469286.4599.7.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Sat, 02 Apr 2005 14:14:47 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1271 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: netdev yes, thx for explaining this in more detail. copy avoidance is one main goal of rdma. the BW gap is the bottleneck. ming On Sat, 2005-04-02 at 14:07, Asgeir Eiriksson wrote: > Dmitry > > The CPU cycles is only at most half of the story with the other half > being the memory sub-system BW. > > So the validity of your observation depends on the BW we're talking > about, i.e. if the client is using a fraction of 10Gbps for RDMA (or > DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of > the memory sub-system total BW so we don't much care about the extra > copy. > > The situation is different if the client wants something close to 10Gbps > (already have such client applications), because today 10Gbps is still a > big chunk of the overall memory BW so you really care about eliminating > that copy via DDP. > > 'Asgeir > > > -----Original Message----- > > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > > Behalf Of Dmitry Yusupov > > Sent: Saturday, April 02, 2005 10:09 AM > > To: open-iscsi@googlegroups.com > > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de; > > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com; > ksummit-2005- > > discuss@thunk.org; netdev@oss.sgi.com > > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit > > ProposedTopics > > > > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote: > > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > > > > If you have plans to start new project such as SoftRDMA than yes. > lets > > > > discuss it since set of problems will be similar to what we've got > > with > > > > software iSCSI Initiators. > > > > > > I'm somewhat interested in seeing a SoftRDMA project get off the > ground. > > > At least the NatSemi 83820 gige MAC is able to provide early-rx > > interrupts > > > that allow one to get an rx interrupt before the full payload has > > arrived > > > making it possible to write out a new rx descriptor to place the > payload > > > wherever it is ultimately desired. It would be fun to work on if > not > > the > > > most performant RDMA implementation. > > > > I see a lot of skepticism around early-rx interrupt schema. It might > > work for gige, but i'm not sure if it will fit into 10g. > > > > What RDMA gives us is zero-copy on receive and new networking api > which > > has a potential to be HW accelerated. SoftRDMA will never avoid > copying > > on receive. But benefit for SoftRDMA would be its availability on > client > > sides. It is free and it could be easily deployed. Soon Intel & Co > will > > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if > > one of those cores will do receive side copying? > > > > From hadi@cyberus.ca Sat Apr 2 11:20:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:20:26 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JKIHq027275 for ; Sat, 2 Apr 2005 11:20:18 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHoAJ-00041A-W6 for netdev@oss.sgi.com; Sat, 02 Apr 2005 14:20:15 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHoAA-0004ZR-4O; Sat, 02 Apr 2005 14:20:06 -0500 Subject: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050402014619.GB24861@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> Content-Type: multipart/mixed; boundary="=-g5r6p/Y+YZcaZoLnsgWz" Organization: jamalopolous Message-Id: <1112469601.1088.173.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 14:20:01 -0500 X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1272 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-g5r6p/Y+YZcaZoLnsgWz Content-Type: text/plain Content-Transfer-Encoding: 7bit On Fri, 2005-04-01 at 20:46, Herbert Xu wrote: > On Fri, Apr 01, 2005 at 08:42:45PM -0500, jamal wrote: > > > > So always go v2? > > Yes since that's the only version that the kernel knows how to generate. Ok, heres a general patch first cut i think i got all that was discussed in there. ive done some basic 5 minutes tests on. Once we have agreement i will pass it on to Masahide-san to do more thorough testing. Look at the XXX comments in the patch. A couple of interesting things: 1) Weve discussed this before Herbert and i think you misspoke that pfkey delivers to all listerners. pfkey Add/del/upd now really do tell all processes about what happened. Before pfkey would skip the originating process. So far this doesnt seem to be an issue in the basic testing. 2) I ended adding a policy_notify to the pfkey manager to make the code generic. Interesting thing is i dont think pfkey knows what to do with policy expiration or i am misreading the code. I dont see any message type for policy expiration as i do for sa expiration. Ive put some hooks and a little noise. I could remove the printks - for now they are just place holders. cheers, jamal --=-g5r6p/Y+YZcaZoLnsgWz Content-Disposition: attachment; filename=ipsec-event-take2 Content-Type: text/plain; name=ipsec-event-take2; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 +++ b/include/net/xfrm.h 2005-04-02 11:59:17.000000000 -0500 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -802,7 +827,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-25 22:28:39.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-02 09:53:03.000000000 -0500 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-03-25 22:28:25.000000000 -0500 +++ b/net/xfrm/xfrm_state.c 2005-04-02 12:15:37.000000000 -0500 @@ -48,7 +48,7 @@ static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); static DEFINE_SPINLOCK(xfrm_state_gc_lock); -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); @@ -208,8 +208,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -236,14 +238,47 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +static DEFINE_RWLOCK(xfrm_km_lock); +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); + +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { + struct xfrm_mgr *km; + + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock(&xfrm_km_lock); +} + +void km_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_mgr *km; + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + km->notify(x, c); + read_unlock(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +int xfrm_state_delete(struct xfrm_state *x) +{ + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -402,6 +437,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -764,37 +800,45 @@ } EXPORT_SYMBOL(xfrm_replay_advance); -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); -static DEFINE_RWLOCK(xfrm_km_lock); static void km_state_expired(struct xfrm_state *x, int hard) { - struct xfrm_mgr *km; + struct km_event c; if (hard) x->km.state = XFRM_STATE_EXPIRED; else x->km.dying = 1; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); - read_unlock(&xfrm_km_lock); + /* XXX: Do we wanna do this right at the top?? + * if the state is dead we dont want to announce + * the expire - a delete may already have announced + * it + */ + if (x->km.state == XFRM_STATE_DEAD) + return; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -819,13 +863,20 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + /* XXX: Do we still wanna wakeup km_waitq? + * if the policy is dead we dont want to announce + * the expire - a delete may already have announced + * it + */ + if (pol->dead) + return; + + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 +++ b/net/xfrm/xfrm_policy.c 2005-04-02 12:16:30.000000000 -0500 @@ -298,7 +298,7 @@ * entry dead. The rule must be unlinked from lists to the moment. */ -static void xfrm_policy_kill(struct xfrm_policy *policy) +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir) { write_lock_bh(&policy->lock); if (policy->dead) @@ -378,7 +378,7 @@ write_unlock_bh(&xfrm_policy_lock); if (delpol) { - xfrm_policy_kill(delpol); + xfrm_policy_kill(delpol, dir); } return 0; } @@ -402,7 +402,7 @@ if (pol && delete) { atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } return pol; } @@ -425,7 +425,7 @@ if (pol && delete) { atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } return pol; } @@ -442,7 +442,7 @@ xfrm_policy_list[dir] = xp->next; write_unlock_bh(&xfrm_policy_lock); - xfrm_policy_kill(xp); + xfrm_policy_kill(xp, dir); write_lock_bh(&xfrm_policy_lock); } @@ -558,7 +558,7 @@ if (pol) { if (dir < XFRM_POLICY_MAX) atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } } @@ -579,7 +579,7 @@ write_unlock_bh(&xfrm_policy_lock); if (old_pol) { - xfrm_policy_kill(old_pol); + xfrm_policy_kill(old_pol, dir); } return 0; } --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-02 12:21:32.000000000 -0500 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -285,14 +286,26 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +317,20 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -672,6 +695,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +707,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +718,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +845,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +874,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +888,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1111,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1128,94 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_sa( struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1349,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1361,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1369,90 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("Netlink Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-03-25 22:28:39.000000000 -0500 +++ b/net/key/af_key.c 2005-04-02 12:25:49.000000000 -0500 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1256,7 +1328,7 @@ if (hdr->sadb_msg_type == SADB_ADD) err = xfrm_state_add(x); - else + else err = xfrm_state_update(x); if (err < 0) { @@ -1265,27 +1337,22 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1368,20 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1519,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + // XXX:do we have to pass proto as well? + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1947,31 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1979,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2047,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2079,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,24 +2115,11 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); -out: xfrm_pol_put(xp); return err; } @@ -2037,8 +2129,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2141,19 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + /* + * XXX: previous get was doing a broadcast-all _always_ + * which didnt seem right for non-deletion case - JHS + * This is like the way netlink behaves .. + * Shall i restore original behavior? + */ + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2188,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2414,25 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + printk("pfkey doesnt deal with expired policies ..\n"); + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2451,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3004,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-g5r6p/Y+YZcaZoLnsgWz-- From liontooth@cogweb.net Sat Apr 2 11:25:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:25:57 -0800 (PST) Received: from weber.sscnet.ucla.edu (weber.sscnet.ucla.edu [128.97.42.3]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JPnCI027930 for ; Sat, 2 Apr 2005 11:25:49 -0800 Received: from localhost (localhost [127.0.0.1]) by weber.sscnet.ucla.edu (8.13.4/8.13.4) with ESMTP id j32JPj40013828; Sat, 2 Apr 2005 11:25:45 -0800 (PST) Received: from weber.sscnet.ucla.edu ([127.0.0.1]) by localhost (weber [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 12686-02; Sat, 2 Apr 2005 11:25:45 -0800 (PST) Received: from [128.97.221.35] (clitunno.sscnet.ucla.edu [128.97.221.35]) by weber.sscnet.ucla.edu (8.13.4/8.13.4) with ESMTP id j32JPKtv013762; Sat, 2 Apr 2005 11:25:20 -0800 (PST) Message-ID: <424EF19B.7030105@cogweb.net> Date: Sat, 02 Apr 2005 11:25:15 -0800 From: David Liontooth User-Agent: Debian Thunderbird 1.0 (X11/20050118) X-Accept-Language: en-us, en MIME-Version: 1.0 To: venza@brownhat.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: ICS1883 LAN PHY not detected X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new at weber.sscnet.ucla.edu X-Virus-Status: Clean X-archive-position: 1273 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: liontooth@cogweb.net Precedence: bulk X-list: netdev Gigabyte's K8NS Ultra-939 mobo has a 100/10 LAN PHY chip, ICS1883, which isn't detected by the 2.6.12-rc1 kernel (and likely not previous kernels). http://www.giga-byte.com/MotherBoard/Products/Products_Spec_GA-K8NS%20Ultra-939.htm On the other hand, the ports light up when connected. The device may be similar to ICS1893, which is supported by the sis900 driver. However, I figure the device first has to be detected? Any advice appreciated. Dave # lspci 0000:00:00.0 Host bridge: nVidia Corporation: Unknown device 00e1 (rev a1) 0000:00:01.0 ISA bridge: nVidia Corporation: Unknown device 00e0 (rev a2) 0000:00:01.1 SMBus: nVidia Corporation: Unknown device 00e4 (rev a1) 0000:00:02.0 USB Controller: nVidia Corporation: Unknown device 00e7 (rev a1) 0000:00:02.1 USB Controller: nVidia Corporation: Unknown device 00e7 (rev a1) 0000:00:02.2 USB Controller: nVidia Corporation: Unknown device 00e8 (rev a2) 0000:00:05.0 Bridge: nVidia Corporation: Unknown device 00df (rev a2) 0000:00:06.0 Multimedia audio controller: nVidia Corporation: Unknown device 00ea (rev a1) 0000:00:08.0 IDE interface: nVidia Corporation: Unknown device 00e5 (rev a2) 0000:00:0a.0 IDE interface: nVidia Corporation: Unknown device 00e3 (rev a2) 0000:00:0b.0 PCI bridge: nVidia Corporation: Unknown device 00e2 (rev a2) 0000:00:0e.0 PCI bridge: nVidia Corporation: Unknown device 00ed (rev a2) 0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:02:0b.0 Ethernet controller: Marvell Technology Group Ltd. Yukon Gigabit Ethernet 10/100/1000Base-T Adapter (rev 13) 0000:02:0d.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD Technology Inc)SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01) 0000:02:0e.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01) From herbert@gondor.apana.org.au Sat Apr 2 11:33:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:33:31 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JXLwe028654 for ; Sat, 2 Apr 2005 11:33:22 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHoMW-0006q1-00; Sun, 03 Apr 2005 05:32:52 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHoM5-0006YU-00; Sun, 03 Apr 2005 05:32:25 +1000 Date: Sun, 3 Apr 2005 05:32:24 +1000 To: Robert Olsson Cc: Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050402193224.GA25157@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16974.41648.568927.54429@robur.slu.se> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1274 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 03:48:32PM +0200, Robert Olsson wrote: > > > Crashes usually occurs when secret_interval interval is elapsed : rt_cache_flush(0); is called, and the whole machine begins to die. > > A good idea to increase the secret_interval interval but it should survive. Incidentally we should change the way the rehashing is triggered. Instead of doing it regularly, we can do it when we notice that a specific hash chain grows beyond a certain size. The idea is that if someone is attacking our hash then they can only do so by lengthening the chains. If they're not doing that then even if they knew how to attack us we don't really care. Of course when it does happen it'll still kill your machine unless we can find a way to amortise this. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sat Apr 2 11:36:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:36:34 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JaQoK029236 for ; Sat, 2 Apr 2005 11:36:27 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHoPX-0006qk-00; Sun, 03 Apr 2005 05:35:59 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHoPL-0006ZD-00; Sun, 03 Apr 2005 05:35:47 +1000 Date: Sun, 3 Apr 2005 05:35:47 +1000 To: John Heffner Cc: davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery Message-ID: <20050402193547.GB25157@gondor.apana.org.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1275 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 10:32:32AM -0500, John Heffner wrote: > On Sat, 2 Apr 2005, Herbert Xu wrote: > > > How about fixing tcp_snd_test directly like this? > > I tried that first, but it caused a panic. I assumed some other point in > the code assumed that invariant that if TSO is disabled then tso_segs==1. > I didn't investigate though. Do you remember what the panic looked like? Perhaps it was because tso_segs wasn't set at all? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From davem@davemloft.net Sat Apr 2 11:56:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 11:56:33 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32JuRLZ030268 for ; Sat, 2 Apr 2005 11:56:27 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHoiO-0005Rw-00; Sat, 02 Apr 2005 11:55:28 -0800 Date: Sat, 2 Apr 2005 11:55:28 -0800 From: "David S. Miller" To: Herbert Xu Cc: Robert.Olsson@data.slu.se, dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050402115528.11f71a3c.davem@davemloft.net> In-Reply-To: <20050402193224.GA25157@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1276 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Sun, 3 Apr 2005 05:32:24 +1000 Herbert Xu wrote: > On Sat, Apr 02, 2005 at 03:48:32PM +0200, Robert Olsson wrote: > > > > > Crashes usually occurs when secret_interval interval is elapsed : rt_cache_flush(0); is called, and the whole machine begins to die. > > > > A good idea to increase the secret_interval interval but it should survive. > > Incidentally we should change the way the rehashing is triggered. > Instead of doing it regularly, we can do it when we notice that a > specific hash chain grows beyond a certain size. > > The idea is that if someone is attacking our hash then they can > only do so by lengthening the chains. If they're not doing that > then even if they knew how to attack us we don't really care. Yes, the secret_interval is way too short. It is a very paranoid default value selected when initially fixing that DoS. I think we should, in the short term, increase the secret interval where it exists in the tree (netfilter conntrack is another instance for example). From hadi@cyberus.ca Sat Apr 2 12:47:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 12:47:49 -0800 (PST) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32KlhrM003188 for ; Sat, 2 Apr 2005 12:47:44 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DHpWx-000554-Hj for netdev@oss.sgi.com; Sat, 02 Apr 2005 15:47:43 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHpWt-00064y-PT; Sat, 02 Apr 2005 15:47:40 -0500 Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() From: jamal Reply-To: hadi@cyberus.ca To: Eric Dumazet Cc: Robert Olsson , Herbert Xu , "David S. Miller" , netdev In-Reply-To: <424EA7C2.6060308@cosmosbay.com> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <424EA7C2.6060308@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1 Organization: jamalopolous Message-Id: <1112474855.1096.274.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 15:47:36 -0500 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1277 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 09:10, Eric Dumazet wrote: > Robert Olsson a écrit : > > Eric Dumazet writes: > > > Yes thats a pretty much load. Very short flows some reason? > > Well... yes. This is a real server, not a DOS simulation. > 1 million TCP flows, and about 3 million peers using UDP frames. SMP? How many processors? cheers, jamal From hadi@cyberus.ca Sat Apr 2 13:06:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:06:09 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32L64CT004180 for ; Sat, 2 Apr 2005 13:06:04 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHpog-0001SQ-QE for netdev@oss.sgi.com; Sat, 02 Apr 2005 16:06:02 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHpoe-00081M-08; Sat, 02 Apr 2005 16:06:00 -0500 Subject: Re: Get rid of rt_check_expire and rt_garbage_collect From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Eric Dumazet , "David S. Miller" , netdev , Robert Olsson In-Reply-To: <20050402112304.GA11321@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <20050402112304.GA11321@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112475955.1088.294.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 16:05:55 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1278 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 06:23, Herbert Xu wrote: > On Sat, Apr 02, 2005 at 11:21:30AM +0200, Eric Dumazet wrote: > > > > Well, I began my work because of the overflow bug in rt_check_expire()... > > Then I realize this function could not work as expected. On a loaded > > machine, one timer tick is 1 ms. > > During this time, number of chains that are scanned is ridiculous. > > With the standard timer of 60 second, fact is rt_check_expire() is useless. > > I see. What we've got here is a scalability problem with respect > to the number of hash buckets. As the number of buckets increases, > the amount of work the timer GC has to perform inreases proportionally. > Its classical incremental garbage collection algorithm thats being used i.e something along whats typically refered to as mark-and-sweep. Could the main issue be not the amount of routes in the cache but rather the locking when number of CPUs go up? Incrementing the timer frequency would certainly help but maybe have adverse effects if the frequency is too high because of the across system locking IMO. > Since the timer GC parameters are fixed, this will eventually break. > > Rather than changing the timer GC so that it runs more often to keep > up with the large routing cache, we should get out of this by reducing > the amount of work we have to do. > Refer to my hint above: perhaps per CPU caches? > Imagine an ideal balanced hash table with 2.6 million entries. That > is, all incoming/outgoing packets belong to flows that are already in > the hash table. Imagine also that there is no PMTU/link failure taking > place so all entries are valid forever. > > > In this state there is absolutely no need to execute the timer GC. > Yeah, but memory is finite friend. True, if you can imagine infinite memory we would not need gc ;-> > Let's remove one of those assumptions and allow there to be entries > which need to expire after a set period. > Instead of having the timer GC clean them up, we can move the expire > check to the place where the entries are used. That is, we make > ip_route_input/ip_route_output/ipv4_dst_check check whether the > entry has expired. > If you can show lock grabbing is the main contentious issue; i believe it is as CPUs go up. Then this is a valuable idea since you are already grabbing the locks anyways. > On the face of it we're doing more work since every routing cache > hit will need to check the validity of the dst. However, because > it's a single subtraction it is actually pretty cheap. There is > also no additional cache miss compared to doing it in the timer > GC since we have to read the dst anyway. > In the case of slower machine, the compute is also an issue. To be honest i feel like handwaving - experimenting and collecting profiles would help nail it. > Let's go one step further and make the routing cache come to life. > Now there are new entries coming in and we need to remove old ones > in order to make room for them. > > That task is currently carried out by the timer GC in rt_check_expire > and on demand by rt_garbage_collect. Either way we have to walk the > entire routing cache looking for entries to get rid of. > we dont really do the whole route cache everytime - I am sure you know that. > This is quite expensive when the routing cache is large. However, > there is a better way. > > The reason we keep a cap on the routing cache (for a given hash size) > is so that individual chains do not degenerate into long linked lists. > > In other words, we don't really care about how many entries there are > in the routing cache. But we do care about how long each hash chain > is. > > So instead of walking the entire routing cache to keep the number of > entries down, what we should do is keep each hash chain as short as > possible. > Thats certainly one solution .. reading on how you achive this .. > Assuming that the hash function is good, this should achieve the > same end result. > > Here is how it can be done: Every time a routing entry is inserted into > a hash chain, we perform GC on that chain unconditionally. > May not be a good idea to do it unconditionally - in particular on SMP where another CPU maybe spinning waiting for you to let go of bucket lock. In particular if a burst of packets accessing the same bucket show up on different processors, this would be aggravated. You may wanna kick in this algorithm only when things start going past a certain threshold. > It might seem that we're doing more work again. However, as before > because we're traversing the chain anyway, it is very cheap to perform > the GC operations which mainly involve the checks in rt_may_expire. > > OK that's enough thinking and it's time to write some code to see > whether this is all bullshit :) > I think there are some good ideas in there; the bottleneck could be perceived as one of either the locks are too expensive (clearly so in SMP as number of CPUs go up) or the compute is taking too long (clearly so in slower systems - but a general fact of life as well). For the first issue, amortizing the lock grabbing via compute as you suggest maybe of value or make per cpu caches. cheers, jamal From hadi@cyberus.ca Sat Apr 2 13:09:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:09:09 -0800 (PST) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32L93mq004713 for ; Sat, 2 Apr 2005 13:09:03 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DHprT-0005n7-Rj for netdev@oss.sgi.com; Sat, 02 Apr 2005 14:08:55 -0700 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHprX-0008HI-Mo; Sat, 02 Apr 2005 16:09:00 -0500 Subject: Re: RFC: Redirect-Device From: jamal Reply-To: hadi@cyberus.ca To: Meelis Roos Cc: netdev In-Reply-To: References: Content-Type: text/plain Organization: jamalopolous Message-Id: <1112476135.1087.298.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 16:08:55 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1279 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Sat, 2005-04-02 at 03:41, Meelis Roos wrote: > j> I must be missing something: What is it that this device can do that the > j> mirred action cant do? > > I know what I am missing here: documentation. There is very basic > documentation about tc qdisc+class+filter level and almost nothing on the > newer features. Without good documentation only some developers > understand it. Have you tried looking at iproute2 doc/examples? Theres some new stuff in there. Over time more stuff will be added - and contributions welcome as well. cheers, jamal From hadi@cyberus.ca Sat Apr 2 13:28:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:29:01 -0800 (PST) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32LSu4O006074 for ; Sat, 2 Apr 2005 13:28:57 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DHqAo-0005rD-Nc for netdev@oss.sgi.com; Sat, 02 Apr 2005 16:28:54 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHqAm-0001r3-CC; Sat, 02 Apr 2005 16:28:52 -0500 Subject: Re: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Aidas Kasparas Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org In-Reply-To: <424E454D.4090402@gmc.lt> References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112477326.1088.321.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 16:28:46 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1280 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 02:10, Aidas Kasparas wrote: > > Re 1 try only. There is little sense to do more tries. If there is no > deamon listening to pfkey messages, then no connection will be made no > matter how many retries you'll do. If deamon/link/peer is slow and SA > was not established before timeout expired, then repeated acquire will > be simply ignored (deamon will find out that negotiation is already in > progress, there is no reason to start another negotiation and therefore > will drop that acquire request). And the only situation where repeated > acquires may help is when pfkey messages are lost. Exactly what i was trying to emulate - lost messages. I would expect it to be the rule to loose messages - but given theres no guarantee of delivery, messages could be lost. > But pfkey was not > designed to survive message loses, therefore you should not operate your > boxes in mode when lost pfkey messages are a rule, not an exception. And > on the other hand, occasional pfkey message loses can be worked around > by applications/user retry. > I think its more than just pfkey (or netlink) - rather the ipsec framework itself. One could look at the acquire as part of the "connection" setup (for lack of better description). Without the acquire succeeding, theres no connection..(assuming that to be a policy). Therefore if acquire is not supposed to be delivered with some certainty (read: retries) then theres some resiliciency issues IMO. Note: Sometimes theres no app. Example a packet coming into a gateway. > Re error code returned. Error codes returned by pfkey never were > perfect. But your experiment is not perfect too. You sent pings with no > KE deamon running. Note what my goals were. > pfkey code found that there is nothing receiving > acquire messages => there is no chance that any process will setup > required SAs and tried to inform about that (I agree, return code is not > very informative, at least until you learn about reasons why it is > such). If you would have racoon (or other pfkey based ISAKMP daemon) > running, you would get "resource temporarily unavailable" (don't know > which error code corresponds to that message), which IMHO is ok (if it > is not, please explain). > Havent tried that - the reason i said restart was the right signal was mainly that an app could translate that to mean "try again". In other words even in the case of ping -c1 the ping app could have reattempted. On Sat, 2005-04-02 at 07:25, Zilvinas Valinskas wrote: > EBUSY I think it is. > > I am not entirely sure it is ok to return such error, some applications are > not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it > doesn't brake old apps assumption (connection cannot be established, > doesn't matter if that is due to routing or IPsec SPD or anything else). > What about ERESTART the way netlink does it right now? ECONNREFUSED is probably not a bad idea. ping was clearly dumb and didnt do anything with the info. Overall, I think the errors are unfortunately not descriptive at all. cheers, jamal From tgraf@suug.ch Sat Apr 2 13:36:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:36:35 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32LaPOw006809 for ; Sat, 2 Apr 2005 13:36:26 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id 58E7BF; Sat, 2 Apr 2005 23:36:02 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 1FDD41C0EA; Sat, 2 Apr 2005 23:36:43 +0200 (CEST) Date: Sat, 2 Apr 2005 23:36:42 +0200 From: Thomas Graf To: Abhishek Gupta Cc: netdev@oss.sgi.com Subject: Re: Problem using HTB Message-ID: <20050402213642.GO3086@postel.suug.ch> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1281 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Abhishek Gupta 2005-04-01 15:10 > tc class add dev $DEV0 parent 2: classid 2:1 htb rate 100kbit burst 100 \ > ceil 100kbit > [...] > I have configured for 100kbps, I am getting only 12kbps as the link speed. Before I look into this, are you aware of 1kbps=8kbit? From hadi@cyberus.ca Sat Apr 2 13:43:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:43:07 -0800 (PST) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32Lh2nq007527 for ; Sat, 2 Apr 2005 13:43:02 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DHqON-0007KX-5Z for netdev@oss.sgi.com; Sat, 02 Apr 2005 14:42:55 -0700 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHqOM-00035F-Qn; Sat, 02 Apr 2005 16:42:55 -0500 Subject: Re: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Alexey Kuznetsov Cc: Herbert Xu , "David S. Miller" , Masahide NAKAMURA , ipsec-tools-devel@lists.sourceforge.net, netdev , kaber@trash.net, jmorris@redhat.com In-Reply-To: <20050402140019.GA13017@yakov.inr.ac.ru> References: <1112405144.1096.33.camel@jzny.localdomain> <20050402140019.GA13017@yakov.inr.ac.ru> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112478168.1088.337.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Apr 2005 16:42:48 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1282 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 09:00, Alexey Kuznetsov wrote: > Hello! > > > a) -ERESTART is the correct signal to return > > Right behaviour is to behave like ARP. A few of packets are queued, > no errors (until timeout), no blocking. Herbert also mentions something along the same lines in his email. This would make a lot of sense! Is the state machine going to look something along the same lines as ARP? i.e incomplete->reachable etc? What would be a good code to return when you queue the packet? cheers, jamal From tgraf@suug.ch Sat Apr 2 13:52:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 13:52:42 -0800 (PST) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32LqbVE008323 for ; Sat, 2 Apr 2005 13:52:38 -0800 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id 93A0282; Sat, 2 Apr 2005 23:52:14 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 9099E1C0EA; Sat, 2 Apr 2005 23:52:56 +0200 (CEST) Date: Sat, 2 Apr 2005 23:52:56 +0200 From: Thomas Graf To: jamal Cc: Alexey Kuznetsov , Herbert Xu , "David S. Miller" , Masahide NAKAMURA , ipsec-tools-devel@lists.sourceforge.net, netdev , kaber@trash.net, jmorris@redhat.com Subject: Re: IPSEC: on behavior of acquire Message-ID: <20050402215256.GP3086@postel.suug.ch> References: <1112405144.1096.33.camel@jzny.localdomain> <20050402140019.GA13017@yakov.inr.ac.ru> <1112478168.1088.337.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112478168.1088.337.camel@jzny.localdomain> X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1283 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * jamal <1112478168.1088.337.camel@jzny.localdomain> 2005-04-02 16:42 > Herbert also mentions something along the same lines in his email. > This would make a lot of sense! > Is the state machine going to look something along the same lines as > ARP? i.e incomplete->reachable etc? > > What would be a good code to return when you queue the packet? EINPROGRESS? From juhl-lkml@dif.dk Sat Apr 2 14:36:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 14:36:51 -0800 (PST) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j32MaiVo009757 for ; Sat, 2 Apr 2005 14:36:45 -0800 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 9219BFFD23 for ; Sun, 3 Apr 2005 00:46:20 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 15976-02 for ; Sun, 3 Apr 2005 00:46:19 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id 445ACFFCA9 for ; Sun, 3 Apr 2005 00:46:19 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Sun, 03 Apr 2005 00:35:29 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id HNMVRDHM; Sun, 3 Apr 2005 00:36:33 +0200 Date: Sun, 3 Apr 2005 00:38:54 +0200 (CEST) From: Jesper Juhl To: Maciej Soltysiak Cc: "James P. Ketrenos" , netdev@oss.sgi.com, "David S. Miller" , linux-kernel@vger.kernel.org Subject: Re: [2.6.12-rc1-mm4] swapped memset arguments In-Reply-To: <74334709.20050402233007@dns.toxicfilms.tv> Message-ID: References: <74334709.20050402233007@dns.toxicfilms.tv> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Scanned: amavisd-new at dif.dk X-Virus-Status: Clean X-archive-position: 1284 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev On Sat, 2 Apr 2005, Maciej Soltysiak wrote: > Hi, > > out of boredom I grepped 2.6.12-rc1-mm4 for swapped memset arguments. > I found one: > > # grep -nr "memset.*\,\(\ \|\)0\(\ \|\));" * > net/ieee80211/ieee80211_tx.c:226: memset(txb, sizeof(struct ieee80211_txb), 0); > And here's a patch : Fix swapped memset() arguments in net/ieee80211/ieee80211_tx.c found by Maciej Soltysiak. Signed-off-by: Jesper Juhl --- linux-2.6.12-rc1-mm4-orig/net/ieee80211/ieee80211_tx.c 2005-03-31 21:20:08.000000000 +0200 +++ linux-2.6.12-rc1-mm4/net/ieee80211/ieee80211_tx.c 2005-04-03 00:34:22.000000000 +0200 @@ -223,7 +223,7 @@ struct ieee80211_txb *ieee80211_alloc_tx if (!txb) return NULL; - memset(txb, sizeof(struct ieee80211_txb), 0); + memset(txb, 0, sizeof(struct ieee80211_txb)); txb->nr_frags = nr_frags; txb->frag_size = txb_size; From jgarzik@pobox.com Sat Apr 2 17:25:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 17:25:34 -0800 (PST) Received: from parcelfarce.linux.theplanet.co.uk (IDENT:93@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j331PShU018360 for ; Sat, 2 Apr 2005 17:25:28 -0800 Received: from cpe-024-025-022-197.nc.res.rr.com ([24.25.22.197] helo=[10.10.10.88]) by parcelfarce.linux.theplanet.co.uk with asmtp (TLSv1:AES256-SHA:256) (Exim 4.33) id 1DHtri-0006Jy-F3; Sun, 03 Apr 2005 02:25:26 +0100 Message-ID: <424F45F0.1000504@pobox.com> Date: Sat, 02 Apr 2005 20:25:04 -0500 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050328 Fedora/1.7.6-1.2.5 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Liontooth CC: venza@brownhat.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: ICS1883 LAN PHY not detected References: <424EF19B.7030105@cogweb.net> In-Reply-To: <424EF19B.7030105@cogweb.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1286 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev David Liontooth wrote: > 0000:02:0b.0 Ethernet controller: Marvell Technology Group Ltd. Yukon > Gigabit Ethernet 10/100/1000Base-T Adapter (rev 13) You want the sk98lin or skge drivers. Jeff From grundler@lackof.org Sat Apr 2 17:24:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 17:25:03 -0800 (PST) Received: from colo.lackof.org (colo.lackof.org [198.49.126.79]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j331OnPg018306 for ; Sat, 2 Apr 2005 17:24:49 -0800 Received: from localhost (localhost [127.0.0.1]) by colo.lackof.org (Postfix) with ESMTP id 4727429802F; Sat, 2 Apr 2005 18:26:37 -0700 (MST) Received: from colo.lackof.org ([127.0.0.1]) by localhost (colo.lackof.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 04205-03; Sat, 2 Apr 2005 18:26:35 -0700 (MST) Received: by colo.lackof.org (Postfix, from userid 27253) id C5BFF298010; Sat, 2 Apr 2005 18:26:35 -0700 (MST) Date: Sat, 2 Apr 2005 18:26:35 -0700 From: Grant Grundler To: jaganav@us.ibm.com Cc: Greg KH , Stephen Hemminger , Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050403012635.GA4218@colo.lackof.org> References: <1112426991.424e49ef57e2b@imap.linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112426991.424e49ef57e2b@imap.linux.ibm.com> X-Home-Page: http://www.parisc-linux.org/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at lackof.org X-Virus-Status: Clean X-archive-position: 1285 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: grundler@parisc-linux.org Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 02:29:51AM -0500, jaganav@us.ibm.com wrote: > If this dual license is a concern to other kernel developers as well from > contributing to OpenRDMA, we would seriously consider this and discuss > with the adapter vendors. I'm not concerned with it. If *BSD can thrive with it's license, I don't see why it's a problem for linux. HP is going to pay me to work on the code regardless of the license. Projects I work on privately happen to be GPL though I'm not religous about it. If people choose NOT to volunteer time/effort on dual licensed code, I understand and respect that. There are enough worthy GPL only projects out there. I'm speaking for myself and NOT for HP. grant From liontooth@cogweb.net Sat Apr 2 21:28:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 21:28:41 -0800 (PST) Received: from weber.sscnet.ucla.edu (weber.sscnet.ucla.edu [128.97.42.3]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j335SZJl028429 for ; Sat, 2 Apr 2005 21:28:35 -0800 Received: from localhost (localhost [127.0.0.1]) by weber.sscnet.ucla.edu (8.13.4/8.13.4) with ESMTP id j335SZTF008913; Sat, 2 Apr 2005 21:28:35 -0800 (PST) Received: from weber.sscnet.ucla.edu ([127.0.0.1]) by localhost (weber [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 08242-01; Sat, 2 Apr 2005 21:28:35 -0800 (PST) Received: from [128.97.221.35] (clitunno.sscnet.ucla.edu [128.97.221.35]) by weber.sscnet.ucla.edu (8.13.4/8.13.4) with ESMTP id j335RdWF008432; Sat, 2 Apr 2005 21:27:40 -0800 (PST) Message-ID: <424F7EC4.1000107@cogweb.net> Date: Sat, 02 Apr 2005 21:27:32 -0800 From: David Liontooth User-Agent: Debian Thunderbird 1.0 (X11/20050118) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Garzik CC: venza@brownhat.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: ICS1883 LAN PHY not detected References: <424EF19B.7030105@cogweb.net> <424F45F0.1000504@pobox.com> In-Reply-To: <424F45F0.1000504@pobox.com> X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new at weber.sscnet.ucla.edu X-Virus-Status: Clean X-archive-position: 1287 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: liontooth@cogweb.net Precedence: bulk X-list: netdev Jeff Garzik wrote: > David Liontooth wrote: > >> 0000:02:0b.0 Ethernet controller: Marvell Technology Group Ltd. Yukon >> Gigabit Ethernet 10/100/1000Base-T Adapter (rev 13) > > You want the sk98lin or skge drivers. Correct -- that one worked already in Debian-Installer. What was confusing is that the Gigabyte K8NS Ultra-939 board has a second gigabyte NIC, identified in the motherboard manual as a 100/10 ICS1883 LAN PHY, that is in fact an nforce gigabyte controller, part of the nforce3 250 chipset (cf. http://cogweb.net/owens/Images/Gigabyte-K8NS-Ultra-939.jpg line 5). For some reason the PCI ID 00E6 doesn't show up in lspci, so I thought it was not detected by the kernel. However, the forcedeth driver brought it to life. Dave From herbert@gondor.apana.org.au Sat Apr 2 23:40:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 23:40:46 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j337eZm9031028 for ; Sat, 2 Apr 2005 23:40:36 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHziD-0001d3-00; Sun, 03 Apr 2005 17:40:01 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHzgu-00027R-00; Sun, 03 Apr 2005 17:38:40 +1000 Date: Sun, 3 Apr 2005 17:38:40 +1000 To: jamal Cc: Eric Dumazet , "David S. Miller" , netdev , Robert Olsson Subject: Re: Get rid of rt_check_expire and rt_garbage_collect Message-ID: <20050403073840.GA8105@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <20050402112304.GA11321@gondor.apana.org.au> <1112475955.1088.294.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112475955.1088.294.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1288 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 04:05:55PM -0500, jamal wrote: > > > In this state there is absolutely no need to execute the timer GC. > > Yeah, but memory is finite friend. True, if you can imagine infinite > memory we would not need gc ;-> True. However running the GC when you can't free most of the entries is a waste of time. On a busy system where the routing cache is near capacity and new entries are coming in all the time, we should arrange it so that the old entries are expired when entries are inserted. Assuming the hash function is good, then as long as there is a steady stream of entries coming in, the old entries will be expired automatically. Of course, we should not leave the systems that have experienced a burst of flows at a disadvantage. Indeed there is a rather simple way of doing GC for them without having to do work that's proportional to the number of hash chains in the routing cache. The key is that the GC is only useful when the routing cache contains enough entries that can be freed. Let's say that if we can free more than 1/3 of the entries then the GC should be run. Of course you can define this to be whatever you want. So now the problem is to quickly determine whether there are enough entries in the cache that can be freed. What we can do is take a leaf out of the politicians' book :) We take a poll on a small sample of the routing cache. That is, we run the GC on a fixed number of chains, e.g., 256 chains. After that we tally the total number of entries and the number of entries freed. Since the hash function should be spreading entries throughout the chains evenly, the ratio here can be extrapolated out to the entire cache. Therefore once the ratio exceeds the defined threshold, we perform GC over the entire cache, preferably in a kernel thread. If not then we'll simply let the GC roam along at the constant pace of 256 chains. The advantage of this is that the GC will free entries in the entire table as soon as that becomes possible without having to do work proportional to the number of chains in each GC interval. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sat Apr 2 23:41:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 23:41:56 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j337foUj031138 for ; Sat, 2 Apr 2005 23:41:50 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHzjf-0001dk-00; Sun, 03 Apr 2005 17:41:31 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHzjZ-00027w-00; Sun, 03 Apr 2005 17:41:25 +1000 Date: Sun, 3 Apr 2005 17:41:25 +1000 To: jamal Cc: Eric Dumazet , "David S. Miller" , netdev , Robert Olsson Subject: Re: Get rid of rt_check_expire and rt_garbage_collect Message-ID: <20050403074125.GB8105@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <20050402112304.GA11321@gondor.apana.org.au> <1112475955.1088.294.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112475955.1088.294.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1289 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 04:05:55PM -0500, jamal wrote: > > > Here is how it can be done: Every time a routing entry is inserted into > > a hash chain, we perform GC on that chain unconditionally. > > May not be a good idea to do it unconditionally - in particular on SMP > where another CPU maybe spinning waiting for you to let go of bucket > lock. In particular if a burst of packets accessing the same bucket show > up on different processors, this would be aggravated. > You may wanna kick in this algorithm only when things start going past a > certain threshold. This isn't too bad because: 1. The fast path is lockless using RCU. 2. The number of locks exceeds the number of CPUs by some insane amount. 3. The cost of performing GC is really cheap, it's just a matter of calling rt_may_expire. Anyway, I agree that all of these ideas are simply fantasy until we have some code. So let me work on that and then we can let the benchmarks do the talking :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sat Apr 2 23:44:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Apr 2005 23:44:09 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j337hxIh031843 for ; Sat, 2 Apr 2005 23:44:01 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHzln-0001eh-00; Sun, 03 Apr 2005 17:43:43 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHzlh-00028S-00; Sun, 03 Apr 2005 17:43:37 +1000 Date: Sun, 3 Apr 2005 17:43:37 +1000 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050403074337.GA8083@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050402115528.11f71a3c.davem@davemloft.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1290 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 11:55:28AM -0800, David S. Miller wrote: > > I think we should, in the short term, increase the secret interval > where it exists in the tree (netfilter conntrack is another instance > for example). We could also move rt_cache_flush into a kernel thread. When the number of chains is large this function is really expensive for a softirq handler. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From a.kasparas@gmc.lt Sun Apr 3 00:30:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 00:30:39 -0800 (PST) Received: from smtp02.omnitel.sun (smtp02-neptunas.omnitel.net [194.176.45.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j338UVJn004479 for ; Sun, 3 Apr 2005 00:30:32 -0800 Received: from smtp04-neptunas.omnitel.net ([194.176.45.42]) by smtp02.omnitel.sun (Sun Java System Messaging Server 6.1 HotFix 0.01 (built Jun 24 2004)) with ESMTP id <0IED004S43K8BK40@smtp02.omnitel.sun> for netdev@oss.sgi.com; Sun, 03 Apr 2005 11:28:57 +0300 (EEST) Received: from smtp04-neptunas.omnitel.net (localhost [127.0.0.1]) by smtp04-neptunas.omnitel.net (Postfix) with SMTP id 6928139804F; Sun, 03 Apr 2005 11:28:54 +0300 (EEST) Received: from [192.168.0.128] (unknown [62.212.195.62]) by smtp04-neptunas.omnitel.net (Postfix) with ESMTP id D1DD939804A; Sun, 03 Apr 2005 11:28:53 +0300 (EEST) Date: Sun, 03 Apr 2005 11:28:54 +0300 From: Aidas Kasparas Subject: Re: IPSEC: on behavior of acquire In-reply-to: <1112477326.1088.321.camel@jzny.localdomain> To: hadi@cyberus.ca Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org Message-id: <424FA946.70809@gmc.lt> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: lt, en, ru, fr X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Virus-Scanned: ClamAV 0.83/801/Sat Apr 2 02:36:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1291 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: a.kasparas@gmc.lt Precedence: bulk X-list: netdev jamal wrote: > On Sat, 2005-04-02 at 02:10, Aidas Kasparas wrote: > > >>Re 1 try only. There is little sense to do more tries. If there is no >>deamon listening to pfkey messages, then no connection will be made no >>matter how many retries you'll do. If deamon/link/peer is slow and SA >>was not established before timeout expired, then repeated acquire will >>be simply ignored (deamon will find out that negotiation is already in >>progress, there is no reason to start another negotiation and therefore >>will drop that acquire request). And the only situation where repeated >>acquires may help is when pfkey messages are lost. > > > Exactly what i was trying to emulate - lost messages. Your emulation was not correct. More correct would have been to start KE daemon, let it fully initialize (open pfkey socket, inform kernel that it is interested in acquire messages), then stop it (via debugger or kill -STOP) and only then send pings or other traffic and see what will happen. This is because there are different paths in xfrm+pfkey for cases 1) when there is no KE daemon and 2) when daemon is, but for some reason it does not establish a SA and therefore reaction to traffic is different. In the first case it's xfrm_lookup() ->xfrm_tmpl_resolve() ->xfrm_state_find() ->xfrm_state.c:km_query() ->pfkey_send_acquire() ->pfkey_broadcast() ->return -ESRCH. This error code goes unchanged back to xfrm_state_find, where it is remaped into itself (other possible values are -EAGAIN and -ENOMEM). And then this error code goes back to application. In the second case it's xfrm_lookup() ->xfrm_tmpl_resolve() ->xfrm_state_find() ->xfrm_state.c:km_query() ->pfkey_send_acquire() ->pfkey_broadcast() ->pfkey_broadcast_one() -> return 0 also sent unchanged back to function xfrm_state_find, where SA is put into state XFRM_STATE_ACQ. xfrm_tmpl_resolve() returns -EAGAIN. xfrm_lookup then organizes timeout, and if the state was not changed after that timeout, returns -EAGAIN to the application. On the other hand, analysis above shows that return code is choosen by xfrm framework, therefore if error code has to be changed, it should be changed in xfrm, not in pfkey or netlink code. > I would expect it > to be the rule to loose messages - but given theres no guarantee of > delivery, messages could be lost. > > >>But pfkey was not >>designed to survive message loses, therefore you should not operate your >>boxes in mode when lost pfkey messages are a rule, not an exception. And >>on the other hand, occasional pfkey message loses can be worked around >>by applications/user retry. >> > > > I think its more than just pfkey (or netlink) - rather the ipsec > framework itself. > > One could look at the acquire as part of the "connection" setup > (for lack of better description). Without the acquire succeeding, theres > no connection..(assuming that to be a policy). > Therefore if acquire is not supposed to be delivered with some certainty > (read: retries) then theres some resiliciency issues IMO. OK, To avoid speaking about apples and oranges let's first find out where you see the problem. In the ipsec framework there are the following players (I'm speaking about pfkey case; netlink may be little different): xfrm <-> pfkey <-> KE daemon <-> remote peer xfrm-pfkey communication is based on function calls. For them to fail something really weird has to happen with your kernel. KE deamon - remote peer communications are done on UDP/500, UDP/4500 according to internet standards. Packet retransmissions are implemented the way standards require, therefore it is not a fatal condition if some packet will be lost on the way. And there is no 1:1 correspondence between packets sent over internet and those sent over pfkey socket. These communications are performed relatively independent. There is no need to receive extra acquire pfkey message to retransmit packet which initiates SA setup with remote peer. pfkey - KE daemon communication is performed over message socket. All the communication is performed within single box. More, only the kernel and userspace process are involved. Therefore I see only the following cases when message can be not delivered: 1) message is too big to fit into socket's buffer; 2) kernel decides to drop that socket buffer and reuse memory for something else; 3) KE daemon do not get [enough] CPU time to handle messages; 4) bug in KE daemon prevents it from reading messages. if you know other case, please, let me know. (1) do happens when there is big SPD/SAD and setkey/racoon request to dump it all. It is known pfkey architectural limitation. Acquire messages are small, therefore this can happen only when such call is made right after responce to big DUMP was generated. In racoon case SPD dump is performed only on daemon startup (and even then it is possible that it is not strictly necessary). Extra acquire message may make sense only if it is sent after some timeout. But again, KE daemon start is more exception than rule and applications can be started only after some delay after KE daemon has started. I'm not sure how realistic is (2). But it and (3) are clear resource shortage cases. Under no circumstances they should be allowed. And in (3) case extra acquire message definitely won't help situation. Inn (4) case it is KE daemon who is guilty, not pfkey. Extra message will not cure this case too. > > Note: Sometimes theres no app. Example a packet coming into a gateway. > What do you have in mind? If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 or UDP/4500 over IP socket and not via acquire message via pfkey socket. If it is ESP/AH packet with unknown SPI, then kernel simply drops it and do not send any acquire messages. If it is something else, please explain. >> pfkey code found that there is nothing receiving >>acquire messages => there is no chance that any process will setup >>required SAs and tried to inform about that (I agree, return code is not >>very informative, at least until you learn about reasons why it is >>such). If you would have racoon (or other pfkey based ISAKMP daemon) >>running, you would get "resource temporarily unavailable" (don't know >>which error code corresponds to that message), which IMHO is ok (if it >>is not, please explain). >> > > > Havent tried that - the reason i said restart was the right signal was > mainly that an app could translate that to mean "try again". > In other words even in the case of ping -c1 the ping app could have > reattempted. If there is security policy which is not satisfied and there is nobody which could make it satisfied, then why should we give application false hope that on retry things will change? > > On Sat, 2005-04-02 at 07:25, Zilvinas Valinskas wrote: > >>EBUSY I think it is. >> >>I am not entirely sure it is ok to return such error, some applications are >>not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it >>doesn't brake old apps assumption (connection cannot be established, >>doesn't matter if that is due to routing or IPsec SPD or anything else). >> > > > What about ERESTART the way netlink does it right now? I suspect that ERESTART is generated not by netlink, but by xfrm_lookup() function when signal_pending(current) is true. Why that function returns true in netlink case but not in pfkey case I don't know. IMHO, xfrm_lookup() returns correct error codes in that case. > ECONNREFUSED is probably not a bad idea. > ping was clearly dumb and didnt do anything with the info. > Overall, I think the errors are unfortunately not descriptive at all. I don't like ECONNREFUSED in this place. As a user if I would receive ECONNREFUSED message then I would address application server admin or remote host admin to resolve the problem. But the problem is in network setup and therefore person responsible for networks should be contacted. Therefore, I would like more ENETUNREACH or EHOSTUNREACH. P.S. for analysis kernel source from debian distribution was used (v.2.6.9) -- Aidas Kasparas IT administrator GM Consult Group, UAB From hadi@cyberus.ca Sun Apr 3 07:29:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 07:29:43 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33ETawn023082 for ; Sun, 3 Apr 2005 07:29:36 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DI66Y-00019y-FV for netdev@oss.sgi.com; Sun, 03 Apr 2005 10:29:34 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DI66U-00068f-JN; Sun, 03 Apr 2005 10:29:30 -0400 Subject: Re: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Aidas Kasparas Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org In-Reply-To: <424FA946.70809@gmc.lt> References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112538566.1096.391.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 10:29:27 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1292 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 04:28, Aidas Kasparas wrote: > jamal wrote: > > Exactly what i was trying to emulate - lost messages. > > Your emulation was not correct. More correct would have been to start KE > daemon, let it fully initialize (open pfkey socket, inform kernel that > it is interested in acquire messages), then stop it (via debugger or > kill -STOP) and only then send pings or other traffic and see what will > happen. This is because there are different paths in xfrm+pfkey for > cases 1) when there is no KE daemon and 2) when daemon is, but for some > reason it does not establish a SA and therefore reaction to traffic is > different. > I dont think that would work. To summarize what happens in the kernel: everything leads to km_query() as you have indicated in your text. If the kernel finds someone/thing has either a pfkey or netlink socket open it sends a acquire to them. In the code you are probably looking at (before i created the patch) - the first user/daemon the kernel sees (either pfkey or netlink based) that has a socket open will receive an acquire and the kernel will give up after that. As an example, if the first pfkey user was just doing "setkey -x" and the second was infact pluto, then pluto will never see the acquire. This is what got me looking at it to begin with. Look at the earlier postings on the subject. So in other words, just killing the ike server as you propose would mean the kernel has no open sockets and will therefore never bother to send an acquire. Still all this is moot and is distracting us from the main discussion. Lets define "lost" simply as the case where an acquire never got to the server (which may be sitting elsewhere on the network). In that case what i did is sufficient. i.e. The methods to create this are not the issue. The issue at stake is the behavior of the kernel in generating the acquires. [..] > On the other hand, analysis above shows that return code is choosen by > xfrm framework, therefore if error code has to be changed, it should be > changed in xfrm, not in pfkey or netlink code. The control for both is under generic code. The end return code - you are right, thats user behavior and should match. > > One could look at the acquire as part of the "connection" setup > > (for lack of better description). Without the acquire succeeding, theres > > no connection..(assuming that to be a policy). > > Therefore if acquire is not supposed to be delivered with some certainty > > (read: retries) then theres some resiliciency issues IMO. > > OK, To avoid speaking about apples and oranges let's first find out > where you see the problem. In the ipsec framework there are the > following players (I'm speaking about pfkey case; netlink may be little > different): > > xfrm <-> pfkey <-> KE daemon <-> remote peer > > xfrm-pfkey communication is based on function calls. For them to fail > something really weird has to happen with your kernel. > > KE deamon - remote peer communications are done on UDP/500, UDP/4500 > according to internet standards. Packet retransmissions are implemented > the way standards require, therefore it is not a fatal condition if some > packet will be lost on the way. Please refer to my earlier definition of what "lost" means. It doesnt matter where the breakage happens really. Think of everything to the right of "xfrm" in your diagram as a black box (i.e that second thing could be pfkey or netlink - thats not the issue). Think of some message that is supposed to reach the KE daemon (make it interesting and say it is remote KE) then think of that message never making it because something in the blackbox swallowed it. If that packet is the first one and it needs to do so for the sake of setup for subsequent packets - then the desire to have it reach its destination is very imprtant. There is no progress for it or subsequent packets if it doesnt make it. The solution being proposed for Linux to treat that xfrm piece in the same fashion as ARP is correct. Read the email from Alexey. Imagine if ARP was only issued once(as does pfkey) or forever(as does netlink). I believe this is an issue with ipsec architecture itself - someone needs to write an IETF draft on it. > > > > > Note: Sometimes theres no app. Example a packet coming into a gateway. > > > > What do you have in mind? > > If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 > or UDP/4500 over IP socket and not via acquire message via pfkey socket. > > If it is ESP/AH packet with unknown SPI, then kernel simply drops it and > do not send any acquire messages. > I was thinking more of this second scenario with incoming from clear text domain and gateway encrypting assuming proper policy setup. I would have to go and reread the "opportunistic" encryption draft closely to make sense. > > Havent tried that - the reason i said restart was the right signal was > > mainly that an app could translate that to mean "try again". > > In other words even in the case of ping -c1 the ping app could have > > reattempted. > > If there is security policy which is not satisfied and there is nobody > which could make it satisfied, then why should we give application false > hope that on retry things will change? > In the case of knowing it is the policy that is not satisfied i think it would make sense to not to tell the app to retry. > > > > What about ERESTART the way netlink does it right now? > > I suspect that ERESTART is generated not by netlink, but by > xfrm_lookup() function when signal_pending(current) is true. Why that > function returns true in netlink case but not in pfkey case I don't > know. IMHO, xfrm_lookup() returns correct error codes in that case. > yes, you are correct. > > ECONNREFUSED is probably not a bad idea. > > ping was clearly dumb and didnt do anything with the info. > > Overall, I think the errors are unfortunately not descriptive at all. > > I don't like ECONNREFUSED in this place. As a user if I would receive > ECONNREFUSED message then I would address application server admin or > remote host admin to resolve the problem. But the problem is in network > setup and therefore person responsible for networks should be contacted. > Therefore, I would like more ENETUNREACH or EHOSTUNREACH. > Agreed to this as well. I think this is what would happen in the case of ARP failure as well. ECONNREFUSED would make sense in the case where the policy rejected progress. cheers, jamal From hadi@cyberus.ca Sun Apr 3 07:32:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 07:32:18 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33EW93e023317 for ; Sun, 3 Apr 2005 07:32:10 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DI68x-0001Jp-W1 for netdev@oss.sgi.com; Sun, 03 Apr 2005 08:32:03 -0600 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DI68v-0006MV-VK; Sun, 03 Apr 2005 10:32:02 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <1112469601.1088.173.camel@jzny.localdomain> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> Content-Type: multipart/mixed; boundary="=-CbZvGNdJ/zGTATpkMExl" Organization: jamalopolous Message-Id: <1112538718.1096.394.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 10:31:58 -0400 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1293 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-CbZvGNdJ/zGTATpkMExl Content-Type: text/plain Content-Transfer-Encoding: 7bit Small change after some testing. Herbert havent heard back from you - this looks very palatable in my opinion with comments below still in effect. cheers, jamal On Sat, 2005-04-02 at 14:20, jamal wrote: > Ok, heres a general patch first cut i think i got all that was discussed > in there. ive done some basic 5 minutes tests on. > Once we have agreement i will pass it on to Masahide-san to do more > thorough testing. > Look at the XXX comments in the patch. > > A couple of interesting things: > > 1) Weve discussed this before Herbert and i think you misspoke that > pfkey delivers to all listerners. > > pfkey Add/del/upd now really do tell all processes about what happened. > Before pfkey would skip the originating process. So far this doesnt seem > to be an issue in the basic testing. > > 2) I ended adding a policy_notify to the pfkey manager to make the code > generic. Interesting thing is i dont think pfkey knows what to do with > policy expiration or i am misreading the code. > I dont see any message type for policy expiration as i do for sa > expiration. Ive put some hooks and a little noise. I could remove the > printks - for now they are just place holders. > > cheers, > jamal --=-CbZvGNdJ/zGTATpkMExl Content-Disposition: attachment; filename=ipsec-event-take2-1 Content-Type: text/plain; name=ipsec-event-take2-1; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 +++ b/include/net/xfrm.h 2005-04-02 11:59:17.000000000 -0500 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -802,7 +827,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-25 22:28:39.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-02 09:53:03.000000000 -0500 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-03-25 22:28:25.000000000 -0500 +++ b/net/xfrm/xfrm_state.c 2005-04-02 12:15:37.000000000 -0500 @@ -48,7 +48,7 @@ static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); static DEFINE_SPINLOCK(xfrm_state_gc_lock); -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); @@ -208,8 +208,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -236,14 +238,47 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +static DEFINE_RWLOCK(xfrm_km_lock); +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); + +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { + struct xfrm_mgr *km; + + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock(&xfrm_km_lock); +} + +void km_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_mgr *km; + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + km->notify(x, c); + read_unlock(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +int xfrm_state_delete(struct xfrm_state *x) +{ + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -402,6 +437,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -764,37 +800,45 @@ } EXPORT_SYMBOL(xfrm_replay_advance); -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); -static DEFINE_RWLOCK(xfrm_km_lock); static void km_state_expired(struct xfrm_state *x, int hard) { - struct xfrm_mgr *km; + struct km_event c; if (hard) x->km.state = XFRM_STATE_EXPIRED; else x->km.dying = 1; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); - read_unlock(&xfrm_km_lock); + /* XXX: Do we wanna do this right at the top?? + * if the state is dead we dont want to announce + * the expire - a delete may already have announced + * it + */ + if (x->km.state == XFRM_STATE_DEAD) + return; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -819,13 +863,20 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + /* XXX: Do we still wanna wakeup km_waitq? + * if the policy is dead we dont want to announce + * the expire - a delete may already have announced + * it + */ + if (pol->dead) + return; + + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 +++ b/net/xfrm/xfrm_policy.c 2005-04-02 12:16:30.000000000 -0500 @@ -298,7 +298,7 @@ * entry dead. The rule must be unlinked from lists to the moment. */ -static void xfrm_policy_kill(struct xfrm_policy *policy) +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir) { write_lock_bh(&policy->lock); if (policy->dead) @@ -378,7 +378,7 @@ write_unlock_bh(&xfrm_policy_lock); if (delpol) { - xfrm_policy_kill(delpol); + xfrm_policy_kill(delpol, dir); } return 0; } @@ -402,7 +402,7 @@ if (pol && delete) { atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } return pol; } @@ -425,7 +425,7 @@ if (pol && delete) { atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } return pol; } @@ -442,7 +442,7 @@ xfrm_policy_list[dir] = xp->next; write_unlock_bh(&xfrm_policy_lock); - xfrm_policy_kill(xp); + xfrm_policy_kill(xp, dir); write_lock_bh(&xfrm_policy_lock); } @@ -558,7 +558,7 @@ if (pol) { if (dir < XFRM_POLICY_MAX) atomic_inc(&flow_cache_genid); - xfrm_policy_kill(pol); + xfrm_policy_kill(pol, dir); } } @@ -579,7 +579,7 @@ write_unlock_bh(&xfrm_policy_lock); if (old_pol) { - xfrm_policy_kill(old_pol); + xfrm_policy_kill(old_pol, dir); } return 0; } --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-02 12:21:32.000000000 -0500 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -285,14 +286,26 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +317,20 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -672,6 +695,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +707,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +718,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +845,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +874,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +888,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1111,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1128,94 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_sa( struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1349,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1361,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1369,90 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("Netlink Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-03-25 22:28:39.000000000 -0500 +++ b/net/key/af_key.c 2005-04-02 18:05:24.000000000 -0500 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1256,7 +1328,7 @@ if (hdr->sadb_msg_type == SADB_ADD) err = xfrm_state_add(x); - else + else err = xfrm_state_update(x); if (err < 0) { @@ -1265,27 +1337,22 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1368,20 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1519,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + // XXX:do we have to pass proto as well? + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1947,31 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1979,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2047,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2079,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,24 +2115,11 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); -out: xfrm_pol_put(xp); return err; } @@ -2037,8 +2129,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2141,19 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + /* + * XXX: previous get was doing a broadcast-all _always_ + * which didnt seem right for non-deletion case - JHS + * This is like the way netlink behaves .. + * Shall i restore original behavior? + */ + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2188,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2414,25 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + printk("pfkey doesnt deal with expired policies ..\n"); + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2451,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3004,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-CbZvGNdJ/zGTATpkMExl-- From abhishek@pal.ece.iisc.ernet.in Sun Apr 3 08:00:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 08:01:00 -0700 (PDT) Received: from ece.iisc.ernet.in (ece.iisc.ernet.in [144.16.64.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33F0o2B024756 for ; Sun, 3 Apr 2005 08:00:52 -0700 Received: from pal.ece.iisc.ernet.in (pal.ece.iisc.ernet.in [144.16.64.149]) by ece.iisc.ernet.in (8.12.6/8.12.6) with ESMTP id j33Ew58V086581; Sun, 3 Apr 2005 20:28:10 +0530 (IST) (envelope-from abhishek@pal.ece.iisc.ernet.in) Received: by pal.ece.iisc.ernet.in (Postfix, from userid 1047) id CB30F31E59; Sun, 3 Apr 2005 20:30:19 +0530 (IST) Received: from localhost (localhost [127.0.0.1]) by pal.ece.iisc.ernet.in (Postfix) with ESMTP id C73C631E57; Sun, 3 Apr 2005 20:30:19 +0530 (IST) Date: Sun, 3 Apr 2005 20:30:19 +0530 (IST) From: Abhishek Gupta To: Thomas Graf Cc: netdev@oss.sgi.com Subject: Re: Problem using HTB In-Reply-To: <20050402213642.GO3086@postel.suug.ch> Message-ID: References: <20050402213642.GO3086@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1294 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abhishek@pal.ece.iisc.ernet.in Precedence: bulk X-list: netdev hello Thanks Mr. Graf for replying. Ya, I do was making mistake by assuming KBps as Kbit per second. Actually, I got confused with notations used in the Linux's RH monitor which I used for the speed measurements. But the problem is still not yet solved as I tried with 1Mbit speed as the setting for link speed in the htb configuration and got about 30KBps which amounts to about 240Kbitps even though my UDP source is sending at speed of about 1MBps(8Mbps), according to RH monitor readings. Is it possible that the problem is due to the source that I am using for UDP packets? abhishek ========================================================================= ABHISHEK GUPTA E-mail:abhishek_it_bhu@yahoo.co.in ========================================================================= On Sat, 2 Apr 2005, Thomas Graf wrote: > * Abhishek Gupta 2005-04-01 15:10 > > tc class add dev $DEV0 parent 2: classid 2:1 htb rate 100kbit burst 100 \ > > ceil 100kbit > > [...] > > I have configured for 100kbps, I am getting only 12kbps as the link speed. > > Before I look into this, are you aware of 1kbps=8kbit? > From kaber@trash.net Sun Apr 3 08:49:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 08:49:16 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33FnBfm030687 for ; Sun, 3 Apr 2005 08:49:12 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DI7KJ-0008Pn-Od; Sun, 03 Apr 2005 17:47:51 +0200 Message-ID: <42501027.6010609@trash.net> Date: Sun, 03 Apr 2005 17:47:51 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050324 Debian/1.7.6-1 X-Accept-Language: en MIME-Version: 1.0 To: hadi@cyberus.ca CC: Herbert Xu , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> In-Reply-To: <1112538718.1096.394.camel@jzny.localdomain> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1295 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev jamal wrote: >>+void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) >> { >>+ struct xfrm_mgr *km; >>+ >>+ read_lock(&xfrm_km_lock); >>+ list_for_each_entry(km, &xfrm_km_list, list) >>+ if (km->notify_policy) >>+ km->notify_policy(xp, dir, c); >>+ read_unlock(&xfrm_km_lock); >>+} >>+ >>+void km_state_notify(struct xfrm_state *x, struct km_event *c) >>+{ >>+ struct xfrm_mgr *km; >>+ read_lock(&xfrm_km_lock); >>+ list_for_each_entry(km, &xfrm_km_list, list) >>+ km->notify(x, c); >>+ read_unlock(&xfrm_km_lock); >>+} You call these functions from both softirq- and user-context, so you need to protect against BHs. Regards Patrick From kaber@trash.net Sun Apr 3 08:53:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 08:53:38 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33FrXtl031326 for ; Sun, 3 Apr 2005 08:53:34 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DI7Ok-0008QN-9f; Sun, 03 Apr 2005 17:52:26 +0200 Message-ID: <4250113A.4080202@trash.net> Date: Sun, 03 Apr 2005 17:52:26 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050324 Debian/1.7.6-1 X-Accept-Language: en MIME-Version: 1.0 To: hadi@cyberus.ca CC: Alexey Kuznetsov , Herbert Xu , "David S. Miller" , Masahide NAKAMURA , ipsec-tools-devel@lists.sourceforge.net, netdev , jmorris@redhat.com Subject: Re: IPSEC: on behavior of acquire References: <1112405144.1096.33.camel@jzny.localdomain> <20050402140019.GA13017@yakov.inr.ac.ru> <1112478168.1088.337.camel@jzny.localdomain> In-Reply-To: <1112478168.1088.337.camel@jzny.localdomain> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1296 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev jamal wrote: > Herbert also mentions something along the same lines in his email. > This would make a lot of sense! > Is the state machine going to look something along the same lines as > ARP? i.e incomplete->reachable etc? Yes, from a bundle POV. In my current approach a single state is resolved at a time and resolution is driven by XFRM_STATE_ACQ->* state transitions. > What would be a good code to return when you queue the packet? It should be transparent, so 0. Regards Patrick From kaber@trash.net Sun Apr 3 09:13:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 09:13:44 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33GDbAA032335 for ; Sun, 3 Apr 2005 09:13:38 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DI7if-0008Tj-3S; Sun, 03 Apr 2005 18:13:01 +0200 Message-ID: <4250160D.2040405@trash.net> Date: Sun, 03 Apr 2005 18:13:01 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050324 Debian/1.7.6-1 X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: Herbert Xu , netdev Subject: [IPSEC]: Protect against BHs in xfrm_user_policy() Content-Type: multipart/mixed; boundary="------------040106090202000803080206" X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1297 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------040106090202000803080206 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit xfrm_user_policy() is called from ip_setsockopt with enabled BHs, so it needs to protect against them when grabbing xfrm_km_lock. --------------040106090202000803080206 Content-Type: text/plain; name="x" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="x" # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/03 17:36:10+02:00 kaber@coreworks.de # [IPSEC]: Protect against BHs in xfrm_user_policy() # # Signed-off-by: Patrick McHardy # # net/xfrm/xfrm_state.c # 2005/04/03 17:36:00+02:00 kaber@coreworks.de +2 -2 # [IPSEC]: Protect against BHs in xfrm_user_policy() # # Signed-off-by: Patrick McHardy # diff -Nru a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c --- a/net/xfrm/xfrm_state.c 2005-04-03 18:04:38 +02:00 +++ b/net/xfrm/xfrm_state.c 2005-04-03 18:04:38 +02:00 @@ -878,14 +878,14 @@ goto out; err = -EINVAL; - read_lock(&xfrm_km_lock); + read_lock_bh(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { pol = km->compile_policy(sk->sk_family, optname, data, optlen, &err); if (err >= 0) break; } - read_unlock(&xfrm_km_lock); + read_unlock_bh(&xfrm_km_lock); if (err >= 0) { xfrm_sk_policy_insert(sk, err, pol); --------------040106090202000803080206-- From hadi@cyberus.ca Sun Apr 3 09:29:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 09:29:19 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33GTFs6000902 for ; Sun, 3 Apr 2005 09:29:15 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DI7yM-0005vy-Au for netdev@oss.sgi.com; Sun, 03 Apr 2005 12:29:14 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DI7yI-00089r-7W; Sun, 03 Apr 2005 12:29:10 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Patrick McHardy Cc: Herbert Xu , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <42501027.6010609@trash.net> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <42501027.6010609@trash.net> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112545744.1087.397.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 12:29:05 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1298 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 11:47, Patrick McHardy wrote: > >>+void km_state_notify(struct xfrm_state *x, struct km_event *c) > >>+{ > >>+ struct xfrm_mgr *km; > >>+ read_lock(&xfrm_km_lock); > >>+ list_for_each_entry(km, &xfrm_km_list, list) > >>+ km->notify(x, c); > >>+ read_unlock(&xfrm_km_lock); > >>+} > > You call these functions from both softirq- and user-context, so you > need to protect against BHs. > You are absolutely correct. Thanks for catching this. cheers, jamal From hadi@cyberus.ca Sun Apr 3 09:36:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 09:36:48 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33Gaiv4001707 for ; Sun, 3 Apr 2005 09:36:44 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DI85b-00083D-L5 for netdev@oss.sgi.com; Sun, 03 Apr 2005 12:36:43 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DI85Y-0000LD-Dg; Sun, 03 Apr 2005 12:36:40 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Patrick McHardy Cc: Herbert Xu , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <42501027.6010609@trash.net> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <42501027.6010609@trash.net> Content-Type: multipart/mixed; boundary="=-HzY6ovv3o1agHd1AZ7ya" Organization: jamalopolous Message-Id: <1112546194.1096.401.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 12:36:35 -0400 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1299 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-HzY6ovv3o1agHd1AZ7ya Content-Type: text/plain Content-Transfer-Encoding: 7bit Masahide, Attached is incremental patch on top of the one posted earlier. Looks ok from my basic testing. Please run it against your tests and see if it stands. cheers, jamal On Sun, 2005-04-03 at 11:47, Patrick McHardy wrote: > You call these functions from both softirq- and user-context, so you > need to protect against BHs. > > Regards > Patrick > --=-HzY6ovv3o1agHd1AZ7ya Content-Disposition: attachment; filename=ipsec-event-take2-1-1 Content-Type: text/plain; name=ipsec-event-take2-1-1; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/net/xfrm/xfrm_state.c 2005/04/03 16:30:31 1.2 +++ b/net/xfrm/xfrm_state.c 2005/04/03 16:31:27 @@ -251,20 +251,20 @@ { struct xfrm_mgr *km; - read_lock(&xfrm_km_lock); + read_lock_bh(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) if (km->notify_policy) km->notify_policy(xp, dir, c); - read_unlock(&xfrm_km_lock); + read_unlock_bh(&xfrm_km_lock); } void km_state_notify(struct xfrm_state *x, struct km_event *c) { struct xfrm_mgr *km; - read_lock(&xfrm_km_lock); + read_lock_bh(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) km->notify(x, c); - read_unlock(&xfrm_km_lock); + read_unlock_bh(&xfrm_km_lock); } EXPORT_SYMBOL(km_policy_notify); --=-HzY6ovv3o1agHd1AZ7ya-- From kaber@trash.net Sun Apr 3 09:49:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 09:49:15 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33Gn9Fr002547 for ; Sun, 3 Apr 2005 09:49:10 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DI8Go-0003lv-39; Sun, 03 Apr 2005 18:48:18 +0200 Message-ID: <42501E51.3000401@trash.net> Date: Sun, 03 Apr 2005 18:48:17 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050324 Debian/1.7.6-1 X-Accept-Language: en MIME-Version: 1.0 To: Herbert Xu CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@redhat.com, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [IPSEC]: Kill nested read lock by deleting xfrm_init_tempsel References: <20050214221200.GA18465@gondor.apana.org.au> <20050214221433.GB18465@gondor.apana.org.au> <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> <20050331212325.5e996432.davem@davemloft.net> <20050402004956.GA24339@gondor.apana.org.au> <20050401172007.7296eced.davem@davemloft.net> <20050402020947.GA24998@gondor.apana.org.au> In-Reply-To: <20050402020947.GA24998@gondor.apana.org.au> Content-Type: multipart/mixed; boundary="------------070809070105060803010504" X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1300 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------070809070105060803010504 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Herbert Xu wrote: > It's still a valid clean-up patch though. Agreed. There is also a bug in my patch, tmpl->daddr can be 0 in which case the daddr passed as an argument to xfrm_state_find() will be used. My patch only checked tmpl->daddr, this patch fixes it. It also uses afinfo->init_tempsel directly, but I didn't kill xfrm_init_tempsel() yet because I need it for xfrm resolution. > There is another reason why it won't dead lock. We don't actually > ever hold the write lock on afinfo :) Is there any reason why we > dont't just use xfrm_state_afinfo_lock instead of afinfo->lock? I don't think so. I also don't see a reason why the lock needs to be held between xfrm_state_get_afinfo() and xfrm_state_put_afinfo(), a reference count should be enough. Regards Patrick --------------070809070105060803010504 Content-Type: text/plain; name="x" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="x" # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/03 18:41:22+02:00 kaber@coreworks.de # [IPSEC]: Use correct daddr for duplicate state check # # Signed-off-by: Patrick McHardy # # net/xfrm/xfrm_state.c # 2005/04/03 18:41:14+02:00 kaber@coreworks.de +9 -9 # [IPSEC]: Use correct daddr for duplicate state check # # Signed-off-by: Patrick McHardy # diff -Nru a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c --- a/net/xfrm/xfrm_state.c 2005-04-03 18:41:41 +02:00 +++ b/net/xfrm/xfrm_state.c 2005-04-03 18:41:41 +02:00 @@ -357,12 +357,6 @@ x = best; if (!x && !error && !acquire_in_progress) { - x0 = afinfo->state_lookup(&tmpl->id.daddr, tmpl->id.spi, tmpl->id.proto); - if (x0 != NULL) { - xfrm_state_put(x0); - error = -EEXIST; - goto out; - } x = xfrm_state_alloc(); if (x == NULL) { error = -ENOMEM; @@ -370,9 +364,11 @@ } /* Initialize temporary selector matching only * to current session. */ - xfrm_init_tempsel(x, fl, tmpl, daddr, saddr, family); + afinfo->init_tempsel(x, fl, tmpl, daddr, saddr); + + x0 = afinfo->state_lookup(&x->id.daddr, x->id.spi, x->id.proto); - if (km_query(x, tmpl, pol) == 0) { + if (!x0 && km_query(x, tmpl, pol) == 0) { x->km.state = XFRM_STATE_ACQ; list_add_tail(&x->bydst, xfrm_state_bydst+h); xfrm_state_hold(x); @@ -386,10 +382,14 @@ x->timer.expires = jiffies + XFRM_ACQ_EXPIRES*HZ; add_timer(&x->timer); } else { + error = -ESRCH; + if (x0) { + xfrm_state_put(x0); + error = -EEXIST; + } x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); x = NULL; - error = -ESRCH; } } out: --------------070809070105060803010504-- From kaber@trash.net Sun Apr 3 10:01:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 10:01:31 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33H1QKT003360 for ; Sun, 3 Apr 2005 10:01:27 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DI8Si-0003mQ-Hy; Sun, 03 Apr 2005 19:00:36 +0200 Message-ID: <42502134.8030003@trash.net> Date: Sun, 03 Apr 2005 19:00:36 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050324 Debian/1.7.6-1 X-Accept-Language: en MIME-Version: 1.0 To: Herbert Xu CC: "David S. Miller" , Alexey Kuznetsov , James Morris , YOSHIFUJI Hideaki , netdev@oss.sgi.com Subject: Re: Checking SPI in xfrm_state_find References: <20050214221006.GA18415@gondor.apana.org.au> <20050214221200.GA18465@gondor.apana.org.au> <20050214221433.GB18465@gondor.apana.org.au> <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> In-Reply-To: <20050331004658.GA26395@gondor.apana.org.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1301 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev Herbert Xu wrote: > It just occured to me that it would be much simpler if you did the > existence check in the first loop. > > So something like > > if (x->props.family != family || > !xfrm_state_addr_check(x, daddr, saddr, family) || > tmpl->id.proto == x->id.proto) > continue; > if (tmpl->id.spi) { > if (tmpl->id.spi != x->id.spi) > continue; > error = -EEXIST; > } > if (x->props.reqid == tmpl->reqid && > tmpl->mode == x->props.mode) { > } You're right, sorry for getting back to you so late. But since its already in now and not very important, I'm going to leave it until I have a better reason to touch that code, if you're ok with that. Regards Patrick From Robert.Olsson@data.slu.se Sun Apr 3 12:18:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 12:18:39 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33JIXn4008610 for ; Sun, 3 Apr 2005 12:18:34 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j33JIUWk016276 for ; Sun, 3 Apr 2005 21:18:31 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id C5C56EE2B1; Sun, 3 Apr 2005 21:18:30 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16976.16774.728707.368646@robur.slu.se> Date: Sun, 3 Apr 2005 21:18:30 +0200 To: Harald Welte Cc: netdev@oss.sgi.com Subject: pktgen problem (skb refcount) in 2.6.12-rc1 In-Reply-To: <20050402191132.GF1890@sunbeam.de.gnumonks.org> References: <20050402191132.GF1890@sunbeam.de.gnumonks.org> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1302 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Harald Welte writes: > I've tried to get pktgen running on 2.6.12-rc1 (dual-opteron system, two > dual e1000 boards). > I've tried to track the problem down, and I've confirmed that skb->users > never goes down to 1 but instead stays at '2'. > The same system with the same pktgen script works fine with 2.6.11.6. > > I'm reporting this since it seems like it sounds like we have a skb > usage count leak somewhere :( Hello! Sounds like a diff could give some clues. pktgen, e1000 and TX-path should be interesting as ev. changes in kernel config. --ro From Robert.Olsson@data.slu.se Sun Apr 3 12:37:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 12:37:45 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33JbdJf013946 for ; Sun, 3 Apr 2005 12:37:39 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j33JaqWB018224; Sun, 3 Apr 2005 21:36:53 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id D38DEEE2B2; Sun, 3 Apr 2005 21:36:52 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16976.17876.832677.945878@robur.slu.se> Date: Sun, 3 Apr 2005 21:36:52 +0200 To: Herbert Xu Cc: Robert Olsson , Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050402193224.GA25157@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1303 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > Incidentally we should change the way the rehashing is triggered. > Instead of doing it regularly, we can do it when we notice that a > specific hash chain grows beyond a certain size. > The idea is that if someone is attacking our hash then they can > only do so by lengthening the chains. If they're not doing that > then even if they knew how to attack us we don't really care. Well I don't see how we detect the need for rehash just be looking at the hash chains. How does the the "lengthening" look like that are allowed to trigger a rehash? Agree with Dave that we can increase the interval to start with. --ro From tgraf@suug.ch Sun Apr 3 12:47:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 12:47:34 -0700 (PDT) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33JlM94014678 for ; Sun, 3 Apr 2005 12:47:22 -0700 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id 55CD682; Sun, 3 Apr 2005 21:46:57 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 781091C0EA; Sun, 3 Apr 2005 21:47:39 +0200 (CEST) Date: Sun, 3 Apr 2005 21:47:39 +0200 From: Thomas Graf To: Abhishek Gupta Cc: netdev@oss.sgi.com Subject: Re: Problem using HTB Message-ID: <20050403194739.GR3086@postel.suug.ch> References: <20050402213642.GO3086@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1304 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Abhishek Gupta 2005-04-03 20:30 > But the problem is still > not yet solved as I tried with 1Mbit speed as the setting for link speed > in the htb configuration and got about 30KBps which amounts to about > 240Kbitps even though my UDP source is sending at speed of about > 1MBps(8Mbps), according to RH monitor readings. I do not know about that "RH monitor" you are referring to, maybe it does not display rates correctly. (I found 3 out of 5 rate estimators outputing with a variance of over 10%) I can recommend you bmon [0] which states the variance and can be used to a resolution up to 1/100s given the input source provides an equal or better resolution. > Is it possible that the problem is due to the source that I am using for > UDP packets? Very likely, especially due to the huge difference in requested and achieved rate you have mentioned above. I hardly think this is a problem related to HTB but rather some misconfiguration in your testing process. [0] http://people.suug.ch/~tgr/bmon/ From Robert.Olsson@data.slu.se Sun Apr 3 12:57:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 12:57:51 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33Jvkb1015443 for ; Sun, 3 Apr 2005 12:57:46 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j33Jv8ZD020427; Sun, 3 Apr 2005 21:57:08 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 9116BEE2B1; Sun, 3 Apr 2005 21:57:08 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16976.19092.562006.246545@robur.slu.se> Date: Sun, 3 Apr 2005 21:57:08 +0200 To: Herbert Xu Cc: "David S. Miller" , Robert.Olsson@data.slu.se, dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050403074337.GA8083@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> <20050403074337.GA8083@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1305 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > We could also move rt_cache_flush into a kernel thread. When the > number of chains is large this function is really expensive for a > softirq handler. It can also be done via /proc and left to administrators to find suitable policy. Kernel just provides the mechanism to rehash. --ro From herbert@gondor.apana.org.au Sun Apr 3 14:45:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 14:45:10 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33Lj1Xs018984 for ; Sun, 3 Apr 2005 14:45:02 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DICtO-00073M-00; Mon, 04 Apr 2005 07:44:26 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DICsw-00049E-00; Mon, 04 Apr 2005 07:43:58 +1000 Date: Mon, 4 Apr 2005 07:43:58 +1000 To: Robert Olsson Cc: Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050403214358.GA15901@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <16976.17876.832677.945878@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16976.17876.832677.945878@robur.slu.se> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1306 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 09:36:52PM +0200, Robert Olsson wrote: > > Well I don't see how we detect the need for rehash just be looking > at the hash chains. How does the the "lengthening" look like that > are allowed to trigger a rehash? The only way to attack a hash is by exploiting collisions and create one or more excessively long chains. This can be detected as follows at each rt hash insertion. If (total number of entries in cache >> (hash length - user defined length)) < current bucket length is true, then we schedule a rehash/flush. Hash length is the number of bits in the hash, i.e., 1 << hash length == number of buckets I'd suggest a default shift length of 3. That is, if any individual chain is growing beyond 8 times the average chain length then we've got a problem. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Apr 3 14:45:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 14:45:51 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33Ljjdx019058 for ; Sun, 3 Apr 2005 14:45:46 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DICuN-00073c-00; Mon, 04 Apr 2005 07:45:27 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DICuH-00049k-00; Mon, 04 Apr 2005 07:45:21 +1000 Date: Mon, 4 Apr 2005 07:45:21 +1000 To: Robert Olsson Cc: "David S. Miller" , dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050403214521.GB15901@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> <20050403074337.GA8083@gondor.apana.org.au> <16976.19092.562006.246545@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16976.19092.562006.246545@robur.slu.se> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1307 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 09:57:08PM +0200, Robert Olsson wrote: > > Herbert Xu writes: > > > We could also move rt_cache_flush into a kernel thread. When the > > number of chains is large this function is really expensive for a > > softirq handler. > > It can also be done via /proc and left to administrators to find > suitable policy. Kernel just provides the mechanism to rehash. The reason I'm suggesting the move to a kernel thread is because softirq context is not preemptible. So doing a large amount of work in it when your table is big means that a UP machine will freeze for a while. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From a.kasparas@gmc.lt Sun Apr 3 15:02:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 15:02:18 -0700 (PDT) Received: from smtp02.omnitel.sun (smtp02-neptunas.omnitel.net [194.176.45.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j33M29ro020932 for ; Sun, 3 Apr 2005 15:02:12 -0700 Received: from smtp04-neptunas.omnitel.net ([194.176.45.42]) by smtp02.omnitel.sun (Sun Java System Messaging Server 6.1 HotFix 0.01 (built Jun 24 2004)) with ESMTP id <0IEE006C357G3Y00@smtp02.omnitel.sun> for netdev@oss.sgi.com; Mon, 04 Apr 2005 01:02:04 +0300 (EEST) Received: from smtp04-neptunas.omnitel.net (localhost [127.0.0.1]) by smtp04-neptunas.omnitel.net (Postfix) with SMTP id C5B95398007; Mon, 04 Apr 2005 01:02:03 +0300 (EEST) Received: from [192.168.0.128] (unknown [62.212.195.62]) by smtp04-neptunas.omnitel.net (Postfix) with ESMTP id 5144A39800D; Mon, 04 Apr 2005 01:02:03 +0300 (EEST) Date: Mon, 04 Apr 2005 01:02:01 +0300 From: Aidas Kasparas Subject: Re: IPSEC: on behavior of acquire In-reply-to: <1112538566.1096.391.camel@jzny.localdomain> To: hadi@cyberus.ca Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org Message-id: <425067D9.9050603@gmc.lt> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: lt, en, ru, fr X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> <1112538566.1096.391.camel@jzny.localdomain> User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1308 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: a.kasparas@gmc.lt Precedence: bulk X-list: netdev jamal wrote: > On Sun, 2005-04-03 at 04:28, Aidas Kasparas wrote: > >>jamal wrote: > > >>>Exactly what i was trying to emulate - lost messages. >> >>Your emulation was not correct. More correct would have been to start KE >>daemon, let it fully initialize (open pfkey socket, inform kernel that >>it is interested in acquire messages), then stop it (via debugger or >>kill -STOP) and only then send pings or other traffic and see what will >>happen. This is because there are different paths in xfrm+pfkey for >>cases 1) when there is no KE daemon and 2) when daemon is, but for some >>reason it does not establish a SA and therefore reaction to traffic is >>different. >> > > > I dont think that would work. > To summarize what happens in the kernel: everything leads to km_query() > as you have indicated in your text. > If the kernel finds someone/thing has either a pfkey or netlink socket > open it sends a acquire to them. In the code you are probably looking at > (before i created the patch) - the first user/daemon the kernel sees > (either pfkey or netlink based) that has a socket open > will receive an acquire and the kernel will give up after that. > > As an example, if the first pfkey user was just doing "setkey -x" and > the second was infact pluto, then pluto will never see the > acquire. This is what got me looking at it to begin with. Look at the > earlier postings on the subject. While I agree that code before your patch would not allow to cooperate tools using different ways to manage SAD/SPD (pfkey vs netlink), I have one setup in production where two instances of racoon runs simultaneously and both gets required pfkey-messages. > So in other words, just killing the ike server as you propose would mean > the kernel has no open sockets and will therefore never bother to send > an acquire. I proposed to stop KE server, not to kill it. > > Still all this is moot and is distracting us from the main discussion. > Lets define "lost" simply as the case where an acquire never got to the > server (which may be sitting elsewhere on the network). ACQUIREs _never_ _leaves_ _the box_ they are generated. It is allways kernel-to-userspace_process communication. It could be made reliable. And present situation IS sufficiently reliable. In that case > what i did is sufficient. i.e. The methods to create this are not the > issue. The issue at stake is the behavior of the kernel in generating > the acquires. > See below. > > Please refer to my earlier definition of what "lost" means. It doesnt > matter where the breakage happens really. > Think of everything to the right of "xfrm" in your diagram as a black > box (i.e that second thing could be pfkey or netlink - thats not the > issue). > Think of some message that is supposed to reach the KE daemon > (make it interesting and say it is remote KE) then think of that message > never making it because something in the blackbox swallowed it. > If that packet is the first one and it needs to do so for the sake of > setup for subsequent packets - then the desire to have it reach its > destination is very imprtant. There is no progress for it or subsequent > packets if it doesnt make it. OK, let's talk about architecture xfrm <-> blackbox. In this architecture communication between these two elements (I do not speak about any comms in the blackbox) can be of two types: 1) reliable (messages always reach blackbox or error is reported); 2) unreliable (messages may fail even to reach blackbox). With good blackboxes good ipsec system can be built using any of comm types. But: a) (1) will be more reliable; b) (1) will be more simple (at least xfrm side, as it will not require retransmisions); c) (1) is implemented now (as a function call). What I want to say is xfrm-to-blackbox interface is good as it is. The problem may only be in how good the blackbox is. And here we have to look inside blackbox and start talk about particular implementations of that blackbox. Retransmitions, if they needed, needs to be inside that blackbox. > > The solution being proposed for Linux to treat that xfrm piece in the > same fashion as ARP is correct. Read the email from Alexey. Imagine if > ARP was only issued once(as does pfkey) or forever(as does netlink). > I have read email from Alexey. I think that xfrm_lookup() function implements functionality very similar to functionality which Alexey described. And I think that direct comparison of ARP messages and pfkey messages is not fair, because pfkey acquire messages goes over reliable traffic and are used only to _initiate_ the process of SA negotiation. ARP has to receive information from other boxes which send it only as a direct responce to some packet. More, ARP is designed to be used [amogst others] on networks which loose some traffic by design. > I believe this is an issue with ipsec architecture itself - someone > needs to write an IETF draft on it. > I still do not see the topic for such draft. > >>> >>>Note: Sometimes theres no app. Example a packet coming into a gateway. >>> >> >>What do you have in mind? >> >>If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 >>or UDP/4500 over IP socket and not via acquire message via pfkey socket. >> >>If it is ESP/AH packet with unknown SPI, then kernel simply drops it and >>do not send any acquire messages. >> > > > I was thinking more of this second scenario with incoming from clear > text domain and gateway encrypting assuming proper policy setup. If you're talking about network behind security gateway communicating to host or network for which there is security policy configured on gateway, then acquire message will be generated on that security gateway, when that packet will be considered for forwarding. Again, that acquire messages never will leave security gateway. > I would have to go and reread the "opportunistic" encryption draft > closely to make sense. > Speaking of "opportunistic" encryption. I never understood it. Ipsec-tools do not implement it. And in the year or so when I'm involved with it, I don't remember anybody even asking or mentioning about this feature. Therefore, I don't care about it -- users do not need it. -- Aidas Kasparas IT administrator GM Consult Group, UAB From dmitry_yus@yahoo.com Sun Apr 3 17:56:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 17:56:32 -0700 (PDT) Received: from smtp111.mail.sc5.yahoo.com (smtp111.mail.sc5.yahoo.com [66.163.170.9]) by oss.sgi.com (8.13.0/8.13.0) with SMTP id j340uQ4U030035 for ; Sun, 3 Apr 2005 17:56:26 -0700 Received: from unknown (HELO ?172.10.7.7?) (dmitry?yus@24.7.114.77 with plain) by smtp111.mail.sc5.yahoo.com with SMTP; 4 Apr 2005 00:56:25 -0000 Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) From: Dmitry Yusupov To: "open-iscsi@googlegroups.com" Cc: "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> Content-Type: text/plain Date: Sun, 03 Apr 2005 17:56:11 -0700 Message-Id: <1112576171.4227.5.camel@mylaptop> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1309 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dmitry_yus@yahoo.com Precedence: bulk X-list: netdev On Sat, 2005-04-02 at 11:07 -0800, Asgeir Eiriksson wrote: > Dmitry > The CPU cycles is only at most half of the story with the other half > being the memory sub-system BW. > > So the validity of your observation depends on the BW we're talking > about, i.e. if the client is using a fraction of 10Gbps for RDMA (or > DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of > the memory sub-system total BW so we don't much care about the extra > copy. > > The situation is different if the client wants something close to 10Gbps > (already have such client applications), because today 10Gbps is still a > big chunk of the overall memory BW so you really care about eliminating > that copy via DDP. I do not get your concern with memory BW. With good AMD box V40Z(SUN) you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80% left. PCI-X BUS BW is bigger concern... > 'Asgeir > > > -----Original Message----- > > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > > Behalf Of Dmitry Yusupov > > Sent: Saturday, April 02, 2005 10:09 AM > > To: open-iscsi@googlegroups.com > > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de; > > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com; > ksummit-2005- > > discuss@thunk.org; netdev@oss.sgi.com > > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit > > ProposedTopics > > > > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote: > > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > > > > If you have plans to start new project such as SoftRDMA than yes. > lets > > > > discuss it since set of problems will be similar to what we've got > > with > > > > software iSCSI Initiators. > > > > > > I'm somewhat interested in seeing a SoftRDMA project get off the > ground. > > > At least the NatSemi 83820 gige MAC is able to provide early-rx > > interrupts > > > that allow one to get an rx interrupt before the full payload has > > arrived > > > making it possible to write out a new rx descriptor to place the > payload > > > wherever it is ultimately desired. It would be fun to work on if > not > > the > > > most performant RDMA implementation. > > > > I see a lot of skepticism around early-rx interrupt schema. It might > > work for gige, but i'm not sure if it will fit into 10g. > > > > What RDMA gives us is zero-copy on receive and new networking api > which > > has a potential to be HW accelerated. SoftRDMA will never avoid > copying > > on receive. But benefit for SoftRDMA would be its availability on > client > > sides. It is free and it could be easily deployed. Soon Intel & Co > will > > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if > > one of those cores will do receive side copying? > > > > From herbert@gondor.apana.org.au Sun Apr 3 18:00:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 18:00:27 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3410Dar030521 for ; Sun, 3 Apr 2005 18:00:13 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIFw6-0007tf-00; Mon, 04 Apr 2005 10:59:26 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIFun-0004NB-00; Mon, 04 Apr 2005 10:58:05 +1000 Date: Mon, 4 Apr 2005 10:58:05 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404005805.GA16543@gondor.apana.org.au> References: <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112538718.1096.394.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1310 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Hi Jamal: On Sun, Apr 03, 2005 at 10:31:58AM -0400, jamal wrote: > > Small change after some testing. > Herbert havent heard back from you - this looks very palatable in my > opinion with comments below still in effect. It's definitely looking better all the time. > -void xfrm_state_delete(struct xfrm_state *x) > +static DEFINE_RWLOCK(xfrm_km_lock); > +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > + > +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) > { > + struct xfrm_mgr *km; > + > + read_lock(&xfrm_km_lock); > + list_for_each_entry(km, &xfrm_km_list, list) > + if (km->notify_policy) > + km->notify_policy(xp, dir, c); > + read_unlock(&xfrm_km_lock); > +} > + > +void km_state_notify(struct xfrm_state *x, struct km_event *c) > +{ > + struct xfrm_mgr *km; > + read_lock(&xfrm_km_lock); > + list_for_each_entry(km, &xfrm_km_list, list) > + km->notify(x, c); > + read_unlock(&xfrm_km_lock); > +} > + > +EXPORT_SYMBOL(km_policy_notify); > +EXPORT_SYMBOL(km_state_notify); Can we perhaps move these lines next to the other km functions further down? They look rather lonely here. > + /* XXX: Do we wanna do this right at the top?? > + * if the state is dead we dont want to announce > + * the expire - a delete may already have announced > + * it > + */ Please code this check differently so that it isn't racy. One way to do it is to change xfrm_timer_handler to do: if (__xfrm_state_delete(x) && x->id.spi) km_state_expired(x, 1); > + /* XXX: Do we still wanna wakeup km_waitq? > + * if the policy is dead we dont want to announce > + * the expire - a delete may already have announced > + * it > + */ Ditto. > --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 > +++ b/net/xfrm/xfrm_policy.c 2005-04-02 12:16:30.000000000 -0500 > @@ -298,7 +298,7 @@ > * entry dead. The rule must be unlinked from lists to the moment. > */ > > -static void xfrm_policy_kill(struct xfrm_policy *policy) > +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir) What's this for? > + c.seq = nlh->nlmsg_seq; > + c.pid = nlh->nlmsg_pid; > + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) > + c.event = XFRM_SAP_ADDED; > + else > + c.event = XFRM_SAP_UPDATED; > + > + km_state_notify(x, &c); You need to hold onto x here. So do a hold before you call xfrm_state_* and then drop the reference after km_state_notify. > static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) > - xfrm_state_delete(x); > + err = xfrm_state_delete(x); > + if (err < 0) { > + x->km.state = XFRM_STATE_DEAD; > + xfrm_state_put(x); > + return err; If the xfrm_state_delete fails then it's already dead. So kill the line that modifies its state. > +static int xfrm_notify_sa( struct xfrm_state *x, struct km_event *c) Extra space after the paren. > + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); Please add the additional payloads for NAT-T and the keys. > +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) > +{ > + struct xfrm_userpolicy_info *p; > + struct nlmsghdr *nlh; > + struct sk_buff *skb; > + u32 nlt = 0 ; > + unsigned char *b; > + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); Please attach the templates. > @@ -1256,7 +1328,7 @@ > > if (hdr->sadb_msg_type == SADB_ADD) > err = xfrm_state_add(x); > - else > + else A better editor that doesn't leave trailing spaces is needed here :) > - xfrm_state_delete(x); > - xfrm_state_put(x); > + err = xfrm_state_delete(x); > + if (err < 0) { > + x->km.state = XFRM_STATE_DEAD; Please remove this line as it's already dead if the delete fails. > +static int key_notify_sa_flush(struct km_event *c) > +{ > + struct sk_buff *skb; > + struct sadb_msg *hdr; > + > + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); > + if (!skb) > + return -ENOBUFS; > + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); > + // XXX:do we have to pass proto as well? I think so. A flush of all IPCOMP states is certainly quite different from a flush of all states. It's just a matter of calling satype2proto. > + /* > + * XXX: previous get was doing a broadcast-all _always_ > + * which didnt seem right for non-deletion case - JHS > + * This is like the way netlink behaves .. > + * Shall i restore original behavior? > + */ You're right. The original behaviour was broken. > - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); > - > - out_hdr = (struct sadb_msg *) out_skb->data; > - out_hdr->sadb_msg_version = hdr->sadb_msg_version; > - out_hdr->sadb_msg_type = hdr->sadb_msg_type; > - out_hdr->sadb_msg_satype = 0; > - out_hdr->sadb_msg_errno = 0; > - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; > - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; > - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); > - err = 0; However, you do need to keep this code for the real GET case. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Apr 3 18:01:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 18:01:41 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3411ZoF030968 for ; Sun, 3 Apr 2005 18:01:36 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIFxw-0007uK-00; Mon, 04 Apr 2005 11:01:20 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIFxq-0004Nw-00; Mon, 04 Apr 2005 11:01:14 +1000 Date: Mon, 4 Apr 2005 11:01:14 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404010114.GA16839@gondor.apana.org.au> References: <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112469601.1088.173.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1311 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 02:20:01PM -0500, jamal wrote: > > 1) Weve discussed this before Herbert and i think you misspoke that > pfkey delivers to all listerners. > > pfkey Add/del/upd now really do tell all processes about what happened. > Before pfkey would skip the originating process. So far this doesnt seem > to be an issue in the basic testing. Are you sure? Previously they did BROADCAST_ALL which goes to everyone including the sender. > 2) I ended adding a policy_notify to the pfkey manager to make the code > generic. Interesting thing is i dont think pfkey knows what to do with > policy expiration or i am misreading the code. That's right, pfkey never had policy expire messages. In general, anything to do with policies cannot be done portably in pfkey since the RFC only specified the SA operations. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Apr 3 18:21:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 18:21:38 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j341LRI0032620 for ; Sun, 3 Apr 2005 18:21:27 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIGH5-00081d-00; Mon, 04 Apr 2005 11:21:07 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIGGe-0004RN-00; Mon, 04 Apr 2005 11:20:40 +1000 Date: Mon, 4 Apr 2005 11:20:40 +1000 To: Patrick McHardy Cc: "David S. Miller" , netdev Subject: Re: [IPSEC]: Protect against BHs in xfrm_user_policy() Message-ID: <20050404012040.GA16960@gondor.apana.org.au> References: <4250160D.2040405@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4250160D.2040405@trash.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1312 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 06:13:01PM +0200, Patrick McHardy wrote: > > # This is a BitKeeper generated diff -Nru style patch. > # > # ChangeSet > # 2005/04/03 17:36:10+02:00 kaber@coreworks.de > # [IPSEC]: Protect against BHs in xfrm_user_policy() > # > # Signed-off-by: Patrick McHardy Looks good. Signed-off-by: Herbert Xu We want the same thing for km_query, no? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Sun Apr 3 18:56:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 18:56:18 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j341uBfD001738 for ; Sun, 3 Apr 2005 18:56:12 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DIGp4-0003Fv-6P for netdev@oss.sgi.com; Sun, 03 Apr 2005 21:56:14 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIGow-00044s-OZ; Sun, 03 Apr 2005 21:56:07 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404005805.GA16543@gondor.apana.org.au> References: <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112579761.1096.412.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 21:56:01 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1313 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Herbert, Now that you are picking on whitespaces i think we are almost there ;-> Comments below On Sun, 2005-04-03 at 20:58, Herbert Xu wrote: > On Sun, Apr 03, 2005 at 10:31:58AM -0400, jamal wrote: [.. ..] > > + > > +EXPORT_SYMBOL(km_policy_notify); > > +EXPORT_SYMBOL(km_state_notify); > > Can we perhaps move these lines next to the other km functions > further down? They look rather lonely here. > Sure. > > + /* XXX: Do we wanna do this right at the top?? > > + * if the state is dead we dont want to announce > > + * the expire - a delete may already have announced > > + * it > > + */ > > Please code this check differently so that it isn't racy. > > One way to do it is to change xfrm_timer_handler to do: > > if (__xfrm_state_delete(x) && x->id.spi) > km_state_expired(x, 1); > > > + /* XXX: Do we still wanna wakeup km_waitq? > > + * if the policy is dead we dont want to announce > > + * the expire - a delete may already have announced > > + * it > > + */ > > Ditto. > I think i am gonna take out any attempts to address this race above. It's a bug thats there already - a separate patch after this will be better. > > --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 > > +++ b/net/xfrm/xfrm_policy.c 2005-04-02 12:16:30.000000000 -0500 > > @@ -298,7 +298,7 @@ > > * entry dead. The rule must be unlinked from lists to the moment. > > */ > > > > -static void xfrm_policy_kill(struct xfrm_policy *policy) > > +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir) > > What's this for? > Good catch - gunk from previous patch. > > + c.seq = nlh->nlmsg_seq; > > + c.pid = nlh->nlmsg_pid; > > + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) > > + c.event = XFRM_SAP_ADDED; > > + else > > + c.event = XFRM_SAP_UPDATED; > > + > > + km_state_notify(x, &c); > > You need to hold onto x here. So do a hold before you call xfrm_state_* > and then drop the reference after km_state_notify. Good point. > > > static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) > > > - xfrm_state_delete(x); > > + err = xfrm_state_delete(x); > > + if (err < 0) { > > + x->km.state = XFRM_STATE_DEAD; > > + xfrm_state_put(x); > > + return err; > > If the xfrm_state_delete fails then it's already dead. So kill > the line that modifies its state. > Good point. > > +static int xfrm_notify_sa( struct xfrm_state *x, struct km_event *c) > > Extra space after the paren. > > > + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); > > Please add the additional payloads for NAT-T and the keys. > I dont think we should broadcast out keys. NAT-T - where do i look at to see what to send? > > +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) > > +{ > > + struct xfrm_userpolicy_info *p; > > + struct nlmsghdr *nlh; > > + struct sk_buff *skb; > > + u32 nlt = 0 ; > > + unsigned char *b; > > + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); > > Please attach the templates. > What is not being attached right now? > > @@ -1256,7 +1328,7 @@ > > > > if (hdr->sadb_msg_type == SADB_ADD) > > err = xfrm_state_add(x); > > - else > > + else > > A better editor that doesn't leave trailing spaces is needed here :) you insulting vi? ;-> > > > - xfrm_state_delete(x); > > - xfrm_state_put(x); > > + err = xfrm_state_delete(x); > > + if (err < 0) { > > + x->km.state = XFRM_STATE_DEAD; > > Please remove this line as it's already dead if the delete fails. > > > +static int key_notify_sa_flush(struct km_event *c) > > +{ > > + struct sk_buff *skb; > > + struct sadb_msg *hdr; > > + > > + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); > > + if (!skb) > > + return -ENOBUFS; > > + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); > > + // XXX:do we have to pass proto as well? > > I think so. A flush of all IPCOMP states is certainly quite different > from a flush of all states. It's just a matter of calling satype2proto. > Looks doable. > > - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); > > - > > - out_hdr = (struct sadb_msg *) out_skb->data; > > - out_hdr->sadb_msg_version = hdr->sadb_msg_version; > > - out_hdr->sadb_msg_type = hdr->sadb_msg_type; > > - out_hdr->sadb_msg_satype = 0; > > - out_hdr->sadb_msg_errno = 0; > > - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; > > - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; > > - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); > > - err = 0; > > However, you do need to keep this code for the real GET case. > Get seems to a separate entry point - pfkey_get() which i didnt touch. cheers, jamal From hadi@cyberus.ca Sun Apr 3 18:58:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 18:58:44 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j341wdtn002139 for ; Sun, 3 Apr 2005 18:58:39 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DIGrO-0001ng-NF for netdev@oss.sgi.com; Sun, 03 Apr 2005 21:58:38 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIGrN-0004J2-0l; Sun, 03 Apr 2005 21:58:37 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404010114.GA16839@gondor.apana.org.au> References: <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <20050404010114.GA16839@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112579911.1088.416.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 21:58:31 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1314 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 21:01, Herbert Xu wrote: > On Sat, Apr 02, 2005 at 02:20:01PM -0500, jamal wrote: > > > > 1) Weve discussed this before Herbert and i think you misspoke that > > pfkey delivers to all listerners. > > > > pfkey Add/del/upd now really do tell all processes about what happened. > > Before pfkey would skip the originating process. So far this doesnt seem > > to be an issue in the basic testing. > > Are you sure? Previously they did BROADCAST_ALL which goes to everyone > including the sender. > Yes, he key is in the sk parameter to the broadcast. if a NULL is passed then all listeners are told. Else the passed sk is excluded. > > 2) I ended adding a policy_notify to the pfkey manager to make the code > > generic. Interesting thing is i dont think pfkey knows what to do with > > policy expiration or i am misreading the code. > > That's right, pfkey never had policy expire messages. In general, > anything to do with policies cannot be done portably in pfkey since > the RFC only specified the SA operations. > Well, hopefully whoever defined that pfkey carries policies as well will have to worry about this in the future. I will just leave teh hook but remove the printk. cheers, jamal From herbert@gondor.apana.org.au Sun Apr 3 19:27:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:28:02 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342RqqX003988 for ; Sun, 3 Apr 2005 19:27:53 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIHJ6-0008Gg-00; Mon, 04 Apr 2005 12:27:16 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIHHt-0004WT-00; Mon, 04 Apr 2005 12:26:01 +1000 Date: Mon, 4 Apr 2005 12:26:01 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404022601.GA17293@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112579761.1096.412.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112579761.1096.412.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1315 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 09:56:01PM -0400, jamal wrote: > Now that you are picking on whitespaces i think we are almost there ;-> Yes I think we're getting really close now :) > I think i am gonna take out any attempts to address this race above. > It's a bug thats there already - a separate patch after this will be > better. OK. > > > +static int xfrm_notify_sa( struct xfrm_state *x, struct km_event *c) > > > > Extra space after the paren. > > > > > + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); > > > > Please add the additional payloads for NAT-T and the keys. > > I dont think we should broadcast out keys. I think that decision should be made by the KM. So you wouldn't do it for PFKEY, but netlink should definitely do it. For netlink we require root privileges to listen for these events. > NAT-T - where do i look at to see what to send? Check out dump_one_state. > What is not being attached right now? copy_to_user_tmpl > you insulting vi? ;-> Yes unless you're using elvis :) > > > - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); > > > - > > > - out_hdr = (struct sadb_msg *) out_skb->data; > > > - out_hdr->sadb_msg_version = hdr->sadb_msg_version; > > > - out_hdr->sadb_msg_type = hdr->sadb_msg_type; > > > - out_hdr->sadb_msg_satype = 0; > > > - out_hdr->sadb_msg_errno = 0; > > > - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; > > > - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; > > > - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); > > > - err = 0; > > > > However, you do need to keep this code for the real GET case. > > Get seems to a separate entry point - pfkey_get() which i didnt touch. pfkey_get() only does states. The code above is in pfkey_spdget(). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Apr 3 19:28:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:28:09 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342S3RH004003 for ; Sun, 3 Apr 2005 19:28:04 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIHJR-0008H9-00; Mon, 04 Apr 2005 12:27:37 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIHJG-0004Wu-00; Mon, 04 Apr 2005 12:27:26 +1000 Date: Mon, 4 Apr 2005 12:27:26 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404022726.GB17293@gondor.apana.org.au> References: <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <20050404010114.GA16839@gondor.apana.org.au> <1112579911.1088.416.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112579911.1088.416.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1316 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 09:58:31PM -0400, jamal wrote: > > > Are you sure? Previously they did BROADCAST_ALL which goes to everyone > > including the sender. > > Yes, he key is in the sk parameter to the broadcast. if a NULL is passed > then all listeners are told. Else the passed sk is excluded. Actually pfkey_broadcast is playing tricks on you :) It always does one_sk at the end of the function. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Sun Apr 3 19:34:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:34:19 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342YEsN005276 for ; Sun, 3 Apr 2005 19:34:14 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DIHPj-0003aN-7X for netdev@oss.sgi.com; Sun, 03 Apr 2005 20:34:07 -0600 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIHPl-0008SS-Kr; Sun, 03 Apr 2005 22:34:09 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404005805.GA16543@gondor.apana.org.au> References: <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112582044.1087.421.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 22:34:04 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1317 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 20:58, Herbert Xu wrote: > ; > > + // XXX:do we have to pass proto as well? > > I think so. A flush of all IPCOMP states is certainly quite different > from a flush of all states. It's just a matter of calling satype2proto. I think you meant pfkey_proto2satype(). i.e hdr->sadb_msg_satype = pfkey_proto2satype(c->data); BTW, slightly different from the way netlink does bussiness. cheers, jamal From hadi@cyberus.ca Sun Apr 3 19:40:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:40:09 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342e42X006001 for ; Sun, 3 Apr 2005 19:40:04 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIHVT-0005hE-IV for netdev@oss.sgi.com; Sun, 03 Apr 2005 22:40:03 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIHVS-0000ZI-9x; Sun, 03 Apr 2005 22:40:02 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404022601.GA17293@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112579761.1096.412.camel@jzny.localdomain> <20050404022601.GA17293@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112582396.1096.427.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 22:39:56 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1318 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 22:26, Herbert Xu wrote: > On Sun, Apr 03, 2005 at 09:56:01PM -0400, jamal wrote: > > I dont think we should broadcast out keys. > > I think that decision should be made by the KM. So you wouldn't do it > for PFKEY, but netlink should definitely do it. > Is it possible to have non-root privileged pfkey sockets. If yes, then it makes sense. > Yes unless you're using elvis :) Elvis left the building a while back ;-> But sightings of him in some doughnought shops in a small town not far from here are rampant ;-> > The code above is in pfkey_spdget(). > How did i miss that? ;-> cheers, jamal From herbert@gondor.apana.org.au Sun Apr 3 19:46:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:46:53 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342ki4S006756 for ; Sun, 3 Apr 2005 19:46:45 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIHba-0008Nh-00; Mon, 04 Apr 2005 12:46:22 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIHbE-0004Yk-00; Mon, 04 Apr 2005 12:46:00 +1000 Date: Mon, 4 Apr 2005 12:46:00 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404024600.GA17507@gondor.apana.org.au> References: <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112579761.1096.412.camel@jzny.localdomain> <20050404022601.GA17293@gondor.apana.org.au> <1112582396.1096.427.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112582396.1096.427.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1319 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 10:39:56PM -0400, jamal wrote: > On Sun, 2005-04-03 at 22:26, Herbert Xu wrote: > > > I think that decision should be made by the KM. So you wouldn't do it > > for PFKEY, but netlink should definitely do it. > > Is it possible to have non-root privileged pfkey sockets. If yes, > then it makes sense. Currently Linux requires CAP_NET_ADMIN for PFKEY. However, this may not be the case on other systems. That's the reason why the RFC requires that the keys not be sent via PFKEY. However for netlink there is no such issue. Even if we do eventually open up netlink for non-root listeners (this will actually require structural changes to netlink itself), we can create a new multicast group for non-privileged users that don't get the keys. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Apr 3 19:53:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 19:54:08 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j342rv2D007479 for ; Sun, 3 Apr 2005 19:53:58 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIHiG-0008R0-00; Mon, 04 Apr 2005 12:53:16 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIHhK-0004Zi-00; Mon, 04 Apr 2005 12:52:18 +1000 Date: Mon, 4 Apr 2005 12:52:18 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404025218.GA17571@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112582044.1087.421.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112582044.1087.421.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1320 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 10:34:04PM -0400, jamal wrote: > > I think you meant pfkey_proto2satype(). i.e > hdr->sadb_msg_satype = pfkey_proto2satype(c->data); Yes I was being dyslexic :) > BTW, slightly different from the way netlink does bussiness. You mean how netlink just passes the proto through verbatim? Yes netlink uses the natural representation wherever possible. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Sun Apr 3 20:05:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 20:05:34 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3435S3d008376 for ; Sun, 3 Apr 2005 20:05:28 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIHu4-0005ko-43 for netdev@oss.sgi.com; Sun, 03 Apr 2005 23:05:28 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIHtz-0003Ka-4w; Sun, 03 Apr 2005 23:05:23 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404024600.GA17507@gondor.apana.org.au> References: <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112579761.1096.412.camel@jzny.localdomain> <20050404022601.GA17293@gondor.apana.org.au> <1112582396.1096.427.camel@jzny.localdomain> <20050404024600.GA17507@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112583912.1096.429.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 23:05:12 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1321 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 22:46, Herbert Xu wrote: > Even if we do eventually open up netlink for non-root listeners > (this will actually require structural changes to netlink itself), > we can create a new multicast group for non-privileged users that > don't get the keys. > Ok, I will add the keys in the case of the netlink announce. cheers, jamal From hadi@cyberus.ca Sun Apr 3 20:07:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 20:07:37 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3437W7f008613 for ; Sun, 3 Apr 2005 20:07:33 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIHw4-0006g6-7v for netdev@oss.sgi.com; Sun, 03 Apr 2005 23:07:32 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIHw3-0003XB-1G; Sun, 03 Apr 2005 23:07:31 -0400 Subject: Re: take 2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404025218.GA17571@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112582044.1087.421.camel@jzny.localdomain> <20050404025218.GA17571@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112584046.1088.432.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 03 Apr 2005 23:07:26 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1322 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 22:52, Herbert Xu wrote: > You mean how netlink just passes the proto through verbatim? > Yes netlink uses the natural representation wherever possible. > The one thing that needs discussing at some point is how to break down the policy and state structural entities into TLVs in netlink. Not now, future topic ;-> cheers, jamal From rddunlap@osdl.org Sun Apr 3 20:31:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 20:31:22 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j343VCoE013586 for ; Sun, 3 Apr 2005 20:31:12 -0700 Received: from [192.168.1.103] (wbar2.sea1-4-5-049-023.sea1.dsl-verizon.net [4.5.49.23]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j343UIs3017661 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Sun, 3 Apr 2005 20:30:19 -0700 Message-ID: <4250B4C5.2000200@osdl.org> Date: Sun, 03 Apr 2005 20:30:13 -0700 From: "Randy.Dunlap" User-Agent: Mozilla Thunderbird 0.9 (X11/20041103) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Sam Ravnborg CC: ioe-lkml@axxeo.de, matthew@wil.cx, lkml , netdev@oss.sgi.com, hadi@cyberus.ca, cfriesen@nortel.com, tgraf@suug.ch Subject: [PATCH] network configs: disconnect network options from drivers References: <20050330234709.1868eee5.randy.dunlap@verizon.net> <20050331185226.GA8146@mars.ravnborg.org> <424C5745.7020501@osdl.org> <20050331203010.GA8034@mars.ravnborg.org> In-Reply-To: <20050331203010.GA8034@mars.ravnborg.org> Content-Type: multipart/mixed; boundary="------------000906010009010506010105" X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1323 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------000906010009010506010105 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sam Ravnborg wrote: > On Thu, Mar 31, 2005 at 12:02:13PM -0800, Randy.Dunlap wrote: > >>Other than "sounds good," are there some comments on: >> >>a. leaving IrDA and Bluetooth subsystem (with drivers) where they >> are, which is under "Network options and protocols" >> (I really don't want to split their drivers away from their >> subsystem, just to put them under Network driver support.) > > > Agreed. All IrDA / Bluetooth stuff belongs together. > Leave them where they are for now. > > >>b. leaving SLIP, PPP, and PLIP where they are under Network driver >> support, even though they say that they are "protocols" ? > > SLIP and PLIP is no that common. PPP is more common for cable-modem/ADSL > I suppose. But still it would make sense to create an Misc protocols > menu, like we have a misc filesystems menu. While looking into this suggestion, I see that SLIP, PLIP, and PPP depend on NETDEVICES, and they use some netdev interfaces, so they appear to be more like net devices than protocols even though they are called protocols in Kconfig text, so I am leaving them alone for now. Don't hesitate to correct me.... Any comments on this new version? Thanks, -- ~Randy --------------000906010009010506010105 Content-Type: text/x-patch; name="netconfigs_v4.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="netconfigs_v4.diff" A few people dislike that the Networking Options menu is inside the Device Drivers/Networking menu. This patch moves the Networking Options menu to immediately before the Device Drivers menu, renames it to "Networking options and protocols", & moves most protocols to more logical places. Notes: - IrDA & Bluetooth subsystems include protocols & drivers, yet they are displayed under Networking protocols. I don't see much good reason to split them up. (See, this is an example of why the Networking Options and Network Drivers were close together....) - SLIP, PLIP, and PPP option names say that they are protocols, but they are sort of a hybrid device and protocol, and they use network device interfaces, so they remain listed under Network devices. drivers/Kconfig | 4 drivers/net/Kconfig | 5 net/Kconfig | 450 ++++++++++++++++++++++--------------------- net/bridge/netfilter/Kconfig | 1 4 files changed, 241 insertions(+), 219 deletions(-) Signed-off-by: Randy Dunlap diff -Naurp -X /home/rddunlap/doc/dontdiff-osdl linux-2612-rc1-bk5-pv/drivers/Kconfig linux-2612-rc1-bk5-netconfigs/drivers/Kconfig --- linux-2612-rc1-bk5-pv/drivers/Kconfig 2005-03-01 23:38:26.000000000 -0800 +++ linux-2612-rc1-bk5-netconfigs/drivers/Kconfig 2005-04-03 19:45:18.330102257 -0700 @@ -1,5 +1,7 @@ # drivers/Kconfig +source "net/Kconfig" + menu "Device Drivers" source "drivers/base/Kconfig" @@ -28,7 +30,7 @@ source "drivers/message/i2o/Kconfig" source "drivers/macintosh/Kconfig" -source "net/Kconfig" +source "drivers/net/Kconfig" source "drivers/isdn/Kconfig" diff -Naurp -X /home/rddunlap/doc/dontdiff-osdl linux-2612-rc1-bk5-pv/drivers/net/Kconfig linux-2612-rc1-bk5-netconfigs/drivers/net/Kconfig --- linux-2612-rc1-bk5-pv/drivers/net/Kconfig 2005-04-03 19:42:32.000000000 -0700 +++ linux-2612-rc1-bk5-netconfigs/drivers/net/Kconfig 2005-04-03 19:45:18.335101815 -0700 @@ -1,8 +1,9 @@ - # # Network device configuration # +menu "Network device support" + config NETDEVICES depends on NET bool "Network device support" @@ -2536,3 +2537,5 @@ config NETCONSOLE If you want to log kernel messages over the network, enable this. See for details. +endmenu + diff -Naurp -X /home/rddunlap/doc/dontdiff-osdl linux-2612-rc1-bk5-pv/net/bridge/netfilter/Kconfig linux-2612-rc1-bk5-netconfigs/net/bridge/netfilter/Kconfig --- linux-2612-rc1-bk5-pv/net/bridge/netfilter/Kconfig 2005-03-01 23:37:50.000000000 -0800 +++ linux-2612-rc1-bk5-netconfigs/net/bridge/netfilter/Kconfig 2005-04-03 19:45:18.000000000 -0700 @@ -139,6 +139,7 @@ config BRIDGE_EBT_VLAN config BRIDGE_EBT_ARPREPLY tristate "ebt: arp reply target support" depends on BRIDGE_NF_EBTABLES + depends on INET help This option adds the arp reply target, which allows automatically sending arp replies to arp requests. diff -Naurp -X /home/rddunlap/doc/dontdiff-osdl linux-2612-rc1-bk5-pv/net/Kconfig linux-2612-rc1-bk5-netconfigs/net/Kconfig --- linux-2612-rc1-bk5-pv/net/Kconfig 2005-04-03 19:42:35.000000000 -0700 +++ linux-2612-rc1-bk5-netconfigs/net/Kconfig 2005-04-03 19:45:18.000000000 -0700 @@ -2,7 +2,7 @@ # Network configuration # -menu "Networking support" +menu "Networking options and protocols" config NET bool "Networking support" @@ -10,7 +10,9 @@ config NET Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even when running on a stand-alone machine that isn't connected to any - other computer. If you are upgrading from an older kernel, you + other computer. + + If you are upgrading from an older kernel, you should consider updating your networking tools too because changes in the kernel and the tools often go hand in hand. The tools are contained in the package net-tools, the location and version number @@ -20,11 +22,9 @@ config NET recommended to read the NET-HOWTO, available from . -menu "Networking options" - depends on NET - config PACKET tristate "Packet socket" + depends on NET ---help--- The Packet protocol is used by applications which communicate directly with network devices without an intermediate network @@ -47,6 +47,7 @@ config PACKET_MMAP config UNIX tristate "Unix domain sockets" + depends on NET ---help--- If you say Y here, you will include support for Unix domain sockets; sockets are the standard Unix mechanism for establishing and @@ -64,6 +65,7 @@ config UNIX config NET_KEY tristate "PF_KEY sockets" + depends on NET select XFRM ---help--- PF_KEYv2 socket family, compatible to KAME ones. @@ -72,8 +74,127 @@ config NET_KEY Say Y unless you know what you are doing. +config NETPOLL + depends on NET + def_bool NETCONSOLE + +config NETPOLL_RX + bool "Netpoll support for trapping incoming packets" + default n + depends on NETPOLL + +config NETPOLL_TRAP + bool "Netpoll traffic trapping" + default n + depends on NETPOLL + +config NET_POLL_CONTROLLER + def_bool NETPOLL + depends on NET + +config BRIDGE + tristate "802.1d Ethernet Bridging" + depends on NET + ---help--- + If you say Y here, then your Linux box will be able to act as an + Ethernet bridge, which means that the different Ethernet segments it + is connected to will appear as one Ethernet to the participants. + Several such bridges can work together to create even larger + networks of Ethernets using the IEEE 802.1 spanning tree algorithm. + As this is a standard, Linux bridges will cooperate properly with + other third party bridge products. + + In order to use the Ethernet bridge, you'll need the bridge + configuration tools; see + for location. Please read the Bridge mini-HOWTO for more + information. + + If you enable iptables support along with the bridge support then you + turn your bridge into a bridging IP firewall. + iptables will then see the IP packets being bridged, so you need to + take this into account when setting up your firewall rules. + Enabling arptables support when bridging will let arptables see + bridged ARP traffic in the arptables FORWARD chain. + + To compile this code as a module, choose M here: the module + will be called bridge. + + If unsure, say N. + +config VLAN_8021Q + tristate "802.1Q VLAN Support" + depends on NET + ---help--- + Select this and you will be able to create 802.1Q VLAN interfaces + on your ethernet interfaces. 802.1Q VLAN supports almost + everything a regular ethernet interface does, including + firewalling, bridging, and of course IP traffic. You will need + the 'vconfig' tool from the VLAN project in order to effectively + use VLANs. See the VLAN web page for more information: + + + To compile this code as a module, choose M here: the module + will be called 8021q. + + If unsure, say N. + +config NET_DIVERT + bool "Frame Diverter (EXPERIMENTAL)" + depends on NET && EXPERIMENTAL + ---help--- + The Frame Diverter allows you to divert packets from the + network, that are not aimed at the interface receiving it (in + promisc. mode). Typically, a Linux box setup as an Ethernet bridge + with the Frames Diverter on, can do some *really* transparent www + caching using a Squid proxy for example. + + This is very useful when you don't want to change your router's + config (or if you simply don't have access to it). + + The other possible usages of diverting Ethernet Frames are + numberous: + - reroute smtp traffic to another interface + - traffic-shape certain network streams + - transparently proxy smtp connections + - etc... + + For more informations, please refer to: + + + + If unsure, say N. + +config WAN_ROUTER + tristate "WAN router" + depends on NET && EXPERIMENTAL + ---help--- + Wide Area Networks (WANs), such as X.25, frame relay and leased + lines, are used to interconnect Local Area Networks (LANs) over vast + distances with data transfer rates significantly higher than those + achievable with commonly used asynchronous modem connections. + Usually, a quite expensive external device called a `WAN router' is + needed to connect to a WAN. + + As an alternative, WAN routing can be built into the Linux kernel. + With relatively inexpensive WAN interface cards available on the + market, a perfectly usable router can be built for less than half + the price of an external router. If you have one of those cards and + wish to use your Linux box as a WAN router, say Y here and also to + the WAN driver for your card, below. You will then need the + wan-tools package which is available from . + Read for more + information. + + To compile WAN routing support as a module, choose M here: the + module will be called wanrouter. + + If unsure, say N. + +menu "Networking protocols" + config INET bool "TCP/IP networking" + depends on NET ---help--- These are the protocols used on the Internet and on most local Ethernets. It is highly recommended to say Y here (this will enlarge @@ -118,105 +239,12 @@ config IPV6 source "net/ipv6/Kconfig" -menuconfig NETFILTER - bool "Network packet filtering (replaces ipchains)" - ---help--- - Netfilter is a framework for filtering and mangling network packets - that pass through your Linux box. - - The most common use of packet filtering is to run your Linux box as - a firewall protecting a local network from the Internet. The type of - firewall provided by this kernel support is called a "packet - filter", which means that it can reject individual network packets - based on type, source, destination etc. The other kind of firewall, - a "proxy-based" one, is more secure but more intrusive and more - bothersome to set up; it inspects the network traffic much more - closely, modifies it and has knowledge about the higher level - protocols, which a packet filter lacks. Moreover, proxy-based - firewalls often require changes to the programs running on the local - clients. Proxy-based firewalls don't need support by the kernel, but - they are often combined with a packet filter, which only works if - you say Y here. - - You should also say Y here if you intend to use your Linux box as - the gateway to the Internet for a local network of machines without - globally valid IP addresses. This is called "masquerading": if one - of the computers on your local network wants to send something to - the outside, your box can "masquerade" as that computer, i.e. it - forwards the traffic to the intended outside destination, but - modifies the packets to make it look like they came from the - firewall box itself. It works both ways: if the outside host - replies, the Linux box will silently forward the traffic to the - correct local computer. This way, the computers on your local net - are completely invisible to the outside world, even though they can - reach the outside and can receive replies. It is even possible to - run globally visible servers from within a masqueraded local network - using a mechanism called portforwarding. Masquerading is also often - called NAT (Network Address Translation). - - Another use of Netfilter is in transparent proxying: if a machine on - the local network tries to connect to an outside host, your Linux - box can transparently forward the traffic to a local server, - typically a caching proxy server. - - Yet another use of Netfilter is building a bridging firewall. Using - a bridge with Network packet filtering enabled makes iptables "see" - the bridged traffic. For filtering on the lower network and Ethernet - protocols over the bridge, use ebtables (under bridge netfilter - configuration). - - Various modules exist for netfilter which replace the previous - masquerading (ipmasqadm), packet filtering (ipchains), transparent - proxying, and portforwarding mechanisms. Please see - under "iptables" for the location of - these packages. - - Make sure to say N to "Fast switching" below if you intend to say Y - here, as Fast switching currently bypasses netfilter. - - Chances are that you should say Y here if you compile a kernel which - will run as a router and N for regular hosts. If unsure, say N. - -if NETFILTER - -config NETFILTER_DEBUG - bool "Network packet filtering debugging" - depends on NETFILTER - help - You can say Y here if you want to get additional messages useful in - debugging the netfilter code. - -config BRIDGE_NETFILTER - bool "Bridged IP/ARP packets filtering" - depends on BRIDGE && NETFILTER && INET - default y - ---help--- - Enabling this option will let arptables resp. iptables see bridged - ARP resp. IP traffic. If you want a bridging firewall, you probably - want this option enabled. - Enabling or disabling this option doesn't enable or disable - ebtables. - - If unsure, say N. - -source "net/ipv4/netfilter/Kconfig" -source "net/ipv6/netfilter/Kconfig" -source "net/decnet/netfilter/Kconfig" -source "net/bridge/netfilter/Kconfig" - -endif - -config XFRM - bool - depends on NET - -source "net/xfrm/Kconfig" - source "net/sctp/Kconfig" config ATM tristate "Asynchronous Transfer Mode (ATM) (EXPERIMENTAL)" depends on EXPERIMENTAL + depends on NET ---help--- ATM is a high-speed networking technology for Local Area Networks and Wide Area Networks. It uses a fixed packet size and is @@ -285,52 +313,9 @@ config ATM_BR2684_IPFILTER large number of IP-only vcc's. Do not enable this unless you are sure you know what you are doing. -config BRIDGE - tristate "802.1d Ethernet Bridging" - ---help--- - If you say Y here, then your Linux box will be able to act as an - Ethernet bridge, which means that the different Ethernet segments it - is connected to will appear as one Ethernet to the participants. - Several such bridges can work together to create even larger - networks of Ethernets using the IEEE 802.1 spanning tree algorithm. - As this is a standard, Linux bridges will cooperate properly with - other third party bridge products. - - In order to use the Ethernet bridge, you'll need the bridge - configuration tools; see - for location. Please read the Bridge mini-HOWTO for more - information. - - If you enable iptables support along with the bridge support then you - turn your bridge into a bridging IP firewall. - iptables will then see the IP packets being bridged, so you need to - take this into account when setting up your firewall rules. - Enabling arptables support when bridging will let arptables see - bridged ARP traffic in the arptables FORWARD chain. - - To compile this code as a module, choose M here: the module - will be called bridge. - - If unsure, say N. - -config VLAN_8021Q - tristate "802.1Q VLAN Support" - ---help--- - Select this and you will be able to create 802.1Q VLAN interfaces - on your ethernet interfaces. 802.1Q VLAN supports almost - everything a regular ethernet interface does, including - firewalling, bridging, and of course IP traffic. You will need - the 'vconfig' tool from the VLAN project in order to effectively - use VLANs. See the VLAN web page for more information: - - - To compile this code as a module, choose M here: the module - will be called 8021q. - - If unsure, say N. - config DECNET tristate "DECnet Support" + depends on NET ---help--- The DECnet networking protocol was used in many products made by Digital (now Compaq). It provides reliable stream and sequenced @@ -358,6 +343,7 @@ source "net/llc/Kconfig" config IPX tristate "The IPX protocol" + depends on NET select LLC ---help--- This is support for the Novell networking protocol, IPX, commonly @@ -393,6 +379,7 @@ source "net/ipx/Kconfig" config ATALK tristate "Appletalk protocol support" + depends on NET select LLC ---help--- AppleTalk is the protocol that Apple computers can use to communicate @@ -422,7 +409,7 @@ source "drivers/net/appletalk/Kconfig" config X25 tristate "CCITT X.25 Packet Layer (EXPERIMENTAL)" - depends on EXPERIMENTAL + depends on NET && EXPERIMENTAL ---help--- X.25 is a set of standardized network protocols, similar in scope to frame relay; the one physical line from your box to the X.25 network @@ -453,7 +440,7 @@ config X25 config LAPB tristate "LAPB Data Link Driver (EXPERIMENTAL)" - depends on EXPERIMENTAL + depends on NET && EXPERIMENTAL ---help--- Link Access Procedure, Balanced (LAPB) is the data link layer (i.e. the lower) part of the X.25 protocol. It offers a reliable @@ -470,32 +457,6 @@ config LAPB To compile this driver as a module, choose M here: the module will be called lapb. If unsure, say N. -config NET_DIVERT - bool "Frame Diverter (EXPERIMENTAL)" - depends on EXPERIMENTAL - ---help--- - The Frame Diverter allows you to divert packets from the - network, that are not aimed at the interface receiving it (in - promisc. mode). Typically, a Linux box setup as an Ethernet bridge - with the Frames Diverter on, can do some *really* transparent www - caching using a Squid proxy for example. - - This is very useful when you don't want to change your router's - config (or if you simply don't have access to it). - - The other possible usages of diverting Ethernet Frames are - numberous: - - reroute smtp traffic to another interface - - traffic-shape certain network streams - - transparently proxy smtp connections - - etc... - - For more informations, please refer to: - - - - If unsure, say N. - config ECONET tristate "Acorn Econet/AUN protocols (EXPERIMENTAL)" depends on EXPERIMENTAL && INET @@ -529,32 +490,109 @@ config ECONET_NATIVE Say Y here if you have a native Econet network card installed in your computer. -config WAN_ROUTER - tristate "WAN router" - depends on EXPERIMENTAL +source "net/ax25/Kconfig" + +source "net/irda/Kconfig" + +source "net/bluetooth/Kconfig" + +endmenu +# end options and protocols + +menuconfig NETFILTER + bool "Network packet filtering (replaces ipchains)" ---help--- - Wide Area Networks (WANs), such as X.25, frame relay and leased - lines, are used to interconnect Local Area Networks (LANs) over vast - distances with data transfer rates significantly higher than those - achievable with commonly used asynchronous modem connections. - Usually, a quite expensive external device called a `WAN router' is - needed to connect to a WAN. + Netfilter is a framework for filtering and mangling network packets + that pass through your Linux box. - As an alternative, WAN routing can be built into the Linux kernel. - With relatively inexpensive WAN interface cards available on the - market, a perfectly usable router can be built for less than half - the price of an external router. If you have one of those cards and - wish to use your Linux box as a WAN router, say Y here and also to - the WAN driver for your card, below. You will then need the - wan-tools package which is available from . - Read for more - information. + The most common use of packet filtering is to run your Linux box as + a firewall protecting a local network from the Internet. The type of + firewall provided by this kernel support is called a "packet + filter", which means that it can reject individual network packets + based on type, source, destination etc. The other kind of firewall, + a "proxy-based" one, is more secure but more intrusive and more + bothersome to set up; it inspects the network traffic much more + closely, modifies it and has knowledge about the higher level + protocols, which a packet filter lacks. Moreover, proxy-based + firewalls often require changes to the programs running on the local + clients. Proxy-based firewalls don't need support by the kernel, but + they are often combined with a packet filter, which only works if + you say Y here. - To compile WAN routing support as a module, choose M here: the - module will be called wanrouter. + You should also say Y here if you intend to use your Linux box as + the gateway to the Internet for a local network of machines without + globally valid IP addresses. This is called "masquerading": if one + of the computers on your local network wants to send something to + the outside, your box can "masquerade" as that computer, i.e. it + forwards the traffic to the intended outside destination, but + modifies the packets to make it look like they came from the + firewall box itself. It works both ways: if the outside host + replies, the Linux box will silently forward the traffic to the + correct local computer. This way, the computers on your local net + are completely invisible to the outside world, even though they can + reach the outside and can receive replies. It is even possible to + run globally visible servers from within a masqueraded local network + using a mechanism called portforwarding. Masquerading is also often + called NAT (Network Address Translation). + + Another use of Netfilter is in transparent proxying: if a machine on + the local network tries to connect to an outside host, your Linux + box can transparently forward the traffic to a local server, + typically a caching proxy server. + + Yet another use of Netfilter is building a bridging firewall. Using + a bridge with Network packet filtering enabled makes iptables "see" + the bridged traffic. For filtering on the lower network and Ethernet + protocols over the bridge, use ebtables (under bridge netfilter + configuration). + + Various modules exist for netfilter which replace the previous + masquerading (ipmasqadm), packet filtering (ipchains), transparent + proxying, and portforwarding mechanisms. Please see + under "iptables" for the location of + these packages. + + Make sure to say N to "Fast switching" below if you intend to say Y + here, as Fast switching currently bypasses netfilter. + + Chances are that you should say Y here if you compile a kernel which + will run as a router and N for regular hosts. If unsure, say N. + +if NETFILTER + +config NETFILTER_DEBUG + bool "Network packet filtering debugging" + depends on NETFILTER + help + You can say Y here if you want to get additional messages useful in + debugging the netfilter code. + +config BRIDGE_NETFILTER + bool "Bridged IP/ARP packets filtering" + depends on BRIDGE && NETFILTER && INET + default y + ---help--- + Enabling this option will let arptables resp. iptables see bridged + ARP resp. IP traffic. If you want a bridging firewall, you probably + want this option enabled. + Enabling or disabling this option doesn't enable or disable + ebtables. If unsure, say N. +source "net/ipv4/netfilter/Kconfig" +source "net/ipv6/netfilter/Kconfig" +source "net/decnet/netfilter/Kconfig" +source "net/bridge/netfilter/Kconfig" + +endif +# NETFILTER + +config XFRM + bool + +source "net/xfrm/Kconfig" + menu "QoS and/or fair queueing" config NET_SCHED @@ -596,12 +634,14 @@ config NET_SCHED source "net/sched/Kconfig" endmenu +# end SCHED menu "Network testing" config NET_PKTGEN tristate "Packet Generator (USE WITH CAUTION)" depends on PROC_FS + depends on INET ---help--- This module will inject preconfigured packets, at a configurable rate, out of a given interface. It is used for network interface @@ -615,32 +655,8 @@ config NET_PKTGEN module will be called pktgen. endmenu +# end PKTGEN endmenu - -config NETPOLL - def_bool NETCONSOLE - -config NETPOLL_RX - bool "Netpoll support for trapping incoming packets" - default n - depends on NETPOLL - -config NETPOLL_TRAP - bool "Netpoll traffic trapping" - default n - depends on NETPOLL - -config NET_POLL_CONTROLLER - def_bool NETPOLL - -source "net/ax25/Kconfig" - -source "net/irda/Kconfig" - -source "net/bluetooth/Kconfig" - -source "drivers/net/Kconfig" - -endmenu +# end top support: options and protocols --------------000906010009010506010105-- From kaber@trash.net Sun Apr 3 20:48:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 20:48:24 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j343mKUl014581 for ; Sun, 3 Apr 2005 20:48:20 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DIIYy-0005CP-Iw; Mon, 04 Apr 2005 05:47:44 +0200 Date: Mon, 4 Apr 2005 05:47:44 +0200 (CEST) From: Patrick McHardy X-X-Sender: kaber@kaber.coreworks.de To: Herbert Xu cc: "David S. Miller" , netdev Subject: Re: [IPSEC]: Protect against BHs in xfrm_user_policy() In-Reply-To: <20050404012040.GA16960@gondor.apana.org.au> Message-ID: References: <4250160D.2040405@trash.net> <20050404012040.GA16960@gondor.apana.org.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1324 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev On Mon, 4 Apr 2005, Herbert Xu wrote: > We want the same thing for km_query, no? In all other places were BHs are not explicitly disabled but need to be they are already disabled by the caller, so I left them as they are. Regards Patrick From herbert@gondor.apana.org.au Sun Apr 3 21:06:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 21:07:04 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3446tgB015823 for ; Sun, 3 Apr 2005 21:06:55 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIIr8-0000HZ-00; Mon, 04 Apr 2005 14:06:30 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIIqh-0004h8-00; Mon, 04 Apr 2005 14:06:03 +1000 Date: Mon, 4 Apr 2005 14:06:03 +1000 To: Patrick McHardy Cc: "David S. Miller" , netdev Subject: Re: [IPSEC]: Protect against BHs in xfrm_user_policy() Message-ID: <20050404040603.GA18025@gondor.apana.org.au> References: <4250160D.2040405@trash.net> <20050404012040.GA16960@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1325 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 05:47:44AM +0200, Patrick McHardy wrote: > On Mon, 4 Apr 2005, Herbert Xu wrote: > >We want the same thing for km_query, no? > > In all other places were BHs are not explicitly disabled > but need to be they are already disabled by the caller, > so I left them as they are. Yes you're right. I missed the spin_lock_bh in xfrm_state_find. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From abhishek@pal.ece.iisc.ernet.in Sun Apr 3 21:27:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 21:27:25 -0700 (PDT) Received: from ece.iisc.ernet.in (ece.iisc.ernet.in [144.16.64.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j344REcR017115 for ; Sun, 3 Apr 2005 21:27:17 -0700 Received: from pal.ece.iisc.ernet.in (pal.ece.iisc.ernet.in [144.16.64.149]) by ece.iisc.ernet.in (8.12.6/8.12.6) with ESMTP id j344OY8V092139; Mon, 4 Apr 2005 09:54:39 +0530 (IST) (envelope-from abhishek@pal.ece.iisc.ernet.in) Received: by pal.ece.iisc.ernet.in (Postfix, from userid 1047) id 46E2031E59; Mon, 4 Apr 2005 09:56:49 +0530 (IST) Received: from localhost (localhost [127.0.0.1]) by pal.ece.iisc.ernet.in (Postfix) with ESMTP id 337DA31E57; Mon, 4 Apr 2005 09:56:49 +0530 (IST) Date: Mon, 4 Apr 2005 09:56:49 +0530 (IST) From: Abhishek Gupta To: Thomas Graf Cc: netdev@oss.sgi.com Subject: Multiple REDs per Queue In-Reply-To: <20050403194739.GR3086@postel.suug.ch> Message-ID: References: <20050402213642.GO3086@postel.suug.ch> <20050403194739.GR3086@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1326 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abhishek@pal.ece.iisc.ernet.in Precedence: bulk X-list: netdev Hello Once again thanks Mr. Graf. I have changed my UDP packet generator to Linux's pktgen which generates UDP packets at very high speed and also at constant rate. Things are now working quite well. Thanks for letting me know about the bmon software. Its a great software and is helping me a lot in my work. I have something more to ask. The problem goes as follows: In my project I want to use multiple REDs per queue i.e, in a queue, various tcps(persistent, non-persistent etc) will be operated with different RED parameters(min and max threshold, beta, pmax etc) but operate on same queue length and not on individual queue lengths(including average queue length). Is it possible with linux's tc option to have multiple REDs per queue. As far as I could able to understand, linux tc option provides only one RED per queue. If it is not possible with tc option, then where I need to make changes in source code to achieve the forementioned set-up. Thanks. abhishek ========================================================================= ABHISHEK GUPTA E-mail:abhishek_it_bhu@yahoo.co.in ========================================================================= On Sun, 3 Apr 2005, Thomas Graf wrote: > * Abhishek Gupta 2005-04-03 20:30 > > But the problem is still > > not yet solved as I tried with 1Mbit speed as the setting for link speed > > in the htb configuration and got about 30KBps which amounts to about > > 240Kbitps even though my UDP source is sending at speed of about > > 1MBps(8Mbps), according to RH monitor readings. > > I do not know about that "RH monitor" you are referring to, maybe it > does not display rates correctly. (I found 3 out of 5 rate estimators > outputing with a variance of over 10%) I can recommend you bmon [0] > which states the variance and can be used to a resolution up to > 1/100s given the input source provides an equal or better resolution. > > > Is it possible that the problem is due to the source that I am using for > > UDP packets? > > Very likely, especially due to the huge difference in requested and > achieved rate you have mentioned above. I hardly think this is a > problem related to HTB but rather some misconfiguration in your testing > process. > > [0] http://people.suug.ch/~tgr/bmon/ > From laforge@gnumonks.org Sun Apr 3 22:26:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 22:26:53 -0700 (PDT) Received: from ganesha.gnumonks.org (Debian-exim@ganesha.gnumonks.org [213.95.27.120]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j345QloE020353 for ; Sun, 3 Apr 2005 22:26:48 -0700 Received: from sunbeam.hmw-consulting.de ([83.236.178.203] helo=sunbeam.gnumonks.org) by ganesha.gnumonks.org with asmtp (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.34) id 1DIK6l-0004in-Ey; Mon, 04 Apr 2005 07:26:43 +0200 Received: from laforge by sunbeam.gnumonks.org with local (Exim 4.50) id 1DIK6k-0003oz-JB; Mon, 04 Apr 2005 07:26:42 +0200 Date: Mon, 4 Apr 2005 07:26:42 +0200 From: Harald Welte To: Robert Olsson Cc: netdev@oss.sgi.com Subject: Re: pktgen problem (skb refcount) in 2.6.12-rc1 Message-ID: <20050404052642.GE9155@sunbeam.de.gnumonks.org> References: <20050402191132.GF1890@sunbeam.de.gnumonks.org> <16976.16774.728707.368646@robur.slu.se> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="uAKRQypu60I7Lcqm" Content-Disposition: inline In-Reply-To: <16976.16774.728707.368646@robur.slu.se> User-Agent: mutt-ng 1.5.8-r168i (Debian) X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1327 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: laforge@gnumonks.org Precedence: bulk X-list: netdev --uAKRQypu60I7Lcqm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Apr 03, 2005 at 09:18:30PM +0200, Robert Olsson wrote: > > I've tried to track the problem down, and I've confirmed that skb->use= rs > > never goes down to 1 but instead stays at '2'. >=20 > > The same system with the same pktgen script works fine with 2.6.11.6. > >=20 > > I'm reporting this since it seems like it sounds like we have a skb > > usage count leak somewhere :( >=20 > Sounds like a diff could give some clues. pktgen, e1000 and TX-path shou= ld=20 > be interesting as ev. changes in kernel config. no changes in kernel config. I've reviewed pktgen changes and couldn't find something that would cause the problem. It always only atomic_inc'ed the ussage cound (and decrements only in error path) which is perfectly fine. As for e1000 and or generic TX path changes, I don't have the time to review them now, sorry :( That's why I posted it to netdev, to let people who have an idea about the committed changes know that there is an issue. Cheers, Harald --=20 - Harald Welte http://gnumonks.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) --uAKRQypu60I7Lcqm Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFCUNASXaXGVTD0i/8RAmJlAJ0am9/hoqAXladQcKHHELjDXiSCAwCgk6KQ p4zfJVUew9dckCSBh1Fg84g= =3nOl -----END PGP SIGNATURE----- --uAKRQypu60I7Lcqm-- From grundler@lackof.org Sun Apr 3 23:29:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 23:29:24 -0700 (PDT) Received: from colo.lackof.org (colo.lackof.org [198.49.126.79]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j346TJHK023049 for ; Sun, 3 Apr 2005 23:29:19 -0700 Received: from localhost (localhost [127.0.0.1]) by colo.lackof.org (Postfix) with ESMTP id 1D7D229802F; Mon, 4 Apr 2005 00:31:11 -0600 (MDT) Received: from colo.lackof.org ([127.0.0.1]) by localhost (colo.lackof.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 30848-01; Mon, 4 Apr 2005 00:31:09 -0600 (MDT) Received: by colo.lackof.org (Postfix, from userid 27253) id 9F704298010; Mon, 4 Apr 2005 00:31:09 -0600 (MDT) Date: Mon, 4 Apr 2005 00:31:09 -0600 From: Grant Grundler To: Dmitry Yusupov Cc: "open-iscsi@googlegroups.com" , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Message-ID: <20050404063109.GA30855@colo.lackof.org> References: <20050324233921.GZ14202@opteron.random> <20050325034341.GV32638@waste.org> <20050327035149.GD4053@g5.random> <20050327054831.GA15453@waste.org> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@davemloft.net> <52vf7bwo4w.fsf@topspin.com> <1112042936.5088.22.camel@beastie> <20050328223203.GC28983@kvack.org> <1112465317.24936.10.camel@mylaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112465317.24936.10.camel@mylaptop> X-Home-Page: http://www.parisc-linux.org/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at lackof.org X-Virus-Status: Clean X-archive-position: 1328 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: grundler@parisc-linux.org Precedence: bulk X-list: netdev On Sat, Apr 02, 2005 at 10:08:37AM -0800, Dmitry Yusupov wrote: > So, who cares if one of those cores will do receive side copying? It burns backplane bandwidth that could be used for other things. The problem isn't the CPU cycles. It's the number of times the data has to cross the memory bus. grant From grundler@lackof.org Sun Apr 3 23:33:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Apr 2005 23:33:10 -0700 (PDT) Received: from colo.lackof.org (colo.lackof.org [198.49.126.79]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j346X5v0023682 for ; Sun, 3 Apr 2005 23:33:05 -0700 Received: from localhost (localhost [127.0.0.1]) by colo.lackof.org (Postfix) with ESMTP id F40F629802F; Mon, 4 Apr 2005 00:34:57 -0600 (MDT) Received: from colo.lackof.org ([127.0.0.1]) by localhost (colo.lackof.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 30514-10; Mon, 4 Apr 2005 00:34:57 -0600 (MDT) Received: by colo.lackof.org (Postfix, from userid 27253) id 8C3CA298010; Mon, 4 Apr 2005 00:34:56 -0600 (MDT) Date: Mon, 4 Apr 2005 00:34:56 -0600 From: Grant Grundler To: Dmitry Yusupov Cc: "open-iscsi@googlegroups.com" , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050404063456.GB30855@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112576171.4227.5.camel@mylaptop> X-Home-Page: http://www.parisc-linux.org/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at lackof.org X-Virus-Status: Clean X-archive-position: 1329 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: grundler@parisc-linux.org Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote: > I do not get your concern with memory BW. With good AMD box V40Z(SUN) > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80% > left. PCI-X BUS BW is bigger concern... Yes and No. PCI-X isn't fast enough but the data only crosses the PCI-X bus once. Think about the data flow: 1) DMA to RAM 2) load into CPU cache 3) store back into RAM We are down to 40% left...graphics folks won't like you. grant From davem@davemloft.net Mon Apr 4 00:11:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 00:11:16 -0700 (PDT) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j347B8Ld026309 for ; Mon, 4 Apr 2005 00:11:08 -0700 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DILii-0000Pi-00; Mon, 04 Apr 2005 00:10:00 -0700 Date: Mon, 4 Apr 2005 00:10:00 -0700 From: "David S. Miller" To: Grant Grundler Cc: dmitry_yus@yahoo.com, open-iscsi@googlegroups.com, mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-Id: <20050404001000.5fa8f206.davem@davemloft.net> In-Reply-To: <20050404063456.GB30855@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1330 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Mon, 4 Apr 2005 00:34:56 -0600 Grant Grundler wrote: > Yes and No. PCI-X isn't fast enough but the data only crosses > the PCI-X bus once. Think about the data flow: > 1) DMA to RAM > 2) load into CPU cache > 3) store back into RAM > > We are down to 40% left...graphics folks won't like you. But you're missing the point, which is that the memory system always catches up to the networking technology. We'll have that %60 back before you know it when we have PCI-Z and DDR8 or whatever even in $500.00USD desktop machines. And those systems will be present by the time we put together this complicated infrastructure for RDMA. RDMA is like cache coloring page allocators, it's for yesterday's technology that we won't be using tomorrow. :-) Those steps #2 and #3 in your data flow are powerful, it is what gives us flexibility. And in a general purpose OS that is important. From Robert.Olsson@data.slu.se Mon Apr 4 03:28:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 03:28:22 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34ASGSS015432 for ; Mon, 4 Apr 2005 03:28:17 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j34ARhdR021531; Mon, 4 Apr 2005 12:27:43 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 63C1EEE2B1; Mon, 4 Apr 2005 12:27:43 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16977.5791.367581.655483@robur.slu.se> Date: Mon, 4 Apr 2005 12:27:43 +0200 To: Herbert Xu Cc: Robert Olsson , "David S. Miller" , dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050403214521.GB15901@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> <20050403074337.GA8083@gondor.apana.org.au> <16976.19092.562006.246545@robur.slu.se> <20050403214521.GB15901@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1331 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > The reason I'm suggesting the move to a kernel thread is because > softirq context is not preemptible. > > So doing a large amount of work in it when your table is big means > that a UP machine will freeze for a while. The flush transient will happen also on UP... as I understand this When we have changed the rt_hash_rnd and therefore invalidated all current entries it would be best to blackhole *all* traffic until all old entries are deleted this to avoid transients. --ro From Robert.Olsson@data.slu.se Mon Apr 4 03:38:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 03:38:42 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34AcWoY016376 for ; Mon, 4 Apr 2005 03:38:33 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j34Ac3Ng023049; Mon, 4 Apr 2005 12:38:03 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 6E27EEE2B1; Mon, 4 Apr 2005 12:38:03 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16977.6411.415326.988754@robur.slu.se> Date: Mon, 4 Apr 2005 12:38:03 +0200 To: Herbert Xu Cc: Robert Olsson , Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050403214358.GA15901@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <16976.17876.832677.945878@robur.slu.se> <20050403214358.GA15901@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1332 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > The only way to attack a hash is by exploiting collisions and > create one or more excessively long chains. > > This can be detected as follows at each rt hash insertion. If > > (total number of entries in cache >> (hash length - user defined length)) < > current bucket length > > is true, then we schedule a rehash/flush. > > Hash length is the number of bits in the hash, i.e., > > 1 << hash length == number of buckets > > I'd suggest a default shift length of 3. That is, if any individual > chain is growing beyond 8 times the average chain length then we've > got a problem. This is likely to happen in rt_intern_hash? I don't see how this can get along with chain-pruning there? IMO the thoughts of extending in-flow GC etc are interesting and can hopefully give us more robust performance. --ro From herbert@gondor.apana.org.au Mon Apr 4 03:39:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 03:39:33 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34AdNef016627 for ; Mon, 4 Apr 2005 03:39:24 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIOyp-0002Cm-00; Mon, 04 Apr 2005 20:38:51 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIOyE-0008PA-00; Mon, 04 Apr 2005 20:38:14 +1000 Date: Mon, 4 Apr 2005 20:38:14 +1000 To: Robert Olsson Cc: "David S. Miller" , dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050404103814.GA32269@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> <20050403074337.GA8083@gondor.apana.org.au> <16976.19092.562006.246545@robur.slu.se> <20050403214521.GB15901@gondor.apana.org.au> <16977.5791.367581.655483@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16977.5791.367581.655483@robur.slu.se> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1333 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 12:27:43PM +0200, Robert Olsson wrote: > > The flush transient will happen also on UP... as I understand this > When we have changed the rt_hash_rnd and therefore invalidated all current > entries it would be best to blackhole *all* traffic until all old entries > are deleted this to avoid transients. That's nasty because if you have a large cache like Eric, then you'll be dropping packets for quite a while :) Actually, what's so bad about seeing transients? One cost that I can see is that you'll be walking a chain only to conclude that none of the entries might match. But this is pretty cheap as long as we keep the chain lengths short. The other cost is that we might be creating an entry that gets flushed straight away. However, that's no worse than not using the cache at all since in that case we'll be creating one entry for each packet anyway. Both of these can be avoided too if we really cared. All we need is one bit per chain that indicated whether it's been flushed. So when ip_route_* hits a chain that hasn't been flushed, it could 1) Skip the lookup step. 2) Create the rt entry as usual. 3) Flush the chain while we insert the entry and set the bit. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Mon Apr 4 03:49:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 03:49:55 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34AngJH017952 for ; Mon, 4 Apr 2005 03:49:47 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIP8r-0002Fr-00; Mon, 04 Apr 2005 20:49:13 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIP8b-0008Qf-00; Mon, 04 Apr 2005 20:48:57 +1000 Date: Mon, 4 Apr 2005 20:48:57 +1000 To: Robert Olsson Cc: Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-ID: <20050404104857.GA32359@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <16976.17876.832677.945878@robur.slu.se> <20050403214358.GA15901@gondor.apana.org.au> <16977.6411.415326.988754@robur.slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16977.6411.415326.988754@robur.slu.se> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1334 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 12:38:03PM +0200, Robert Olsson wrote: > > This is likely to happen in rt_intern_hash? I don't see how this can > get along with chain-pruning there? What I'm trying to catch is the case when you've got x number of entries in the table and a large fraction of them are all in one chain. This does not conflict with the goal of keeping the chains short. Even if you strictly allow only 8 entries per chain, it's trivial to exceed 8 times the average chain length. Remember the average chain length can be fractions like 0.1. Of course we need to set a minimum value that the chain needs to grow beyond before this check kicks in. > IMO the thoughts of extending in-flow GC etc are interesting and can > hopefully give us more robust performance. Indeed, it looks like Alexey has already put the code there. It just needs to be made more strict :) It needs to free entries even if they are in use. After all, freeing an entry in use can't be much worse than not having a cache at all. OTOH, having a very long chain is definitely much worse than not having a cache :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From mike@codeweavers.com Mon Apr 4 03:52:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 03:52:51 -0700 (PDT) Received: from mail.codeweavers.com (Debian-exim@mail.codeweavers.com [216.251.189.131]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34AqhYX018579 for ; Mon, 4 Apr 2005 03:52:44 -0700 Received: from foghorn.codeweavers.com ([216.251.189.130] helo=[127.0.0.1]) by mail.codeweavers.com with esmtp (Exim 4.34) id 1DIPC8-0004jR-Cg for netdev@oss.sgi.com; Mon, 04 Apr 2005 05:52:42 -0500 Message-ID: <42511C31.3090801@codeweavers.com> Date: Mon, 04 Apr 2005 19:51:29 +0900 From: Mike McCormack Organization: Codeweavers User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041221 X-Accept-Language: en, en-us MIME-Version: 1.0 To: netdev@oss.sgi.com Content-Type: multipart/mixed; boundary="------------040700050107080907080805" X-SA-Exim-Connect-IP: 216.251.189.130 X-SA-Exim-Mail-From: mike@codeweavers.com Subject: Patch to count the number of datagrams in a unix domain socket X-SA-Exim-Version: 4.2 (built Tue, 25 Jan 2005 19:36:50 +0100) X-SA-Exim-Scanned: Yes (on mail.codeweavers.com) X-Virus-Scanned: ClamAV 0.83/802/Sat Apr 2 06:49:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1335 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mike@codeweavers.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------040700050107080907080805 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi, I trying to implement mailslot support in Wine, and have come across a small problem that isn't easy to solve. My implementation [1] uses a unix domain socket in dgram mode. The Win32 function GetMailslotInfo [2] allows a program to fetch the number of messages waiting in the mailslot. In my implementation, that is the number of datagrams waiting in the socket. In Linux 2.6.11 there is no way to count the the number of datagrams in a socket without reading them all out, one by one. I have attached a small patch that lets me read the number of datagrams in a socket, and makes GetMailslotInfo work in my test case [3]. Questions: Do people thing this is something useful to add to the kernel? What's the right way to assign a value for SIOCINCOUNT, and in which header? Is SIOCDGRAMCOUNT or something else a better name? Should the return value of SIOCINCOUNT for a non-dgram socket be different? If I add this ioctl() for unix domain sockets, should other sockets be made to work the same way? thanks, Mike [1] http://cvs.winehq.org/cvsweb/wine/server/mailslot.c [2] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ipc/base/getmailslotinfo.asp [3] http://cvs.winehq.org/cvsweb/wine/dlls/kernel/tests/mailslot.c --------------040700050107080907080805 Content-Type: text/x-patch; name="siocincount.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="siocincount.diff" --- linux-2.6.11-orig/net/unix/af_unix.c 2005-03-02 16:38:12.000000000 +0900 +++ linux-2.6.11/net/unix/af_unix.c 2005-04-03 16:36:52.000000000 +0900 @@ -1838,6 +1838,7 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock->sk; + struct sk_buff *skb; long amount=0; int err; @@ -1848,8 +1849,6 @@ err = put_user(amount, (int __user *)arg); break; case SIOCINQ: - { - struct sk_buff *skb; if (sk->sk_state == TCP_LISTEN) { err = -EINVAL; @@ -1869,8 +1868,21 @@ spin_unlock(&sk->sk_receive_queue.lock); err = put_user(amount, (int __user *)arg); break; - } +#define SIOCINCOUNT 0x8907 + case SIOCINCOUNT: + /* count the number of packets waiting */ + if (sk->sk_state == TCP_LISTEN) { + err = -EINVAL; + break; + } + + spin_lock(&sk->sk_receive_queue.lock); + skb_queue_walk(&sk->sk_receive_queue, skb) + amount++; + spin_unlock(&sk->sk_receive_queue.lock); + err = put_user(amount, (int __user *)arg); + break; default: err = dev_ioctl(cmd, (void __user *)arg); break; --------------040700050107080907080805-- From akpm@osdl.org Mon Apr 4 04:18:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 04:18:55 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34BIk8m020289 for ; Mon, 4 Apr 2005 04:18:46 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j34BIcs4019225 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon, 4 Apr 2005 04:18:39 -0700 Received: from bix (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id j34BIbvl007759; Mon, 4 Apr 2005 04:18:38 -0700 Date: Mon, 4 Apr 2005 04:18:22 -0700 From: Andrew Morton To: netdev@oss.sgi.com Cc: dcmwai@pl.jaring.my Subject: Fw: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan Message-Id: <20050404041822.2ea0c16a.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j34BIk8m020289 X-archive-position: 1336 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev Begin forwarded message: Date: Mon, 4 Apr 2005 04:15:25 -0700 From: bugme-daemon@osdl.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan http://bugme.osdl.org/show_bug.cgi?id=4441 Summary: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan Kernel Version: 2.6.11-gentoo-r4 i686 Status: NEW Severity: high Owner: acme@conectiva.com.br Submitter: dcmwai@pl.jaring.my Distribution: Gentoo, FC2, FC3 Hardware Environment: Pentium 4 3.0E, Intel SE7210TP1-E Server Entry Board 512 MB DDR Ram 2x Intel® PRO/1000 Dual Port Adapters Software Environment: Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5, glibc-2.3.4.20041102-r1) Fc2 and Fc2 original kernel. Problem Description: When Shutdown a Vlan using this command vconfig rem eth4.1001 The interface will be down (using ifconfig) However the following error will be prompt on the screen and the log leaving the shell to be not responding. Even if "ifconfig eth4.1001 down" is run before "vconfig rem" sill the problem will be there. The only way I tested on solve this problem is to shutdown the interface totally. ifconfig eth4 down Then the vlan can be removed correctly. This problem don't happen on the following "special Condition" 1) On another motherboard (Gigabyte GA-81PE1000-G) same NIC on Fc3 1) On eth0 Steps to reproduce: 1. get a vlan supported NIC, emerge vconfig 2. ifconfig ethx 0.0.0.0 3. create a vlan using "vconfig add ethx nnnn" 4. ifconfig ethx.nnnn aaa.bbb.ccc.ddd 5. remove the vlan using "vconfig rem ethx.nnnn" 6. Wait for the error like the below. * Motherboard and NIC seem to be a Problem in my case. unregister_netdevice: waiting for ethx.nnnn to become free. Usage count = 6 I've also open a bug in Gentoo http://bugs.gentoo.org/show_bug.cgi?id=87495 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hadi@cyberus.ca Mon Apr 4 04:38:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 04:38:56 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34BceGb024867 for ; Mon, 4 Apr 2005 04:38:42 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DIPuc-0007dK-Gw for netdev@oss.sgi.com; Mon, 04 Apr 2005 05:38:34 -0600 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIPuX-00042B-BN; Mon, 04 Apr 2005 07:38:29 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404005805.GA16543@gondor.apana.org.au> References: <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> Content-Type: multipart/mixed; boundary="=-udwTmEebwhkzoeZTJYHN" Organization: jamalopolous Message-Id: <1112614706.1096.439.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 07:38:26 -0400 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1337 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-udwTmEebwhkzoeZTJYHN Content-Type: text/plain Content-Transfer-Encoding: 7bit Herbert! Ok, heres an update. cheers, jamal --=-udwTmEebwhkzoeZTJYHN Content-Disposition: attachment; filename=ipsec-event-take2-2 Content-Type: text/plain; name=ipsec-event-take2-2; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 +++ b/include/net/xfrm.h 2005-04-02 11:59:17.000000000 -0500 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -802,7 +827,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-25 22:28:39.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-02 09:53:03.000000000 -0500 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-03-25 22:28:25.000000000 -0500 +++ b/net/xfrm/xfrm_state.c 2005-04-04 07:35:03.000000000 -0400 @@ -40,6 +40,8 @@ DECLARE_WAIT_QUEUE_HEAD(km_waitq); EXPORT_SYMBOL(km_waitq); +static DEFINE_RWLOCK(xfrm_km_lock); +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); static DEFINE_RWLOCK(xfrm_state_afinfo_lock); static struct xfrm_state_afinfo *xfrm_state_afinfo[NPROTO]; @@ -48,13 +50,15 @@ static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); static DEFINE_SPINLOCK(xfrm_state_gc_lock); -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); static void km_state_expired(struct xfrm_state *x, int hard); +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +void km_state_notify(struct xfrm_state *x, struct km_event *c); static void xfrm_state_gc_destroy(struct xfrm_state *x) { @@ -208,8 +212,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -236,14 +242,21 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +int xfrm_state_delete(struct xfrm_state *x) { + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -402,6 +415,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -764,37 +778,60 @@ } EXPORT_SYMBOL(xfrm_replay_advance); -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); -static DEFINE_RWLOCK(xfrm_km_lock); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_mgr *km; + + read_lock_bh(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock_bh(&xfrm_km_lock); +} + +void km_state_notify(struct xfrm_state *x, struct km_event *c) { struct xfrm_mgr *km; + read_lock_bh(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + km->notify(x, c); + read_unlock_bh(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +static void km_state_expired(struct xfrm_state *x, int hard) +{ + struct km_event c; if (hard) x->km.state = XFRM_STATE_EXPIRED; else x->km.dying = 1; - - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -819,13 +856,12 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-04 07:23:31.000000000 -0400 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -285,14 +286,28 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + xfrm_state_hold(x); + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + xfrm_state_put(x); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +319,19 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -672,6 +696,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +708,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +719,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +846,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +875,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +889,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1112,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1129,107 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + if (x->aalg) + RTA_PUT(skb, XFRMA_ALG_AUTH, + sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8, x->aalg); + if (x->ealg) + RTA_PUT(skb, XFRMA_ALG_CRYPT, + sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8, x->ealg); + if (x->calg) + RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg); + + if (x->encap) + RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: +rtattr_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1363,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1375,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1383,92 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + if (copy_to_user_tmpl(xp, skb) < 0) + goto nlmsg_failure; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("Netlink Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-03-25 22:28:39.000000000 -0500 +++ b/net/key/af_key.c 2005-04-04 07:20:12.000000000 -0400 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1265,27 +1337,24 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + xfrm_state_hold(x); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1370,19 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1520,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + hdr->sadb_msg_satype = pfkey_proto2satype(c->data); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1948,35 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + + if (c->data && c->event == XFRM_SAP_DELETED) + out_hdr->sadb_msg_type = SADB_X_SPDDELETE2; + else + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1984,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2052,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2084,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,25 +2120,41 @@ err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + + xfrm_pol_put(xp); + return err; +} + + +static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, struct sadb_msg *hdr, int dir) +{ + int err; + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + err = 0; + out_skb = pfkey_xfrm_policy2msg_prep(xp); if (IS_ERR(out_skb)) { err = PTR_ERR(out_skb); goto out; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + pfkey_xfrm_policy2msg(out_skb, xp, dir); out_hdr = (struct sadb_msg *) out_skb->data; out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; + out_hdr->sadb_msg_type = hdr->sadb_msg_type; out_hdr->sadb_msg_satype = 0; out_hdr->sadb_msg_errno = 0; out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk); err = 0; out: - xfrm_pol_put(xp); return err; } @@ -2037,8 +2163,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2175,16 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.data = 1; // to signal pfkey of SADB_X_SPDDELETE2 + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + } else { + err = key_pol_get_resp(sk, xp, hdr, pol->sadb_x_policy_dir-1); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2219,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2445,25 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + printk("pfkey doesnt deal with expired policies ..\n"); + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2482,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3035,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-udwTmEebwhkzoeZTJYHN-- From Robert.Olsson@data.slu.se Mon Apr 4 04:41:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 04:41:36 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34BfUkF025381 for ; Mon, 4 Apr 2005 04:41:31 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j34BfRUL001667; Mon, 4 Apr 2005 13:41:28 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id C0252EE2B1; Mon, 4 Apr 2005 13:41:27 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16977.10215.725020.64329@robur.slu.se> Date: Mon, 4 Apr 2005 13:41:27 +0200 To: Harald Welte Cc: Robert Olsson , netdev@oss.sgi.com Subject: Re: pktgen problem (skb refcount) in 2.6.12-rc1 In-Reply-To: <20050404052642.GE9155@sunbeam.de.gnumonks.org> References: <20050402191132.GF1890@sunbeam.de.gnumonks.org> <16976.16774.728707.368646@robur.slu.se> <20050404052642.GE9155@sunbeam.de.gnumonks.org> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1338 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Harald Welte writes: Hello! >>> I've tried to track the problem down, and I've confirmed that skb->users >>> never goes down to 1 but instead stays at '2'. > no changes in kernel config. I've reviewed pktgen changes and couldn't > find something that would cause the problem. It always only > atomic_inc'ed the ussage cound (and decrements only in error path) which > is perfectly fine. OK! Thanks. > As for e1000 and or generic TX path changes, I don't have the time to > review them now, sorry :( That's why I posted it to netdev, to let > people who have an idea about the committed changes know that there is > an issue. Well if it's skb leak it will be seen. --ro From herbert@gondor.apana.org.au Mon Apr 4 04:56:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 04:56:10 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Bu16S026414 for ; Mon, 4 Apr 2005 04:56:02 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIQB5-0002ha-00; Mon, 04 Apr 2005 21:55:35 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIQAe-0003Am-00; Mon, 04 Apr 2005 21:55:08 +1000 Date: Mon, 4 Apr 2005 21:55:08 +1000 To: Patrick McHardy Cc: "David S. Miller" , netdev Subject: Re: [IPSEC]: Protect against BHs in xfrm_user_policy() Message-ID: <20050404115508.GA12171@gondor.apana.org.au> References: <4250160D.2040405@trash.net> <20050404012040.GA16960@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050404012040.GA16960@gondor.apana.org.au> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1339 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 11:20:40AM +1000, herbert wrote: > On Sun, Apr 03, 2005 at 06:13:01PM +0200, Patrick McHardy wrote: > > > > # This is a BitKeeper generated diff -Nru style patch. > > # > > # ChangeSet > > # 2005/04/03 17:36:10+02:00 kaber@coreworks.de > > # [IPSEC]: Protect against BHs in xfrm_user_policy() > > # > > # Signed-off-by: Patrick McHardy > > Looks good. > > Signed-off-by: Herbert Xu Actually, I now think this patch is unnecessary for mainline. The read_lock()'s only need to be protected from the write_lock()'s. Since all the write_lock()'s are made in process context, we don't need to disable BH on the read_lock()'s. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Mon Apr 4 05:18:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:18:42 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CIVUP032473 for ; Mon, 4 Apr 2005 05:18:32 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIQWj-0002qA-00; Mon, 04 Apr 2005 22:17:57 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIQVV-0003Cn-00; Mon, 04 Apr 2005 22:16:41 +1000 Date: Mon, 4 Apr 2005 22:16:41 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404121641.GA12103@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112614706.1096.439.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1340 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Hi Jamal: On Mon, Apr 04, 2005 at 07:38:26AM -0400, jamal wrote: > Ok, heres an update. Great! White space comments only this time, almost :) > @@ -48,13 +50,15 @@ > static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); > static DEFINE_SPINLOCK(xfrm_state_gc_lock); > > -static void __xfrm_state_delete(struct xfrm_state *x); > +static int __xfrm_state_delete(struct xfrm_state *x); > > static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); > static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); > > static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); > static void km_state_expired(struct xfrm_state *x, int hard); > +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); > +void km_state_notify(struct xfrm_state *x, struct km_event *c); No need for these prototypes since they're already in xfrm.h. > @@ -764,37 +778,60 @@ > } > EXPORT_SYMBOL(xfrm_replay_advance); > > -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > -static DEFINE_RWLOCK(xfrm_km_lock); How about letting these guys stay where they are? The move was necessary before because the km_*_notify functions had to be called in this file but that's no longer the case. > --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 > +++ b/net/xfrm/xfrm_user.c 2005-04-04 07:23:31.000000000 -0400 > @@ -268,6 +268,7 @@ > struct xfrm_usersa_info *p = NLMSG_DATA(nlh); > struct xfrm_state *x; > int err; > + struct km_event c; > > err = verify_newsa_info(p, (struct rtattr **) xfrma); > if (err) > @@ -285,14 +286,28 @@ > if (err < 0) { > x->km.state = XFRM_STATE_DEAD; > xfrm_state_put(x); > + return err; > } > > + xfrm_state_hold(x); Sorry, you need to hold x before the call to xfrm_state_add/xfrm_state_update as otherwise they can cause x to be freed. In general hold/put is only useful if 1) When you call hold you already have a reference to the object. 2) In between the hold/put you may free the object. > static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) > { > - struct sk_buff *out_skb; > - struct sadb_msg *out_hdr; > struct xfrm_state *x; > int err; > + struct km_event c; > > xfrm_probe_algs(); ... > - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); > + xfrm_state_hold(x); Same problem as xfrm_user. We need the hold to occur before the add/update for it to be effective. In fact the original code was buggy too since it didn't hold a reference at all. Of course this is very unlikely to crash since it requires a small life time and some bad luck in getting preempted. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From Robert.Olsson@data.slu.se Mon Apr 4 05:30:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:30:12 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CU6WN006631 for ; Mon, 4 Apr 2005 05:30:07 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j34CTXkU009764; Mon, 4 Apr 2005 14:29:33 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 6D233EE2B1; Mon, 4 Apr 2005 14:29:33 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16977.13101.410241.382741@robur.slu.se> Date: Mon, 4 Apr 2005 14:29:33 +0200 To: Herbert Xu Cc: Robert Olsson , "David S. Miller" , dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050404103814.GA32269@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <20050402115528.11f71a3c.davem@davemloft.net> <20050403074337.GA8083@gondor.apana.org.au> <16976.19092.562006.246545@robur.slu.se> <20050403214521.GB15901@gondor.apana.org.au> <16977.5791.367581.655483@robur.slu.se> <20050404103814.GA32269@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1341 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > That's nasty because if you have a large cache like Eric, then you'll > be dropping packets for quite a while :) > > Actually, what's so bad about seeing transients? One cost that > I can see is that you'll be walking a chain only to conclude that > none of the entries might match. But this is pretty cheap as long as > we keep the chain lengths short. > > The other cost is that we might be creating an entry that gets flushed > straight away. However, that's no worse than not using the cache at > all since in that case we'll be creating one entry for each packet > anyway. Maybe you're right and systems seems to survive. But the transit period should be as short as possible. > Both of these can be avoided too if we really cared. All we need > is one bit per chain that indicated whether it's been flushed. So > when ip_route_* hits a chain that hasn't been flushed, it could > > 1) Skip the lookup step. > 2) Create the rt entry as usual. > 3) Flush the chain while we insert the entry and set the bit. Yes better was thinking of something like this too. --ro From hadi@cyberus.ca Mon Apr 4 05:33:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:33:40 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CXZUv007318 for ; Mon, 4 Apr 2005 05:33:35 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DIQll-0005Fw-Fc for netdev@oss.sgi.com; Mon, 04 Apr 2005 06:33:29 -0600 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIQlm-0001Lo-Mc; Mon, 04 Apr 2005 08:33:30 -0400 Subject: Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Aidas Kasparas Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org In-Reply-To: <425067D9.9050603@gmc.lt> References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> <1112538566.1096.391.camel@jzny.localdomain> <425067D9.9050603@gmc.lt> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112618007.1096.465.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 08:33:27 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1342 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Sun, 2005-04-03 at 18:02, Aidas Kasparas wrote: > jamal wrote: [..] > > As an example, if the first pfkey user was just doing "setkey -x" and > > the second was infact pluto, then pluto will never see the > > acquire. This is what got me looking at it to begin with. Look at the > > earlier postings on the subject. > > While I agree that code before your patch would not allow to cooperate > tools using different ways to manage SAD/SPD (pfkey vs netlink), I have > one setup in production where two instances of racoon runs > simultaneously and both gets required pfkey-messages. > yes, multiple instances of the same socket type would work. Try running "ip xfrm mon" and your two racoon instances and see what happens;-> Anyways this will be fixed in upcoming kernels. > > So in other words, just killing the ike server as you propose would mean > > the kernel has no open sockets and will therefore never bother to send > > an acquire. > > I proposed to stop KE server, not to kill it. > The goal is: An acquire that the kernel thinks it sent successfuly in order to update a SA larval state never made it. To simulate this, it doesnt matter whether it happened in kernel-user space boundary or afterwards. The simple observation to make is: the kernel thinks the desired objective has been reached when it was not and from the little investigation conclude the kernel did not try to reliably deliver the message. > > > > Still all this is moot and is distracting us from the main discussion. > > Lets define "lost" simply as the case where an acquire never got to the > > server (which may be sitting elsewhere on the network). > > ACQUIREs _never_ _leaves_ _the box_ they are generated. It is allways > kernel-to-userspace_process communication. It could be made reliable. > And present situation IS sufficiently reliable. > I think i have made a bad case of explaining. Yes, I know where acquires terminate. However this is not about where acquires terminate. It is insufficient to assume that a succesful acquire to user space equates to successful interaction to the KE server which will do an update. Does that make more sense? If you issue an acquire from the kernel it will result in a domino effect in the blocks to the right of xfrm in your diagram and the end result is the larval SA gets an update (as a result of the acquire). So ignore where/how the acquire gets there and imagine that kernel sent an acquire so you could get an SA update then it will become clear. > OK, let's talk about architecture xfrm <-> blackbox. In this > architecture communication between these two elements (I do not speak > about any comms in the blackbox) can be of two types: > 1) reliable (messages always reach blackbox or error is reported); > 2) unreliable (messages may fail even to reach blackbox). > > With good blackboxes good ipsec system can be built using any of comm > types. But: > a) (1) will be more reliable; > b) (1) will be more simple (at least xfrm side, as it will not require > retransmisions); > c) (1) is implemented now (as a function call). > > What I want to say is xfrm-to-blackbox interface is good as it is. The > problem may only be in how good the blackbox is. And here we have to > look inside blackbox and start talk about particular implementations of > that blackbox. Retransmitions, if they needed, needs to be inside that > blackbox. > I am not sure i followed what you are actually trying to say above. Lets discuss basics of how reliability is achieved. If you want to have something reliably delivered after you transmit you do several basic things: a) you wait for an end acknowledgement, in this case an update to the acquire b) you timeout within reasonable time (30 seconds seems to be the default in acquire) and c) you retransmit upto a maximum number of times. This is the part that is missing > > > > The solution being proposed for Linux to treat that xfrm piece in the > > same fashion as ARP is correct. Read the email from Alexey. Imagine if > > ARP was only issued once(as does pfkey) or forever(as does netlink). > > > > I have read email from Alexey. I think that xfrm_lookup() function > implements functionality very similar to functionality which Alexey > described. Absolutely not. But this is a good sign - i.e you see the desire to do this, you just think its already there. > And I think that direct comparison of ARP messages and pfkey messages is > not fair, because pfkey acquire messages goes over reliable traffic and > are used only to _initiate_ the process of SA negotiation. ARP has to > receive information from other boxes which send it only as a direct > responce to some packet. More, ARP is designed to be used [amogst > others] on networks which loose some traffic by design. > Please refer to my above statements as to what is missing to complete the equation. > > I believe this is an issue with ipsec architecture itself - someone > > needs to write an IETF draft on it. > > > > I still do not see the topic for such draft. > Read again what i said above. cheers, jamal From hadi@cyberus.ca Mon Apr 4 05:51:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:51:55 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CplWa008325 for ; Mon, 4 Apr 2005 05:51:47 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DIR3Q-0001th-3s for netdev@oss.sgi.com; Mon, 04 Apr 2005 08:51:44 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIR3M-0003ty-Dw; Mon, 04 Apr 2005 08:51:40 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404121641.GA12103@gondor.apana.org.au> References: <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> Content-Type: multipart/mixed; boundary="=-Fpr/1RgxOBG3EaxqfHd0" Organization: jamalopolous Message-Id: <1112619096.1088.473.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 08:51:37 -0400 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1343 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-Fpr/1RgxOBG3EaxqfHd0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Herbert, On Mon, 2005-04-04 at 08:16, Herbert Xu wrote: > Hi Jamal: > > On Mon, Apr 04, 2005 at 07:38:26AM -0400, jamal wrote: > > Ok, heres an update. > > +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); > > +void km_state_notify(struct xfrm_state *x, struct km_event *c); > > No need for these prototypes since they're already in xfrm.h. Good catch. > > > @@ -764,37 +778,60 @@ > > } > > EXPORT_SYMBOL(xfrm_replay_advance); > > > > -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > > -static DEFINE_RWLOCK(xfrm_km_lock); > > How about letting these guys stay where they are? The move was > necessary before because the km_*_notify functions had to be called > in this file but that's no longer the case. > Changed - dont see what the harm was as they were in that patch though. [..] > Sorry, you need to hold x before the call to > xfrm_state_add/xfrm_state_update as otherwise > they can cause x to be freed. > > In general hold/put is only useful if > > 1) When you call hold you already have a reference to the object. > 2) In between the hold/put you may free the object. > [..] > Same problem as xfrm_user. We need the hold to occur before the > add/update for it to be effective. In fact the original code was > buggy too since it didn't hold a reference at all. > > Of course this is very unlikely to crash since it requires a > small life time and some bad luck in getting preempted. > ;-> Yes, indeed. I think its time for you to throw in the towel ;-> There was really not even a need for that hold given the likelihood of anything like this happening. Anyways ive made this fix and heres the updated patch. cheers, jamal --=-Fpr/1RgxOBG3EaxqfHd0 Content-Disposition: attachment; filename=ipsec-event-take2-3 Content-Type: text/plain; name=ipsec-event-take2-3; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 +++ b/include/net/xfrm.h 2005-04-02 11:59:17.000000000 -0500 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -802,7 +827,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-25 22:28:39.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-02 09:53:03.000000000 -0500 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-03-25 22:28:25.000000000 -0500 +++ b/net/xfrm/xfrm_state.c 2005-04-04 08:41:52.000000000 -0400 @@ -48,7 +48,7 @@ static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); static DEFINE_SPINLOCK(xfrm_state_gc_lock); -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); @@ -208,8 +208,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -236,14 +238,21 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +int xfrm_state_delete(struct xfrm_state *x) { + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -402,6 +411,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -762,39 +772,64 @@ x->replay.bitmap |= (1U << diff); } } +static DEFINE_RWLOCK(xfrm_km_lock); +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); EXPORT_SYMBOL(xfrm_replay_advance); -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); -static DEFINE_RWLOCK(xfrm_km_lock); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct xfrm_mgr *km; + read_lock_bh(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock_bh(&xfrm_km_lock); +} + +void km_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_mgr *km; + read_lock_bh(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + km->notify(x, c); + read_unlock_bh(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +static void km_state_expired(struct xfrm_state *x, int hard) +{ + struct km_event c; + if (hard) x->km.state = XFRM_STATE_EXPIRED; else x->km.dying = 1; - - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -819,13 +854,12 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-04 08:44:31.000000000 -0400 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -277,6 +278,7 @@ if (!x) return err; + xfrm_state_hold(x); if (nlh->nlmsg_type == XFRM_MSG_NEWSA) err = xfrm_state_add(x); else @@ -285,14 +287,27 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + xfrm_state_put(x); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +319,19 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -672,6 +696,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +708,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +719,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +846,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +875,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +889,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1112,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1129,107 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + if (x->aalg) + RTA_PUT(skb, XFRMA_ALG_AUTH, + sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8, x->aalg); + if (x->ealg) + RTA_PUT(skb, XFRMA_ALG_CRYPT, + sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8, x->ealg); + if (x->calg) + RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg); + + if (x->encap) + RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: +rtattr_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1363,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1375,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1383,92 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + if (copy_to_user_tmpl(xp, skb) < 0) + goto nlmsg_failure; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("Netlink Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-03-25 22:28:39.000000000 -0500 +++ b/net/key/af_key.c 2005-04-04 08:44:50.000000000 -0400 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1254,6 +1326,7 @@ if (IS_ERR(x)) return PTR_ERR(x); + xfrm_state_hold(x); if (hdr->sadb_msg_type == SADB_ADD) err = xfrm_state_add(x); else @@ -1265,27 +1338,23 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1370,19 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1520,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + hdr->sadb_msg_satype = pfkey_proto2satype(c->data); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1948,35 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + + if (c->data && c->event == XFRM_SAP_DELETED) + out_hdr->sadb_msg_type = SADB_X_SPDDELETE2; + else + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1984,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2052,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2084,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,25 +2120,41 @@ err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + + xfrm_pol_put(xp); + return err; +} + + +static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, struct sadb_msg *hdr, int dir) +{ + int err; + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + err = 0; + out_skb = pfkey_xfrm_policy2msg_prep(xp); if (IS_ERR(out_skb)) { err = PTR_ERR(out_skb); goto out; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + pfkey_xfrm_policy2msg(out_skb, xp, dir); out_hdr = (struct sadb_msg *) out_skb->data; out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; + out_hdr->sadb_msg_type = hdr->sadb_msg_type; out_hdr->sadb_msg_satype = 0; out_hdr->sadb_msg_errno = 0; out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk); err = 0; out: - xfrm_pol_put(xp); return err; } @@ -2037,8 +2163,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2175,16 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.data = 1; // to signal pfkey of SADB_X_SPDDELETE2 + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + } else { + err = key_pol_get_resp(sk, xp, hdr, pol->sadb_x_policy_dir-1); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2219,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2445,25 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + printk("pfkey doesnt deal with expired policies ..\n"); + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2482,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3035,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-Fpr/1RgxOBG3EaxqfHd0-- From mingz@ele.uri.edu Mon Apr 4 05:57:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:57:12 -0700 (PDT) Received: from leviathan.ele.uri.edu (leviathan.ele.uri.edu [131.128.51.64]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Cv75U009006 for ; Mon, 4 Apr 2005 05:57:08 -0700 Received: from [127.0.0.1] (leviathan [131.128.51.64]) by leviathan.ele.uri.edu (8.12.9/8.12.9) with ESMTP id j34CuuCu026841; Mon, 4 Apr 2005 08:56:57 -0400 (EDT) Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) From: Ming Zhang Reply-To: mingz@ele.uri.edu To: open-iscsi Cc: Dmitry Yusupov , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <20050404063456.GB30855@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> Content-Type: text/plain Message-Id: <1112619415.2880.5.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Mon, 04 Apr 2005 08:56:56 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1344 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: netdev yes, it travel 3 times instead of 1 time. and it is duplex. send traffic will take another 20%. so total 80% or it can never run that fast. ming On Mon, 2005-04-04 at 02:34, Grant Grundler wrote: > On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote: > > I do not get your concern with memory BW. With good AMD box V40Z(SUN) > > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80% > > left. PCI-X BUS BW is bigger concern... > > Yes and No. PCI-X isn't fast enough but the data only crosses > the PCI-X bus once. Think about the data flow: > 1) DMA to RAM > 2) load into CPU cache > 3) store back into RAM > > We are down to 40% left...graphics folks won't like you. > > grant From mingz@ele.uri.edu Mon Apr 4 05:58:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:58:37 -0700 (PDT) Received: from leviathan.ele.uri.edu (leviathan.ele.uri.edu [131.128.51.64]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CwTVs009566 for ; Mon, 4 Apr 2005 05:58:29 -0700 Received: from [127.0.0.1] (leviathan [131.128.51.64]) by leviathan.ele.uri.edu (8.12.9/8.12.9) with ESMTP id j34CwKCu026882; Mon, 4 Apr 2005 08:58:20 -0400 (EDT) Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) From: Ming Zhang Reply-To: mingz@ele.uri.edu To: open-iscsi Cc: Grant Grundler , Dmitry Yusupov , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <20050404001000.5fa8f206.davem@davemloft.net> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> <20050404001000.5fa8f206.davem@davemloft.net> Content-Type: text/plain Message-Id: <1112619500.2880.7.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Mon, 04 Apr 2005 08:58:20 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1345 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 03:10, David S. Miller wrote: > On Mon, 4 Apr 2005 00:34:56 -0600 > Grant Grundler wrote: > > > Yes and No. PCI-X isn't fast enough but the data only crosses > > the PCI-X bus once. Think about the data flow: > > 1) DMA to RAM > > 2) load into CPU cache > > 3) store back into RAM > > > > We are down to 40% left...graphics folks won't like you. > > But you're missing the point, which is that the memory system > always catches up to the networking technology. > > We'll have that %60 back before you know it when we have > PCI-Z and DDR8 or whatever even in $500.00USD desktop machines. 10G is supposed to be deployed in 2005 and 2006. while i did not see DDR4 come out yet. > > And those systems will be present by the time we put together > this complicated infrastructure for RDMA. > > RDMA is like cache coloring page allocators, it's for yesterday's > technology that we won't be using tomorrow. :-) > > Those steps #2 and #3 in your data flow are powerful, it is what > gives us flexibility. And in a general purpose OS that is important. From a.kasparas@gmc.lt Mon Apr 4 05:59:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 05:59:36 -0700 (PDT) Received: from sizifas.gmc.lt (esc.ortopedija.lt [213.190.36.10]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34CxTZh009899 for ; Mon, 4 Apr 2005 05:59:30 -0700 Received: from [10.19.65.83] ([::ffff:10.19.65.83]) by sizifas.gmc.lt with esmtp; Mon, 04 Apr 2005 15:59:25 +0300 Message-ID: <42513A2F.7020504@gmc.lt> Date: Mon, 04 Apr 2005 15:59:27 +0300 From: Aidas Kasparas User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: lt, en, ru, fr MIME-Version: 1.0 To: hadi@cyberus.ca CC: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org Subject: Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> <1112538566.1096.391.camel@jzny.localdomain> <425067D9.9050603@gmc.lt> <1112618007.1096.465.camel@jzny.localdomain> In-Reply-To: <1112618007.1096.465.camel@jzny.localdomain> X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1346 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: a.kasparas@gmc.lt Precedence: bulk X-list: netdev jamal wrote: > I think i have made a bad case of explaining. > Yes, I know where acquires terminate. However this is not about where > acquires terminate. It is insufficient to assume that a succesful > acquire to user space equates to successful interaction to the KE server > which will do an update. Why? -- Aidas Kasparas IT administrator GM Consult Group, UAB From herbert@gondor.apana.org.au Mon Apr 4 06:03:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 06:03:18 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34D39ul010777 for ; Mon, 4 Apr 2005 06:03:09 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIRE6-00036i-00; Mon, 04 Apr 2005 23:02:46 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIRDk-0003Gp-00; Mon, 04 Apr 2005 23:02:24 +1000 Date: Mon, 4 Apr 2005 23:02:24 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404130224.GA12546@gondor.apana.org.au> References: <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112619096.1088.473.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1347 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 08:51:37AM -0400, jamal wrote: > > > > -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > > > -static DEFINE_RWLOCK(xfrm_km_lock); > > > > How about letting these guys stay where they are? The move was > > necessary before because the km_*_notify functions had to be called > > in this file but that's no longer the case. > > Changed > - dont see what the harm was as they were in that patch though. Please see below. > +static DEFINE_RWLOCK(xfrm_km_lock); > +static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > EXPORT_SYMBOL(xfrm_replay_advance); > > -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > -static DEFINE_RWLOCK(xfrm_km_lock); All I wanted was to leave these lines as is so that they didn't appear in the patch at all (except as conext) :) When reviewing patches the most annoying thing is to see things moved around or rearranged because that distracts the reviewer from the substantiative changes. > ;-> Yes, indeed. I think its time for you to throw in the towel ;-> Alright I give in :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Mon Apr 4 06:09:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 06:09:31 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34D9R4a011520 for ; Mon, 4 Apr 2005 06:09:27 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DIRKW-0001Ls-2x for netdev@oss.sgi.com; Mon, 04 Apr 2005 09:09:24 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIRKV-0006M7-2e; Mon, 04 Apr 2005 09:09:23 -0400 Subject: Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire From: jamal Reply-To: hadi@cyberus.ca To: Aidas Kasparas Cc: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org In-Reply-To: <42513A2F.7020504@gmc.lt> References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> <1112538566.1096.391.camel@jzny.localdomain> <425067D9.9050603@gmc.lt> <1112618007.1096.465.camel@jzny.localdomain> <42513A2F.7020504@gmc.lt> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112620159.1087.486.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 09:09:19 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1348 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 08:59, Aidas Kasparas wrote: > jamal wrote: > > I think i have made a bad case of explaining. > > Yes, I know where acquires terminate. However this is not about where > > acquires terminate. It is insufficient to assume that a succesful > > acquire to user space equates to successful interaction to the KE server > > which will do an update. > > Why? The reason the kernel sends an acquire is to update larval SAs it created. The result is either updating the SA or a rejection for that matter. Else theres failure in communication. Anology: If you are trying to send a message from one end system to another and there are multiple hops between them, then just because it made it to the first hop does not equate it made it to its final destination. To make it to the final destination, the confirmation has to come from the target end. So if you said the KE was the final destination then kernel to user space was the first hop. I am not sure if this is clear as an analogy. cheers, jamal From hadi@cyberus.ca Mon Apr 4 06:17:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 06:17:08 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34DH38o012207 for ; Mon, 4 Apr 2005 06:17:03 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIRRr-0000Y3-NA for netdev@oss.sgi.com; Mon, 04 Apr 2005 09:16:59 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIRRq-0007PZ-L9; Mon, 04 Apr 2005 09:16:58 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404130224.GA12546@gondor.apana.org.au> References: <1112403845.1088.14.camel@jzny.localdomain> <20050402012813.GA24575@gondor.apana.org.au> <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112620614.1088.489.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 09:16:55 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1349 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 09:02, Herbert Xu wrote: > > -static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); > > -static DEFINE_RWLOCK(xfrm_km_lock); > > All I wanted was to leave these lines as is so that they didn't > appear in the patch at all (except as conext) :) > > When reviewing patches the most annoying thing is to see things > moved around or rearranged because that distracts the reviewer > from the substantiative changes. Ok, fair enough. It annoys me too when i review patches ;-> So i will fix this before final. > > > ;-> Yes, indeed. I think its time for you to throw in the towel ;-> > > Alright I give in :) Goody - now we can have Masahide run his full test. cheers, jamal From Robert.Olsson@data.slu.se Mon Apr 4 06:17:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 06:17:50 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34DHjeC012307 for ; Mon, 4 Apr 2005 06:17:45 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j34DHCwI018239; Mon, 4 Apr 2005 15:17:12 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 42A31EE2B1; Mon, 4 Apr 2005 15:17:12 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16977.15960.242211.442811@robur.slu.se> Date: Mon, 4 Apr 2005 15:17:12 +0200 To: Herbert Xu Cc: Robert Olsson , Eric Dumazet , davem@davemloft.net, netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <20050404104857.GA32359@gondor.apana.org.au> References: <424E641A.1020609@cosmosbay.com> <16974.41648.568927.54429@robur.slu.se> <20050402193224.GA25157@gondor.apana.org.au> <16976.17876.832677.945878@robur.slu.se> <20050403214358.GA15901@gondor.apana.org.au> <16977.6411.415326.988754@robur.slu.se> <20050404104857.GA32359@gondor.apana.org.au> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1350 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Herbert Xu writes: > What I'm trying to catch is the case when you've got x number of > entries in the table and a large fraction of them are all in one > chain. > > This does not conflict with the goal of keeping the chains short. > > Even if you strictly allow only 8 entries per chain, it's trivial > to exceed 8 times the average chain length. OK! Since deletions doen't happen instantly.. Try some code it can print a warning to start with. > > IMO the thoughts of extending in-flow GC etc are interesting and can > > hopefully give us more robust performance. > > Indeed, it looks like Alexey has already put the code there. It just > needs to be made more strict :) It needs to free entries even if they > are in use. > > After all, freeing an entry in use can't be much worse than not having > a cache at all. OTOH, having a very long chain is definitely much worse > than not having a cache :) FYI I'm experimenting with "new" routing algo that does 24-bit ipv4 lookup and routing without route hash to see if we can come close route hash performance This needs memory. :) IP: FIB routing table of 16777216 buckets, 65536Kbytes for table id=255 Needs some more work before testing. --ro From a.kasparas@gmc.lt Mon Apr 4 07:21:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 07:21:08 -0700 (PDT) Received: from sizifas.gmc.lt (esc.ortopedija.lt [213.190.36.10]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34EL2ZA015721 for ; Mon, 4 Apr 2005 07:21:03 -0700 Received: from [10.19.65.83] ([::ffff:10.19.65.83]) by sizifas.gmc.lt with esmtp; Mon, 04 Apr 2005 17:20:57 +0300 Message-ID: <42514D4B.1040202@gmc.lt> Date: Mon, 04 Apr 2005 17:20:59 +0300 From: Aidas Kasparas User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: lt, en, ru, fr MIME-Version: 1.0 To: hadi@cyberus.ca CC: ipsec-tools-devel@lists.sourceforge.net, netdev , nakam@linux-ipv6.org Subject: Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire References: <1112405303.1096.37.camel@jzny.localdomain> <424E454D.4090402@gmc.lt> <1112477326.1088.321.camel@jzny.localdomain> <424FA946.70809@gmc.lt> <1112538566.1096.391.camel@jzny.localdomain> <425067D9.9050603@gmc.lt> <1112618007.1096.465.camel@jzny.localdomain> <42513A2F.7020504@gmc.lt> <1112620159.1087.486.camel@jzny.localdomain> In-Reply-To: <1112620159.1087.486.camel@jzny.localdomain> X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1351 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: a.kasparas@gmc.lt Precedence: bulk X-list: netdev jamal wrote: > On Mon, 2005-04-04 at 08:59, Aidas Kasparas wrote: > >>jamal wrote: >> >>>I think i have made a bad case of explaining. >>>Yes, I know where acquires terminate. However this is not about where >>>acquires terminate. It is insufficient to assume that a succesful >>>acquire to user space equates to successful interaction to the KE server >>>which will do an update. >> >>Why? > > > The reason the kernel sends an acquire is to update larval SAs it > created. The result is either updating the SA or a rejection for that > matter. Else theres failure in communication. > > Anology: If you are trying to send a message from one end system > to another and there are multiple hops between them, then just because > it made it to the first hop does not equate it made it to its final > destination. To make it to the final destination, the confirmation has > to come from the target end. > So if you said the KE was the final destination then kernel to user > space was the first hop. > I am not sure if this is clear as an analogy. OK, if you have a chain with sevaral hops, then probably there is no better way than signal from other end that it got something. The thing we do not agree is how this should be managed and supervised. I would like to provide an analogy too. You have a telenet application. You try to connect to some host:port. Your telnet application just makes connect(2) syscall and do not cares how kernel establishes that connection. What MAC address to send packet to, how and when to retransmit syn packet if the ack was not received in timely fashion, and so on, so on, so on. If kernel does his job fine, then we have connected socket on which to communicate further. If it does not, or there are some problems on the target host or network in between, then we will not have that connected socket - syscall will return an error. With ipsec system the situation is quite similar, just kernel and userspace have swaped places. Kernel told the userspace to update larval SA. Userspace works on that. If it has negotiated keys for that SA with KE at remote site, fine, userspace will update SA. If there are problems, and key negotiation is not possible -- these SA will not get updated and eventually will die. But single signal to userspace is sufficient for that process to be performed. Yes, kernel can check state of SA every time some packet has to use that SA. But to make noise by asking "please negotiate the SA which you're supposed to be negotiating already" ... IMHO it is contrproductive. -- Aidas Kasparas IT administrator GM Consult Group, UAB From sds@tycho.nsa.gov Mon Apr 4 07:22:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 07:22:41 -0700 (PDT) Received: from jazzhorn.ncsc.mil (mummy.ncsc.mil [144.51.88.129]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34EMYCK015993 for ; Mon, 4 Apr 2005 07:22:34 -0700 Received: from tycho.ncsc.mil (jazzhorn.ncsc.mil [144.51.5.9]) by jazzhorn.ncsc.mil (8.12.10/8.12.10) with ESMTP id j34EJDis015970; Mon, 4 Apr 2005 14:19:13 GMT Received: from moss-spartans.epoch.ncsc.mil (moss-spartans [144.51.25.121]) by tycho.ncsc.mil (8.12.8/8.12.8) with ESMTP id j34ENsDo029994; Mon, 4 Apr 2005 10:23:55 -0400 (EDT) Subject: Re: [PATCH] Fix SELinux for removal of i_sock From: Stephen Smalley To: "David S. Miller" Cc: jmorris@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, matthew@wil.cx In-Reply-To: <20050401123520.7532528b.davem@davemloft.net> References: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> <20050401123520.7532528b.davem@davemloft.net> Content-Type: text/plain Organization: National Security Agency Date: Mon, 04 Apr 2005 10:13:53 -0400 Message-Id: <1112624033.7629.61.camel@moss-spartans.epoch.ncsc.mil> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-14) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1352 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sds@tycho.nsa.gov Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 12:35 -0800, David S. Miller wrote: > On Fri, 01 Apr 2005 15:06:37 -0500 > Stephen Smalley wrote: > > > This patch against -bk eliminates the use of i_sock by SELinux as it > > appears to have been removed recently, breaking the build of SELinux in > > -bk. Simply replacing the i_sock test with an S_ISSOCK test would be > > unsafe in the SELinux code, as the latter will also return true for the > > inodes of socket files in the filesystem, not just the actual socket > > objects IIUC. Hence this patch reworks the SELinux code to avoid the > > need to apply such a test in the first place, part of which was > > obsoleted anyway by earlier changes to SELinux. Please apply. > > > > Signed-off-by: Stephen Smalley > > Signed-off-by: James Morris > > Applied, thanks Stephen. So, just for clarification, since a S_ISSOCK test is not necessarily equivalent to an i_sock test (in the case of inodes of socket files in the filesystem), was removing i_sock truly the right choice? It may not be an issue for typical users of i_sock since you can't open a descriptor to such a socket file, so any code that was acting on an open file shouldn't have to deal with this ambiguity, but could possibly lead to an erroneous use of SOCKET_I on the inode of a socket file in other code (which is what would have happened in SELinux if we had just changed the i_sock test to an ISSOCK test). Thanks, just trying to avoid confusion in the kernel in the future... -- Stephen Smalley National Security Agency From greearb@candelatech.com Mon Apr 4 08:44:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 08:44:30 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34FiNB9022886 for ; Mon, 4 Apr 2005 08:44:23 -0700 Received: from [4.33.45.22] (evrtwa1-ar2-4-33-045-022.evrtwa1.dsl-verizon.net [4.33.45.22]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j34GAbLH004256; Mon, 4 Apr 2005 09:10:37 -0700 Message-ID: <425160D1.6060701@candelatech.com> Date: Mon, 04 Apr 2005 08:44:17 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.3) Gecko/20041020 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Morton CC: netdev@oss.sgi.com, dcmwai@pl.jaring.my Subject: Re: Fw: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan References: <20050404041822.2ea0c16a.akpm@osdl.org> In-Reply-To: <20050404041822.2ea0c16a.akpm@osdl.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1353 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Andrew Morton wrote: > > Begin forwarded message: > > Date: Mon, 4 Apr 2005 04:15:25 -0700 > From: bugme-daemon@osdl.org > To: bugme-new@lists.osdl.org > Subject: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan > > > http://bugme.osdl.org/show_bug.cgi?id=4441 > > Summary: unregister_netdevice Prompt and system shell lookup when > trying to shutdown vlan > Kernel Version: 2.6.11-gentoo-r4 i686 > Status: NEW > Severity: high > Owner: acme@conectiva.com.br > Submitter: dcmwai@pl.jaring.my > > > Distribution: > Gentoo, FC2, FC3 > > Hardware Environment: > Pentium 4 3.0E, > Intel SE7210TP1-E Server Entry Board > 512 MB DDR Ram > 2x Intel® PRO/1000 Dual Port Adapters > > Software Environment: > Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5, glibc-2.3.4.20041102-r1) > Fc2 and Fc2 original kernel. > > Problem Description: > When Shutdown a Vlan using this command > vconfig rem eth4.1001 > The interface will be down (using ifconfig) > However the following error will be prompt on the screen and the log leaving the > shell to be not responding. > > Even if "ifconfig eth4.1001 down" is run before "vconfig rem" sill the problem > will be there. > The only way I tested on solve this problem is to shutdown the interface totally. > ifconfig eth4 down > > Then the vlan can be removed correctly. > > This problem don't happen on the following "special Condition" > 1) On another motherboard (Gigabyte GA-81PE1000-G) same NIC on Fc3 > 1) On eth0 > > > Steps to reproduce: > 1. get a vlan supported NIC, emerge vconfig > 2. ifconfig ethx 0.0.0.0 > 3. create a vlan using "vconfig add ethx nnnn" > 4. ifconfig ethx.nnnn aaa.bbb.ccc.ddd > 5. remove the vlan using "vconfig rem ethx.nnnn" > 6. Wait for the error like the below. > * Motherboard and NIC seem to be a Problem in my case. > > unregister_netdevice: waiting for ethx.nnnn to become free. Usage count = 6 In the past, IPv6 has often been the problem here. Are you using IPv6? What is the hardware/driver for eth4? Thanks, Ben > > I've also open a bug in Gentoo > http://bugs.gentoo.org/show_bug.cgi?id=87495 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > -- Ben Greear Candela Technologies Inc http://www.candelatech.com From ganesh.venkatesan@gmail.com Mon Apr 4 08:59:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 08:59:08 -0700 (PDT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.198]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Fx3Qr023960 for ; Mon, 4 Apr 2005 08:59:03 -0700 Received: by rproxy.gmail.com with SMTP id r35so1210877rna for ; Mon, 04 Apr 2005 08:59:02 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=QFV/TOJAgUzkcmGw15aOqirYL/nfc9WMnGu4vU5CpVFGt5Hfo5hI+yL7bYpj7kwZO5vUBeSJrIA4DkbIIcT+l75Hw1Cb1B+UfHr6eBJl7Z/HsOCh2exSXY+ORJHgNBssKRWg4ITlQc6qOdLRYslY0q4kBCwszb3q8JZBaPYugDc= Received: by 10.38.87.21 with SMTP id k21mr5262305rnb; Mon, 04 Apr 2005 08:59:02 -0700 (PDT) Received: by 10.54.29.37 with HTTP; Mon, 4 Apr 2005 08:59:02 -0700 (PDT) Message-ID: <5fc59ff3050404085928b37e57@mail.gmail.com> Date: Mon, 4 Apr 2005 08:59:02 -0700 From: Ganesh Venkatesan Reply-To: Ganesh Venkatesan To: Ben Greear Subject: Re: Fw: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan Cc: Andrew Morton , netdev@oss.sgi.com, dcmwai@pl.jaring.my In-Reply-To: <425160D1.6060701@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <20050404041822.2ea0c16a.akpm@osdl.org> <425160D1.6060701@candelatech.com> X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1354 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ganesh.venkatesan@gmail.com Precedence: bulk X-list: netdev I have seen a similar issue without the VLAN module. I was using IPv4. Happens with 10/100 and 1GbE drivers on Itanium and AMD Opteron systems. Kernel used was RHEL4 (2.6.9-5.ELsmp). ganesh. On Apr 4, 2005 8:44 AM, Ben Greear wrote: > Andrew Morton wrote: > > > > Begin forwarded message: > > > > Date: Mon, 4 Apr 2005 04:15:25 -0700 > > From: bugme-daemon@osdl.org > > To: bugme-new@lists.osdl.org > > Subject: [Bugme-new] [Bug 4441] New: unregister_netdevice Prompt and system shell lookup when trying to shutdown vlan > > > > > > http://bugme.osdl.org/show_bug.cgi?id=4441 > > > > Summary: unregister_netdevice Prompt and system shell lookup when > > trying to shutdown vlan > > Kernel Version: 2.6.11-gentoo-r4 i686 > > Status: NEW > > Severity: high > > Owner: acme@conectiva.com.br > > Submitter: dcmwai@pl.jaring.my > > > > > > Distribution: > > Gentoo, FC2, FC3 > > > > Hardware Environment: > > Pentium 4 3.0E, > > Intel SE7210TP1-E Server Entry Board > > 512 MB DDR Ram > > 2x Intel(r) PRO/1000 Dual Port Adapters > > > > Software Environment: > > Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5, glibc-2.3.4.20041102-r1) > > Fc2 and Fc2 original kernel. > > > > Problem Description: > > When Shutdown a Vlan using this command > > vconfig rem eth4.1001 > > The interface will be down (using ifconfig) > > However the following error will be prompt on the screen and the log leaving the > > shell to be not responding. > > > > Even if "ifconfig eth4.1001 down" is run before "vconfig rem" sill the problem > > will be there. > > The only way I tested on solve this problem is to shutdown the interface totally. > > ifconfig eth4 down > > > > Then the vlan can be removed correctly. > > > > This problem don't happen on the following "special Condition" > > 1) On another motherboard (Gigabyte GA-81PE1000-G) same NIC on Fc3 > > 1) On eth0 > > > > > > Steps to reproduce: > > 1. get a vlan supported NIC, emerge vconfig > > 2. ifconfig ethx 0.0.0.0 > > 3. create a vlan using "vconfig add ethx nnnn" > > 4. ifconfig ethx.nnnn aaa.bbb.ccc.ddd > > 5. remove the vlan using "vconfig rem ethx.nnnn" > > 6. Wait for the error like the below. > > * Motherboard and NIC seem to be a Problem in my case. > > > > unregister_netdevice: waiting for ethx.nnnn to become free. Usage count = 6 > > In the past, IPv6 has often been the problem here. Are you using IPv6? > > What is the hardware/driver for eth4? > > Thanks, > Ben > > > > > I've also open a bug in Gentoo > > http://bugs.gentoo.org/show_bug.cgi?id=87495 > > > > ------- You are receiving this mail because: ------- > > You are on the CC list for the bug, or are watching someone who is. > > > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > From rharper333@hotmail.com Mon Apr 4 09:17:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:17:54 -0700 (PDT) Received: from OMC3-S41.phx.gbl (omc3-s41.bay6.hotmail.com [65.54.249.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34GHksC000737 for ; Mon, 4 Apr 2005 09:17:46 -0700 Received: from hotmail.com ([64.4.54.112]) by OMC3-S41.phx.gbl with Microsoft SMTPSVC(6.0.3790.211); Mon, 4 Apr 2005 09:17:41 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 4 Apr 2005 09:17:41 -0700 Message-ID: Received: from 192.38.69.230 by by20fd.bay20.hotmail.msn.com with HTTP; Mon, 04 Apr 2005 16:17:40 GMT X-Originating-IP: [192.38.69.230] X-Originating-Email: [rharper333@hotmail.com] X-Sender: rharper333@hotmail.com From: "R Harper" To: netdev@oss.sgi.com Subject: RFC 1323 Date: Mon, 04 Apr 2005 18:17:40 +0200 Mime-Version: 1.0 Content-Type: text/plain; format=flowed X-OriginalArrivalTime: 04 Apr 2005 16:17:41.0398 (UTC) FILETIME=[D3783B60:01C53931] X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1355 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rharper333@hotmail.com Precedence: bulk X-list: netdev Hello I have been taking a look at the timestamp feature in RFC 1323. My question is how are the TSval and TSecr fields represented in the current implementation of linux, e.g. that is the resolution of the timers. Is the format simply timeval.tv_sec from the time.h or is timeval.tv_usec also mixed into the timestamp. (If someone would point to where in the code this is defined) Another question, as I understand the RFC, the timestamp are only relevant to the box generating them. Are TCP implementation different in how the TSval data is represented? Is it possible to make use of the timestamp in an intermediate box, e.g. estimating RTT in a box that a TCP connection passes? I someone has the time to answer these question, it would be great. with regards r.harper _________________________________________________________________ Undgå pop-ups med MSN Toolbar - http://toolbar.msn.dk/ hent den gratis! From alpt@freaknet.org Mon Apr 4 09:22:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:23:04 -0700 (PDT) Received: from alpt.dyndns.org (host183-41.pool8251.interbusiness.it [82.51.41.183]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34GMsbT001557 for ; Mon, 4 Apr 2005 09:22:57 -0700 Received: (qmail 10087 invoked by uid 1000); 4 Apr 2005 18:22:01 +0200 Date: Mon, 4 Apr 2005 18:22:01 +0200 From: Alpt To: linux-net@vger.kernel.org, netdev@oss.sgi.com, bridge@lists.osdl.org Subject: Re: [PATCH bridge-2.6.11] bridge hub_enabled option Message-ID: <20050404162201.GA10070@nihil> Mail-Followup-To: Alpt , linux-net@vger.kernel.org, netdev@oss.sgi.com, bridge@lists.osdl.org References: <20050327092700.GA19972@nihil> <20050329112733.GA6274@darkalpt.hinezumilabs.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Qxx1br4bt0+wmkIi" Content-Disposition: inline In-Reply-To: <20050329112733.GA6274@darkalpt.hinezumilabs.org> User-Agent: Mutt/1.4.2.1i User-Agent: hahaSRY X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1356 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: alpt@freaknet.org Precedence: bulk X-list: netdev --Qxx1br4bt0+wmkIi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 29, 2005 at 01:27:33PM +0200, Alpt wrote : ~> The document describing this patch is here: ~> http://www.freaknet.org/alpt/src/bridge-hub/readme ~>=20 ~> There is a small correction for this patch. The new version is attached ~> here and be be found also here: ~> http://www.freaknet.org/alpt/src/bridge-hub/bridge-2.6.11-hub.patch ~>=20 ~> The patch for the bridge-utils: ~> http://www.freaknet.org/alpt/src/bridge-hub/bridge-utils-1.0.6-hub.patch so.. is it applied or not? Best Regards --=20 :wq! "I don't know nothing" The One Who reached the Thinking Matter '.' [ Alpt --- Freaknet Medialab ] [ GPG Key ID 441CF0EE ] [ Key fingerprint =3D 8B02 26E8 831A 7BB9 81A9 5277 BFF8 037E 441C F0EE ] --Qxx1br4bt0+wmkIi Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFCUWmpv/gDfkQc8O4RAhPXAJwNGJMFHkVEPVrV5reo3bknUdtJygCgxBWM fMqPves8wmP6yoXMCdDaBXQ= =xnaj -----END PGP SIGNATURE----- --Qxx1br4bt0+wmkIi-- From grundler@lackof.org Mon Apr 4 09:29:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:29:45 -0700 (PDT) Received: from colo.lackof.org (colo.lackof.org [198.49.126.79]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34GTbWx002339 for ; Mon, 4 Apr 2005 09:29:37 -0700 Received: from localhost (localhost [127.0.0.1]) by colo.lackof.org (Postfix) with ESMTP id 6F57E29802F; Mon, 4 Apr 2005 10:31:27 -0600 (MDT) Received: from colo.lackof.org ([127.0.0.1]) by localhost (colo.lackof.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07632-04; Mon, 4 Apr 2005 10:31:25 -0600 (MDT) Received: by colo.lackof.org (Postfix, from userid 27253) id E0B7B298010; Mon, 4 Apr 2005 10:31:25 -0600 (MDT) Date: Mon, 4 Apr 2005 10:31:25 -0600 From: Grant Grundler To: "David S. Miller" Cc: dmitry_yus@yahoo.com, open-iscsi@googlegroups.com, mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050404163125.GA6809@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> <20050404001000.5fa8f206.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050404001000.5fa8f206.davem@davemloft.net> X-Home-Page: http://www.parisc-linux.org/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at lackof.org X-Virus-Status: Clean X-archive-position: 1357 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: grundler@parisc-linux.org Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 12:10:00AM -0700, David S. Miller wrote: > On Mon, 4 Apr 2005 00:34:56 -0600 > Grant Grundler wrote: > > > Yes and No. PCI-X isn't fast enough but the data only crosses > > the PCI-X bus once. Think about the data flow: > > 1) DMA to RAM > > 2) load into CPU cache > > 3) store back into RAM > > > > We are down to 40% left...graphics folks won't like you. > > But you're missing the point, which is that the memory system > always catches up to the networking technology. No. Bus bandwidth catches up to "a" networking technology - not the "current" technology. Networking and graphics are usually starving for bus bandwidth. > We'll have that %60 back before you know it when we have > PCI-Z and DDR8 or whatever even in $500.00USD desktop machines. Yes, I agree. That's certainly how it went for 100bt and gige. Even laptops come with gige now. But we aren't in that part "of the curve" for IB or 10GigE *yet*. > And those systems will be present by the time we put together > this complicated infrastructure for RDMA. And that will be fine for "general use". > RDMA is like cache coloring page allocators, it's for yesterday's > technology that we won't be using tomorrow. :-) > > Those steps #2 and #3 in your data flow are powerful, it is what > gives us flexibility. Agreed - some very cool things have been done with it. And for general use, it'll perf sufficiently well over gige. In the future, I agree IB or 10gigE will too. > And in a general purpose OS that is important. I think most of the people interested in IB and 10GigE aren't looking for "general use". They have a particular application in mind and they want to maximize performance for dollar spent. Things like "science appliance", "router", "data warehouse" come to mind. "General Use" will be a reality only when the dollar cost comes down so those new technologies can compete with gige. thanks, grant From shemminger@osdl.org Mon Apr 4 09:30:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:30:53 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34GUmXJ002616 for ; Mon, 4 Apr 2005 09:30:48 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j34GUZs4012382 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon, 4 Apr 2005 09:30:36 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [172.20.1.103]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j34GUZDs020853; Mon, 4 Apr 2005 09:30:35 -0700 Date: Mon, 4 Apr 2005 09:30:36 -0700 From: Stephen Hemminger To: Alpt Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com, bridge@lists.osdl.org Subject: Re: [PATCH bridge-2.6.11] bridge hub_enabled option Message-ID: <20050404093036.1065a465@dxpl.pdx.osdl.net> In-Reply-To: <20050404162201.GA10070@nihil> References: <20050327092700.GA19972@nihil> <20050329112733.GA6274@darkalpt.hinezumilabs.org> <20050404162201.GA10070@nihil> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1358 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Mon, 4 Apr 2005 18:22:01 +0200 Alpt wrote: > On Tue, Mar 29, 2005 at 01:27:33PM +0200, Alpt wrote : > ~> The document describing this patch is here: > ~> http://www.freaknet.org/alpt/src/bridge-hub/readme > ~> > ~> There is a small correction for this patch. The new version is attached > ~> here and be be found also here: > ~> http://www.freaknet.org/alpt/src/bridge-hub/bridge-2.6.11-hub.patch > ~> > ~> The patch for the bridge-utils: > ~> http://www.freaknet.org/alpt/src/bridge-hub/bridge-utils-1.0.6-hub.patch > > so.. is it applied or not? > > Best Regards I would rather it not be applied to mainline since it only has specialized usage. I am willing to hold keep it in the patches area of the bridge site so others can find it. From shemminger@osdl.org Mon Apr 4 09:51:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:51:15 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Gp3xB004024 for ; Mon, 4 Apr 2005 09:51:03 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j34Goms4014278 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon, 4 Apr 2005 09:50:48 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [172.20.1.103]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j34Gol14021977; Mon, 4 Apr 2005 09:50:47 -0700 Date: Mon, 4 Apr 2005 09:50:47 -0700 From: Stephen Hemminger To: jaganav@us.ibm.com Cc: Roland Dreier , Benjamin LaHaise , Dmitry Yusupov , open-iscsi@googlegroups.com, "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com, bmt@zurich.ibm.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050404095047.4d5c263f@dxpl.pdx.osdl.net> In-Reply-To: <1112405833.424df749e61b5@imap.linux.ibm.com> References: <1112405833.424df749e61b5@imap.linux.ibm.com> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1359 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Fri, 1 Apr 2005 20:37:13 -0500 jaganav@us.ibm.com wrote: > Quoting Stephen Hemminger : > > > On Thu, 31 Mar 2005 21:13:39 -0500 > > jaganav@us.ibm.com wrote: > > > > > Quoting Roland Dreier : > > > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or > > > > whatever you want to call it) world. However I know that the large > > > > majority of InfiniBand use right now is running on Linux, and I hope > > > > the Linux community is willing to work with the IB community. > > > > > > > > > > Just want to let everyone know know that we have started an opensource > > > effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs). > > This > > > community has now come up with an architecture > > > (http://rdma.sourceforge.net/architecture.pdf) to build this support in > > Linux. > > > Would really appreciate if you review and provide any comments. We have > > just > > > started to hack but no code is available on this project yet. > > > > > > Thanks > > > Venkat > > > > OpenRdma is a misnomer, because as I read your architecture you are trying > > to > > create a "kernel abstraction layer" for closed source vendor RDMA drivers. > > This will > > never be accepted, please go back to the drawing board and figure out how to > > make > > real open source drivers. > > > > > > First let me say that the purpose of this project is to > make the entire stack (with all of the enablement layers) > including the drivers opensourced. How about putting out code early that implements a subset of what you want (and not waiting till you think it is done). > The kernel abstraction layer will be built > around standards based (opengroup.org/icsc) RNIC-PI > interface and which allows the RNIC vendors to opensource > their drivers using that interface. BTW, RNIC-PI > interface is work-in-progress and the first draft > is targeted to be published soon. But standards based abstraction layer is sure to be limited to least common denominator. The locking model and setup/teardown are sure to be different under each OS. Also, it is impossible to build any decent abstraction top-down in a "waterfall" model. You need to have an iterative process that refines and is willing to have compatibility restrictions. Also, the kernel community hates interfaces and code where there is a big *don't go here* box that prevents fixing bugs and improving interfaces. Linux puts a big emphasis on long-term maintainability of code. Another issue that concerns me is making sure that all the security and policy are maintained when doing RDMA. How do you do firewalling and routing when you are allowing adapter to control the world? > Several RNIC adapter vendors, who contribute to the > openRDMA effort, are quite willing to opensource > their drivers through openRDMA project. > > BTW, I understood why you got the impression > that the this is for closed source vendor drivers: > Our intention is not to allow the kernel verbs > provider code (kVP) to be private and that was > an error. Thanks for pointing this out > but we'll make this change soon. You are attacking a hard problem. Thanks for the effort. From dmitry_yus@yahoo.com Mon Apr 4 09:54:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:54:18 -0700 (PDT) Received: from smtp110.mail.sc5.yahoo.com (smtp110.mail.sc5.yahoo.com [66.163.170.8]) by oss.sgi.com (8.13.0/8.13.0) with SMTP id j34GsCSr004647 for ; Mon, 4 Apr 2005 09:54:12 -0700 Received: from unknown (HELO beastie) (dmitry?yus@67.120.213.161 with plain) by smtp110.mail.sc5.yahoo.com with SMTP; 4 Apr 2005 16:54:12 -0000 Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) From: Dmitry Yusupov To: Grant Grundler Cc: "open-iscsi@googlegroups.com" , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com In-Reply-To: <20050404063456.GB30855@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> Content-Type: text/plain Date: Mon, 04 Apr 2005 09:54:10 -0700 Message-Id: <1112633650.9559.142.camel@beastie> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1360 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dmitry_yus@yahoo.com Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 00:34 -0600, Grant Grundler wrote: > On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote: > > I do not get your concern with memory BW. With good AMD box V40Z(SUN) > > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80% > > left. PCI-X BUS BW is bigger concern... > > Yes and No. PCI-X isn't fast enough but the data only crosses > the PCI-X bus once. Think about the data flow: > 1) DMA to RAM yes. > 2) load into CPU cache yes. > 3) store back into RAM no. we are talking about receive side optimization only. why do you think store back into RAM comes to the picture? also keep in mind that we have huge L2 & L3 caches today and write operation is usually very well buffered. > We are down to 40% left...graphics folks won't like you. > > grant > From rick.jones2@hp.com Mon Apr 4 09:56:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 09:56:29 -0700 (PDT) Received: from ccerelrim03.cce.hp.com (smtp.cce.hp.com [161.114.21.24]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34GuOsC005260 for ; Mon, 4 Apr 2005 09:56:24 -0700 Received: from ccerelrim03.cce.hp.com (localhost [127.0.0.1]) by receive-from-antispam-filter (Postfix) with SMTP id DC12332E42; Mon, 4 Apr 2005 11:56:18 -0500 (CDT) Received: from mailstation.cce.hp.com (mailstation.cce.hp.com [161.114.20.124]) by ccerelrim03.cce.hp.com (Postfix) with ESMTP id AFDD532E82; Mon, 4 Apr 2005 11:56:16 -0500 (CDT) Received: from [192.168.1.100] (adsl-64-171-2-144.dsl.sntc01.pacbell.net [64.171.2.144]) by mailstation.cce.hp.com (Postfix) with ESMTP id 960781CCC8; Mon, 4 Apr 2005 11:56:15 -0500 (CDT) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v619.2) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Cc: netdev@oss.sgi.com From: rick jones Subject: Re: RFC 1323 Date: Mon, 4 Apr 2005 09:56:06 -0700 To: "R Harper" X-Mailer: Apple Mail (2.619.2) X-PMX-Version: 5.0.0.131485, Antispam-Engine: 2.0.3.1, Antispam-Data: 2005.4.4.8 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1361 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev > > Another question, as I understand the RFC, the timestamp are only > relevant to the box generating them. Are TCP implementation different > in how the TSval data is represented? They certainly can be - for the PAWS portion of 1323 the fields in the "timestamp" options only have to be monotonic nondecreasing so they can extend the sequence number space. From my brief perusal of the RFC, sender's are free to have them be more or less whatever they want, and as networks become faster (10G) looks like higher resolution will be desired. [I wonder when even the extra 32 bits will not be enough :) ] With the recent dust-up about ID'ing systems based on their clock skew, it would not surprise me to see someone suggest that the options should just be for PAWS and not for RTTM. > Is it possible to make use of the timestamp in an intermediate box, > e.g. estimating RTT in a box that a TCP connection passes? If you know that the sender is using the option for timing, and know the resolution - basically that means knowing what the sender is running. rick jones there is no rest for the wicked, yet the virtuous have no pillows From ayyappan.veeraiyan@intel.com Mon Apr 4 10:26:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 10:26:56 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34HQmgQ007045 for ; Mon, 4 Apr 2005 10:26:49 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j34HQSdY020356; Mon, 4 Apr 2005 17:26:28 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j34HQNeL028773; Mon, 4 Apr 2005 17:26:27 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005040410262504784 ; Mon, 04 Apr 2005 10:26:26 -0700 Received: from orsmsx410.amr.corp.intel.com ([192.168.65.64]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 4 Apr 2005 10:26:19 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: pktgen problem (skb refcount) in 2.6.12-rc1 Date: Mon, 4 Apr 2005 10:26:18 -0700 Message-ID: <9D602ABCE51B0B488BF857A4787939B5042E7396@orsmsx410> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: pktgen problem (skb refcount) in 2.6.12-rc1 Thread-Index: AcU41yY6h7+AmF6WRLCqF13GmO45xAAXoIhw From: "Veeraiyan, Ayyappan" To: "Harald Welte" , "Robert Olsson" Cc: X-OriginalArrivalTime: 04 Apr 2005 17:26:19.0787 (UTC) FILETIME=[6A3885B0:01C5393B] X-Scanned-By: MIMEDefang 2.44 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j34HQmgQ007045 X-archive-position: 1362 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ayyappan.veeraiyan@intel.com Precedence: bulk X-list: netdev Harald, > > As for e1000 and or generic TX path changes, I don't have the time to > review them now, sorry :( That's why I posted it to netdev, to let > people who have an idea about the committed changes know that there is > an issue. > What version of e1000 you are using? If you are using e1000 5.7.6-k2 and haven't tried older versions, Could you please try e1000 version 5.6.10.1-k2 (available in kernel 2.6.11) and inform us the result? Thanks, Ayyappan V. > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of Harald Welte > Sent: Sunday, April 03, 2005 10:27 PM > To: Robert Olsson > Cc: netdev@oss.sgi.com > Subject: Re: pktgen problem (skb refcount) in 2.6.12-rc1 > > On Sun, Apr 03, 2005 at 09:18:30PM +0200, Robert Olsson wrote: > > > > I've tried to track the problem down, and I've confirmed that skb- > >users > > > never goes down to 1 but instead stays at '2'. > > > > > The same system with the same pktgen script works fine with 2.6.11.6. > > > > > > I'm reporting this since it seems like it sounds like we have a skb > > > usage count leak somewhere :( > > > > Sounds like a diff could give some clues. pktgen, e1000 and TX-path > should > > be interesting as ev. changes in kernel config. > > no changes in kernel config. I've reviewed pktgen changes and couldn't > find something that would cause the problem. It always only > atomic_inc'ed the ussage cound (and decrements only in error path) which > is perfectly fine. > > As for e1000 and or generic TX path changes, I don't have the time to > review them now, sorry :( That's why I posted it to netdev, to let > people who have an idea about the committed changes know that there is > an issue. > > Cheers, > Harald > -- > - Harald Welte > http://gnumonks.org/ > ======================================================================== == > == > "Privacy in residential applications is a desirable marketing option." > (ETSI EN 300 175-7 Ch. > A6) From bcrl@kvack.org Mon Apr 4 10:30:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 10:30:37 -0700 (PDT) Received: from kanga.kvack.org (kanga.kvack.org [66.96.29.28]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34HUWkL007665 for ; Mon, 4 Apr 2005 10:30:33 -0700 Received: (from localhost user: 'bcrl' uid#63042 fake: STDIN (bcrl@kanga.kvack.org)) by kvack.org id ; Mon, 4 Apr 2005 13:30:10 -0400 Date: Mon, 4 Apr 2005 13:30:10 -0400 From: Benjamin LaHaise To: davem@redhat.com Cc: netdev@oss.sgi.com Subject: [PATCH] fix uninitialized proto_list_lock Message-ID: <20050404173010.GA19451@kvack.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1363 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bcrl@kvack.org Precedence: bulk X-list: netdev Please apply the following patch which fixes a BUG() on boot when the kernel is compiled with spinlock debugging in bk head. Cheers, -ben ===== net/core/sock.c 1.67 vs edited ===== --- 1.67/net/core/sock.c 2005-03-26 18:04:35 -05:00 +++ edited/net/core/sock.c 2005-04-04 12:46:26 -04:00 @@ -1352,7 +1352,7 @@ EXPORT_SYMBOL(sk_common_release); -static rwlock_t proto_list_lock; +static rwlock_t proto_list_lock = RW_LOCK_UNLOCKED; static LIST_HEAD(proto_list); int proto_register(struct proto *prot, int alloc_slab) From davem@davemloft.net Mon Apr 4 10:36:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 10:36:48 -0700 (PDT) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34HagQx008344 for ; Mon, 4 Apr 2005 10:36:42 -0700 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DIVTi-00045i-00; Mon, 04 Apr 2005 10:35:10 -0700 Date: Mon, 4 Apr 2005 10:35:09 -0700 From: "David S. Miller" To: Herbert Xu Cc: kaber@trash.net, netdev@oss.sgi.com Subject: Re: [IPSEC]: Protect against BHs in xfrm_user_policy() Message-Id: <20050404103509.06ca48a4.davem@davemloft.net> In-Reply-To: <20050404115508.GA12171@gondor.apana.org.au> References: <4250160D.2040405@trash.net> <20050404012040.GA16960@gondor.apana.org.au> <20050404115508.GA12171@gondor.apana.org.au> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1364 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Mon, 4 Apr 2005 21:55:08 +1000 Herbert Xu wrote: > The read_lock()'s only need to be protected from the write_lock()'s. > > Since all the write_lock()'s are made in process context, we don't > need to disable BH on the read_lock()'s. This is correct. It's actually a common technique, only disable IRQ or BH in the write_locks. From davem@redhat.com Mon Apr 4 10:43:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 10:43:55 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Hhl49009059 for ; Mon, 4 Apr 2005 10:43:48 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j34Hhkfu002729; Mon, 4 Apr 2005 13:43:46 -0400 Received: from devserv.devel.redhat.com (devserv.devel.redhat.com [172.16.58.1]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j34HhkO16745; Mon, 4 Apr 2005 13:43:46 -0400 Received: from cheetah.davemloft.net (localhost.localdomain [127.0.0.1]) by devserv.devel.redhat.com (8.12.11/8.12.11) with SMTP id j34HhjAs018053; Mon, 4 Apr 2005 13:43:45 -0400 Date: Mon, 4 Apr 2005 10:42:54 -0700 From: "David S. Miller" To: Benjamin LaHaise Cc: netdev@oss.sgi.com Subject: Re: [PATCH] fix uninitialized proto_list_lock Message-Id: <20050404104254.0d30ba8d.davem@redhat.com> In-Reply-To: <20050404173010.GA19451@kvack.org> References: <20050404173010.GA19451@kvack.org> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1365 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Mon, 4 Apr 2005 13:30:10 -0400 Benjamin LaHaise wrote: > Please apply the following patch which fixes a BUG() on boot when the > kernel is compiled with spinlock debugging in bk head. Cheers, This is already in Linus's tree. Thanks. From arnaldo.melo@gmail.com Mon Apr 4 10:45:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 10:45:46 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.205]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34HjcKR009540 for ; Mon, 4 Apr 2005 10:45:38 -0700 Received: by wproxy.gmail.com with SMTP id 68so1642512wri for ; Mon, 04 Apr 2005 10:45:32 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=iscThFzpwHhGqjnirB8wtlISRQdIOFHHB0SmyeGue0FNMXYNhckUCVCyKJEnzBj9AzslfjFVjeZY95zq214PwsnN60q6clWl22zdbGKAeNIw6kr8QPu3gPp92uWtkSJWm1djGl0gPFwK+RiSXMVrwoCf8SLwmu7hOyJ6G8RBve8= Received: by 10.54.15.26 with SMTP id 26mr77695wro; Mon, 04 Apr 2005 10:45:31 -0700 (PDT) Received: by 10.54.72.15 with HTTP; Mon, 4 Apr 2005 10:45:30 -0700 (PDT) Message-ID: <39e6f6c7050404104578221d01@mail.gmail.com> Date: Mon, 4 Apr 2005 14:45:30 -0300 From: Arnaldo Carvalho de Melo Reply-To: acme@conectiva.com.br To: Benjamin LaHaise Subject: Re: [PATCH] fix uninitialized proto_list_lock Cc: davem@redhat.com, netdev@oss.sgi.com In-Reply-To: <20050404173010.GA19451@kvack.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <20050404173010.GA19451@kvack.org> X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1366 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arnaldo.melo@gmail.com Precedence: bulk X-list: netdev On Apr 4, 2005 2:30 PM, Benjamin LaHaise wrote: > Please apply the following patch which fixes a BUG() on boot when the > kernel is compiled with spinlock debugging in bk head. Cheers, > > -ben > > ===== net/core/sock.c 1.67 vs edited ===== > --- 1.67/net/core/sock.c 2005-03-26 18:04:35 -05:00 > +++ edited/net/core/sock.c 2005-04-04 12:46:26 -04:00 > @@ -1352,7 +1352,7 @@ > > EXPORT_SYMBOL(sk_common_release); > > -static rwlock_t proto_list_lock; > +static rwlock_t proto_list_lock = RW_LOCK_UNLOCKED; This was already fixed by James Bottomley and its in Dave tree waiting for Linus to do a pull. Thanks anyway! - Arnaldo From rick.jones2@hp.com Mon Apr 4 11:57:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 11:58:00 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34IvrO8013446 for ; Mon, 4 Apr 2005 11:57:54 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel13.hp.com (Postfix) with ESMTP id 69BCF1C04AC0; Mon, 4 Apr 2005 11:57:53 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id LAA21630; Mon, 4 Apr 2005 11:57:52 -0700 (PDT) Message-ID: <42518E30.2090207@hp.com> Date: Mon, 04 Apr 2005 11:57:52 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 Cc: "open-iscsi@googlegroups.com" , ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics References: <20050324215922.GT14202@opteron.random> <424346FE.20704@cs.wisc.edu> <20050324233921.GZ14202@opteron.random> <20050325034341.GV32638@waste.org> <20050327035149.GD4053@g5.random> <20050327054831.GA15453@waste.org> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@davemloft.net> <52vf7bwo4w.fsf@topspin.com> <1112042936.5088.22.camel@beastie> <20050328223203.GC28983@kvack.org> <1112465317.24936.10.camel@mylaptop> In-Reply-To: <1112465317.24936.10.camel@mylaptop> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1367 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev > What RDMA gives us is zero-copy on receive and new networking api which > has a potential to be HW accelerated. SoftRDMA will never avoid copying > on receive. But benefit for SoftRDMA would be its availability on client > sides. It is free and it could be easily deployed. Soon Intel & Co will > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if > one of those cores will do receive side copying? 20 years ago, in certain circles at least, people were saying "With 32-bits of addressing, who cares if we allocate much memory" :) Speaking a bit more prosaicly, if that core is sitting there churning through data copies, what affect does that have on the rest of the bus(ses) and the memory? What else will the client want to be able to push around that those data copies may preclude? rick jones From grundler@lackof.org Mon Apr 4 12:09:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 12:09:18 -0700 (PDT) Received: from colo.lackof.org (colo.lackof.org [198.49.126.79]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34J9Ca6014760 for ; Mon, 4 Apr 2005 12:09:12 -0700 Received: from localhost (localhost [127.0.0.1]) by colo.lackof.org (Postfix) with ESMTP id 260C629802F; Mon, 4 Apr 2005 13:11:03 -0600 (MDT) Received: from colo.lackof.org ([127.0.0.1]) by localhost (colo.lackof.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 09890-03; Mon, 4 Apr 2005 13:11:01 -0600 (MDT) Received: by colo.lackof.org (Postfix, from userid 27253) id C4D90298010; Mon, 4 Apr 2005 13:11:01 -0600 (MDT) Date: Mon, 4 Apr 2005 13:11:01 -0600 From: Grant Grundler To: Dmitry Yusupov Cc: "open-iscsi@googlegroups.com" , "David S. Miller" , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, James.Bottomley@HansenPartnership.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Subject: Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Message-ID: <20050404191101.GB6809@colo.lackof.org> References: <67D69596DDF0C2448DB0F0547D0F947E01781F2E@yogi.asicdesigners.com> <1112576171.4227.5.camel@mylaptop> <20050404063456.GB30855@colo.lackof.org> <1112633650.9559.142.camel@beastie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112633650.9559.142.camel@beastie> X-Home-Page: http://www.parisc-linux.org/ User-Agent: Mutt/1.5.6+20040907i X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at lackof.org X-Virus-Status: Clean X-archive-position: 1368 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: grundler@parisc-linux.org Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 09:54:10AM -0700, Dmitry Yusupov wrote: > > 3) store back into RAM > > no. we are talking about receive side optimization only. > why do you think store back into RAM comes to the picture? Application eventually wants to read the data. > also keep in mind that we have huge L2 & L3 caches today and write > operation is usually very well buffered. Agreed. But how effective the cache is will depend on if the CPU (application) can process the data as fast as it arrives (and still be in the cache). Otherwise the data will get pushed out in (3) and recalled later when the app can consume it (4th time across). It also assumes the application is running on a CPU core that shares the cache with the CPU that did the copy. If the CPU is saturated with the copy (ok, assume we've got 2 Cores per socket), then the other CPU has to be *assigned* manually to make sure it does the other part. Jamal learned all this when he moved to a dual core PPC for his fast routing work. Jamal, did that ever make it into a paper? grant From acme@ghostprotocols.net Mon Apr 4 12:49:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 12:49:31 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34JnI3i021966 for ; Mon, 4 Apr 2005 12:49:22 -0700 Received: from [200.138.131.177] (helo=oops.ghostprotocols.net) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 1DIXa7-0005JG-00; Mon, 04 Apr 2005 16:49:55 -0300 Received: by oops.ghostprotocols.net (Postfix, from userid 500) id 6366014631; Mon, 4 Apr 2005 16:48:35 -0300 (BRT) Date: Mon, 4 Apr 2005 16:48:35 -0300 To: "David S. Miller" , Ralf Baechle Cc: netdev@oss.sgi.com Subject: [PATCH 1/2][AX25] make ax25_queue_xmit a net_device parameter Message-ID: <20050404194835.GI640@conectiva.com.br> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="mYCpIKhGyMATD0i+" Content-Disposition: inline X-Url: http://advogato.org/person/acme User-Agent: Mutt/1.5.6i From: acme@ghostprotocols.net (Arnaldo Carvalho de Melo) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1369 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@ghostprotocols.net Precedence: bulk X-list: netdev --mYCpIKhGyMATD0i+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi David, Ralf, This one is just in preparation for the second, that introduces ax25_type_trans. Available at: bk://kernel.bkbits.net/acme/sk_buff-2.6 Regards, - Arnaldo --mYCpIKhGyMATD0i+ Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ax25.1.patch" =================================================================== ChangeSet@1.2245, 2005-04-04 16:23:29-03:00, acme@toy.ghostprotocols.net [AX25] make ax25_queue_xmit a net_device parameter I.e. not using skb->dev as a way to pass the parameter used to fill... skb->dev :-) Also to get the _type_trans open coded sequence grouped, next changesets will introduce ax25_type_trans. Signed-off-by: Ralf Baechle Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller include/net/ax25.h | 2 +- net/ax25/af_ax25.c | 4 +--- net/ax25/ax25_ip.c | 4 +--- net/ax25/ax25_out.c | 8 +++----- net/ax25/ax25_subr.c | 4 +--- 5 files changed, 7 insertions(+), 15 deletions(-) diff -Nru a/include/net/ax25.h b/include/net/ax25.h --- a/include/net/ax25.h 2005-04-04 16:43:28 -03:00 +++ b/include/net/ax25.h 2005-04-04 16:43:28 -03:00 @@ -305,7 +305,7 @@ extern void ax25_output(ax25_cb *, int, struct sk_buff *); extern void ax25_kick(ax25_cb *); extern void ax25_transmit_buffer(ax25_cb *, struct sk_buff *, int); -extern void ax25_queue_xmit(struct sk_buff *); +extern void ax25_queue_xmit(struct sk_buff *skb, struct net_device *dev); extern int ax25_check_iframes_acked(ax25_cb *, unsigned short); /* ax25_route.c */ diff -Nru a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c --- a/net/ax25/af_ax25.c 2005-04-04 16:43:28 -03:00 +++ b/net/ax25/af_ax25.c 2005-04-04 16:43:28 -03:00 @@ -1587,9 +1587,7 @@ *asmptr = AX25_UI; /* Datagram frames go straight out of the door as UI */ - skb->dev = ax25->ax25_dev->dev; - - ax25_queue_xmit(skb); + ax25_queue_xmit(skb, ax25->ax25_dev->dev); err = len; diff -Nru a/net/ax25/ax25_ip.c b/net/ax25/ax25_ip.c --- a/net/ax25/ax25_ip.c 2005-04-04 16:43:28 -03:00 +++ b/net/ax25/ax25_ip.c 2005-04-04 16:43:28 -03:00 @@ -199,9 +199,7 @@ skb = ourskb; } - skb->dev = dev; - - ax25_queue_xmit(skb); + ax25_queue_xmit(skb, dev); put: ax25_put_route(route); diff -Nru a/net/ax25/ax25_out.c b/net/ax25/ax25_out.c --- a/net/ax25/ax25_out.c 2005-04-04 16:43:28 -03:00 +++ b/net/ax25/ax25_out.c 2005-04-04 16:43:28 -03:00 @@ -340,21 +340,19 @@ ax25_addr_build(ptr, &ax25->source_addr, &ax25->dest_addr, ax25->digipeat, type, ax25->modulus); - skb->dev = ax25->ax25_dev->dev; - - ax25_queue_xmit(skb); + ax25_queue_xmit(skb, ax25->ax25_dev->dev); } /* * A small shim to dev_queue_xmit to add the KISS control byte, and do * any packet forwarding in operation. */ -void ax25_queue_xmit(struct sk_buff *skb) +void ax25_queue_xmit(struct sk_buff *skb, struct net_device *dev) { unsigned char *ptr; skb->protocol = htons(ETH_P_AX25); - skb->dev = ax25_fwd_dev(skb->dev); + skb->dev = ax25_fwd_dev(dev); ptr = skb_push(skb, 1); *ptr = 0x00; /* KISS */ diff -Nru a/net/ax25/ax25_subr.c b/net/ax25/ax25_subr.c --- a/net/ax25/ax25_subr.c 2005-04-04 16:43:28 -03:00 +++ b/net/ax25/ax25_subr.c 2005-04-04 16:43:28 -03:00 @@ -220,9 +220,7 @@ dptr = skb_push(skb, ax25_addr_size(digi)); dptr += ax25_addr_build(dptr, dest, src, &retdigi, AX25_RESPONSE, AX25_MODULUS); - skb->dev = dev; - - ax25_queue_xmit(skb); + ax25_queue_xmit(skb, dev); } /* --mYCpIKhGyMATD0i+-- From sam@ravnborg.org Mon Apr 4 12:49:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 12:49:46 -0700 (PDT) Received: from pfepa.post.tele.dk (pfepa.post.tele.dk [195.41.46.235]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34JnTsl022036 for ; Mon, 4 Apr 2005 12:49:30 -0700 Received: from mars.ravnborg.org (0x50a0757d.hrnxx9.adsl-dhcp.tele.dk [80.160.117.125]) by pfepa.post.tele.dk (Postfix) with ESMTP id 5B57247FEA7; Mon, 4 Apr 2005 21:49:20 +0200 (CEST) Received: by mars.ravnborg.org (Postfix, from userid 1000) id 27A746AC01D; Mon, 4 Apr 2005 21:50:52 +0200 (CEST) Date: Mon, 4 Apr 2005 21:50:52 +0200 From: Sam Ravnborg To: "Randy.Dunlap" Cc: ioe-lkml@axxeo.de, matthew@wil.cx, lkml , netdev@oss.sgi.com, hadi@cyberus.ca, cfriesen@nortel.com, tgraf@suug.ch Subject: Re: [PATCH] network configs: disconnect network options from drivers Message-ID: <20050404195051.GA12364@mars.ravnborg.org> References: <20050330234709.1868eee5.randy.dunlap@verizon.net> <20050331185226.GA8146@mars.ravnborg.org> <424C5745.7020501@osdl.org> <20050331203010.GA8034@mars.ravnborg.org> <4250B4C5.2000200@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4250B4C5.2000200@osdl.org> User-Agent: Mutt/1.5.8i X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1370 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sam@ravnborg.org Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 08:30:13PM -0700, Randy.Dunlap wrote: > Any comments on this new version? The new Networking menu looks unstructured. And the net/Kconfig file contains a lot of config snippets that does not belong there. So I took a stamp on it with focus on: - Move config bits to appropriate places, creating several new Kconfig files - Made uses of menus more consistent at least on first and second level - Move submenu to the top - Rename top menu to "Networking" and located it just before "File systems" The patch became much larger. The win is that the top-level net/Kconfig contains much less cruft. Many of the 56 lines added are due to the additional files. I did not (on purpose) change any functionality. Only bit that I am worried about is the statement in SCTP: depends on IPV6 || IPV6=n That looked like a noop to me. It had the sideeffect that SCTP menu entries where idented an extra level which was not desireable with currect layout. Comments appreciated. Patch on top of rc2. Signed-off-by: Sam Ravnborg --- Sam drivers/Kconfig | 5 drivers/net/Kconfig | 5 drivers/net/appletalk/Kconfig | 28 ++ net/8021q/Kconfig | 21 + net/Kconfig | 541 +++--------------------------------------- net/atm/Kconfig | 77 +++++ net/bridge/Kconfig | 32 ++ net/bridge/netfilter/Kconfig | 1 net/core/Kconfig | 67 +++++ net/decnet/Kconfig | 24 + net/econet/Kconfig | 34 ++ net/ipv4/netfilter/Kconfig | 5 net/ipv6/Kconfig | 20 + net/ipx/Kconfig | 33 ++ net/lapb/Kconfig | 24 + net/packet/Kconfig | 26 ++ net/sched/Kconfig | 40 +++ net/sctp/Kconfig | 5 net/unix/Kconfig | 22 + net/wanrouter/Kconfig | 31 ++ net/x25/Kconfig | 35 ++ 21 files changed, 567 insertions(+), 509 deletions(-) diff -Nru a/drivers/Kconfig b/drivers/Kconfig --- a/drivers/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/drivers/Kconfig 2005-04-04 21:41:57 +02:00 @@ -28,7 +28,7 @@ source "drivers/macintosh/Kconfig" -source "net/Kconfig" +source "drivers/net/Kconfig" source "drivers/isdn/Kconfig" @@ -59,3 +59,6 @@ source "drivers/infiniband/Kconfig" endmenu + +source "net/Kconfig" + diff -Nru a/drivers/net/Kconfig b/drivers/net/Kconfig --- a/drivers/net/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/drivers/net/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,8 +1,9 @@ - # # Network device configuration # +menu "Network device support" + config NETDEVICES depends on NET bool "Network device support" @@ -2535,4 +2536,6 @@ ---help--- If you want to log kernel messages over the network, enable this. See for details. + +endmenu diff -Nru a/drivers/net/appletalk/Kconfig b/drivers/net/appletalk/Kconfig --- a/drivers/net/appletalk/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/drivers/net/appletalk/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,6 +1,34 @@ # # Appletalk driver configuration # +config ATALK + tristate "Appletalk protocol support" + depends on NET + select LLC + ---help--- + AppleTalk is the protocol that Apple computers can use to communicate + on a network. If your Linux box is connected to such a network and you + wish to connect to it, say Y. You will need to use the netatalk package + so that your Linux box can act as a print and file server for Macs as + well as access AppleTalk printers. Check out + on the WWW for details. + EtherTalk is the name used for AppleTalk over Ethernet and the + cheaper and slower LocalTalk is AppleTalk over a proprietary Apple + network using serial links. EtherTalk and LocalTalk are fully + supported by Linux. + + General information about how to connect Linux, Windows machines and + Macs is on the WWW at . The + NET-3-HOWTO, available from + , contains valuable + information as well. + + To compile this driver as a module, choose M here: the module will be + called appletalk. You almost certainly want to compile it as a + module so you can restart your AppleTalk stack without rebooting + your machine. I hear that the GNU boycott of Apple is over, so + even politically correct people are allowed to say Y here. + config DEV_APPLETALK bool "Appletalk interfaces support" depends on ATALK diff -Nru a/net/8021q/Kconfig b/net/8021q/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/8021q/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,21 @@ +# +# Configuration for 802.1Q VLAN support +# + +config VLAN_8021Q + tristate "802.1Q VLAN Support" + ---help--- + Select this and you will be able to create 802.1Q VLAN interfaces + on your ethernet interfaces. 802.1Q VLAN supports almost + everything a regular ethernet interface does, including + firewalling, bridging, and of course IP traffic. You will need + the 'vconfig' tool from the VLAN project in order to effectively + use VLANs. See the VLAN web page for more information: + + + To compile this code as a module, choose M here: the module + will be called 8021q. + + If unsure, say N. + + diff -Nru a/net/Kconfig b/net/Kconfig --- a/net/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/Kconfig 2005-04-04 21:41:57 +02:00 @@ -2,7 +2,7 @@ # Network configuration # -menu "Networking support" +menu "Networking" config NET bool "Networking support" @@ -10,7 +10,9 @@ Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even when running on a stand-alone machine that isn't connected to any - other computer. If you are upgrading from an older kernel, you + other computer. + + If you are upgrading from an older kernel, you should consider updating your networking tools too because changes in the kernel and the tools often go hand in hand. The tools are contained in the package net-tools, the location and version number @@ -20,57 +22,9 @@ recommended to read the NET-HOWTO, available from . -menu "Networking options" - depends on NET - -config PACKET - tristate "Packet socket" - ---help--- - The Packet protocol is used by applications which communicate - directly with network devices without an intermediate network - protocol implemented in the kernel, e.g. tcpdump. If you want them - to work, choose Y. - - To compile this driver as a module, choose M here: the module will - be called af_packet. - - If unsure, say Y. - -config PACKET_MMAP - bool "Packet socket: mmapped IO" - depends on PACKET - help - If you say Y here, the Packet protocol driver will use an IO - mechanism that results in faster communication. - - If unsure, say N. - -config UNIX - tristate "Unix domain sockets" - ---help--- - If you say Y here, you will include support for Unix domain sockets; - sockets are the standard Unix mechanism for establishing and - accessing network connections. Many commonly used programs such as - the X Window system and syslog use these sockets even if your - machine is not connected to any network. Unless you are working on - an embedded system or something similar, you therefore definitely - want to say Y here. - - To compile this driver as a module, choose M here: the module will be - called unix. Note that several important services won't work - correctly if you say M here and then neglect to load the module. - - Say Y unless you know what you are doing. - -config NET_KEY - tristate "PF_KEY sockets" - select XFRM - ---help--- - PF_KEYv2 socket family, compatible to KAME ones. - They are required if you are going to use IPsec tools ported - from KAME. +if NET - Say Y unless you know what you are doing. +menu "Networking protocols" config INET bool "TCP/IP networking" @@ -94,31 +48,29 @@ Short answer: say Y. +if INET source "net/ipv4/Kconfig" +source "net/ipv6/Kconfig" +source "net/sctp/Kconfig" +endif -# IPv6 as module will cause a CRASH if you try to unload it -config IPV6 - tristate "The IPv6 protocol" - depends on INET - default m - select CRYPTO if IPV6_PRIVACY - select CRYPTO_MD5 if IPV6_PRIVACY - ---help--- - This is complemental support for the IP version 6. - You will still be able to do traditional IPv4 networking as well. - - For general information about IPv6, see - . - For Linux IPv6 development information, see . - For specific information about IPv6 under Linux, read the HOWTO at - . +source "net/decnet/Kconfig" +source "net/llc/Kconfig" +source "net/ipx/Kconfig" +source "drivers/net/appletalk/Kconfig" +source "net/x25/Kconfig" +source "net/lapb/Kconfig" +source "net/econet/Kconfig" +source "net/ax25/Kconfig" +source "net/irda/Kconfig" +source "net/bluetooth/Kconfig" - To compile this protocol support as a module, choose M here: the - module will be called ipv6. +endmenu +# end options and protocols -source "net/ipv6/Kconfig" +menu "Network packet filtering" -menuconfig NETFILTER +config NETFILTER bool "Network packet filtering (replaces ipchains)" ---help--- Netfilter is a framework for filtering and mangling network packets @@ -205,442 +157,37 @@ source "net/bridge/netfilter/Kconfig" endif +endmenu +# end netfilter + +source "net/sched/Kconfig" +source "net/core/Kconfig" + config XFRM bool - depends on NET source "net/xfrm/Kconfig" -source "net/sctp/Kconfig" - -config ATM - tristate "Asynchronous Transfer Mode (ATM) (EXPERIMENTAL)" - depends on EXPERIMENTAL - ---help--- - ATM is a high-speed networking technology for Local Area Networks - and Wide Area Networks. It uses a fixed packet size and is - connection oriented, allowing for the negotiation of minimum - bandwidth requirements. - - In order to participate in an ATM network, your Linux box needs an - ATM networking card. If you have that, say Y here and to the driver - of your ATM card below. - - Note that you need a set of user-space programs to actually make use - of ATM. See the file for - further details. - -config ATM_CLIP - tristate "Classical IP over ATM (EXPERIMENTAL)" - depends on ATM && INET - help - Classical IP over ATM for PVCs and SVCs, supporting InARP and - ATMARP. If you want to communication with other IP hosts on your ATM - network, you will typically either say Y here or to "LAN Emulation - (LANE)" below. - -config ATM_CLIP_NO_ICMP - bool "Do NOT send ICMP if no neighbour (EXPERIMENTAL)" - depends on ATM_CLIP - help - Normally, an "ICMP host unreachable" message is sent if a neighbour - cannot be reached because there is no VC to it in the kernel's - ATMARP table. This may cause problems when ATMARP table entries are - briefly removed during revalidation. If you say Y here, packets to - such neighbours are silently discarded instead. - -config ATM_LANE - tristate "LAN Emulation (LANE) support (EXPERIMENTAL)" - depends on ATM - help - LAN Emulation emulates services of existing LANs across an ATM - network. Besides operating as a normal ATM end station client, Linux - LANE client can also act as an proxy client bridging packets between - ELAN and Ethernet segments. You need LANE if you want to try MPOA. - -config ATM_MPOA - tristate "Multi-Protocol Over ATM (MPOA) support (EXPERIMENTAL)" - depends on ATM && INET && ATM_LANE!=n - help - Multi-Protocol Over ATM allows ATM edge devices such as routers, - bridges and ATM attached hosts establish direct ATM VCs across - subnetwork boundaries. These shortcut connections bypass routers - enhancing overall network performance. - -config ATM_BR2684 - tristate "RFC1483/2684 Bridged protocols" - depends on ATM && INET - help - ATM PVCs can carry ethernet PDUs according to rfc2684 (formerly 1483) - This device will act like an ethernet from the kernels point of view, - with the traffic being carried by ATM PVCs (currently 1 PVC/device). - This is sometimes used over DSL lines. If in doubt, say N. - -config ATM_BR2684_IPFILTER - bool "Per-VC IP filter kludge" - depends on ATM_BR2684 - help - This is an experimental mechanism for users who need to terminating a - large number of IP-only vcc's. Do not enable this unless you are sure - you know what you are doing. - -config BRIDGE - tristate "802.1d Ethernet Bridging" - ---help--- - If you say Y here, then your Linux box will be able to act as an - Ethernet bridge, which means that the different Ethernet segments it - is connected to will appear as one Ethernet to the participants. - Several such bridges can work together to create even larger - networks of Ethernets using the IEEE 802.1 spanning tree algorithm. - As this is a standard, Linux bridges will cooperate properly with - other third party bridge products. - - In order to use the Ethernet bridge, you'll need the bridge - configuration tools; see - for location. Please read the Bridge mini-HOWTO for more - information. - - If you enable iptables support along with the bridge support then you - turn your bridge into a bridging IP firewall. - iptables will then see the IP packets being bridged, so you need to - take this into account when setting up your firewall rules. - Enabling arptables support when bridging will let arptables see - bridged ARP traffic in the arptables FORWARD chain. - - To compile this code as a module, choose M here: the module - will be called bridge. - - If unsure, say N. - -config VLAN_8021Q - tristate "802.1Q VLAN Support" - ---help--- - Select this and you will be able to create 802.1Q VLAN interfaces - on your ethernet interfaces. 802.1Q VLAN supports almost - everything a regular ethernet interface does, including - firewalling, bridging, and of course IP traffic. You will need - the 'vconfig' tool from the VLAN project in order to effectively - use VLANs. See the VLAN web page for more information: - - - To compile this code as a module, choose M here: the module - will be called 8021q. - - If unsure, say N. - -config DECNET - tristate "DECnet Support" - ---help--- - The DECnet networking protocol was used in many products made by - Digital (now Compaq). It provides reliable stream and sequenced - packet communications over which run a variety of services similar - to those which run over TCP/IP. - - To find some tools to use with the kernel layer support, please - look at Patrick Caulfield's web site: - . - - More detailed documentation is available in - . - - Be sure to say Y to "/proc file system support" and "Sysctl support" - below when using DECnet, since you will need sysctl support to aid - in configuration at run time. - - The DECnet code is also available as a module ( = code which can be - inserted in and removed from the running kernel whenever you want). - The module is called decnet. - -source "net/decnet/Kconfig" - -source "net/llc/Kconfig" - -config IPX - tristate "The IPX protocol" - select LLC - ---help--- - This is support for the Novell networking protocol, IPX, commonly - used for local networks of Windows machines. You need it if you - want to access Novell NetWare file or print servers using the Linux - Novell client ncpfs (available from - ) or from - within the Linux DOS emulator DOSEMU (read the DOSEMU-HOWTO, - available from ). In order - to do the former, you'll also have to say Y to "NCP file system - support", below. - - IPX is similar in scope to IP, while SPX, which runs on top of IPX, - is similar to TCP. There is also experimental support for SPX in - Linux (see "SPX networking", below). - - To turn your Linux box into a fully featured NetWare file server and - IPX router, say Y here and fetch either lwared from - or - mars_nwe from . For more - information, read the IPX-HOWTO available from - . - - General information about how to connect Linux, Windows machines and - Macs is on the WWW at . - - The IPX driver would enlarge your kernel by about 16 KB. To compile - this driver as a module, choose M here: the module will be called ipx. - Unless you want to integrate your Linux box with a local Novell - network, say N. - -source "net/ipx/Kconfig" - -config ATALK - tristate "Appletalk protocol support" - select LLC - ---help--- - AppleTalk is the protocol that Apple computers can use to communicate - on a network. If your Linux box is connected to such a network and you - wish to connect to it, say Y. You will need to use the netatalk package - so that your Linux box can act as a print and file server for Macs as - well as access AppleTalk printers. Check out - on the WWW for details. - EtherTalk is the name used for AppleTalk over Ethernet and the - cheaper and slower LocalTalk is AppleTalk over a proprietary Apple - network using serial links. EtherTalk and LocalTalk are fully - supported by Linux. - - General information about how to connect Linux, Windows machines and - Macs is on the WWW at . The - NET-3-HOWTO, available from - , contains valuable - information as well. - - To compile this driver as a module, choose M here: the module will be - called appletalk. You almost certainly want to compile it as a - module so you can restart your AppleTalk stack without rebooting - your machine. I hear that the GNU boycott of Apple is over, so - even politically correct people are allowed to say Y here. - -source "drivers/net/appletalk/Kconfig" - -config X25 - tristate "CCITT X.25 Packet Layer (EXPERIMENTAL)" - depends on EXPERIMENTAL - ---help--- - X.25 is a set of standardized network protocols, similar in scope to - frame relay; the one physical line from your box to the X.25 network - entry point can carry several logical point-to-point connections - (called "virtual circuits") to other computers connected to the X.25 - network. Governments, banks, and other organizations tend to use it - to connect to each other or to form Wide Area Networks (WANs). Many - countries have public X.25 networks. X.25 consists of two - protocols: the higher level Packet Layer Protocol (PLP) (say Y here - if you want that) and the lower level data link layer protocol LAPB - (say Y to "LAPB Data Link Driver" below if you want that). - - You can read more about X.25 at and - . - Information about X.25 for Linux is contained in the files - and - . - - One connects to an X.25 network either with a dedicated network card - using the X.21 protocol (not yet supported by Linux) or one can do - X.25 over a standard telephone line using an ordinary modem (say Y - to "X.25 async driver" below) or over Ethernet using an ordinary - Ethernet card and the LAPB over Ethernet (say Y to "LAPB Data Link - Driver" and "LAPB over Ethernet driver" below). - - To compile this driver as a module, choose M here: the module - will be called x25. If unsure, say N. - -config LAPB - tristate "LAPB Data Link Driver (EXPERIMENTAL)" - depends on EXPERIMENTAL - ---help--- - Link Access Procedure, Balanced (LAPB) is the data link layer (i.e. - the lower) part of the X.25 protocol. It offers a reliable - connection service to exchange data frames with one other host, and - it is used to transport higher level protocols (mostly X.25 Packet - Layer, the higher part of X.25, but others are possible as well). - Usually, LAPB is used with specialized X.21 network cards, but Linux - currently supports LAPB only over Ethernet connections. If you want - to use LAPB connections over Ethernet, say Y here and to "LAPB over - Ethernet driver" below. Read - for technical - details. - - To compile this driver as a module, choose M here: the - module will be called lapb. If unsure, say N. - -config NET_DIVERT - bool "Frame Diverter (EXPERIMENTAL)" - depends on EXPERIMENTAL - ---help--- - The Frame Diverter allows you to divert packets from the - network, that are not aimed at the interface receiving it (in - promisc. mode). Typically, a Linux box setup as an Ethernet bridge - with the Frames Diverter on, can do some *really* transparent www - caching using a Squid proxy for example. - - This is very useful when you don't want to change your router's - config (or if you simply don't have access to it). - - The other possible usages of diverting Ethernet Frames are - numberous: - - reroute smtp traffic to another interface - - traffic-shape certain network streams - - transparently proxy smtp connections - - etc... - - For more informations, please refer to: - - - - If unsure, say N. - -config ECONET - tristate "Acorn Econet/AUN protocols (EXPERIMENTAL)" - depends on EXPERIMENTAL && INET - ---help--- - Econet is a fairly old and slow networking protocol mainly used by - Acorn computers to access file and print servers. It uses native - Econet network cards. AUN is an implementation of the higher level - parts of Econet that runs over ordinary Ethernet connections, on - top of the UDP packet protocol, which in turn runs on top of the - Internet protocol IP. - - If you say Y here, you can choose with the next two options whether - to send Econet/AUN traffic over a UDP Ethernet connection or over - a native Econet network card. - - To compile this driver as a module, choose M here: the module - will be called econet. - -config ECONET_AUNUDP - bool "AUN over UDP" - depends on ECONET - help - Say Y here if you want to send Econet/AUN traffic over a UDP - connection (UDP is a packet based protocol that runs on top of the - Internet protocol IP) using an ordinary Ethernet network card. - -config ECONET_NATIVE - bool "Native Econet" - depends on ECONET - help - Say Y here if you have a native Econet network card installed in - your computer. - -config WAN_ROUTER - tristate "WAN router" - depends on EXPERIMENTAL - ---help--- - Wide Area Networks (WANs), such as X.25, frame relay and leased - lines, are used to interconnect Local Area Networks (LANs) over vast - distances with data transfer rates significantly higher than those - achievable with commonly used asynchronous modem connections. - Usually, a quite expensive external device called a `WAN router' is - needed to connect to a WAN. - - As an alternative, WAN routing can be built into the Linux kernel. - With relatively inexpensive WAN interface cards available on the - market, a perfectly usable router can be built for less than half - the price of an external router. If you have one of those cards and - wish to use your Linux box as a WAN router, say Y here and also to - the WAN driver for your card, below. You will then need the - wan-tools package which is available from . - Read for more - information. - - To compile WAN routing support as a module, choose M here: the - module will be called wanrouter. - - If unsure, say N. - -menu "QoS and/or fair queueing" - -config NET_SCHED - bool "QoS and/or fair queueing" - ---help--- - When the kernel has several packets to send out over a network - device, it has to decide which ones to send first, which ones to - delay, and which ones to drop. This is the job of the packet - scheduler, and several different algorithms for how to do this - "fairly" have been proposed. - - If you say N here, you will get the standard packet scheduler, which - is a FIFO (first come, first served). If you say Y here, you will be - able to choose from among several alternative algorithms which can - then be attached to different network devices. This is useful for - example if some of your network devices are real time devices that - need a certain minimum data flow rate, or if you need to limit the - maximum data flow rate for traffic which matches specified criteria. - This code is considered to be experimental. - - To administer these schedulers, you'll need the user-level utilities - from the package iproute2+tc at . - That package also contains some documentation; for more, check out - . - - This Quality of Service (QoS) support will enable you to use - Differentiated Services (diffserv) and Resource Reservation Protocol - (RSVP) on your Linux router if you also say Y to "QoS support", - "Packet classifier API" and to some classifiers below. Documentation - and software is at . - - If you say Y here and to "/proc file system" below, you will be able - to read status information about packet schedulers from the file - /proc/net/psched. - - The available schedulers are listed in the following questions; you - can say Y to as many as you like. If unsure, say N now. - -source "net/sched/Kconfig" - -endmenu - -menu "Network testing" - -config NET_PKTGEN - tristate "Packet Generator (USE WITH CAUTION)" - depends on PROC_FS +config NET_KEY + tristate "PF_KEY sockets" + select XFRM ---help--- - This module will inject preconfigured packets, at a configurable - rate, out of a given interface. It is used for network interface - stress testing and performance analysis. If you don't understand - what was just said, you don't need it: say N. - - Documentation on how to use the packet generator can be found - at . - - To compile this code as a module, choose M here: the - module will be called pktgen. - -endmenu - -endmenu - -config NETPOLL - def_bool NETCONSOLE - -config NETPOLL_RX - bool "Netpoll support for trapping incoming packets" - default n - depends on NETPOLL - -config NETPOLL_TRAP - bool "Netpoll traffic trapping" - default n - depends on NETPOLL - -config NET_POLL_CONTROLLER - def_bool NETPOLL + PF_KEYv2 socket family, compatible to KAME ones. + They are required if you are going to use IPsec tools ported + from KAME. -source "net/ax25/Kconfig" + Say Y unless you know what you are doing. -source "net/irda/Kconfig" -source "net/bluetooth/Kconfig" +source "net/packet/Kconfig" +source "net/unix/Kconfig" +source "net/bridge/Kconfig" +source "net/8021q/Kconfig" +source "net/wanrouter/Kconfig" -source "drivers/net/Kconfig" +endif +# NET endmenu diff -Nru a/net/atm/Kconfig b/net/atm/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/atm/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,77 @@ +# +# ATM Configarition +# + +config ATM + tristate "Asynchronous Transfer Mode (ATM) (EXPERIMENTAL)" + depends on EXPERIMENTAL + depends on NET + ---help--- + ATM is a high-speed networking technology for Local Area Networks + and Wide Area Networks. It uses a fixed packet size and is + connection oriented, allowing for the negotiation of minimum + bandwidth requirements. + + In order to participate in an ATM network, your Linux box needs an + ATM networking card. If you have that, say Y here and to the driver + of your ATM card below. + + Note that you need a set of user-space programs to actually make use + of ATM. See the file for + further details. + +config ATM_CLIP + tristate "Classical IP over ATM (EXPERIMENTAL)" + depends on ATM && INET + help + Classical IP over ATM for PVCs and SVCs, supporting InARP and + ATMARP. If you want to communication with other IP hosts on your ATM + network, you will typically either say Y here or to "LAN Emulation + (LANE)" below. + +config ATM_CLIP_NO_ICMP + bool "Do NOT send ICMP if no neighbour (EXPERIMENTAL)" + depends on ATM_CLIP + help + Normally, an "ICMP host unreachable" message is sent if a neighbour + cannot be reached because there is no VC to it in the kernel's + ATMARP table. This may cause problems when ATMARP table entries are + briefly removed during revalidation. If you say Y here, packets to + such neighbours are silently discarded instead. + +config ATM_LANE + tristate "LAN Emulation (LANE) support (EXPERIMENTAL)" + depends on ATM + help + LAN Emulation emulates services of existing LANs across an ATM + network. Besides operating as a normal ATM end station client, Linux + LANE client can also act as an proxy client bridging packets between + ELAN and Ethernet segments. You need LANE if you want to try MPOA. + +config ATM_MPOA + tristate "Multi-Protocol Over ATM (MPOA) support (EXPERIMENTAL)" + depends on ATM && INET && ATM_LANE!=n + help + Multi-Protocol Over ATM allows ATM edge devices such as routers, + bridges and ATM attached hosts establish direct ATM VCs across + subnetwork boundaries. These shortcut connections bypass routers + enhancing overall network performance. + +config ATM_BR2684 + tristate "RFC1483/2684 Bridged protocols" + depends on ATM && INET + help + ATM PVCs can carry ethernet PDUs according to rfc2684 (formerly 1483) + This device will act like an ethernet from the kernels point of view, + with the traffic being carried by ATM PVCs (currently 1 PVC/device). + This is sometimes used over DSL lines. If in doubt, say N. + +config ATM_BR2684_IPFILTER + bool "Per-VC IP filter kludge" + depends on ATM_BR2684 + help + This is an experimental mechanism for users who need to terminating a + large number of IP-only vcc's. Do not enable this unless you are sure + you know what you are doing. + + diff -Nru a/net/bridge/Kconfig b/net/bridge/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/bridge/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,32 @@ +# +# Configuration for Ethernet bridging +# + +config BRIDGE + tristate "802.1d Ethernet Bridging" + ---help--- + If you say Y here, then your Linux box will be able to act as an + Ethernet bridge, which means that the different Ethernet segments it + is connected to will appear as one Ethernet to the participants. + Several such bridges can work together to create even larger + networks of Ethernets using the IEEE 802.1 spanning tree algorithm. + As this is a standard, Linux bridges will cooperate properly with + other third party bridge products. + + In order to use the Ethernet bridge, you'll need the bridge + configuration tools; see + for location. Please read the Bridge mini-HOWTO for more + information. + + If you enable iptables support along with the bridge support then you + turn your bridge into a bridging IP firewall. + iptables will then see the IP packets being bridged, so you need to + take this into account when setting up your firewall rules. + Enabling arptables support when bridging will let arptables see + bridged ARP traffic in the arptables FORWARD chain. + + To compile this code as a module, choose M here: the module + will be called bridge. + + If unsure, say N. + diff -Nru a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig --- a/net/bridge/netfilter/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/bridge/netfilter/Kconfig 2005-04-04 21:41:57 +02:00 @@ -139,6 +139,7 @@ config BRIDGE_EBT_ARPREPLY tristate "ebt: arp reply target support" depends on BRIDGE_NF_EBTABLES + depends on INET help This option adds the arp reply target, which allows automatically sending arp replies to arp requests. diff -Nru a/net/core/Kconfig b/net/core/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/core/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,67 @@ +# +# Core configuration +# + +menu "Network testing" + +config NET_PKTGEN + tristate "Packet Generator (USE WITH CAUTION)" + depends on PROC_FS + depends on INET + ---help--- + This module will inject preconfigured packets, at a configurable + rate, out of a given interface. It is used for network interface + stress testing and performance analysis. If you don't understand + what was just said, you don't need it: say N. + + Documentation on how to use the packet generator can be found + at . + + To compile this code as a module, choose M here: the + module will be called pktgen. + +endmenu + +config NETPOLL + def_bool NETCONSOLE + +config NETPOLL_RX + bool "Netpoll support for trapping incoming packets" + default n + depends on NETPOLL + +config NETPOLL_TRAP + bool "Netpoll traffic trapping" + default n + depends on NETPOLL + +config NET_POLL_CONTROLLER + def_bool NETPOLL + +config NET_DIVERT + bool "Frame Diverter (EXPERIMENTAL)" + depends on EXPERIMENTAL + ---help--- + The Frame Diverter allows you to divert packets from the + network, that are not aimed at the interface receiving it (in + promisc. mode). Typically, a Linux box setup as an Ethernet bridge + with the Frames Diverter on, can do some *really* transparent www + caching using a Squid proxy for example. + + This is very useful when you don't want to change your router's + config (or if you simply don't have access to it). + + The other possible usages of diverting Ethernet Frames are + numberous: + - reroute smtp traffic to another interface + - traffic-shape certain network streams + - transparently proxy smtp connections + - etc... + + For more informations, please refer to: + + + + If unsure, say N. + + diff -Nru a/net/decnet/Kconfig b/net/decnet/Kconfig --- a/net/decnet/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/decnet/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,6 +1,30 @@ # # DECnet configuration # + +config DECNET + tristate "DECnet Support" + ---help--- + The DECnet networking protocol was used in many products made by + Digital (now Compaq). It provides reliable stream and sequenced + packet communications over which run a variety of services similar + to those which run over TCP/IP. + + To find some tools to use with the kernel layer support, please + look at Patrick Caulfield's web site: + . + + More detailed documentation is available in + . + + Be sure to say Y to "/proc file system support" and "Sysctl support" + below when using DECnet, since you will need sysctl support to aid + in configuration at run time. + + The DECnet code is also available as a module ( = code which can be + inserted in and removed from the running kernel whenever you want). + The module is called decnet. + config DECNET_ROUTER bool "DECnet: router support (EXPERIMENTAL)" depends on DECNET && EXPERIMENTAL diff -Nru a/net/econet/Kconfig b/net/econet/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/econet/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,34 @@ + +config ECONET + tristate "Acorn Econet/AUN protocols (EXPERIMENTAL)" + depends on EXPERIMENTAL && INET + ---help--- + Econet is a fairly old and slow networking protocol mainly used by + Acorn computers to access file and print servers. It uses native + Econet network cards. AUN is an implementation of the higher level + parts of Econet that runs over ordinary Ethernet connections, on + top of the UDP packet protocol, which in turn runs on top of the + Internet protocol IP. + + If you say Y here, you can choose with the next two options whether + to send Econet/AUN traffic over a UDP Ethernet connection or over + a native Econet network card. + + To compile this driver as a module, choose M here: the module + will be called econet. + +config ECONET_AUNUDP + bool "AUN over UDP" + depends on ECONET + help + Say Y here if you want to send Econet/AUN traffic over a UDP + connection (UDP is a packet based protocol that runs on top of the + Internet protocol IP) using an ordinary Ethernet network card. + +config ECONET_NATIVE + bool "Native Econet" + depends on ECONET + help + Say Y here if you have a native Econet network card installed in + your computer. + diff -Nru a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig --- a/net/ipv4/netfilter/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/ipv4/netfilter/Kconfig 2005-04-04 21:41:57 +02:00 @@ -2,9 +2,6 @@ # IP netfilter configuration # -menu "IP: Netfilter Configuration" - depends on INET && NETFILTER - # connection tracking, helpers and protocols config IP_NF_CONNTRACK tristate "Connection tracking (required for masq/NAT)" @@ -691,6 +688,4 @@ help Allows altering the ARP packet payload: source and destination hardware and network addresses. - -endmenu diff -Nru a/net/ipv6/Kconfig b/net/ipv6/Kconfig --- a/net/ipv6/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/ipv6/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,6 +1,26 @@ # # IPv6 configuration # + +# IPv6 as module will cause a CRASH if you try to unload it +config IPV6 + tristate "The IPv6 protocol" + default m + select CRYPTO if IPV6_PRIVACY + select CRYPTO_MD5 if IPV6_PRIVACY + ---help--- + This is complemental support for the IP version 6. + You will still be able to do traditional IPv4 networking as well. + + For general information about IPv6, see + . + For Linux IPv6 development information, see . + For specific information about IPv6 under Linux, read the HOWTO at + . + + To compile this protocol support as a module, choose M here: the + module will be called ipv6. + config IPV6_PRIVACY bool "IPv6: Privacy Extensions (RFC 3041) support" depends on IPV6 diff -Nru a/net/ipx/Kconfig b/net/ipx/Kconfig --- a/net/ipx/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/ipx/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,6 +1,39 @@ # # IPX configuration # +config IPX + tristate "The IPX protocol" + select LLC + ---help--- + This is support for the Novell networking protocol, IPX, commonly + used for local networks of Windows machines. You need it if you + want to access Novell NetWare file or print servers using the Linux + Novell client ncpfs (available from + ) or from + within the Linux DOS emulator DOSEMU (read the DOSEMU-HOWTO, + available from ). In order + to do the former, you'll also have to say Y to "NCP file system + support", below. + + IPX is similar in scope to IP, while SPX, which runs on top of IPX, + is similar to TCP. There is also experimental support for SPX in + Linux (see "SPX networking", below). + + To turn your Linux box into a fully featured NetWare file server and + IPX router, say Y here and fetch either lwared from + or + mars_nwe from . For more + information, read the IPX-HOWTO available from + . + + General information about how to connect Linux, Windows machines and + Macs is on the WWW at . + + The IPX driver would enlarge your kernel by about 16 KB. To compile + this driver as a module, choose M here: the module will be called ipx. + Unless you want to integrate your Linux box with a local Novell + network, say N. + config IPX_INTERN bool "IPX: Full internal IPX network" depends on IPX diff -Nru a/net/lapb/Kconfig b/net/lapb/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/lapb/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,24 @@ +# +# LAPB Configuration +# + +config LAPB + tristate "LAPB Data Link Driver (EXPERIMENTAL)" + depends on NET && EXPERIMENTAL + ---help--- + Link Access Procedure, Balanced (LAPB) is the data link layer (i.e. + the lower) part of the X.25 protocol. It offers a reliable + connection service to exchange data frames with one other host, and + it is used to transport higher level protocols (mostly X.25 Packet + Layer, the higher part of X.25, but others are possible as well). + Usually, LAPB is used with specialized X.21 network cards, but Linux + currently supports LAPB only over Ethernet connections. If you want + to use LAPB connections over Ethernet, say Y here and to "LAPB over + Ethernet driver" below. Read + for technical + details. + + To compile this driver as a module, choose M here: the + module will be called lapb. If unsure, say N. + + diff -Nru a/net/packet/Kconfig b/net/packet/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/packet/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,26 @@ +# +# Packet configuration +# + +config PACKET + tristate "Packet socket" + ---help--- + The Packet protocol is used by applications which communicate + directly with network devices without an intermediate network + protocol implemented in the kernel, e.g. tcpdump. If you want them + to work, choose Y. + + To compile this driver as a module, choose M here: the module will + be called af_packet. + + If unsure, say Y. + +config PACKET_MMAP + bool "Packet socket: mmapped IO" + depends on PACKET + help + If you say Y here, the Packet protocol driver will use an IO + mechanism that results in faster communication. + + If unsure, say N. + diff -Nru a/net/sched/Kconfig b/net/sched/Kconfig --- a/net/sched/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/sched/Kconfig 2005-04-04 21:41:57 +02:00 @@ -1,6 +1,45 @@ # # Traffic control configuration. # + +menu "QoS and/or fair quiueing" + +config NET_SCHED + bool "QoS and/or fair queueing" + ---help--- + When the kernel has several packets to send out over a network + device, it has to decide which ones to send first, which ones to + delay, and which ones to drop. This is the job of the packet + scheduler, and several different algorithms for how to do this + "fairly" have been proposed. + + If you say N here, you will get the standard packet scheduler, which + is a FIFO (first come, first served). If you say Y here, you will be + able to choose from among several alternative algorithms which can + then be attached to different network devices. This is useful for + example if some of your network devices are real time devices that + need a certain minimum data flow rate, or if you need to limit the + maximum data flow rate for traffic which matches specified criteria. + This code is considered to be experimental. + + To administer these schedulers, you'll need the user-level utilities + from the package iproute2+tc at . + That package also contains some documentation; for more, check out + . + + This Quality of Service (QoS) support will enable you to use + Differentiated Services (diffserv) and Resource Reservation Protocol + (RSVP) on your Linux router if you also say Y to "QoS support", + "Packet classifier API" and to some classifiers below. Documentation + and software is at . + + If you say Y here and to "/proc file system" below, you will be able + to read status information about packet schedulers from the file + /proc/net/psched. + + The available schedulers are listed in the following questions; you + can say Y to as many as you like. If unsure, say N now. + choice prompt "Packet scheduler clock source" depends on NET_SCHED @@ -506,3 +545,4 @@ Say Y to support traffic policing (bandwidth limits). Needed for ingress and egress rate limiting. +endmenu diff -Nru a/net/sctp/Kconfig b/net/sctp/Kconfig --- a/net/sctp/Kconfig 2005-04-04 21:41:57 +02:00 +++ b/net/sctp/Kconfig 2005-04-04 21:41:57 +02:00 @@ -2,12 +2,8 @@ # SCTP configuration # -menu "SCTP Configuration (EXPERIMENTAL)" - depends on INET && EXPERIMENTAL - config IP_SCTP tristate "The SCTP Protocol (EXPERIMENTAL)" - depends on IPV6 || IPV6=n select CRYPTO if SCTP_HMAC_SHA1 || SCTP_HMAC_MD5 select CRYPTO_HMAC if SCTP_HMAC_SHA1 || SCTP_HMAC_MD5 select CRYPTO_SHA1 if SCTP_HMAC_SHA1 @@ -86,4 +82,3 @@ advised to use either HMAC-MD5 or HMAC-SHA1. endchoice -endmenu diff -Nru a/net/unix/Kconfig b/net/unix/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/unix/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,22 @@ +# +# Configuration for Unix domain sockets +# + +config UNIX + tristate "Unix domain sockets" + ---help--- + If you say Y here, you will include support for Unix domain sockets; + sockets are the standard Unix mechanism for establishing and + accessing network connections. Many commonly used programs such as + the X Window system and syslog use these sockets even if your + machine is not connected to any network. Unless you are working on + an embedded system or something similar, you therefore definitely + want to say Y here. + + To compile this driver as a module, choose M here: the module will be + called unix. Note that several important services won't work + correctly if you say M here and then neglect to load the module. + + Say Y unless you know what you are doing. + + diff -Nru a/net/wanrouter/Kconfig b/net/wanrouter/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/wanrouter/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,31 @@ +# +# Configuration for WAN Router +# + +config WAN_ROUTER + tristate "WAN router" + depends on EXPERIMENTAL + ---help--- + Wide Area Networks (WANs), such as X.25, frame relay and leased + lines, are used to interconnect Local Area Networks (LANs) over vast + distances with data transfer rates significantly higher than those + achievable with commonly used asynchronous modem connections. + Usually, a quite expensive external device called a `WAN router' is + needed to connect to a WAN. + + As an alternative, WAN routing can be built into the Linux kernel. + With relatively inexpensive WAN interface cards available on the + market, a perfectly usable router can be built for less than half + the price of an external router. If you have one of those cards and + wish to use your Linux box as a WAN router, say Y here and also to + the WAN driver for your card, below. You will then need the + wan-tools package which is available from . + Read for more + information. + + To compile WAN routing support as a module, choose M here: the + module will be called wanrouter. + + If unsure, say N. + + diff -Nru a/net/x25/Kconfig b/net/x25/Kconfig --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/net/x25/Kconfig 2005-04-04 21:41:57 +02:00 @@ -0,0 +1,35 @@ +# +# X25 Configuration +# + +config X25 + tristate "CCITT X.25 Packet Layer (EXPERIMENTAL)" + depends on NET && EXPERIMENTAL + ---help--- + X.25 is a set of standardized network protocols, similar in scope to + frame relay; the one physical line from your box to the X.25 network + entry point can carry several logical point-to-point connections + (called "virtual circuits") to other computers connected to the X.25 + network. Governments, banks, and other organizations tend to use it + to connect to each other or to form Wide Area Networks (WANs). Many + countries have public X.25 networks. X.25 consists of two + protocols: the higher level Packet Layer Protocol (PLP) (say Y here + if you want that) and the lower level data link layer protocol LAPB + (say Y to "LAPB Data Link Driver" below if you want that). + + You can read more about X.25 at and + . + Information about X.25 for Linux is contained in the files + and + . + + One connects to an X.25 network either with a dedicated network card + using the X.21 protocol (not yet supported by Linux) or one can do + X.25 over a standard telephone line using an ordinary modem (say Y + to "X.25 async driver" below) or over Ethernet using an ordinary + Ethernet card and the LAPB over Ethernet (say Y to "LAPB Data Link + Driver" and "LAPB over Ethernet driver" below). + + To compile this driver as a module, choose M here: the module + will be called x25. If unsure, say N. + From acme@ghostprotocols.net Mon Apr 4 12:50:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 12:50:49 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34JoWda022536 for ; Mon, 4 Apr 2005 12:50:37 -0700 Received: from [200.138.131.177] (helo=oops.ghostprotocols.net) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 1DIXbp-0005JQ-00; Mon, 04 Apr 2005 16:51:41 -0300 Received: by oops.ghostprotocols.net (Postfix, from userid 500) id D673E14631; Mon, 4 Apr 2005 16:50:30 -0300 (BRT) Date: Mon, 4 Apr 2005 16:50:30 -0300 To: "David S. Miller" , Ralf Baechle Cc: netdev@oss.sgi.com Subject: [PATCH 2/2][AX25] make ax25_queue_xmit a net_device parameter Message-ID: <20050404195030.GJ640@conectiva.com.br> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="7ZAtKRhVyVSsbBD2" Content-Disposition: inline X-Url: http://advogato.org/person/acme User-Agent: Mutt/1.5.6i From: acme@ghostprotocols.net (Arnaldo Carvalho de Melo) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1371 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@ghostprotocols.net Precedence: bulk X-list: netdev --7ZAtKRhVyVSsbBD2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi David, Ralf, I'm trying to get back to the work of reducing the number of direct references to skb->{h,nh,raw}, that eventually will become just a void pointer. Available at: bk://kernel.bkbits.net/acme/sk_buff-2.6 Regards, - Arnaldo --7ZAtKRhVyVSsbBD2 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ax25.2.patch" =================================================================== ChangeSet@1.2246, 2005-04-04 16:28:37-03:00, acme@toy.ghostprotocols.net [AX25] Introduce ax25_type_trans Replacing the open coded equivalents and making ax25 look more like a linux network protocol, i.e. more similar to inet. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Ralf Baechle Signed-off-by: David S. Miller drivers/net/hamradio/6pack.c | 4 +--- drivers/net/hamradio/baycom_epp.c | 4 +--- drivers/net/hamradio/bpqether.c | 10 ++-------- drivers/net/hamradio/dmascc.c | 4 +--- drivers/net/hamradio/hdlcdrv.c | 4 +--- drivers/net/hamradio/mkiss.c | 4 +--- drivers/net/hamradio/scc.c | 5 +---- drivers/net/hamradio/yam.c | 4 +--- include/net/ax25.h | 8 ++++++++ net/ax25/ax25_ds_subr.c | 3 +-- net/ax25/ax25_out.c | 3 +-- 11 files changed, 19 insertions(+), 34 deletions(-) diff -Nru a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c --- a/drivers/net/hamradio/6pack.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/6pack.c 2005-04-04 16:44:28 -03:00 @@ -394,13 +394,11 @@ if ((skb = dev_alloc_skb(count)) == NULL) goto out_mem; - skb->dev = sp->dev; ptr = skb_put(skb, count); *ptr++ = cmd; /* KISS command */ memcpy(ptr, sp->cooked_buf + 1, count); - skb->mac.raw = skb->data; - skb->protocol = htons(ETH_P_AX25); + skb->protocol = ax25_type_trans(skb, sp->dev); netif_rx(skb); sp->dev->last_rx = jiffies; sp->stats.rx_packets++; diff -Nru a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c --- a/drivers/net/hamradio/baycom_epp.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/baycom_epp.c 2005-04-04 16:44:28 -03:00 @@ -601,12 +601,10 @@ bc->stats.rx_dropped++; return; } - skb->dev = dev; cp = skb_put(skb, pktlen); *cp++ = 0; /* KISS kludge */ memcpy(cp, bc->hdlcrx.buf, pktlen - 1); - skb->protocol = htons(ETH_P_AX25); - skb->mac.raw = skb->data; + skb->protocol = ax25_type_trans(skb, dev); netif_rx(skb); dev->last_rx = jiffies; bc->stats.rx_packets++; diff -Nru a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c --- a/drivers/net/hamradio/bpqether.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/bpqether.c 2005-04-04 16:44:28 -03:00 @@ -211,11 +211,7 @@ ptr = skb_push(skb, 1); *ptr = 0; - skb->dev = dev; - skb->protocol = htons(ETH_P_AX25); - skb->mac.raw = skb->data; - skb->pkt_type = PACKET_HOST; - + skb->protocol = ax25_type_trans(skb, dev); netif_rx(skb); dev->last_rx = jiffies; unlock: @@ -272,8 +268,6 @@ skb = newskb; } - skb->protocol = htons(ETH_P_AX25); - ptr = skb_push(skb, 2); *ptr++ = (size + 5) % 256; @@ -287,7 +281,7 @@ return -ENODEV; } - skb->dev = dev; + skb->protocol = ax25_type_trans(skb, dev); skb->nh.raw = skb->data; dev->hard_header(skb, dev, ETH_P_BPQ, bpq->dest_addr, NULL, 0); bpq->stats.tx_packets++; diff -Nru a/drivers/net/hamradio/dmascc.c b/drivers/net/hamradio/dmascc.c --- a/drivers/net/hamradio/dmascc.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/dmascc.c 2005-04-04 16:44:28 -03:00 @@ -1306,9 +1306,7 @@ data = skb_put(skb, cb + 1); data[0] = 0; memcpy(&data[1], priv->rx_buf[i], cb); - skb->dev = priv->dev; - skb->protocol = ntohs(ETH_P_AX25); - skb->mac.raw = skb->data; + skb->protocol = ax25_type_trans(skb, priv->dev); netif_rx(skb); priv->dev->last_rx = jiffies; priv->stats.rx_packets++; diff -Nru a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c --- a/drivers/net/hamradio/hdlcdrv.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/hdlcdrv.c 2005-04-04 16:44:28 -03:00 @@ -174,12 +174,10 @@ s->stats.rx_dropped++; return; } - skb->dev = dev; cp = skb_put(skb, pkt_len); *cp++ = 0; /* KISS kludge */ memcpy(cp, s->hdlcrx.buffer, pkt_len - 1); - skb->protocol = htons(ETH_P_AX25); - skb->mac.raw = skb->data; + skb->protocol = ax25_type_trans(skb, dev); netif_rx(skb); dev->last_rx = jiffies; s->stats.rx_packets++; diff -Nru a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c --- a/drivers/net/hamradio/mkiss.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/mkiss.c 2005-04-04 16:44:28 -03:00 @@ -332,12 +332,10 @@ return; } - skb->dev = ax->dev; spin_lock_bh(&ax->buflock); memcpy(skb_put(skb,count), ax->rbuff, count); spin_unlock_bh(&ax->buflock); - skb->mac.raw = skb->data; - skb->protocol = htons(ETH_P_AX25); + skb->protocol = ax25_type_trans(skb, ax->dev); netif_rx(skb); ax->dev->last_rx = jiffies; ax->rx_packets++; diff -Nru a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c --- a/drivers/net/hamradio/scc.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/scc.c 2005-04-04 16:44:28 -03:00 @@ -1630,10 +1630,7 @@ scc->dev_stat.rx_packets++; scc->dev_stat.rx_bytes += skb->len; - skb->dev = scc->dev; - skb->protocol = htons(ETH_P_AX25); - skb->mac.raw = skb->data; - skb->pkt_type = PACKET_HOST; + skb->protocol = ax25_type_trans(skb, scc->dev); netif_rx(skb); scc->dev->last_rx = jiffies; diff -Nru a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c --- a/drivers/net/hamradio/yam.c 2005-04-04 16:44:28 -03:00 +++ b/drivers/net/hamradio/yam.c 2005-04-04 16:44:28 -03:00 @@ -522,12 +522,10 @@ ++yp->stats.rx_dropped; } else { unsigned char *cp; - skb->dev = dev; cp = skb_put(skb, pkt_len); *cp++ = 0; /* KISS kludge */ memcpy(cp, yp->rx_buf, pkt_len - 1); - skb->protocol = htons(ETH_P_AX25); - skb->mac.raw = skb->data; + skb->protocol = ax25_type_trans(skb, dev); netif_rx(skb); dev->last_rx = jiffies; ++yp->stats.rx_packets; diff -Nru a/include/net/ax25.h b/include/net/ax25.h --- a/include/net/ax25.h 2005-04-04 16:44:28 -03:00 +++ b/include/net/ax25.h 2005-04-04 16:44:28 -03:00 @@ -220,6 +220,14 @@ } } +static inline unsigned short ax25_type_trans(struct sk_buff *skb, struct net_device *dev) +{ + skb->dev = dev; + skb->pkt_type = PACKET_HOST; + skb->mac.raw = skb->data; + return htons(ETH_P_AX25); +} + /* af_ax25.c */ extern struct hlist_head ax25_list; extern spinlock_t ax25_list_lock; diff -Nru a/net/ax25/ax25_ds_subr.c b/net/ax25/ax25_ds_subr.c --- a/net/ax25/ax25_ds_subr.c 2005-04-04 16:44:28 -03:00 +++ b/net/ax25/ax25_ds_subr.c 2005-04-04 16:44:28 -03:00 @@ -143,8 +143,7 @@ *p++ = cmd; *p++ = param; - skb->dev = ax25_dev->dev; - skb->protocol = htons(ETH_P_AX25); + skb->protocol = ax25_type_trans(skb, ax25_dev->dev); dev_queue_xmit(skb); } diff -Nru a/net/ax25/ax25_out.c b/net/ax25/ax25_out.c --- a/net/ax25/ax25_out.c 2005-04-04 16:44:28 -03:00 +++ b/net/ax25/ax25_out.c 2005-04-04 16:44:28 -03:00 @@ -351,8 +351,7 @@ { unsigned char *ptr; - skb->protocol = htons(ETH_P_AX25); - skb->dev = ax25_fwd_dev(dev); + skb->protocol = ax25_type_trans(skb, ax25_fwd_dev(dev)); ptr = skb_push(skb, 1); *ptr = 0x00; /* KISS */ --7ZAtKRhVyVSsbBD2-- From acme@ghostprotocols.net Mon Apr 4 12:52:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 12:52:57 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Jqj8f024314 for ; Mon, 4 Apr 2005 12:52:46 -0700 Received: from [200.138.131.177] (helo=oops.ghostprotocols.net) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 1DIXdy-0005Jl-00; Mon, 04 Apr 2005 16:53:54 -0300 Received: by oops.ghostprotocols.net (Postfix, from userid 500) id 1962814631; Mon, 4 Apr 2005 16:52:44 -0300 (BRT) Date: Mon, 4 Apr 2005 16:52:44 -0300 To: "David S. Miller" , Ralf Baechle Cc: netdev@oss.sgi.com Subject: [AX25] Introduce ax25_type_trans. was Re: [PATCH 2/2][AX25] make ax25_queue_xmit a net_device parameter Message-ID: <20050404195243.GK640@conectiva.com.br> References: <20050404195030.GJ640@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050404195030.GJ640@conectiva.com.br> X-Url: http://advogato.org/person/acme User-Agent: Mutt/1.5.6i From: acme@ghostprotocols.net (Arnaldo Carvalho de Melo) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1372 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@ghostprotocols.net Precedence: bulk X-list: netdev Sorry, this one got the same subject as the previous one, it should have been "[AX25] Introduce ax25_type_trans". - Arnaldo From hadi@cyberus.ca Mon Apr 4 13:12:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 13:12:05 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34KC1tM032714 for ; Mon, 4 Apr 2005 13:12:01 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DIXvN-0005Mq-Ui for netdev@oss.sgi.com; Mon, 04 Apr 2005 14:11:53 -0600 Received: from [216.209.86.2] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIXvC-0002rh-Cy; Mon, 04 Apr 2005 16:11:42 -0400 Subject: Re: [PATCH 2/2][AX25] make ax25_queue_xmit a net_device parameter From: jamal Reply-To: hadi@cyberus.ca To: Arnaldo Carvalho de Melo Cc: "David S. Miller" , Ralf Baechle , netdev In-Reply-To: <20050404195030.GJ640@conectiva.com.br> References: <20050404195030.GJ640@conectiva.com.br> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112645495.1078.40.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 16:11:36 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1373 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev You probably wanna set skb->input_dev in ax25_type_trans() as well something like: skb->input_dev = skb->dev = dev cheers, jamal On Mon, 2005-04-04 at 15:50, Arnaldo Carvalho de Melo wrote: > Hi David, Ralf, > > I'm trying to get back to the work of reducing the number of direct > references to skb->{h,nh,raw}, that eventually will become just a void > pointer. > > Available at: > > bk://kernel.bkbits.net/acme/sk_buff-2.6 > > Regards, > > - Arnaldo From rddunlap@osdl.org Mon Apr 4 13:49:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 13:49:46 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Knc1M002701 for ; Mon, 4 Apr 2005 13:49:40 -0700 Received: from [172.20.1.49] (fw.osdl.org [65.172.181.6]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j34Kmms4004974 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 4 Apr 2005 13:48:49 -0700 Message-ID: <4251A830.5030905@osdl.org> Date: Mon, 04 Apr 2005 13:48:48 -0700 From: "Randy.Dunlap" Organization: OSDL User-Agent: Mozilla Thunderbird 1.0 (X11/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Sam Ravnborg CC: ioe-lkml@axxeo.de, matthew@wil.cx, lkml , netdev@oss.sgi.com, hadi@cyberus.ca, cfriesen@nortel.com, tgraf@suug.ch Subject: Re: [PATCH] network configs: disconnect network options from drivers References: <20050330234709.1868eee5.randy.dunlap@verizon.net> <20050331185226.GA8146@mars.ravnborg.org> <424C5745.7020501@osdl.org> <20050331203010.GA8034@mars.ravnborg.org> <4250B4C5.2000200@osdl.org> <20050404195051.GA12364@mars.ravnborg.org> In-Reply-To: <20050404195051.GA12364@mars.ravnborg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1374 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev Hi Sam- Sam Ravnborg wrote: > On Sun, Apr 03, 2005 at 08:30:13PM -0700, Randy.Dunlap wrote: > >>Any comments on this new version? > > The new Networking menu looks unstructured. > And the net/Kconfig file contains a lot of config snippets that does not > belong there. > So I took a stamp on it with focus on: > - Move config bits to appropriate places, creating several new Kconfig > files Very Good. > - Made uses of menus more consistent at least on first and second level Very Good again. > - Move submenu to the top > - Rename top menu to "Networking" and located it just before > "File systems" I still prefer Networking to come before Device Drivers FWIW. Just makes some kind of hierarchical sense to me. > The patch became much larger. The win is that the top-level > net/Kconfig contains much less cruft. > > Many of the 56 lines added are due to the additional files. > I did not (on purpose) change any functionality. > > Only bit that I am worried about is the statement in SCTP: > depends on IPV6 || IPV6=n > > That looked like a noop to me. It had the sideeffect that SCTP > menu entries where idented an extra level which was not desireable > with currect layout. Yeah, I was having several identation problems. > Comments appreciated. Nice job overall. Especially nice to move ATM, bridge, DECNET, ECONET, etc., to their own Kconfig files so that they are more manageable. I propose that the new file net/atm/Kconfig be sourced somewhere. I'll look at it more to see if I have any other comments. > Patch on top of rc2. > > Signed-off-by: Sam Ravnborg > --- > > > Sam > > drivers/Kconfig | 5 > drivers/net/Kconfig | 5 > drivers/net/appletalk/Kconfig | 28 ++ > net/8021q/Kconfig | 21 + > net/Kconfig | 541 +++--------------------------------------- > net/atm/Kconfig | 77 +++++ > net/bridge/Kconfig | 32 ++ > net/bridge/netfilter/Kconfig | 1 > net/core/Kconfig | 67 +++++ > net/decnet/Kconfig | 24 + > net/econet/Kconfig | 34 ++ > net/ipv4/netfilter/Kconfig | 5 > net/ipv6/Kconfig | 20 + > net/ipx/Kconfig | 33 ++ > net/lapb/Kconfig | 24 + > net/packet/Kconfig | 26 ++ > net/sched/Kconfig | 40 +++ > net/sctp/Kconfig | 5 > net/unix/Kconfig | 22 + > net/wanrouter/Kconfig | 31 ++ > net/x25/Kconfig | 35 ++ > 21 files changed, 567 insertions(+), 509 deletions(-) Thanks! -- ~Randy From herbert@gondor.apana.org.au Mon Apr 4 14:33:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 14:34:07 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34LXvmU004547 for ; Mon, 4 Apr 2005 14:33:58 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIZC8-0008Fm-00; Tue, 05 Apr 2005 07:33:16 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIZAk-0003xk-00; Tue, 05 Apr 2005 07:31:50 +1000 Date: Tue, 5 Apr 2005 07:31:49 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050404213149.GA15222@gondor.apana.org.au> References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112620614.1088.489.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1375 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Apr 04, 2005 at 09:16:55AM -0400, jamal wrote: > > Ok, fair enough. It annoys me too when i review patches ;-> > So i will fix this before final. Just one more thing, can you please remove the _bh's that you added to the read_lock for xfrm_km_list? It turns out that they're not necessary since the write_lock()'s are only held in process context. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From sam@ravnborg.org Mon Apr 4 14:55:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 14:55:26 -0700 (PDT) Received: from pfepa.post.tele.dk (pfepa.post.tele.dk [195.41.46.235]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34LtIX6006011 for ; Mon, 4 Apr 2005 14:55:21 -0700 Received: from mars.ravnborg.org (0x50a0757d.hrnxx9.adsl-dhcp.tele.dk [80.160.117.125]) by pfepa.post.tele.dk (Postfix) with ESMTP id 26AAB47FE7E; Mon, 4 Apr 2005 23:54:22 +0200 (CEST) Received: by mars.ravnborg.org (Postfix, from userid 1000) id DC3EC6AC01D; Mon, 4 Apr 2005 23:55:54 +0200 (CEST) Date: Mon, 4 Apr 2005 23:55:54 +0200 From: Sam Ravnborg To: "Randy.Dunlap" Cc: ioe-lkml@axxeo.de, matthew@wil.cx, lkml , netdev@oss.sgi.com, hadi@cyberus.ca, cfriesen@nortel.com, tgraf@suug.ch Subject: Re: [PATCH] network configs: disconnect network options from drivers Message-ID: <20050404215554.GA29170@mars.ravnborg.org> References: <20050330234709.1868eee5.randy.dunlap@verizon.net> <20050331185226.GA8146@mars.ravnborg.org> <424C5745.7020501@osdl.org> <20050331203010.GA8034@mars.ravnborg.org> <4250B4C5.2000200@osdl.org> <20050404195051.GA12364@mars.ravnborg.org> <4251A830.5030905@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4251A830.5030905@osdl.org> User-Agent: Mutt/1.5.8i X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1376 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sam@ravnborg.org Precedence: bulk X-list: netdev > >- Move submenu to the top > >- Rename top menu to "Networking" and located it just before > > "File systems" > > I still prefer Networking to come before Device Drivers FWIW. > Just makes some kind of hierarchical sense to me. Moved up as suggested. > I propose that the new file net/atm/Kconfig be sourced somewhere. Thanks, I have missed that one - added just before wanrouter. > I'll look at it more to see if I have any other comments. OK. I will await and post an updated patch if you do not beat me. Sam From hadi@cyberus.ca Mon Apr 4 15:20:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 15:20:33 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34MKRhB007138 for ; Mon, 4 Apr 2005 15:20:28 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DIZvo-0005mQ-Sp for netdev@oss.sgi.com; Mon, 04 Apr 2005 18:20:28 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIZvi-00038k-0N; Mon, 04 Apr 2005 18:20:22 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050404213149.GA15222@gondor.apana.org.au> References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112653217.1088.2.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 18:20:17 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1377 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 17:31, Herbert Xu wrote: > On Mon, Apr 04, 2005 at 09:16:55AM -0400, jamal wrote: > > > > Ok, fair enough. It annoys me too when i review patches ;-> > > So i will fix this before final. > > Just one more thing, can you please remove the _bh's that you > added to the read_lock for xfrm_km_list? It turns out that they're > not necessary since the write_lock()'s are only held in process > context. Doesnt the policy notification one need it at least ? I thought it is entered at interupt context on packet path, no? cheers, jamal From davem@davemloft.net Mon Apr 4 15:26:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 15:26:55 -0700 (PDT) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34MQnvU007829 for ; Mon, 4 Apr 2005 15:26:49 -0700 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DIa0I-0005rl-00; Mon, 04 Apr 2005 15:25:06 -0700 Date: Mon, 4 Apr 2005 15:25:06 -0700 From: "David S. Miller" To: hadi@cyberus.ca Cc: herbert@gondor.apana.org.au, kaber@trash.net, nakam@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events Message-Id: <20050404152506.15e1404b.davem@davemloft.net> In-Reply-To: <1112653217.1088.2.camel@jzny.localdomain> References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1378 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On 04 Apr 2005 18:20:17 -0400 jamal wrote: > On Mon, 2005-04-04 at 17:31, Herbert Xu wrote: > > On Mon, Apr 04, 2005 at 09:16:55AM -0400, jamal wrote: > > > > > > Ok, fair enough. It annoys me too when i review patches ;-> > > > So i will fix this before final. > > > > Just one more thing, can you please remove the _bh's that you > > added to the read_lock for xfrm_km_list? It turns out that they're > > not necessary since the write_lock()'s are only held in process > > context. > > Doesnt the policy notification one need it at least ? I thought it is > entered at interupt context on packet path, no? If you only take write_lock() from process context, only the write_lock()'s need BH disabling. read_lock() takers can then nest arbitrarily, BH or not. From hadi@cyberus.ca Mon Apr 4 15:43:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 15:43:08 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34Mh4G5008765 for ; Mon, 4 Apr 2005 15:43:04 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DIaHh-0007om-26 for netdev@oss.sgi.com; Mon, 04 Apr 2005 18:43:05 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIaHb-0006Ma-T9; Mon, 04 Apr 2005 18:43:00 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: herbert@gondor.apana.org.au, kaber@trash.net, nakam@linux-ipv6.org, netdev In-Reply-To: <20050404152506.15e1404b.davem@davemloft.net> References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112654575.1089.17.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 04 Apr 2005 18:42:55 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1379 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Mon, 2005-04-04 at 18:25, David S. Miller wrote: > If you only take write_lock() from process context, only the write_lock()'s > need BH disabling. read_lock() takers can then nest arbitrarily, BH or not. Ok, never mind - Ive made the change. As soon as Masahide tests i will post the final patch. cheers, jamal From arnaldo.melo@gmail.com Mon Apr 4 16:08:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 16:08:53 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.192]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34N8iIP009987 for ; Mon, 4 Apr 2005 16:08:45 -0700 Received: by wproxy.gmail.com with SMTP id 68so1742918wri for ; Mon, 04 Apr 2005 16:08:39 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=eJqEO9seUN4hAcomQRcO6ctk+XHGwEe3AWJVXjbwbSrWvsDFO6MhdWfW4SxowsX4cfwbW6XF6SH6jDWnN8nPOMFkLbNLWd7+ynGEl22IL5r2dQySiMZD3zK2SVMbANUnZUkFhDZ4wE6j7lvUi1xSW+ZAb/r7PnzgS276QUICL6k= Received: by 10.54.84.18 with SMTP id h18mr166504wrb; Mon, 04 Apr 2005 16:08:39 -0700 (PDT) Received: by 10.54.72.15 with HTTP; Mon, 4 Apr 2005 16:08:39 -0700 (PDT) Message-ID: <39e6f6c70504041608707cb02f@mail.gmail.com> Date: Mon, 4 Apr 2005 20:08:39 -0300 From: Arnaldo Carvalho de Melo Reply-To: acme@conectiva.com.br To: hadi@cyberus.ca Subject: Re: [PATCH 2/2][AX25] make ax25_queue_xmit a net_device parameter Cc: Arnaldo Carvalho de Melo , "David S. Miller" , Ralf Baechle , netdev In-Reply-To: <1112645495.1078.40.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <20050404195030.GJ640@conectiva.com.br> <1112645495.1078.40.camel@jzny.localdomain> X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1380 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arnaldo.melo@gmail.com Precedence: bulk X-list: netdev On 04 Apr 2005 16:11:36 -0400, jamal wrote: > > You probably wanna set skb->input_dev in ax25_type_trans() as well > something like: > skb->input_dev = skb->dev = dev Yup, forgot about this one in this patch, but I want right now is mostly to reduce open coding all around while maintaining the same behaviour, in time we can eventually see a better abstraction for this set of operations, it should also set skb->protocol internally and not return a value to set skb->protocol, etc. :-) - Arnaldo From rddunlap@osdl.org Mon Apr 4 16:12:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 16:12:22 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34NCIA7010714 for ; Mon, 4 Apr 2005 16:12:18 -0700 Received: from [172.20.1.49] (fw.osdl.org [65.172.181.6]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j34NBXs4017472 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 4 Apr 2005 16:11:34 -0700 Message-ID: <4251C9A5.3020704@osdl.org> Date: Mon, 04 Apr 2005 16:11:33 -0700 From: "Randy.Dunlap" Organization: OSDL User-Agent: Mozilla Thunderbird 1.0 (X11/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Sam Ravnborg CC: ioe-lkml@axxeo.de, matthew@wil.cx, lkml , netdev@oss.sgi.com, hadi@cyberus.ca, cfriesen@nortel.com, tgraf@suug.ch Subject: Re: [PATCH] network configs: disconnect network options from drivers References: <20050330234709.1868eee5.randy.dunlap@verizon.net> <20050331185226.GA8146@mars.ravnborg.org> <424C5745.7020501@osdl.org> <20050331203010.GA8034@mars.ravnborg.org> <4250B4C5.2000200@osdl.org> <20050404195051.GA12364@mars.ravnborg.org> <4251A830.5030905@osdl.org> <20050404215554.GA29170@mars.ravnborg.org> In-Reply-To: <20050404215554.GA29170@mars.ravnborg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1381 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev Sam Ravnborg wrote: > >>>- Move submenu to the top >>>- Rename top menu to "Networking" and located it just before >>>"File systems" >> >>I still prefer Networking to come before Device Drivers FWIW. >>Just makes some kind of hierarchical sense to me. > > Moved up as suggested. > > >>I propose that the new file net/atm/Kconfig be sourced somewhere. > > Thanks, I have missed that one - added just before wanrouter. > > >>I'll look at it more to see if I have any other comments. > > OK. I will await and post an updated patch if you do not beat me. Sam, Here are a few more suggestions for you to consider. - in Networking support, move Network testing and Netpoll support to the end of the menu (basically put the devel. tools toward the bottom of the menu) - I would rather not "hide" Amateur Radio, IrDA, and Bluetooth in the Networking protocols area, but have them near 802.1x and ATM in the top-level Networking support menu. How does that sound to you? Thanks. -- ~Randy From tgraf@suug.ch Mon Apr 4 16:29:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 16:29:27 -0700 (PDT) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j34NTJMt014947 for ; Mon, 4 Apr 2005 16:29:20 -0700 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id 8A2A084 for ; Tue, 5 Apr 2005 01:28:52 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id F349F1C0EA; Tue, 5 Apr 2005 01:29:34 +0200 (CEST) Date: Tue, 5 Apr 2005 01:29:34 +0200 From: Thomas Graf To: netdev@oss.sgi.com Subject: [ANNOUNCE] netlink library 0.5.0 Message-ID: <20050404232934.GK26731@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1382 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev I released libnl 0.5.0 today introducing address, routing, and routing rules support, various cleanups, more callbacks to customize message parsers, support for non-blocking sockets and a complete API reference in HTML and PostScript. About 70% of the features have been implemented of what I would call the basics needed for a feature freeze to let things stabilize for a 1.0 version. http://people.suug.ch/~tgr/libnl/ http://people.suug.ch/~tgr/libnl/files/libnl-0.5.0.tar.gz Summary of Changes from 0.4.4 to 0.5.0 ================================================ Thomas Graf o API documentation o nl_cache_filter to manually filter on a object o partial routing support o routing rules support o Propely set address family when setting addresses o debug flag and some rare messages, more to come o make error mesage verboseness configureable o tc fixes to wait for ack o cleanup and adaption of address code to latest internal API o various cleanups o dozens of API breakages (better now than later) Daniel Hottinger o arch 64bit printf length modifier fixes Baruch Even , Mediatrix Telecom, inc. o address support From ravinandan.arakali@neterion.com Mon Apr 4 18:49:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 18:49:45 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j351ndu8019254 for ; Mon, 4 Apr 2005 18:49:40 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j350nFOC012602; Mon, 4 Apr 2005 20:49:15 -0400 (EDT) Received: from rarakali ([10.16.16.58]) by guinness.s2io.com (8.12.6/8.12.6) with SMTP id j350nDDD018529; Mon, 4 Apr 2005 20:49:13 -0400 (EDT) From: "Ravinandan Arakali" To: "'Arthur Kepner'" Cc: , , "'Leonid. Grossman \(E-mail\)'" , "'Raghavendra. Koushik \(E-mail\)'" Subject: RE: High CPU utilization with Bonding driver ? Date: Mon, 4 Apr 2005 17:49:07 -0700 Message-ID: <004101c53979$45f02800$3a10100a@pc.s2io.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-Scanned-By: MIMEDefang 2.34 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1383 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ravinandan.arakali@neterion.com Precedence: bulk X-list: netdev Arthur, On what kernel version should your below mentioned patch be applied ? We tried on one of the older kernels(2.6.5) and got an Oops while loading the bonding driver. Thanks, Ravi -----Original Message----- From: Arthur Kepner [mailto:akepner@sgi.com] Sent: Tuesday, March 29, 2005 10:29 AM To: Ravinandan Arakali Cc: netdev@oss.sgi.com; bonding-devel@lists.sourceforge.net; Leonid. Grossman (E-mail); Raghavendra. Koushik (E-mail) Subject: Re: High CPU utilization with Bonding driver ? On Tue, 29 Mar 2005, Ravinandan Arakali wrote: > .... > Results(8 nttcp/chariot streams): > --------------------------------- > 1. Combined throughputs(but no bonding): > 3.1 + 6.2 = 9.3 Gbps with 58% CPU idle. > > 2. eth0 and eth1 bonded together in LACP mode: > 8.2 Gbps with 1% CPU idle. > > From the above results, when Bonding driver is used(#2), the CPUs are > completely maxed out compared to the case when traffic is run > simultaneously on both the cards(#1). > Can anybody suggest some reasons for the above behavior ? > Ravi; Have you tried this patch? http://marc.theaimsgroup.com/?l=linux-netdev&m=111091146828779&w=2 If not, it will likely go a long way to solving your problem. -- Arthur From akepner@sgi.com Mon Apr 4 20:05:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 20:05:21 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35351YH022026 for ; Mon, 4 Apr 2005 20:05:01 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j354gqxw012558 for ; Mon, 4 Apr 2005 21:43:02 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by nodin.corp.sgi.com (SGI-8.12.5/8.12.10/SGI_generic_relay-1.2) with ESMTP id j3534VbT55254226 for ; Mon, 4 Apr 2005 20:04:31 -0700 (PDT) Received: from [192.168.2.20] (mtv-vpn-sw-corp-0-49.corp.sgi.com [134.15.0.49]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j3533PlV10842885; Mon, 4 Apr 2005 20:03:27 -0700 (PDT) Date: Mon, 4 Apr 2005 20:03:02 -0700 (PDT) From: Arthur Kepner X-X-Sender: akepner@linux.site To: Ravinandan Arakali cc: netdev@oss.sgi.com, bonding-devel@lists.sourceforge.net, "'Leonid. Grossman (E-mail)'" , "'Raghavendra. Koushik (E-mail)'" Subject: RE: High CPU utilization with Bonding driver ? In-Reply-To: <004101c53979$45f02800$3a10100a@pc.s2io.com> Message-ID: References: <004101c53979$45f02800$3a10100a@pc.s2io.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1384 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akepner@sgi.com Precedence: bulk X-list: netdev On Mon, 4 Apr 2005, Ravinandan Arakali wrote: > Arthur, > On what kernel version should your below mentioned patch be applied ? > We tried on one of the older kernels(2.6.5) and got an Oops while > loading the bonding driver. > ..... Hmmm, interesting. I've used it with 2.6.X for at least two values of X (one of them being 5) so I'm surprised. Can you provide details about the oops? -- Arthut From jmorris@redhat.com Mon Apr 4 22:05:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 22:06:00 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3555rU0029947 for ; Mon, 4 Apr 2005 22:05:54 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j3555B9x021150; Tue, 5 Apr 2005 01:05:12 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j3555BO11361; Tue, 5 Apr 2005 01:05:11 -0400 Received: from thoron.boston.redhat.com (thoron.boston.redhat.com [172.16.80.63]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j3555Av9028068; Tue, 5 Apr 2005 01:05:10 -0400 Date: Tue, 5 Apr 2005 01:05:10 -0400 (EDT) From: James Morris X-X-Sender: jmorris@thoron.boston.redhat.com To: Evgeniy Polyakov cc: linux-kernel@vger.kernel.org, , "David S. Miller" , Herbert Xu , , Greg KH , Andrew Morton Subject: Netlink Connector / CBUS Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1385 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@redhat.com Precedence: bulk X-list: netdev Evgeniy, Please send networking patches to netdev@oss.sgi.com. Your connector code (under drivers/connector) is now in the -mm tree and as far as I can tell, has not received any review from the network developers. Looking at it briefly, it seems quite unfinished. I'm not entirely sure what it's purpose is. A clear explanation of its purpose would be helpful (to me, at least), as well as documentation of the API and majore data structures (which akpm has also asked for, IIRC). I can see one example of where it's being used with kobject_uevent, and it seems to have arrived via Greg-KH's I2C tree... If you're trying to add a generic, psuedo-reliable Netlink communication system, perhaps this should be built into Netlink itself as an extension of the existing Netlink API. I don't think this should be done as a separate "driver" off somewhere else with a new API. A few questions: - Why does it by default use NETLINK_NFLOG a kernel socket, and also allow this to be overriden by a module parameter? - Why does the cn.o module (poor namespace choice) add a callback itself on initialization? - Where is the userspace code which uses this? I checked out dbus from cvs and couldn't see anything obvious. Thanks, - James -- James Morris From jmorris@redhat.com Mon Apr 4 22:10:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 22:10:29 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j355AOAV030773 for ; Mon, 4 Apr 2005 22:10:25 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j355A7lF022142; Tue, 5 Apr 2005 01:10:07 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j355A6O12574; Tue, 5 Apr 2005 01:10:06 -0400 Received: from thoron.boston.redhat.com (thoron.boston.redhat.com [172.16.80.63]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j355A6v9028398; Tue, 5 Apr 2005 01:10:06 -0400 Date: Tue, 5 Apr 2005 01:10:06 -0400 (EDT) From: James Morris X-X-Sender: jmorris@thoron.boston.redhat.com To: Evgeniy Polyakov cc: linux-kernel@vger.kernel.org, , "David S. Miller" , Herbert Xu , , Greg KH , Andrew Morton Subject: Re: Netlink Connector / CBUS In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1386 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@redhat.com Precedence: bulk X-list: netdev On Tue, 5 Apr 2005, James Morris wrote: > A few questions: Also, please allow cn_add_callback() allow it to be passed a NULL callback function, so the caller doesn't pass in a dummy function and your code doesn't waste time dealing with something which isn't real. - James -- James Morris From lark@linux.net.cn Mon Apr 4 22:35:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 22:35:24 -0700 (PDT) Received: from mx.linux.net.cn ([211.100.11.220]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j355ZC9N032019 for ; Mon, 4 Apr 2005 22:35:14 -0700 Received: from localhost (master.linux.net.cn [127.0.0.1]) by mx.linux.net.cn (Postfix) with ESMTP id D34243F89E for ; Tue, 5 Apr 2005 13:35:05 +0800 (CST) Received: from mx.linux.net.cn ([127.0.0.1]) by localhost (master.linux.net.cn [127.0.0.1]) (amavisd-new, port 10025) with LMTP id 08008-04-3 for ; Tue, 5 Apr 2005 13:35:02 +0800 (CST) Received: from [192.168.0.120] (unknown [61.48.107.46]) by mx.linux.net.cn (Postfix) with ESMTP id 6BEF33F895 for ; Tue, 5 Apr 2005 13:35:02 +0800 (CST) Date: Tue, 05 Apr 2005 13:35:02 +0800 From: Wang Jian To: netdev@oss.sgi.com Subject: [PATCH] improvement on net/sched/cls_fw.c's hash function Message-Id: <20050405133336.0247.LARK@linux.net.cn> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_42520103041705CB2AF8_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.20 [CN] X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: amavisd-new at linux.net.cn X-Virus-Status: Clean X-archive-position: 1387 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lark@linux.net.cn Precedence: bulk X-list: netdev --------_42520103041705CB2AF8_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, This is a simple patch against net/sched/cls_fw.c. The idea of this patch is discussed in this thread https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then 251 is good too. BTW: I don't know much about hash performance and hash distribution of jhash. This is a quick fix. -- lark --------_42520103041705CB2AF8_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="hash-cls_fw.diff" Content-Disposition: attachment; filename="hash-cls_fw.diff" Content-Transfer-Encoding: base64 SW5kZXg6IGNsc19mdy5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIGNsc19mdy5jCShyZXZpc2lvbiAxKQorKysg Y2xzX2Z3LmMJKHdvcmtpbmcgY29weSkKQEAgLTQ1LDEwICs0NSwxMyBAQAogI2luY2x1ZGUgPG5l dC9zb2NrLmg+CiAjaW5jbHVkZSA8bmV0L2FjdF9hcGkuaD4KICNpbmNsdWRlIDxuZXQvcGt0X2Ns cy5oPgorI2luY2x1ZGUgPGxpbnV4L2poYXNoLmg+CiAKKyNkZWZpbmUgRldfRklMVEVSX0hTSVpF CQk1MDkKKwogc3RydWN0IGZ3X2hlYWQKIHsKLQlzdHJ1Y3QgZndfZmlsdGVyICpodFsyNTZdOwor CXN0cnVjdCBmd19maWx0ZXIgKmh0W0ZXX0ZJTFRFUl9IU0laRV07CiB9OwogCiBzdHJ1Y3QgZndf ZmlsdGVyCkBAIC02OSw3ICs3Miw3IEBACiAKIHN0YXRpYyBfX2lubGluZV9fIGludCBmd19oYXNo KHUzMiBoYW5kbGUpCiB7Ci0JcmV0dXJuIGhhbmRsZSYweEZGOworCXJldHVybiAoamhhc2hfMXdv cmQoaGFuZGxlLCAweEYzMEE3MTI5KSAlIEZXX0ZJTFRFUl9IU0laRSk7CiB9CiAKIHN0YXRpYyBp bnQgZndfY2xhc3NpZnkoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3RydWN0IHRjZl9wcm90byAqdHAs CkBAIC0xNTIsNyArMTU1LDcgQEAKIAlpZiAoaGVhZCA9PSBOVUxMKQogCQlyZXR1cm47CiAKLQlm b3IgKGg9MDsgaDwyNTY7IGgrKykgeworCWZvciAoaD0wOyBoPEZXX0ZJTFRFUl9IU0laRTsgaCsr KSB7CiAJCXdoaWxlICgoZj1oZWFkLT5odFtoXSkgIT0gTlVMTCkgewogCQkJaGVhZC0+aHRbaF0g PSBmLT5uZXh0OwogCQkJZndfZGVsZXRlX2ZpbHRlcih0cCwgZik7CkBAIC0yOTEsNyArMjk0LDcg QEAKIAlpZiAoYXJnLT5zdG9wKQogCQlyZXR1cm47CiAKLQlmb3IgKGggPSAwOyBoIDwgMjU2OyBo KyspIHsKKwlmb3IgKGggPSAwOyBoIDwgRldfRklMVEVSX0hTSVpFOyBoKyspIHsKIAkJc3RydWN0 IGZ3X2ZpbHRlciAqZjsKIAogCQlmb3IgKGYgPSBoZWFkLT5odFtoXTsgZjsgZiA9IGYtPm5leHQp IHsK --------_42520103041705CB2AF8_MULTIPART_MIXED_-- From davem@davemloft.net Mon Apr 4 22:38:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 22:39:02 -0700 (PDT) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j355ctK0032703 for ; Mon, 4 Apr 2005 22:38:58 -0700 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DIgky-0000BB-00; Mon, 04 Apr 2005 22:37:44 -0700 Date: Mon, 4 Apr 2005 22:37:44 -0700 From: "David S. Miller" To: Wang Jian Cc: netdev@oss.sgi.com Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Message-Id: <20050404223744.1f04c130.davem@davemloft.net> In-Reply-To: <20050405133336.0247.LARK@linux.net.cn> References: <20050405133336.0247.LARK@linux.net.cn> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1388 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Tue, 05 Apr 2005 13:35:02 +0800 Wang Jian wrote: > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > 251 is good too. Please us a power of two, the "%" is expensive on some cpus. From lark@linux.net.cn Mon Apr 4 23:06:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 23:06:16 -0700 (PDT) Received: from mx.linux.net.cn ([211.100.11.220]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35663Kq001837 for ; Mon, 4 Apr 2005 23:06:07 -0700 Received: from localhost (master.linux.net.cn [127.0.0.1]) by mx.linux.net.cn (Postfix) with ESMTP id D2D7E3F89E for ; Tue, 5 Apr 2005 14:05:58 +0800 (CST) Received: from mx.linux.net.cn ([127.0.0.1]) by localhost (master.linux.net.cn [127.0.0.1]) (amavisd-new, port 10025) with LMTP id 08289-06-4 for ; Tue, 5 Apr 2005 14:05:56 +0800 (CST) Received: from [192.168.0.120] (unknown [61.48.107.46]) by mx.linux.net.cn (Postfix) with ESMTP id 76BD63F895 for ; Tue, 5 Apr 2005 14:05:56 +0800 (CST) Date: Tue, 05 Apr 2005 14:05:56 +0800 From: Wang Jian To: netdev@oss.sgi.com Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function In-Reply-To: <20050404223744.1f04c130.davem@davemloft.net> References: <20050405133336.0247.LARK@linux.net.cn> <20050404223744.1f04c130.davem@davemloft.net> Message-Id: <20050405140342.024A.LARK@linux.net.cn> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_42522A3E043B033D6C20_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.20 [CN] X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: amavisd-new at linux.net.cn X-Virus-Status: Clean X-archive-position: 1389 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lark@linux.net.cn Precedence: bulk X-list: netdev --------_42522A3E043B033D6C20_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi David S. Miller, New patch attached. Hashsize is 256, the same as old one. On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" wrote: > On Tue, 05 Apr 2005 13:35:02 +0800 > Wang Jian wrote: > > > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > > 251 is good too. > > Please us a power of two, the "%" is expensive on some cpus. -- lark --------_42522A3E043B033D6C20_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="hash-cls_fw-2.diff" Content-Disposition: attachment; filename="hash-cls_fw-2.diff" Content-Transfer-Encoding: base64 SW5kZXg6IGxpbnV4LTIuNi4xMS13L25ldC9zY2hlZC9jbHNfZncuYwo9PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBs aW51eC0yLjYuMTEtdy9uZXQvc2NoZWQvY2xzX2Z3LmMJKHJldmlzaW9uIDEpCisrKyBsaW51eC0y LjYuMTEtdy9uZXQvc2NoZWQvY2xzX2Z3LmMJKHdvcmtpbmcgY29weSkKQEAgLTQ1LDEwICs0NSwx MyBAQAogI2luY2x1ZGUgPG5ldC9zb2NrLmg+CiAjaW5jbHVkZSA8bmV0L2FjdF9hcGkuaD4KICNp bmNsdWRlIDxuZXQvcGt0X2Nscy5oPgorI2luY2x1ZGUgPGxpbnV4L2poYXNoLmg+CiAKKyNkZWZp bmUgRldfRklMVEVSX0hTSVpFCQkyNTYKKwogc3RydWN0IGZ3X2hlYWQKIHsKLQlzdHJ1Y3QgZndf ZmlsdGVyICpodFsyNTZdOworCXN0cnVjdCBmd19maWx0ZXIgKmh0W0ZXX0ZJTFRFUl9IU0laRV07 CiB9OwogCiBzdHJ1Y3QgZndfZmlsdGVyCkBAIC02OSw3ICs3Miw3IEBACiAKIHN0YXRpYyBfX2lu bGluZV9fIGludCBmd19oYXNoKHUzMiBoYW5kbGUpCiB7Ci0JcmV0dXJuIGhhbmRsZSYweEZGOwor CXJldHVybiAoamhhc2hfMXdvcmQoaGFuZGxlLCAweEYzMEE3MTI5KSAlIEZXX0ZJTFRFUl9IU0la RSk7CiB9CiAKIHN0YXRpYyBpbnQgZndfY2xhc3NpZnkoc3RydWN0IHNrX2J1ZmYgKnNrYiwgc3Ry dWN0IHRjZl9wcm90byAqdHAsCkBAIC0xNTIsNyArMTU1LDcgQEAKIAlpZiAoaGVhZCA9PSBOVUxM KQogCQlyZXR1cm47CiAKLQlmb3IgKGg9MDsgaDwyNTY7IGgrKykgeworCWZvciAoaD0wOyBoPEZX X0ZJTFRFUl9IU0laRTsgaCsrKSB7CiAJCXdoaWxlICgoZj1oZWFkLT5odFtoXSkgIT0gTlVMTCkg ewogCQkJaGVhZC0+aHRbaF0gPSBmLT5uZXh0OwogCQkJZndfZGVsZXRlX2ZpbHRlcih0cCwgZik7 CkBAIC0yOTEsNyArMjk0LDcgQEAKIAlpZiAoYXJnLT5zdG9wKQogCQlyZXR1cm47CiAKLQlmb3Ig KGggPSAwOyBoIDwgMjU2OyBoKyspIHsKKwlmb3IgKGggPSAwOyBoIDwgRldfRklMVEVSX0hTSVpF OyBoKyspIHsKIAkJc3RydWN0IGZ3X2ZpbHRlciAqZjsKIAogCQlmb3IgKGYgPSBoZWFkLT5odFto XTsgZjsgZiA9IGYtPm5leHQpIHsK --------_42522A3E043B033D6C20_MULTIPART_MIXED_-- From johnpol@2ka.mipt.ru Mon Apr 4 23:59:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Apr 2005 23:59:31 -0700 (PDT) Received: from vocord.com (ns2.vocord.com [194.220.215.56]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j356xNbl004930 for ; Mon, 4 Apr 2005 23:59:24 -0700 Received: from uganda.factory.vocord.ru (uganda.factory.vocord.ru [192.168.0.48]) by vocord.com (8.13.1/8.13.1) with ESMTP id j356uTw7024383; Tue, 5 Apr 2005 10:57:29 +0400 Subject: Re: Netlink Connector / CBUS From: Evgeniy Polyakov Reply-To: johnpol@2ka.mipt.ru To: James Morris Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, "David S. Miller" , Herbert Xu , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: References: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-p5aA2EfudVVDZNSzIBGU" Organization: MIPT Date: Tue, 05 Apr 2005 11:03:16 +0400 Message-Id: <1112684596.28858.4.camel@uganda> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: ClamAV 0.80/762/Mon Mar 14 02:35:33 2005 clamav-milter version 0.80j on dea.vocord.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (vocord.com [192.168.0.1]); Tue, 05 Apr 2005 10:57:35 +0400 (MSD) X-archive-position: 1390 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev --=-p5aA2EfudVVDZNSzIBGU Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, 2005-04-05 at 01:05 -0400, James Morris wrote:=20 > Evgeniy, >=20 > Please send networking patches to netdev@oss.sgi.com. It was sent there two times. > Your connector code (under drivers/connector) is now in the -mm tree and=20 > as far as I can tell, has not received any review from the network=20 > developers. I received comments and feature requests from Herbert Xu and Jamal Hadi Salim, almost all were successfully resolved. > Looking at it briefly, it seems quite unfinished. Hmmm... I think it is fully functional and ready for inclusion. > I'm not entirely sure what it's purpose is. 1. Provide very flexible userspace control over netlink. 2. Provide very flexible notification mechanism. > A clear explanation of its purpose would be helpful (to me, at least), as= =20 > well as documentation of the API and majore data structures (which akpm=20 > has also asked for, IIRC). Documentation exists in Documentation/connector/connector.txt. Patch with brief source documentation was already created, so I will post it with other minor updates soon. > I can see one example of where it's being used with kobject_uevent, and i= t=20 > seems to have arrived via Greg-KH's I2C tree... It also is used in SuperIO and acrypto subsystems. > If you're trying to add a generic, psuedo-reliable Netlink communication=20 > system, perhaps this should be built into Netlink itself as an extension=20 > of the existing Netlink API. So, you recommend to create for each driver, that wants to be controlled over netlink, new netlink socket, register it's unit and learn how SKB is allocated, processed and so on? This is wrong. Much easier to just register a callback. > I don't think this should be done as a separate "driver" off somewhere=20 > else with a new API. It is much easier to use connector instead of direct netlink sockets. One should only register callback and identifier. When driver receives special netlink message with appropriate identifier, appropriate callback will be called. =46rom the userspace point of view it's quite straightforward: socket(); bind(); send(); recv(); But if kernelspace want to use full power of such connections, driver writer must create special sockets, must know about struct sk_buff handling... Connector allows any kernelspace agents to use netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); void cn_netlink_send(struct cn_msg *msg, u32 __groups); >=20 > A few questions: >=20 > - Why does it by default use NETLINK_NFLOG a kernel socket, and also allo= w=20 > this to be overriden by a module parameter? Because while this driver lived outside kernel tree there were no empty=20 registered socket. It can be changed if driver will go upstream. > - Why does the cn.o module (poor namespace choice) add a callback itself > on initialization? Because that callback is used for notification requests. > - Where is the userspace code which uses this? I checked out dbus from=20 > cvs and couldn't see anything obvious. I posted it with SuperIO, kobject_uevent, acrypto and fork changes. It is quite straightforward: s =3D socket(PF_NETLINK, SOCK_DGRAM, NETLINK_NFLOG); if (s =3D=3D -1) { perror("socket"); return -1; } l_local.nl_family =3D AF_NETLINK; l_local.nl_groups =3D CN_ACRYPTO_IDX; l_local.nl_pid =3D getpid(); if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) =3D=3D -1) { perror("bind"); close(s); return -1; } case NLMSG_DONE: data =3D (struct cn_msg *)NLMSG_DATA(reply); m =3D (struct crypto_conn_data *)(data + 1); stat =3D (struct crypto_device_stat *)(m+1); time(&tm); fprintf(out, "%.24s : [%x.%x] [seq=3D%u, ack=3D%u], name= =3D %s, cmd=3D%#02x, " "sesions: completed=3D%llu, started=3D%llu, finished=3D%llu, cache_failed=3D%llu.\n", ctime(&tm), data->id.idx, data->id.val, data->seq, data->ack, m->name, m->cmd, stat->scompleted, stat->sstarted, stat- >sfinished, stat->cache_failed); fflush(out); break; >=20 > Thanks, Thank you for your comments. >=20 > - James --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski --=-p5aA2EfudVVDZNSzIBGU Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQBCUjg0IKTPhE+8wY0RAtW9AJ0a0EjP0tCQ+mf28pplSyNYxtY5DgCfQq0x oMdIKfBX1VrHHWNtXPzhMAc= =C9qk -----END PGP SIGNATURE----- --=-p5aA2EfudVVDZNSzIBGU-- From johnpol@2ka.mipt.ru Tue Apr 5 00:04:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 00:04:13 -0700 (PDT) Received: from vocord.com (ns2.vocord.com [194.220.215.56]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35745hU005695 for ; Tue, 5 Apr 2005 00:04:06 -0700 Received: from uganda.factory.vocord.ru (uganda.factory.vocord.ru [192.168.0.48]) by vocord.com (8.13.1/8.13.1) with ESMTP id j35731D8024623; Tue, 5 Apr 2005 11:03:05 +0400 Subject: Re: Netlink Connector / CBUS From: Evgeniy Polyakov Reply-To: johnpol@2ka.mipt.ru To: James Morris Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, "David S. Miller" , Herbert Xu , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: References: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-Q31YfxoFqrjCdpDRIKy3" Organization: MIPT Date: Tue, 05 Apr 2005 11:08:04 +0400 Message-Id: <1112684884.28858.10.camel@uganda> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: ClamAV 0.80/762/Mon Mar 14 02:35:33 2005 clamav-milter version 0.80j on dea.vocord.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (vocord.com [192.168.0.1]); Tue, 05 Apr 2005 11:03:05 +0400 (MSD) X-archive-position: 1391 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev --=-Q31YfxoFqrjCdpDRIKy3 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, 2005-04-05 at 01:10 -0400, James Morris wrote: > On Tue, 5 Apr 2005, James Morris wrote: >=20 > > A few questions: >=20 > Also, please allow cn_add_callback() allow it to be passed a NULL=20 > callback function, so the caller doesn't pass in a dummy function and you= r=20 > code doesn't waste time dealing with something which isn't real. Why can anyone want to add callback that will not supposed to be usefull? Callback is called when someone sends netlink message with appropriate idx/val inside, if there is no registered callback with such ID,=20 nothing will be called and skb will be freed. >=20 > - James --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski --=-Q31YfxoFqrjCdpDRIKy3 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQBCUjlUIKTPhE+8wY0RAv/qAJ4gmxfOafjdLprCQ4Ue0e7d6SxbrACfcJM7 qFXZGBk8rY3rtFUCIyZgu3Q= =0Car -----END PGP SIGNATURE----- --=-Q31YfxoFqrjCdpDRIKy3-- From herbert@gondor.apana.org.au Tue Apr 5 00:11:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 00:11:36 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j357BQBi006725 for ; Tue, 5 Apr 2005 00:11:26 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIiDB-0002zX-00; Tue, 05 Apr 2005 17:10:57 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIiCI-00050h-00; Tue, 05 Apr 2005 17:10:02 +1000 Date: Tue, 5 Apr 2005 17:10:02 +1000 To: Evgeniy Polyakov Cc: James Morris , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, "David S. Miller" , rml@novell.com, Greg KH , Andrew Morton Subject: Re: Netlink Connector / CBUS Message-ID: <20050405071002.GA19186@gondor.apana.org.au> References: <1112684596.28858.4.camel@uganda> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112684596.28858.4.camel@uganda> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1392 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Tue, Apr 05, 2005 at 11:03:16AM +0400, Evgeniy Polyakov wrote: > > I received comments and feature requests from Herbert Xu and Jamal Hadi > Salim, > almost all were successfully resolved. Please do not construe my involvement in these threads as endorsement for this system. In fact to this day I still don't understand what problems this thing is meant to solve. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From johnpol@2ka.mipt.ru Tue Apr 5 00:28:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 00:28:56 -0700 (PDT) Received: from vocord.com (dea.vocord.ru [217.67.177.50]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j357SllX011474 for ; Tue, 5 Apr 2005 00:28:48 -0700 Received: from uganda.factory.vocord.ru (uganda.factory.vocord.ru [192.168.0.48]) by vocord.com (8.13.1/8.13.1) with ESMTP id j357Rstm025626; Tue, 5 Apr 2005 11:27:54 +0400 Subject: Re: Netlink Connector / CBUS From: Evgeniy Polyakov Reply-To: johnpol@2ka.mipt.ru To: Herbert Xu Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, "David S. Miller" , James Morris , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: References: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-sXlTmFdVb1kJKT8aZHNH" Organization: MIPT Date: Tue, 05 Apr 2005 11:34:40 +0400 Message-Id: <1112686480.28858.17.camel@uganda> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: ClamAV 0.80/762/Mon Mar 14 02:35:33 2005 clamav-milter version 0.80j on dea.vocord.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (vocord.com [192.168.0.1]); Tue, 05 Apr 2005 11:27:54 +0400 (MSD) X-archive-position: 1393 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev --=-sXlTmFdVb1kJKT8aZHNH Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, 2005-04-05 at 01:10 -0400, Herbert Xu wrote: >On Tue, Apr 05, 2005 at 11:03:16AM +0400, Evgeniy Polyakov wrote: >>=20 >> I received comments and feature requests from Herbert Xu and Jamal Hadi >> Salim, >> almost all were successfully resolved. > >Please do not construe my involvement in these threads as endorsement >for this system. Sure. I remember you are against it :). >In fact to this day I still don't understand what problems this thing is >meant to solve. Hmm, what else can I add to my words? May be checking the size of the code needed to broadcast kobject changes in kobject_uevent.c for example... Netlink socket allocation + skb handling against call to cn_netlink_send(). >--=20 >Visit Openswan at http://www.openswan.org/ >Email: Herbert Xu ~{PmV>HI~} >Home Page: http://gondor.apana.org.au/~herbert/ >PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski --=-sXlTmFdVb1kJKT8aZHNH Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQBCUj+QIKTPhE+8wY0RAtcKAJ91ZXvgUr1gGOjGWtnLZRc6iQYeCwCfWLe/ hMplKPbqSYSR1MIMr/E38+E= =Xfba -----END PGP SIGNATURE----- --=-sXlTmFdVb1kJKT8aZHNH-- From nakam@linux-ipv6.org Tue Apr 5 00:35:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 00:36:02 -0700 (PDT) Received: from mail406.noc.n-bone.net (mail4.noc.n-bone.net [138.243.50.144]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j357ZvsU012410 for ; Tue, 5 Apr 2005 00:35:58 -0700 Received: from [192.168.2.173] (polaris.linux-ipv6.org [203.178.140.10]) by mail406.noc.n-bone.net (NBONE-MTA) with ESMTP id 7725E109D; Tue, 5 Apr 2005 16:35:48 +0900 (JST) Message-ID: <42523FD3.8010400@linux-ipv6.org> Date: Tue, 05 Apr 2005 16:35:47 +0900 From: Masahide NAKAMURA User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca Cc: "David S. Miller" , herbert@gondor.apana.org.au, kaber@trash.net, netdev Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> <1112654575.1089.17.camel@jzny.localdomain> In-Reply-To: <1112654575.1089.17.camel@jzny.localdomain> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1394 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello Jamal, jamal wrote: > On Mon, 2005-04-04 at 18:25, David S. Miller wrote: > > >>If you only take write_lock() from process context, only the write_lock()'s >>need BH disabling. read_lock() takers can then nest arbitrarily, BH or not. > > > Ok, never mind - Ive made the change. > As soon as Masahide tests i will post the final patch. I've tested normal cases below with the latest patch and it works fine. I think you can go ahead. tested cases: o netlink (using iproute2 "ip xfrm monitor" to confirm it) - add/del/flush/expire for SA/SP - acquire,allocspi,update for SA - update for SP o pfkey - running racoon o both sockets - running racoon with using "ip xfrm monitor". Regards, -- Masahide NAKAMURA From guillaume.thouvenin@bull.net Tue Apr 5 01:11:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 01:11:46 -0700 (PDT) Received: from ecfrec.frec.bull.fr (ecfrec.frec.bull.fr [129.183.4.8]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j358BYbd014731 for ; Tue, 5 Apr 2005 01:11:40 -0700 Received: from localhost (localhost [127.0.0.1]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id 987FE19D90B; Tue, 5 Apr 2005 10:11:23 +0200 (CEST) Received: from ecfrec.frec.bull.fr ([127.0.0.1]) by localhost (ecfrec.frec.bull.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 03244-04; Tue, 5 Apr 2005 10:11:21 +0200 (CEST) Received: from ecn002.frec.bull.fr (ecn002.frec.bull.fr [129.183.4.6]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id 54BC919D90A; Tue, 5 Apr 2005 10:11:21 +0200 (CEST) Received: from frecb000711.frec.bull.fr ([129.183.101.50]) by ecn002.frec.bull.fr (Lotus Domino Release 5.0.12) with ESMTP id 2005040510211477:2848 ; Tue, 5 Apr 2005 10:21:14 +0200 Subject: Re: Netlink Connector / CBUS From: Guillaume Thouvenin To: Herbert Xu Cc: lkml , Netlink List , "David S. Miller" , James Morris , rml@novell.com, Greg KH , Andrew Morton , Evgeniy Polyakov In-Reply-To: <1112686480.28858.17.camel@uganda> References: <1112686480.28858.17.camel@uganda> Date: Tue, 05 Apr 2005 10:11:23 +0200 Message-Id: <1112688683.8456.10.camel@frecb000711.frec.bull.fr> Mime-Version: 1.0 X-Mailer: Evolution 2.0.3 X-MIMETrack: Itemize by SMTP Server on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 05/04/2005 10:21:14, Serialize by Router on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 05/04/2005 10:21:15, Serialize complete at 05/04/2005 10:21:15 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: by amavisd-new at frec.bull.fr X-Virus-Status: Clean X-archive-position: 1395 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: guillaume.thouvenin@bull.net Precedence: bulk X-list: netdev On Tue, 2005-04-05 at 11:34 +0400, Evgeniy Polyakov wrote: > On Tue, 2005-04-05 at 01:10 -0400, Herbert Xu wrote: > > >In fact to this day I still don't understand what problems this thing is > >meant to solve. > > Hmm, what else can I add to my words? > May be checking the size of the code needed to broadcast kobject changes > in kobject_uevent.c for example... > Netlink socket allocation + skb handling against call to cn_netlink_send(). And It's the same for the fork connector. It allows to send a message to a user space application when a fork occurs by adding only one line (two with the #include) in the kernel/fork.c file. Thus, the netlink connector is a very simple and fast mechanism when you need to send a small amount of information from kernel space to user space. Regards, Guillaume From marcel@holtmann.org Tue Apr 5 02:35:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 02:35:53 -0700 (PDT) Received: from mail.holtmann.net (coyote.holtmann.net [217.160.111.169]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j359ZjaK025942 for ; Tue, 5 Apr 2005 02:35:46 -0700 Received: from pegasus (pD9FF9FF2.dip.t-dialin.net [217.255.159.242]) by mail.holtmann.net (8.12.3/8.12.3/Debian-7.1) with ESMTP id j359aDbo017297 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Tue, 5 Apr 2005 11:36:14 +0200 Subject: Some sleeping function called from invalid context From: Marcel Holtmann To: Network Development Mailing List Content-Type: text/plain Date: Tue, 05 Apr 2005 11:35:44 +0200 Message-Id: <1112693744.7960.2.camel@pegasus> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 16:38:58 2005 on coyote.holtmann.net X-Virus-Status: Clean X-archive-position: 1396 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: marcel@holtmann.org Precedence: bulk X-list: netdev Hi, while testing the latest kernel from the Bitkeeper repository, I got some sleeping functions called from invalid context: Freeing unused kernel memory: 180k freed Debug: sleeping function called from invalid context at mm/slab.c:2090 in_atomic():1, irqs_disabled():0 [] __might_sleep+0xa6/0xb0 [] kmem_cache_alloc+0x73/0x80 [] kmem_cache_create+0xfe/0x630 [] proto_register+0x9d/0xc0 [] af_unix_init+0x1c/0x7a [unix] [] sys_init_module+0x1b2/0x290 [] syscall_call+0x7/0xb NET: Registered protocol family 1 Debug: sleeping function called from invalid context at mm/slab.c:2090 in_atomic():1, irqs_disabled():0 [] __might_sleep+0xa6/0xb0 [] kmem_cache_alloc+0x73/0x80 [] kmem_cache_create+0xfe/0x630 [] wake_up_process+0x1d/0x30 [] free_uid+0x20/0x90 [] proto_register+0x9d/0xc0 [] inet6_init+0x19/0x200 [ipv6] [] sys_init_module+0x1b2/0x290 [] syscall_call+0x7/0xb NET: Registered protocol family 10 Regards Marcel From hadi@cyberus.ca Tue Apr 5 03:18:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:18:42 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35AIXqG000444 for ; Tue, 5 Apr 2005 03:18:34 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIl8g-0006Ze-53 for netdev@oss.sgi.com; Tue, 05 Apr 2005 06:18:30 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIl8a-00024T-OZ; Tue, 05 Apr 2005 06:18:25 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Masahide NAKAMURA Cc: "David S. Miller" , herbert@gondor.apana.org.au, kaber@trash.net, netdev In-Reply-To: <42523FD3.8010400@linux-ipv6.org> References: <1112406164.1088.54.camel@jzny.localdomain> <20050402014619.GB24861@gondor.apana.org.au> <1112469601.1088.173.camel@jzny.localdomain> <1112538718.1096.394.camel@jzny.localdomain> <20050404005805.GA16543@gondor.apana.org.au> <1112614706.1096.439.camel@jzny.localdomain> <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> <1112654575.1089.17.camel@jzny.localdomain> <42523FD3.8010400@linux-ipv6.org> Content-Type: multipart/mixed; boundary="=-8p91jRGJUs8TN3Xo5ByK" Organization: jamalopolous Message-Id: <1112696301.1089.30.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 06:18:22 -0400 X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1397 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-8p91jRGJUs8TN3Xo5ByK Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2005-04-05 at 03:35, Masahide NAKAMURA wrote: > Hello Jamal, [..] > I've tested normal cases below with the latest patch and it works fine. > I think you can go ahead. > > tested cases: > o netlink (using iproute2 "ip xfrm monitor" to confirm it) > - add/del/flush/expire for SA/SP > - acquire,allocspi,update for SA > - update for SP > o pfkey > - running racoon > o both sockets > - running racoon with using "ip xfrm monitor". > Thanks a lot Masahide! Ok, heres the patch i will shoot to Dave if no further comments. cheers, jamal --=-8p91jRGJUs8TN3Xo5ByK Content-Disposition: attachment; filename=ipsec-event-take2-5 Content-Type: text/plain; name=ipsec-event-take2-5; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 +++ b/include/net/xfrm.h 2005-04-02 11:59:17.000000000 -0500 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -802,7 +827,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-25 22:28:39.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-02 09:53:03.000000000 -0500 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-03-25 22:28:25.000000000 -0500 +++ b/net/xfrm/xfrm_state.c 2005-04-04 18:22:32.000000000 -0400 @@ -48,7 +48,7 @@ static struct list_head xfrm_state_gc_list = LIST_HEAD_INIT(xfrm_state_gc_list); static DEFINE_SPINLOCK(xfrm_state_gc_lock); -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); @@ -208,8 +208,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -236,14 +238,21 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +int xfrm_state_delete(struct xfrm_state *x) { + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -402,6 +411,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -767,34 +777,60 @@ static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); static DEFINE_RWLOCK(xfrm_km_lock); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct xfrm_mgr *km; - if (hard) - x->km.state = XFRM_STATE_EXPIRED; - else - x->km.dying = 1; + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock(&xfrm_km_lock); +} +void km_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); + if (km->notify) + km->notify(x, c); read_unlock(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +static void km_state_expired(struct xfrm_state *x, int hard) +{ + struct km_event c; + + if (hard) + x->km.state = XFRM_STATE_EXPIRED; + else + x->km.dying = 1; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -819,13 +855,12 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-04 18:36:44.000000000 -0400 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -277,6 +278,7 @@ if (!x) return err; + xfrm_state_hold(x); if (nlh->nlmsg_type == XFRM_MSG_NEWSA) err = xfrm_state_add(x); else @@ -285,14 +287,27 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + xfrm_state_put(x); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +319,19 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -335,6 +359,7 @@ int this_idx; }; + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -672,6 +697,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +709,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +720,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +847,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +876,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +890,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1113,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1130,122 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int inline xfrm_sa_len(struct xfrm_state *x) +{ + int l = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + if (x->aalg) + l+= RTA_SPACE(sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8); + if (x->ealg) + l+= RTA_SPACE(sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8); + if (x->calg) + l+= RTA_SPACE(sizeof(*(x->calg))); + if (x->encap) + l+= RTA_SPACE(sizeof(*x->encap)); + + return l; +} + +static int xfrm_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = xfrm_sa_len(x); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + if (x->aalg) + RTA_PUT(skb, XFRMA_ALG_AUTH, + sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8, x->aalg); + if (x->ealg) + RTA_PUT(skb, XFRMA_ALG_CRYPT, + sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8, x->ealg); + if (x->calg) + RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg); + + if (x->encap) + RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: +rtattr_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("netlink: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1379,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1391,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1399,93 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); + len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + if (copy_to_user_tmpl(xp, skb) < 0) + goto nlmsg_failure; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("Netlink Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-03-25 22:28:39.000000000 -0500 +++ b/net/key/af_key.c 2005-04-04 18:45:48.000000000 -0400 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1254,6 +1326,7 @@ if (IS_ERR(x)) return PTR_ERR(x); + xfrm_state_hold(x); if (hdr->sadb_msg_type == SADB_ADD) err = xfrm_state_add(x); else @@ -1265,27 +1338,23 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1370,19 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1520,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + hdr->sadb_msg_satype = pfkey_proto2satype(c->data); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1948,35 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + + if (c->data && c->event == XFRM_SAP_DELETED) + out_hdr->sadb_msg_type = SADB_X_SPDDELETE2; + else + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1984,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2052,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2084,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,25 +2120,41 @@ err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + + xfrm_pol_put(xp); + return err; +} + + +static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, struct sadb_msg *hdr, int dir) +{ + int err; + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + err = 0; + out_skb = pfkey_xfrm_policy2msg_prep(xp); if (IS_ERR(out_skb)) { err = PTR_ERR(out_skb); goto out; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + pfkey_xfrm_policy2msg(out_skb, xp, dir); out_hdr = (struct sadb_msg *) out_skb->data; out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; + out_hdr->sadb_msg_type = hdr->sadb_msg_type; out_hdr->sadb_msg_satype = 0; out_hdr->sadb_msg_errno = 0; out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk); err = 0; out: - xfrm_pol_put(xp); return err; } @@ -2037,8 +2163,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2175,16 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.data = 1; // to signal pfkey of SADB_X_SPDDELETE2 + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + } else { + err = key_pol_get_resp(sk, xp, hdr, pol->sadb_x_policy_dir-1); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; - -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2219,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2445,24 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2481,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3034,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-8p91jRGJUs8TN3Xo5ByK-- From herbert@gondor.apana.org.au Tue Apr 5 03:23:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:23:45 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35ANZt0001243 for ; Tue, 5 Apr 2005 03:23:36 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIlD3-0004mZ-00; Tue, 05 Apr 2005 20:23:01 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIlCh-0006PT-00; Tue, 05 Apr 2005 20:22:39 +1000 Date: Tue, 5 Apr 2005 20:22:39 +1000 To: jamal Cc: Masahide NAKAMURA , "David S. Miller" , kaber@trash.net, netdev Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events Message-ID: <20050405102238.GC23226@gondor.apana.org.au> References: <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> <1112654575.1089.17.camel@jzny.localdomain> <42523FD3.8010400@linux-ipv6.org> <1112696301.1089.30.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112696301.1089.30.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1398 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Tue, Apr 05, 2005 at 06:18:22AM -0400, jamal wrote: > > Ok, heres the patch i will shoot to Dave if no further comments. Thanks for your great work Jamal. Signed-off-by: Herbert Xu -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Tue Apr 5 03:25:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:25:54 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35APj8i002075 for ; Tue, 5 Apr 2005 03:25:45 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DIlFe-0002Tz-9N for netdev@oss.sgi.com; Tue, 05 Apr 2005 06:25:42 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIlFY-0002hw-9d; Tue, 05 Apr 2005 06:25:36 -0400 Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function From: jamal Reply-To: hadi@cyberus.ca To: Wang Jian Cc: netdev In-Reply-To: <20050405140342.024A.LARK@linux.net.cn> References: <20050405133336.0247.LARK@linux.net.cn> <20050404223744.1f04c130.davem@davemloft.net> <20050405140342.024A.LARK@linux.net.cn> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112696733.1088.33.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 06:25:34 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1399 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Wang, I read that thread and i am a little confused. What is this change supposed to improve? cheers, jamal On Tue, 2005-04-05 at 02:05, Wang Jian wrote: > Hi David S. Miller, > > New patch attached. Hashsize is 256, the same as old one. > > > On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" wrote: > > > On Tue, 05 Apr 2005 13:35:02 +0800 > > Wang Jian wrote: > > > > > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > > > > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > > > 251 is good too. > > > > Please us a power of two, the "%" is expensive on some cpus. > > From hadi@cyberus.ca Tue Apr 5 03:35:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:35:29 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35AZOBl003011 for ; Tue, 5 Apr 2005 03:35:24 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIlOz-0004mJ-1m for netdev@oss.sgi.com; Tue, 05 Apr 2005 06:35:21 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIlOu-0003Uc-VC; Tue, 05 Apr 2005 06:35:17 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Masahide NAKAMURA , "David S. Miller" , kaber@trash.net, netdev In-Reply-To: <20050405102238.GC23226@gondor.apana.org.au> References: <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> <1112654575.1089.17.camel@jzny.localdomain> <42523FD3.8010400@linux-ipv6.org> <1112696301.1089.30.camel@jzny.localdomain> <20050405102238.GC23226@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112697315.1095.36.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 06:35:15 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1400 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2005-04-05 at 06:22, Herbert Xu wrote: > > Thanks for your great work Jamal. > Well, thanks to you for the shepherding and to Masahide-san for the testing and bugs found. cheers, jamal From tgraf@suug.ch Tue Apr 5 03:38:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:38:13 -0700 (PDT) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35Ac8I0003711 for ; Tue, 5 Apr 2005 03:38:09 -0700 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id D1B6385; Tue, 5 Apr 2005 12:37:44 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 1D1091C0EA; Tue, 5 Apr 2005 12:38:27 +0200 (CEST) Date: Tue, 5 Apr 2005 12:38:27 +0200 From: Thomas Graf To: Wang Jian Cc: netdev@oss.sgi.com Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Message-ID: <20050405103827.GL26731@postel.suug.ch> References: <20050405133336.0247.LARK@linux.net.cn> <20050404223744.1f04c130.davem@davemloft.net> <20050405140342.024A.LARK@linux.net.cn> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050405140342.024A.LARK@linux.net.cn> X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1401 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05 > New patch attached. Hashsize is 256, the same as old one. Do you have any numbers that could prove that this change actually improves the hash distribution and thus the overall lookup performance? The most often used and thus most important range of mark values is definitely 0..255. I did not look into jhash but the risk of collisions definitely increases with this change which affects about 90% of the users of fw which could benefit from a collision free hashtable so far. I would appreciate if you could provide some numbers proving both the need and actual improvement of this change since fwmark is one of the most often used classifiers. Cheers From herbert@gondor.apana.org.au Tue Apr 5 03:40:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:40:34 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35AePYX004380 for ; Tue, 5 Apr 2005 03:40:26 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIlTC-0004wy-00; Tue, 05 Apr 2005 20:39:42 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DIlSo-0006Tl-00; Tue, 05 Apr 2005 20:39:18 +1000 Date: Tue, 5 Apr 2005 20:39:18 +1000 To: Patrick McHardy Cc: "David S. Miller" , kuznet@ms2.inr.ac.ru, jmorris@redhat.com, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [IPSEC]: Kill nested read lock by deleting xfrm_init_tempsel Message-ID: <20050405103918.GA24863@gondor.apana.org.au> References: <20050214221607.GC18465@gondor.apana.org.au> <424864CE.5060802@trash.net> <20050328233917.GB15369@gondor.apana.org.au> <424B40C2.90304@trash.net> <20050331004658.GA26395@gondor.apana.org.au> <20050331212325.5e996432.davem@davemloft.net> <20050402004956.GA24339@gondor.apana.org.au> <20050401172007.7296eced.davem@davemloft.net> <20050402020947.GA24998@gondor.apana.org.au> <42501E51.3000401@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42501E51.3000401@trash.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1402 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Sun, Apr 03, 2005 at 06:48:17PM +0200, Patrick McHardy wrote: > > Agreed. There is also a bug in my patch, tmpl->daddr can be 0 in which > case the daddr passed as an argument to xfrm_state_find() will be used. > My patch only checked tmpl->daddr, this patch fixes it. It also uses Why not just use daddr? It's always guaranteed to be correct. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Tue Apr 5 03:45:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 03:45:07 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35Aj2bC005150 for ; Tue, 5 Apr 2005 03:45:02 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DIlYI-0007I5-VX for netdev@oss.sgi.com; Tue, 05 Apr 2005 06:44:58 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIlYA-0004Ob-HL; Tue, 05 Apr 2005 06:44:50 -0400 Subject: Re: Netlink Connector / CBUS From: jamal Reply-To: hadi@cyberus.ca To: johnpol@2ka.mipt.ru Cc: Herbert Xu , linux-kernel@vger.kernel.org, netdev , "David S. Miller" , James Morris , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: <1112686480.28858.17.camel@uganda> References: <1112686480.28858.17.camel@uganda> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112697888.1089.44.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 06:44:48 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1403 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev To be fair to Evgeniy I am not against the Konnector idea. I think that it is a useful feature to have an easy to use messaging between kernel-kernel and kernel-userspace. The fact that he leveraged netlink instead of inventing things is a bonus. Having said that i have not seriously scrutinized the code - and i think the idea of this new thing hes tossing around called CBUS maybe pushing it. cheers, jamal On Tue, 2005-04-05 at 03:34, Evgeniy Polyakov wrote: > On Tue, 2005-04-05 at 01:10 -0400, Herbert Xu wrote: > >On Tue, Apr 05, 2005 at 11:03:16AM +0400, Evgeniy Polyakov wrote: > >> > >> I received comments and feature requests from Herbert Xu and Jamal Hadi > >> Salim, > >> almost all were successfully resolved. > > > >Please do not construe my involvement in these threads as endorsement > >for this system. > > Sure. > I remember you are against it :). > > >In fact to this day I still don't understand what problems this thing is > >meant to solve. > > Hmm, what else can I add to my words? > May be checking the size of the code needed to broadcast kobject changes > in kobject_uevent.c for example... > Netlink socket allocation + skb handling against call to cn_netlink_send(). > > >-- > >Visit Openswan at http://www.openswan.org/ > >Email: Herbert Xu ~{PmV>HI~} > >Home Page: http://gondor.apana.org.au/~herbert/ > >PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > From hadi@cyberus.ca Tue Apr 5 04:00:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 04:00:18 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35B0AbL006656 for ; Tue, 5 Apr 2005 04:00:10 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIlmw-0000Dq-M5 for netdev@oss.sgi.com; Tue, 05 Apr 2005 07:00:06 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIlmt-0005tS-IV; Tue, 05 Apr 2005 07:00:03 -0400 Subject: Re: Netlink Connector / CBUS From: jamal Reply-To: hadi@cyberus.ca To: johnpol@2ka.mipt.ru Cc: Herbert Xu , linux-kernel@vger.kernel.org, netdev , "David S. Miller" , James Morris , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: <1112697888.1089.44.camel@jzny.localdomain> References: <1112686480.28858.17.camel@uganda> <1112697888.1089.44.camel@jzny.localdomain> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112698800.1088.50.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 07:00:00 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/804/Mon Apr 4 07:38:58 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1404 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev and, oh yeah - wheres the documentation Evgeniy? ;-> cheers, jamal On Tue, 2005-04-05 at 06:44, jamal wrote: > To be fair to Evgeniy I am not against the Konnector idea. I think that > it is a useful feature to have an easy to use messaging between > kernel-kernel and kernel-userspace. The fact that he leveraged netlink > instead of inventing things is a bonus. Having said that i have not > seriously scrutinized the code - and i think the idea of this new thing > hes tossing around called CBUS maybe pushing it. > > cheers, > jamal > > On Tue, 2005-04-05 at 03:34, Evgeniy Polyakov wrote: > > On Tue, 2005-04-05 at 01:10 -0400, Herbert Xu wrote: > > >On Tue, Apr 05, 2005 at 11:03:16AM +0400, Evgeniy Polyakov wrote: > > >> > > >> I received comments and feature requests from Herbert Xu and Jamal Hadi > > >> Salim, > > >> almost all were successfully resolved. > > > > > >Please do not construe my involvement in these threads as endorsement > > >for this system. > > > > Sure. > > I remember you are against it :). > > > > >In fact to this day I still don't understand what problems this thing is > > >meant to solve. > > > > Hmm, what else can I add to my words? > > May be checking the size of the code needed to broadcast kobject changes > > in kobject_uevent.c for example... > > Netlink socket allocation + skb handling against call to cn_netlink_send(). > > > > >-- > > >Visit Openswan at http://www.openswan.org/ > > >Email: Herbert Xu ~{PmV>HI~} > > >Home Page: http://gondor.apana.org.au/~herbert/ > > >PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > > > > > From arnaldo.melo@gmail.com Tue Apr 5 04:13:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 04:14:15 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.199]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35BDv59008359 for ; Tue, 5 Apr 2005 04:13:57 -0700 Received: by wproxy.gmail.com with SMTP id 68so1874054wri for ; Tue, 05 Apr 2005 04:13:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=gw79SUFsSKIHkAWPOipy04BOGlVHOlpQ0VcpwMp7pBJ+/DZwXKfmCmi7fF3zH44RNNHtzrphbs1FPihkDpGzF76mcKYphrl3Zf/dd+evoOi8pwARwm6bm77U+NBF4gY5z0Mebew2GSnBAJJoEEkkEm9HosQCZgyowUHEB2cakvo= Received: by 10.54.32.33 with SMTP id f33mr791485wrf; Tue, 05 Apr 2005 04:13:51 -0700 (PDT) Received: by 10.54.72.15 with HTTP; Tue, 5 Apr 2005 04:13:51 -0700 (PDT) Message-ID: <39e6f6c70504050413666ea29d@mail.gmail.com> Date: Tue, 5 Apr 2005 08:13:51 -0300 From: Arnaldo Carvalho de Melo Reply-To: acme@conectiva.com.br To: Marcel Holtmann Subject: Re: Some sleeping function called from invalid context Cc: Network Development Mailing List In-Reply-To: <1112693744.7960.2.camel@pegasus> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <1112693744.7960.2.camel@pegasus> X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1405 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arnaldo.melo@gmail.com Precedence: bulk X-list: netdev On Apr 5, 2005 6:35 AM, Marcel Holtmann wrote: > Hi, > > while testing the latest kernel from the Bitkeeper repository, I got > some sleeping functions called from invalid context: > > Freeing unused kernel memory: 180k freed > Debug: sleeping function called from invalid context at mm/slab.c:2090 > in_atomic():1, irqs_disabled():0 > [] __might_sleep+0xa6/0xb0 > [] kmem_cache_alloc+0x73/0x80 > [] kmem_cache_create+0xfe/0x630 > [] proto_register+0x9d/0xc0 > [] af_unix_init+0x1c/0x7a [unix] > [] sys_init_module+0x1b2/0x290 > [] syscall_call+0x7/0xb > NET: Registered protocol family 1 Damn, thanks for reporting, looking at it now. - Arnaldo From johnpol@2ka.mipt.ru Tue Apr 5 04:20:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 04:20:27 -0700 (PDT) Received: from vocord.com (dea.vocord.ru [217.67.177.50]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35BKHik012593 for ; Tue, 5 Apr 2005 04:20:18 -0700 Received: from uganda.factory.vocord.ru (uganda.factory.vocord.ru [192.168.0.48]) by vocord.com (8.13.1/8.13.1) with ESMTP id j35BIZju004751; Tue, 5 Apr 2005 15:18:36 +0400 Subject: Re: Netlink Connector / CBUS From: Evgeniy Polyakov Reply-To: johnpol@2ka.mipt.ru To: hadi@cyberus.ca Cc: Herbert Xu , linux-kernel@vger.kernel.org, netdev , "David S. Miller" , James Morris , rml@novell.com, Greg KH , Andrew Morton In-Reply-To: <1112698800.1088.50.camel@jzny.localdomain> References: <1112686480.28858.17.camel@uganda> <1112697888.1089.44.camel@jzny.localdomain> <1112698800.1088.50.camel@jzny.localdomain> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-sSnCz6Q18Ygv+8vLJ7Uc" Organization: MIPT Date: Tue, 05 Apr 2005 15:25:22 +0400 Message-Id: <1112700322.28858.42.camel@uganda> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-2) X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Scanned: ClamAV 0.80/762/Mon Mar 14 02:35:33 2005 clamav-milter version 0.80j on dea.vocord.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (vocord.com [192.168.0.1]); Tue, 05 Apr 2005 15:18:39 +0400 (MSD) X-archive-position: 1406 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev --=-sSnCz6Q18Ygv+8vLJ7Uc Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, 2005-04-05 at 07:00 -0400, jamal wrote: > and, oh yeah - wheres the documentation Evgeniy? ;-> In the tree :) Documentation/connector/connector.txt - some notes, API. Documentation/connector/cn_test.c - kernel example. Uses cn_netlink_send(), notification feature. I will send today a pathc that adds in-source documentation bits with some code cleanups. > cheers, > jamal >=20 > On Tue, 2005-04-05 at 06:44, jamal wrote: > > To be fair to Evgeniy I am not against the Konnector idea. I think that > > it is a useful feature to have an easy to use messaging between > > kernel-kernel and kernel-userspace. The fact that he leveraged netlink > > instead of inventing things is a bonus. Having said that i have not > > seriously scrutinized the code - and i think the idea of this new thing > > hes tossing around called CBUS maybe pushing it. > >=20 > > cheers, > > jamal > >=20 > > On Tue, 2005-04-05 at 03:34, Evgeniy Polyakov wrote: > > > On Tue, 2005-04-05 at 01:10 -0400, Herbert Xu wrote: > > > >On Tue, Apr 05, 2005 at 11:03:16AM +0400, Evgeniy Polyakov wrote: > > > >>=20 > > > >> I received comments and feature requests from Herbert Xu and Jamal= Hadi > > > >> Salim, > > > >> almost all were successfully resolved. > > > > > > > >Please do not construe my involvement in these threads as endorsemen= t > > > >for this system. > > >=20 > > > Sure. > > > I remember you are against it :). > > >=20 > > > >In fact to this day I still don't understand what problems this thin= g is > > > >meant to solve. > > >=20 > > > Hmm, what else can I add to my words? > > > May be checking the size of the code needed to broadcast kobject chan= ges > > > in kobject_uevent.c for example... > > > Netlink socket allocation + skb handling against call to cn_netlink_s= end(). > > >=20 > > > >--=20 > > > >Visit Openswan at http://www.openswan.org/ > > > >Email: Herbert Xu ~{PmV>HI~} > > > >Home Page: http://gondor.apana.org.au/~herbert/ > > > >PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > > >=20 > >=20 > >=20 > >=20 --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski --=-sSnCz6Q18Ygv+8vLJ7Uc Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQBCUnWiIKTPhE+8wY0RAlXXAJ9cqMiWKTv+jyUGIgqYjppnwYvvlACfYXx7 uJSA7Zm+fMplyqvjC2bt38w= =f8I/ -----END PGP SIGNATURE----- --=-sSnCz6Q18Ygv+8vLJ7Uc-- From lark@linux.net.cn Tue Apr 5 04:26:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 04:26:07 -0700 (PDT) Received: from mx.linux.net.cn ([211.100.11.220]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35BPthO013417 for ; Tue, 5 Apr 2005 04:25:59 -0700 Received: from localhost (master.linux.net.cn [127.0.0.1]) by mx.linux.net.cn (Postfix) with ESMTP id 8B7B63EE49; Tue, 5 Apr 2005 19:25:50 +0800 (CST) Received: from mx.linux.net.cn ([127.0.0.1]) by localhost (master.linux.net.cn [127.0.0.1]) (amavisd-new, port 10025) with LMTP id 26513-02-2; Tue, 5 Apr 2005 19:25:46 +0800 (CST) Received: from [192.168.0.120] (unknown [61.51.151.86]) by mx.linux.net.cn (Postfix) with ESMTP id D78473EE29; Tue, 5 Apr 2005 19:25:45 +0800 (CST) Date: Tue, 05 Apr 2005 19:25:45 +0800 From: Wang Jian To: Thomas Graf Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Cc: netdev@oss.sgi.com, jamal In-Reply-To: <20050405103827.GL26731@postel.suug.ch> References: <20050405140342.024A.LARK@linux.net.cn> <20050405103827.GL26731@postel.suug.ch> Message-Id: <20050405190024.024D.LARK@linux.net.cn> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Mailer: Becky! ver. 2.20 [CN] X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Scanned: amavisd-new at linux.net.cn X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j35BPthO013417 X-archive-position: 1407 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lark@linux.net.cn Precedence: bulk X-list: netdev Hi Thomas Graf, If you read the thread I pointed to, then you know there is chance that nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16 bit number can be mapped to a classid. This is one of many chances. In that case, nfmark can be used like this 0x00010000 0x00020000 0x00030000 ... 0x00000001 0x00000002 0x00000003 ... The old hash function doesn't expect such pattern. I must admit that I am not very familiar with hash function. I find that and use a quick hack. My patch just points out the existing risk. Anyone can improve this by using a faster and even distributed hash function. And actually, for 256 as hash size, the second patch I sent can be still improved, return (jhash_1word(handle, 0xF30A7129) & 0xFF); instead of return (jhash_1word(handle, 0xF30A7129) % 256); On Tue, 5 Apr 2005 12:38:27 +0200, Thomas Graf wrote: > * Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05 > > New patch attached. Hashsize is 256, the same as old one. > > Do you have any numbers that could prove that this change > actually improves the hash distribution and thus the overall > lookup performance? > > The most often used and thus most important range of mark > values is definitely 0..255. I did not look into jhash > but the risk of collisions definitely increases with this > change which affects about 90% of the users of fw which > could benefit from a collision free hashtable so far. > > I would appreciate if you could provide some numbers proving > both the need and actual improvement of this change since > fwmark is one of the most often used classifiers. > > Cheers -- lark From hadi@cyberus.ca Tue Apr 5 04:58:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 04:58:42 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35BwYmd015757 for ; Tue, 5 Apr 2005 04:58:34 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DImhQ-0007Tf-Gi for netdev@oss.sgi.com; Tue, 05 Apr 2005 05:58:28 -0600 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DImhO-0004qp-9J; Tue, 05 Apr 2005 07:58:26 -0400 Subject: Re: take 2-2 WAS(Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Masahide NAKAMURA , "David S. Miller" , kaber@trash.net, netdev In-Reply-To: <20050405102238.GC23226@gondor.apana.org.au> References: <20050404121641.GA12103@gondor.apana.org.au> <1112619096.1088.473.camel@jzny.localdomain> <20050404130224.GA12546@gondor.apana.org.au> <1112620614.1088.489.camel@jzny.localdomain> <20050404213149.GA15222@gondor.apana.org.au> <1112653217.1088.2.camel@jzny.localdomain> <20050404152506.15e1404b.davem@davemloft.net> <1112654575.1089.17.camel@jzny.localdomain> <42523FD3.8010400@linux-ipv6.org> <1112696301.1089.30.camel@jzny.localdomain> <20050405102238.GC23226@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112702303.1095.107.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 07:58:23 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1408 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2005-04-05 at 06:22, Herbert Xu wrote: > Signed-off-by: Herbert Xu I have a feeling that Dave is not following this thread. All along we have been testing against 2.6.11.6; I just tested against -rc2 and found the patch applies with some fuzz. I fixed that as well as a couple of error messages Masahide didnt like. So please signoff instead the next patch i post in a new thread. cheers, jamal From hadi@cyberus.ca Tue Apr 5 05:03:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:03:40 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35C3W6k016689 for ; Tue, 5 Apr 2005 05:03:32 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DImmH-0008E0-TJ for netdev@oss.sgi.com; Tue, 05 Apr 2005 08:03:29 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DImmE-0005Ln-Dm; Tue, 05 Apr 2005 08:03:26 -0400 Subject: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: Herbert Xu , Masahide NAKAMURA , kaber@trash.net, netdev Content-Type: multipart/mixed; boundary="=-/PvpckXwtyeuSTLx4Mw9" Organization: jamalopolous Message-Id: <1112702604.1089.119.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 08:03:24 -0400 X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1409 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-/PvpckXwtyeuSTLx4Mw9 Content-Type: text/plain Content-Transfer-Encoding: 7bit Dave, Heres the final patch. What this patch provides - netlink xfrm events - ability to have events generated by netlink propagated to pfkey and vice versa. - fixes the acquire lets-be-happy-with-one-success issue cheers, jamal --=-/PvpckXwtyeuSTLx4Mw9 Content-Disposition: attachment; filename=ipsec-event-take2-6 Content-Type: text/plain; name=ipsec-event-take2-6; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- a/include/net/xfrm.h 2005-04-05 07:19:11.000000000 -0400 +++ b/include/net/xfrm.h 2005-04-05 07:29:00.000000000 -0400 @@ -157,6 +157,28 @@ XFRM_STATE_DEAD }; +/* events that could be sent by kernel */ +enum { + XFRM_SAP_INVALID, + XFRM_SAP_EXPIRED, + XFRM_SAP_ADDED, + XFRM_SAP_UPDATED, + XFRM_SAP_DELETED, + XFRM_SAP_FLUSHED, + __XFRM_SAP_MAX +}; +#define XFRM_SAP_MAX (__XFRM_SAP_MAX - 1) + +/* callback structure passed from either netlink or pfkey */ +struct km_event +{ + u32 data; + u32 seq; + u32 pid; + u32 event; +}; + + struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { @@ -178,6 +200,9 @@ extern int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo); extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); +extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); +extern void km_state_notify(struct xfrm_state *x, struct km_event *c); + #define XFRM_ACQ_EXPIRES 30 @@ -283,17 +308,17 @@ struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 struct xfrm_mgr { struct list_head list; char *id; - int (*notify)(struct xfrm_state *x, int event); + int (*notify)(struct xfrm_state *x, struct km_event *c); int (*acquire)(struct xfrm_state *x, struct xfrm_tmpl *, struct xfrm_policy *xp, int dir); struct xfrm_policy *(*compile_policy)(u16 family, int opt, u8 *data, int len, int *dir); int (*new_mapping)(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); - int (*notify_policy)(struct xfrm_policy *x, int dir, int event); + int (*notify_policy)(struct xfrm_policy *x, int dir, struct km_event *c); }; extern int xfrm_register_km(struct xfrm_mgr *km); @@ -805,7 +830,7 @@ extern int xfrm_state_update(struct xfrm_state *x); extern struct xfrm_state *xfrm_state_lookup(xfrm_address_t *daddr, u32 spi, u8 proto, unsigned short family); extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); -extern void xfrm_state_delete(struct xfrm_state *x); +extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); --- a/include/linux/xfrm.h 2005-03-02 02:38:37.000000000 -0500 +++ b/include/linux/xfrm.h 2005-04-05 07:29:00.000000000 -0400 @@ -254,5 +254,7 @@ #define XFRMGRP_ACQUIRE 1 #define XFRMGRP_EXPIRE 2 +#define XFRMGRP_SA 4 +#define XFRMGRP_POLICY 8 #endif /* _LINUX_XFRM_H */ --- a/net/xfrm/xfrm_state.c 2005-04-05 07:19:30.000000000 -0400 +++ b/net/xfrm/xfrm_state.c 2005-04-05 07:29:00.000000000 -0400 @@ -50,7 +50,7 @@ static int xfrm_state_gc_flush_bundles; -static void __xfrm_state_delete(struct xfrm_state *x); +static int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); @@ -215,8 +215,10 @@ } EXPORT_SYMBOL(__xfrm_state_destroy); -static void __xfrm_state_delete(struct xfrm_state *x) +static int __xfrm_state_delete(struct xfrm_state *x) { + int err = -ESRCH; + if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; spin_lock(&xfrm_state_lock); @@ -245,14 +247,21 @@ * is what we are dropping here. */ atomic_dec(&x->refcnt); + err = 0; } + + return err; } -void xfrm_state_delete(struct xfrm_state *x) +int xfrm_state_delete(struct xfrm_state *x) { + int err; + spin_lock_bh(&x->lock); - __xfrm_state_delete(x); + err = __xfrm_state_delete(x); spin_unlock_bh(&x->lock); + + return err; } EXPORT_SYMBOL(xfrm_state_delete); @@ -430,6 +439,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(u32 seq); + int xfrm_state_add(struct xfrm_state *x) { struct xfrm_state_afinfo *afinfo; @@ -795,34 +805,60 @@ static struct list_head xfrm_km_list = LIST_HEAD_INIT(xfrm_km_list); static DEFINE_RWLOCK(xfrm_km_lock); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct xfrm_mgr *km; - if (hard) - x->km.state = XFRM_STATE_EXPIRED; - else - x->km.dying = 1; + read_lock(&xfrm_km_lock); + list_for_each_entry(km, &xfrm_km_list, list) + if (km->notify_policy) + km->notify_policy(xp, dir, c); + read_unlock(&xfrm_km_lock); +} +void km_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) - km->notify(x, hard); + if (km->notify) + km->notify(x, c); read_unlock(&xfrm_km_lock); +} + +EXPORT_SYMBOL(km_policy_notify); +EXPORT_SYMBOL(km_state_notify); + +static void km_state_expired(struct xfrm_state *x, int hard) +{ + struct km_event c; + + if (hard) + x->km.state = XFRM_STATE_EXPIRED; + else + x->km.dying = 1; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_state_notify(x, &c); if (hard) wake_up(&km_waitq); } +/* + * We send to all registered managers regardless of failure + * We are happy with one success +*/ static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { - int err = -EINVAL; + int err = -EINVAL, acqret; struct xfrm_mgr *km; read_lock(&xfrm_km_lock); list_for_each_entry(km, &xfrm_km_list, list) { - err = km->acquire(x, t, pol, XFRM_POLICY_OUT); - if (!err) - break; + acqret = km->acquire(x, t, pol, XFRM_POLICY_OUT); + if (!acqret) + err = acqret; } read_unlock(&xfrm_km_lock); return err; @@ -847,13 +883,12 @@ void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) { - struct xfrm_mgr *km; + struct km_event c; - read_lock(&xfrm_km_lock); - list_for_each_entry(km, &xfrm_km_list, list) - if (km->notify_policy) - km->notify_policy(pol, dir, hard); - read_unlock(&xfrm_km_lock); + c.data = hard; + c.data = hard; + c.event = XFRM_SAP_EXPIRED; + km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); --- a/net/xfrm/xfrm_user.c 2005-03-02 02:38:10.000000000 -0500 +++ b/net/xfrm/xfrm_user.c 2005-04-05 07:47:45.000000000 -0400 @@ -268,6 +268,7 @@ struct xfrm_usersa_info *p = NLMSG_DATA(nlh); struct xfrm_state *x; int err; + struct km_event c; err = verify_newsa_info(p, (struct rtattr **) xfrma); if (err) @@ -277,6 +278,7 @@ if (!x) return err; + xfrm_state_hold(x); if (nlh->nlmsg_type == XFRM_MSG_NEWSA) err = xfrm_state_add(x); else @@ -285,14 +287,27 @@ if (err < 0) { x->km.state = XFRM_STATE_DEAD; xfrm_state_put(x); + return err; } + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + if (nlh->nlmsg_type == XFRM_MSG_NEWSA) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + + km_state_notify(x, &c); + xfrm_state_put(x); + return err; } static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { struct xfrm_state *x; + int err; + struct km_event c; struct xfrm_usersa_id *p = NLMSG_DATA(nlh); x = xfrm_state_lookup(&p->daddr, p->spi, p->proto, p->family); @@ -304,10 +319,19 @@ return -EPERM; } - xfrm_state_delete(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); xfrm_state_put(x); - return 0; + return err; } static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) @@ -335,6 +359,7 @@ int this_idx; }; + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -672,6 +697,7 @@ { struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); struct xfrm_policy *xp; + struct km_event c; int err; int excl; @@ -683,6 +709,10 @@ if (!xp) return err; + /* shouldnt excl be based on nlh flags?? + * Aha! this is anti-netlink really i.e more pfkey derived + * in netlink excl is a flag and you wouldnt need + * a type XFRM_MSG_UPDPOLICY - JHS */ excl = nlh->nlmsg_type == XFRM_MSG_NEWPOLICY; err = xfrm_policy_insert(p->dir, xp, excl); if (err) { @@ -690,6 +720,16 @@ return err; } + + if (!excl) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; + + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); + xfrm_pol_put(xp); return 0; @@ -807,8 +847,10 @@ struct xfrm_policy *xp; struct xfrm_userpolicy_id *p; int err; + struct km_event c; int delete; + p = NLMSG_DATA(nlh); delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY; @@ -834,6 +876,11 @@ NETLINK_CB(skb).pid, MSG_DONTWAIT); } + } else { + c.event = XFRM_SAP_DELETED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(xp, p->dir, &c); } xfrm_pol_put(xp); @@ -843,15 +890,28 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); xfrm_state_flush(p->proto); + c.data = p->proto; + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_state_notify(NULL, &c); + return 0; } static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct km_event c; + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + km_policy_notify(NULL, 0, &c); return 0; } @@ -1053,10 +1113,11 @@ return -1; } -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) +static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) { struct sk_buff *skb; - + int hard = c ->data; + /* fix to do alloc using NLM macros */ skb = alloc_skb(sizeof(struct xfrm_user_expire) + 16, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1069,6 +1130,122 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_sa_flush(struct km_event *c) +{ + struct xfrm_usersa_flush *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, + XFRM_MSG_FLUSHSA, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + p->proto = c->data; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int inline xfrm_sa_len(struct xfrm_state *x) +{ + int l = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info)); + if (x->aalg) + l+= RTA_SPACE(sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8); + if (x->ealg) + l+= RTA_SPACE(sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8); + if (x->calg) + l+= RTA_SPACE(sizeof(*(x->calg))); + if (x->encap) + l+= RTA_SPACE(sizeof(*x->encap)); + + return l; +} + +static int xfrm_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_usersa_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt; + unsigned char *b; + int len = xfrm_sa_len(x); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWSA; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDSA; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELSA; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + copy_to_user_state(x, p); + + if (x->aalg) + RTA_PUT(skb, XFRMA_ALG_AUTH, + sizeof(*(x->aalg))+(x->aalg->alg_key_len+7)/8, x->aalg); + if (x->ealg) + RTA_PUT(skb, XFRMA_ALG_CRYPT, + sizeof(*(x->ealg))+(x->ealg->alg_key_len+7)/8, x->ealg); + if (x->calg) + RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg); + + if (x->encap) + RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_SA, GFP_ATOMIC); + +nlmsg_failure: +rtattr_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_state_notify(struct xfrm_state *x, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_EXPIRED: + return xfrm_exp_state_notify(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_ADDED: + return xfrm_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_sa_flush(c); + default: + printk("xfrm_user: Unknown SA event %d\n",c->event); + break; + } + + return 0; + +} + static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, struct xfrm_tmpl *xt, struct xfrm_policy *xp, int dir) @@ -1202,7 +1379,8 @@ return -1; } -static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, int hard) + +static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) { struct sk_buff *skb; size_t len; @@ -1213,7 +1391,7 @@ if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, hard) < 0) + if (build_polexpire(skb, xp, dir, c->data) < 0) BUG(); NETLINK_CB(skb).dst_groups = XFRMGRP_EXPIRE; @@ -1221,6 +1399,93 @@ return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct xfrm_userpolicy_info *p; + struct nlmsghdr *nlh; + struct sk_buff *skb; + u32 nlt = 0 ; + unsigned char *b; + int len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); + len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_info)); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + if (c->event == XFRM_SAP_ADDED) + nlt = XFRM_MSG_NEWPOLICY; + else if (c->event == XFRM_SAP_UPDATED) + nlt = XFRM_MSG_UPDPOLICY; + else if (c->event == XFRM_SAP_DELETED) + nlt = XFRM_MSG_DELPOLICY; + else + goto nlmsg_failure; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, nlt, sizeof(*p)); + + p = NLMSG_DATA(nlh); + + nlh->nlmsg_flags = 0; + + copy_to_user_policy(xp, p, dir); + if (copy_to_user_tmpl(xp, skb) < 0) + goto nlmsg_failure; + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_notify_policy_flush(struct km_event *c) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned char *b; + int len = NLMSG_LENGTH(0); + + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + b = skb->tail; + + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0); + + nlh->nlmsg_len = skb->tail - b; + + return netlink_broadcast(xfrm_nl, skb, 0, XFRMGRP_POLICY, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -1; +} + +static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + + switch (c->event) { + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + case XFRM_SAP_DELETED: + return xfrm_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return xfrm_notify_policy_flush(c); + case XFRM_SAP_EXPIRED: + return xfrm_exp_policy_notify(xp, dir, c); + default: + printk("xfrm_user: Unknown Policy event %d\n",c->event); + } + + return 0; + +} + static struct xfrm_mgr netlink_mgr = { .id = "netlink", .notify = xfrm_send_state_notify, --- a/net/key/af_key.c 2005-04-05 07:19:26.000000000 -0400 +++ b/net/key/af_key.c 2005-04-05 07:48:31.000000000 -0400 @@ -1240,13 +1240,85 @@ return 0; } +static inline int event2poltype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_X_SPDDELETE; + case XFRM_SAP_ADDED: + return SADB_X_SPDADD; + case XFRM_SAP_UPDATED: + return SADB_X_SPDUPDATE; + case XFRM_SAP_EXPIRED: + // return SADB_X_SPDEXPIRE; + default: + printk("pfkey: Unknown policy event %d\n",event); + break; + } + + return 0; +} + +static inline int event2keytype (int event) +{ + switch (event) { + case XFRM_SAP_DELETED: + return SADB_DELETE; + case XFRM_SAP_ADDED: + return SADB_ADD; + case XFRM_SAP_UPDATED: + return SADB_UPDATE; + case XFRM_SAP_EXPIRED: + return SADB_EXPIRE; + default: + printk("pfkey: Unknown SA event %d\n",event); + break; + } + + return 0; +} + +/* ADD/UPD/DEL */ +static int key_notify_sa(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + int hsc = 3; + + if (c->event == XFRM_SAP_DELETED) + hsc = 0; + + if (c->event == XFRM_SAP_EXPIRED) { + if (c->data) + hsc = 2; + else + hsc = 1; + } + + skb = pfkey_xfrm_state2msg(x, 0, hsc); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + hdr = (struct sadb_msg *) skb->data; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_type = event2keytype(c->event); + hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); + hdr->sadb_msg_errno = 0; + hdr->sadb_msg_reserved = 0; + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_state *x; int err; + struct km_event c; xfrm_probe_algs(); @@ -1254,6 +1326,7 @@ if (IS_ERR(x)) return PTR_ERR(x); + xfrm_state_hold(x); if (hdr->sadb_msg_type == SADB_ADD) err = xfrm_state_add(x); else @@ -1265,27 +1338,23 @@ return err; } - out_skb = pfkey_xfrm_state2msg(x, 0, 3); - if (IS_ERR(out_skb)) - return PTR_ERR(out_skb); /* XXX Should we return 0 here ? */ - - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = pfkey_proto2satype(x->id.proto); - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_reserved = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + if (hdr->sadb_msg_type == SADB_ADD) + c.event = XFRM_SAP_ADDED; + else + c.event = XFRM_SAP_UPDATED; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_delete(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { struct xfrm_state *x; + struct km_event c; + int err; if (!ext_hdrs[SADB_EXT_SA-1] || !present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], @@ -1301,13 +1370,19 @@ return -EPERM; } - xfrm_state_delete(x); - xfrm_state_put(x); + err = xfrm_state_delete(x); + if (err < 0) { + xfrm_state_put(x); + return err; + } - pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_ALL, sk); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_state_notify(x, &c); + xfrm_state_put(x); - return 0; + return err; } static int pfkey_get(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) @@ -1445,28 +1520,42 @@ return 0; } +static int key_notify_sa_flush(struct km_event *c) +{ + struct sk_buff *skb; + struct sadb_msg *hdr; + + skb = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); + if (!skb) + return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb, sizeof(struct sadb_msg)); + hdr->sadb_msg_satype = pfkey_proto2satype(c->data); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + + pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL); + + return 0; +} + static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { unsigned proto; - struct sk_buff *skb_out; - struct sadb_msg *hdr_out; + struct km_event c; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) return -EINVAL; - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); - if (!skb_out) - return -ENOBUFS; - xfrm_state_flush(proto); - - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); + c.data = proto; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_FLUSHED; + km_state_notify(NULL, &c); return 0; } @@ -1859,6 +1948,35 @@ hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } +static int key_notify_policy( struct xfrm_policy *xp, int dir, struct km_event *c) +{ + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + int err; + + out_skb = pfkey_xfrm_policy2msg_prep(xp); + if (IS_ERR(out_skb)) { + err = PTR_ERR(out_skb); + goto out; + } + pfkey_xfrm_policy2msg(out_skb, xp, dir); + + out_hdr = (struct sadb_msg *) out_skb->data; + out_hdr->sadb_msg_version = PF_KEY_V2; + + if (c->data && c->event == XFRM_SAP_DELETED) + out_hdr->sadb_msg_type = SADB_X_SPDDELETE2; + else + out_hdr->sadb_msg_type = event2poltype(c->event); + out_hdr->sadb_msg_errno = 0; + out_hdr->sadb_msg_seq = c->seq; + out_hdr->sadb_msg_pid = c->pid; + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL); +out: + return 0; + +} + static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { int err; @@ -1866,8 +1984,7 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1935,31 +2052,25 @@ (err = parse_ipsecrequests(xp, pol)) < 0) goto out; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; - } err = xfrm_policy_insert(pol->sadb_x_policy_dir-1, xp, hdr->sadb_msg_type != SADB_X_SPDUPDATE); + if (err) { - kfree_skb(out_skb); - goto out; + kfree(xp); + return err; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + if (hdr->sadb_msg_type == SADB_X_SPDUPDATE) + c.event = XFRM_SAP_UPDATED; + else + c.event = XFRM_SAP_ADDED; - xfrm_pol_put(xp); + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + xfrm_pol_put(xp); return 0; out: @@ -1973,9 +2084,8 @@ struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct km_event c; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2010,25 +2120,41 @@ err = 0; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + + xfrm_pol_put(xp); + return err; +} + + +static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, struct sadb_msg *hdr, int dir) +{ + int err; + struct sk_buff *out_skb; + struct sadb_msg *out_hdr; + err = 0; + out_skb = pfkey_xfrm_policy2msg_prep(xp); if (IS_ERR(out_skb)) { err = PTR_ERR(out_skb); goto out; } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); + pfkey_xfrm_policy2msg(out_skb, xp, dir); out_hdr = (struct sadb_msg *) out_skb->data; out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = SADB_X_SPDDELETE; + out_hdr->sadb_msg_type = hdr->sadb_msg_type; out_hdr->sadb_msg_satype = 0; out_hdr->sadb_msg_errno = 0; out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); + pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk); err = 0; out: - xfrm_pol_put(xp); return err; } @@ -2037,8 +2163,7 @@ int err; struct sadb_x_policy *pol; struct xfrm_policy *xp; - struct sk_buff *out_skb; - struct sadb_msg *out_hdr; + struct km_event c; if ((pol = ext_hdrs[SADB_X_EXT_POLICY-1]) == NULL) return -EINVAL; @@ -2050,24 +2175,16 @@ err = 0; - out_skb = pfkey_xfrm_policy2msg_prep(xp); - if (IS_ERR(out_skb)) { - err = PTR_ERR(out_skb); - goto out; + c.seq = hdr->sadb_msg_seq; + c.pid = hdr->sadb_msg_pid; + if (hdr->sadb_msg_type == SADB_X_SPDDELETE2) { + c.data = 1; // to signal pfkey of SADB_X_SPDDELETE2 + c.event = XFRM_SAP_DELETED; + km_policy_notify(xp, pol->sadb_x_policy_dir-1, &c); + } else { + err = key_pol_get_resp(sk, xp, hdr, pol->sadb_x_policy_dir-1); } - pfkey_xfrm_policy2msg(out_skb, xp, pol->sadb_x_policy_dir-1); - out_hdr = (struct sadb_msg *) out_skb->data; - out_hdr->sadb_msg_version = hdr->sadb_msg_version; - out_hdr->sadb_msg_type = hdr->sadb_msg_type; - out_hdr->sadb_msg_satype = 0; - out_hdr->sadb_msg_errno = 0; - out_hdr->sadb_msg_seq = hdr->sadb_msg_seq; - out_hdr->sadb_msg_pid = hdr->sadb_msg_pid; - pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, sk); - err = 0; - -out: xfrm_pol_put(xp); return err; } @@ -2102,22 +2219,33 @@ return xfrm_policy_walk(dump_sp, &data); } -static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +static int key_notify_policy_flush(struct km_event *c) { struct sk_buff *skb_out; - struct sadb_msg *hdr_out; - - skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_KERNEL); + struct sadb_msg *hdr; + skb_out = alloc_skb(sizeof(struct sadb_msg) + 16, GFP_ATOMIC); if (!skb_out) return -ENOBUFS; + hdr = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); + hdr->sadb_msg_seq = c->seq; + hdr->sadb_msg_pid = c->pid; + hdr->sadb_msg_version = PF_KEY_V2; + hdr->sadb_msg_errno = (uint8_t) 0; + hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL); + return 0; - xfrm_policy_flush(); +} - hdr_out = (struct sadb_msg *) skb_put(skb_out, sizeof(struct sadb_msg)); - pfkey_hdr_dup(hdr_out, hdr); - hdr_out->sadb_msg_errno = (uint8_t) 0; - hdr_out->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); - pfkey_broadcast(skb_out, GFP_KERNEL, BROADCAST_ALL, NULL); +static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) +{ + struct km_event c; + + xfrm_policy_flush(); + c.event = XFRM_SAP_FLUSHED; + c.pid = hdr->sadb_msg_pid; + c.seq = hdr->sadb_msg_seq; + km_policy_notify(NULL, 0, &c); return 0; } @@ -2317,11 +2445,24 @@ } } -static int pfkey_send_notify(struct xfrm_state *x, int hard) +/* XXX: Noisy for now */ +static int key_notify_policy_expire(struct xfrm_policy *xp, struct km_event *c) +{ + return 0; +} + +static int key_notify_sa_expire(struct xfrm_state *x, struct km_event *c) { struct sk_buff *out_skb; struct sadb_msg *out_hdr; - int hsc = (hard ? 2 : 1); + int hard; + int hsc; + + hard = c->data; + if (hard) + hsc = 2; + else + hsc = 1; out_skb = pfkey_xfrm_state2msg(x, 0, hsc); if (IS_ERR(out_skb)) @@ -2340,6 +2481,43 @@ return 0; } +static int pfkey_send_notify(struct xfrm_state *x, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_sa_expire(x, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_sa(x, c); + case XFRM_SAP_FLUSHED: + return key_notify_sa_flush(c); + default: + printk("pfkey: Unknown SA event %d\n",c->event); + break; + } + + return 0; +} + +static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c) +{ + switch (c->event) { + case XFRM_SAP_EXPIRED: + return key_notify_policy_expire(xp, c); + case XFRM_SAP_DELETED: + case XFRM_SAP_ADDED: + case XFRM_SAP_UPDATED: + return key_notify_policy(xp, dir, c); + case XFRM_SAP_FLUSHED: + return key_notify_policy_flush(c); + default: + printk("pfkey: Unknown policy event %d\n",c->event); + break; + } + + return 0; +} static u32 get_acqseq(void) { u32 res; @@ -2856,6 +3034,7 @@ .acquire = pfkey_send_acquire, .compile_policy = pfkey_compile_policy, .new_mapping = pfkey_send_new_mapping, + .notify_policy = pfkey_send_policy_notify, }; static void __exit ipsec_pfkey_exit(void) --=-/PvpckXwtyeuSTLx4Mw9-- From herbert@gondor.apana.org.au Tue Apr 5 05:08:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:08:45 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35C8YCq017502 for ; Tue, 5 Apr 2005 05:08:37 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DImqj-0005W8-00; Tue, 05 Apr 2005 22:08:05 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DImq5-0006bF-00; Tue, 05 Apr 2005 22:07:25 +1000 Date: Tue, 5 Apr 2005 22:07:24 +1000 To: jamal Cc: "David S. Miller" , Masahide NAKAMURA , kaber@trash.net, netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050405120724.GA25359@gondor.apana.org.au> References: <1112702604.1089.119.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112702604.1089.119.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1410 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Tue, Apr 05, 2005 at 08:03:24AM -0400, jamal wrote: > > Heres the final patch. > What this patch provides > > - netlink xfrm events > - ability to have events generated by netlink propagated to pfkey > and vice versa. > - fixes the acquire lets-be-happy-with-one-success issue Jamal you forgot to sign off your own patch :) Anyway this looks good to me. Signed-off-by: Herbert Xu -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From tgraf@suug.ch Tue Apr 5 05:15:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:15:52 -0700 (PDT) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35CFkBG018601 for ; Tue, 5 Apr 2005 05:15:47 -0700 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id C74D085; Tue, 5 Apr 2005 14:15:23 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id BFFF81C0EA; Tue, 5 Apr 2005 14:16:05 +0200 (CEST) Date: Tue, 5 Apr 2005 14:16:05 +0200 From: Thomas Graf To: Wang Jian Cc: netdev@oss.sgi.com, jamal Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Message-ID: <20050405121605.GM26731@postel.suug.ch> References: <20050405140342.024A.LARK@linux.net.cn> <20050405103827.GL26731@postel.suug.ch> <20050405190024.024D.LARK@linux.net.cn> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050405190024.024D.LARK@linux.net.cn> X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1411 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Wang Jian <20050405190024.024D.LARK@linux.net.cn> 2005-04-05 19:25 > If you read the thread I pointed to, then you know there is chance that > nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16 > bit number can be mapped to a classid. This is one of many chances. > > In that case, nfmark can be used like this > > 0x00010000 > 0x00020000 > 0x00030000 > ... > > 0x00000001 > 0x00000002 > 0x00000003 > ... > > The old hash function doesn't expect such pattern. I'm aware of the problem you're facing, if the lower 8bits are set to 0 for a large amount of flows you get all that flows chained in the first hash bucket. > I must admit that I am not very familiar with hash function. I find that > and use a quick hack. My patch just points out the existing risk. Anyone > can improve this by using a faster and even distributed hash function. I can't really give you feedback on this since I don't have the background for this. Theoretically a hash size being a prime would do better but is stupid regarding slab efficiency. What I'm worried about is that we lose the zero collisions behaviour for the most popular use case. New idea: we make this configureable and allow 3 types of hash functions: 1) default as-is, perfect for marks 0..255 2) all bits taken into account (your patch) 3) bitmask + shift provided by the user just like dsmark. Thoughts? From arnaldo.melo@gmail.com Tue Apr 5 05:18:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:18:46 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.205]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35CIcna019303 for ; Tue, 5 Apr 2005 05:18:39 -0700 Received: by wproxy.gmail.com with SMTP id 68so1886518wri for ; Tue, 05 Apr 2005 05:18:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=BCdr7gokxxBbrdZXIinEb96Ghwcw4BWSo0g6l7TUcUgfknz1diSYskNAONVXJjEH2u3OGoHHoirqvirHvhlzuxSU3Z72e3rNuZ7XPQOnMdLNNsgqF9Nl/gpDbZl3EapoyMDyW+yGFlAIubBiHh9QnjrwFlPcNk27wxN18kal90k= Received: by 10.54.32.33 with SMTP id f33mr837374wrf; Tue, 05 Apr 2005 05:18:33 -0700 (PDT) Received: by 10.54.72.15 with HTTP; Tue, 5 Apr 2005 05:18:33 -0700 (PDT) Message-ID: <39e6f6c705040505186c1c62ed@mail.gmail.com> Date: Tue, 5 Apr 2005 09:18:33 -0300 From: Arnaldo Carvalho de Melo Reply-To: acme@conectiva.com.br To: Marcel Holtmann Subject: Re: Some sleeping function called from invalid context Cc: Network Development Mailing List In-Reply-To: <39e6f6c70504050413666ea29d@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <1112693744.7960.2.camel@pegasus> <39e6f6c70504050413666ea29d@mail.gmail.com> X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1412 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arnaldo.melo@gmail.com Precedence: bulk X-list: netdev On Apr 5, 2005 8:13 AM, Arnaldo Carvalho de Melo wrote: > On Apr 5, 2005 6:35 AM, Marcel Holtmann wrote: > > Hi, > > > > while testing the latest kernel from the Bitkeeper repository, I got > > some sleeping functions called from invalid context: > > > > Freeing unused kernel memory: 180k freed > > Debug: sleeping function called from invalid context at mm/slab.c:2090 > > in_atomic():1, irqs_disabled():0 > > [] __might_sleep+0xa6/0xb0 > > [] kmem_cache_alloc+0x73/0x80 > > [] kmem_cache_create+0xfe/0x630 > > [] proto_register+0x9d/0xc0 > > [] af_unix_init+0x1c/0x7a [unix] > > [] sys_init_module+0x1b2/0x290 > > [] syscall_call+0x7/0xb > > NET: Registered protocol family 1 > > Damn, thanks for reporting, looking at it now. Humm, recent changes in slab.[ch]... I'll try booting with a kernel without proto_register to see if this is some bug introduced by this changeset or if the problem would appear without it, that is my current guess, as we were doing a kmem_cache_create at module __init time before, and it uses SLAB_KERNEL at some point... I.e. with regards to per protocol slab cache creating at module init time we are doing the same thing as before the proto_register changeset, unless I'm missing some obvious thing... - Arnaldo From hadi@cyberus.ca Tue Apr 5 05:19:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:19:20 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35CJEqs019376 for ; Tue, 5 Apr 2005 05:19:14 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIn1T-00007F-06 for netdev@oss.sgi.com; Tue, 05 Apr 2005 08:19:11 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DIn1R-0007LT-4E; Tue, 05 Apr 2005 08:19:09 -0400 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: "David S. Miller" , Masahide NAKAMURA , kaber@trash.net, netdev In-Reply-To: <20050405120724.GA25359@gondor.apana.org.au> References: <1112702604.1089.119.camel@jzny.localdomain> <20050405120724.GA25359@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112703546.1089.137.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 08:19:06 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1413 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Gah, Ok - I guess i too can be famous Signed-off-by: Jamal Hadi Salim cheers, jamal On Tue, 2005-04-05 at 08:07, Herbert Xu wrote: > > Jamal you forgot to sign off your own patch :) > > Anyway this looks good to me. > > Signed-off-by: Herbert Xu From arnaldo.melo@gmail.com Tue Apr 5 05:24:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:24:40 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.195]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35COZXk020915 for ; Tue, 5 Apr 2005 05:24:36 -0700 Received: by wproxy.gmail.com with SMTP id 68so1887802wri for ; Tue, 05 Apr 2005 05:24:30 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=Bbnu7azeoT+8Z2gPRWhyzlZgKlGUx021VrBrCn5uToCfu3ROD6o5tEBVbu03J7NYokiFqY1BpA+1kqsi7Bp5rYPn50bIg7lTWZ685QH8hWXW1Iqgi7mHQAGfUVTIryVjFfYWNRkrIDLVOobAS2ccIN71DAq4qeE/lWkxa0aLXdU= Received: by 10.54.24.49 with SMTP id 49mr658857wrx; Tue, 05 Apr 2005 05:24:30 -0700 (PDT) Received: by 10.54.72.15 with HTTP; Tue, 5 Apr 2005 05:24:30 -0700 (PDT) Message-ID: <39e6f6c705040505241c03d6ce@mail.gmail.com> Date: Tue, 5 Apr 2005 09:24:30 -0300 From: Arnaldo Carvalho de Melo Reply-To: acme@conectiva.com.br To: hadi@cyberus.ca Subject: Re: PATCH: IPSEC xfrm events Cc: Herbert Xu , "David S. Miller" , Masahide NAKAMURA , kaber@trash.net, netdev In-Reply-To: <1112703546.1089.137.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <1112702604.1089.119.camel@jzny.localdomain> <20050405120724.GA25359@gondor.apana.org.au> <1112703546.1089.137.camel@jzny.localdomain> X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1414 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arnaldo.melo@gmail.com Precedence: bulk X-list: netdev On 05 Apr 2005 08:19:06 -0400, jamal wrote: > > Gah, Ok - I guess i too can be famous Or funny! /me runs - Arnaldo From lark@linux.net.cn Tue Apr 5 05:39:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:39:55 -0700 (PDT) Received: from mx.linux.net.cn ([211.100.11.220]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35Cdj6S022240 for ; Tue, 5 Apr 2005 05:39:49 -0700 Received: from localhost (master.linux.net.cn [127.0.0.1]) by mx.linux.net.cn (Postfix) with ESMTP id E753E3EE29; Tue, 5 Apr 2005 20:39:43 +0800 (CST) Received: from mx.linux.net.cn ([127.0.0.1]) by localhost (master.linux.net.cn [127.0.0.1]) (amavisd-new, port 10025) with LMTP id 32277-04-4; Tue, 5 Apr 2005 20:39:41 +0800 (CST) Received: from [192.168.0.120] (unknown [61.51.151.86]) by mx.linux.net.cn (Postfix) with ESMTP id 7AC8C3EC0B; Tue, 5 Apr 2005 20:39:41 +0800 (CST) Date: Tue, 05 Apr 2005 20:39:41 +0800 From: Wang Jian To: Thomas Graf Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Cc: netdev@oss.sgi.com, jamal In-Reply-To: <20050405121605.GM26731@postel.suug.ch> References: <20050405190024.024D.LARK@linux.net.cn> <20050405121605.GM26731@postel.suug.ch> Message-Id: <20050405202039.0250.LARK@linux.net.cn> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.20 [CN] X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Scanned: amavisd-new at linux.net.cn X-Virus-Status: Clean X-archive-position: 1415 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lark@linux.net.cn Precedence: bulk X-list: netdev Hi Thomas Graf, On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf wrote: > > What I'm worried about is that we lose the zero collisions behaviour > for the most popular use case. If a web interface is used to generate netfilter/tc rules that use nfmark, then the above assumption is false. nfmark will be used incrementally and wrapped back to 0 somewhere like process id. So zero collision is not likely. When linux's QoS control capability is widely used, such web interface sooner or later comes into being. > New idea: we make this configureable and allow 3 types of hash functions: > 1) default as-is, perfect for marks 0..255 > 2) all bits taken into account (your patch) > 3) bitmask + shift provided by the user just like > dsmark. > > Thoughts? Your suggestion is very considerable. But that needs some more work. And, isn't that some bloated? -- lark From tgraf@suug.ch Tue Apr 5 05:52:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:52:23 -0700 (PDT) Received: from b.mx.projectdream.org (eth0-0.arisu.projectdream.org [194.158.4.191]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35CqHwN023488 for ; Tue, 5 Apr 2005 05:52:18 -0700 Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by b.mx.projectdream.org (Postfix) with ESMTP id B751385; Tue, 5 Apr 2005 14:51:54 +0200 (CEST) Received: by postel.suug.ch (Postfix, from userid 10001) id 9066E1C0EA; Tue, 5 Apr 2005 14:52:37 +0200 (CEST) Date: Tue, 5 Apr 2005 14:52:37 +0200 From: Thomas Graf To: Wang Jian Cc: netdev@oss.sgi.com, jamal Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function Message-ID: <20050405125237.GN26731@postel.suug.ch> References: <20050405190024.024D.LARK@linux.net.cn> <20050405121605.GM26731@postel.suug.ch> <20050405202039.0250.LARK@linux.net.cn> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050405202039.0250.LARK@linux.net.cn> X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1416 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Wang Jian <20050405202039.0250.LARK@linux.net.cn> 2005-04-05 20:39 > On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf wrote: > > What I'm worried about is that we lose the zero collisions behaviour > > for the most popular use case. > > If a web interface is used to generate netfilter/tc rules that use > nfmark, then the above assumption is false. nfmark will be used > incrementally and wrapped back to 0 somewhere like process id. So zero > collision is not likely. I did not claim that the above assumption is true for all case but the most common use of cls_fw is static marks set by netfilter to values from 0..255. > When linux's QoS control capability is widely used, such web interface > sooner or later comes into being. That might be true but I will never ack on something that makes zero collision use of cls_fw impossible. I'm all for improving this but not at the cost of reduced performance for the most obvious use case of cls_fw. > Your suggestion is very considerable. But that needs some more work. And, > isn't that some bloated? The shift + bitmask might be bloated and can be deferred a bit until someone comes up with this need. I can cook up a patch for this if you want, it's not much work. From hadi@cyberus.ca Tue Apr 5 05:54:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 05:55:02 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35Csu0M024169 for ; Tue, 5 Apr 2005 05:54:57 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DIna1-0001hi-Ll for netdev@oss.sgi.com; Tue, 05 Apr 2005 08:54:53 -0400 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DInZz-0003sq-JD; Tue, 05 Apr 2005 08:54:51 -0400 Subject: Re: [PATCH] improvement on net/sched/cls_fw.c's hash function From: jamal Reply-To: hadi@cyberus.ca To: Wang Jian Cc: Thomas Graf , netdev In-Reply-To: <20050405202039.0250.LARK@linux.net.cn> References: <20050405190024.024D.LARK@linux.net.cn> <20050405121605.GM26731@postel.suug.ch> <20050405202039.0250.LARK@linux.net.cn> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112705689.1088.209.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 05 Apr 2005 08:54:49 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1417 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2005-04-05 at 08:39, Wang Jian wrote: > Hi Thomas Graf, > > > On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf wrote: > > > > > What I'm worried about is that we lose the zero collisions behaviour > > for the most popular use case. > > If a web interface is used to generate netfilter/tc rules that use > nfmark, then the above assumption is false. nfmark will be used > incrementally and wrapped back to 0 somewhere like process id. So zero > collision is not likely. > Yes, but the distribution is still very good even in that case. If you have 257 entries then all except for two will be in separate buckets. > When linux's QoS control capability is widely used, such web interface > sooner or later comes into being. > > > New idea: we make this configureable and allow 3 types of hash functions: > > 1) default as-is, perfect for marks 0..255 > > 2) all bits taken into account (your patch) > > 3) bitmask + shift provided by the user just like > > dsmark. > > > > Thoughts? > > Your suggestion is very considerable. But that needs some more work. And, > isn't that some bloated? > Why dont you run a quick test? Very easy to do in user space. Enter two sets of values using the two different approaches; yours and the current way tc uses nfmark (incremental). And then apply the jenkins approach you had to see how well it looks like? I thinkw e know how it will look with current hash - but if you can show its not so bad in the case of jenkins as well it may be an acceptable approach, cheers, jamal From herbert@gondor.apana.org.au Tue Apr 5 06:05:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 06:05:46 -0700 (PDT) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j35D5ago025358 for ; Tue, 5 Apr 2005 06:05:37 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DInjw-0005xw-00; Tue, 05 Apr 2005 23:05:08 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DInjV-0001Sk-00; Tue, 05 Apr 2005 23:04:41 +1000 Date: Tue, 5 Apr 2005 23:04:41 +1000 To: "David S. Miller" , Alexey Kuznetsov , netdev@oss.sgi.com Subject: [IPV4] Disable MULTIPATH_CACHED on input path Message-ID: <20050405130441.GA5604@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="NzB8fVQJ5HfG6fxh" Content-Disposition: inline User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/808/Tue Apr 5 02:54:46 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1418 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: Since we're not doing multipath selection on the input path yet, we should disable the code that inserts the multipath route entries for the input path. As it is we are inserting a whole bunch of useless entries as well as breaking multipath routing for the input path (forwarded packets) completely. I left the code around since we're planning to do this at some point. Signed-off-by: Herbert Xu As this code is going to stick around, I'm going to fix it :) Expect more patches soon. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p ===== net/ipv4/route.c 1.108 vs edited ===== --- 1.108/net/ipv4/route.c 2005-03-23 15:06:23 +11:00 +++ edited/net/ipv4/route.c 2005-04-05 23:00:28 +10:00 @@ -1720,7 +1720,7 @@ } rth->u.dst.flags= DST_HOST; -#ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED +#if 0 if (res->fi->fib_nhs > 1) rth->u.dst.flags |= DST_BALANCED; #endif @@ -1792,7 +1792,7 @@ struct in_device *in_dev, u32 daddr, u32 saddr, u32 tos) { -#ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED +#if 0 struct rtable* rth; unsigned char hop, hopcount, lasthop; int err = -EINVAL; --NzB8fVQJ5HfG6fxh-- From lark@linux.net.cn Tue Apr 5 06:30:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Apr 2005 06:30:09 -0700 (PDT) Received: from mx.linux.net.cn ([211.100.11.220]) by