From ahu@outpost.ds9a.nl Fri Apr 1 01:01:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:01:26 -0800 (PST) Received: from outpost.ds9a.nl (postfix@outpost.ds9a.nl [213.244.168.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3191Ju3017215 for ; Fri, 1 Apr 2005 01:01:19 -0800 Received: by outpost.ds9a.nl (Postfix, from userid 1000) id CE4E33FC3; Fri, 1 Apr 2005 11:01:16 +0200 (CEST) Date: Fri, 1 Apr 2005 11:01:16 +0200 From: bert hubert To: Ben Greear Cc: hadi@cyberus.ca, "David S. Miller" , netdev Subject: Re: RFC: Redirect-Device Message-ID: <20050401090116.GA21361@outpost.ds9a.nl> Mail-Followup-To: bert hubert , Ben Greear , hadi@cyberus.ca, "David S. Miller" , netdev References: <424C6089.1080507@candelatech.com> <1112303627.1073.71.camel@jzny.localdomain> <424C6B10.6030200@candelatech.com> <1112306031.1073.109.camel@jzny.localdomain> <424C7813.4000101@candelatech.com> <20050331143531.30f4eb8f.davem@davemloft.net> <424C7F96.4070002@candelatech.com> <1112311618.1090.20.camel@jzny.localdomain> <424C8E2C.70302@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <424C8E2C.70302@candelatech.com> User-Agent: Mutt/1.3.28i X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1185 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ahu@ds9a.nl Precedence: bulk X-list: netdev On Thu, Mar 31, 2005 at 03:56:28PM -0800, Ben Greear wrote: > >I think you are more comfortable with using netdevices and ioctls and > >/proc. > > Definately. Ever tried to sniff a socket with ethereal? :) On loopback, all the time. I'm probably dense but I don't understand what problem you've solved with this interface. Could you elaborate a bit? -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services From pekkas@netcore.fi Fri Apr 1 01:28:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:28:59 -0800 (PST) Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319SqP5018568 for ; Fri, 1 Apr 2005 01:28:53 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id j319SiR11426; Fri, 1 Apr 2005 12:28:44 +0300 Date: Fri, 1 Apr 2005 12:28:44 +0300 (EEST) From: Pekka Savola To: Ben Greear cc: "'netdev@oss.sgi.com'" Subject: Re: RFC: Redirect-Device In-Reply-To: <424CDBA9.80703@candelatech.com> Message-ID: References: <424C6089.1080507@candelatech.com> <424CDBA9.80703@candelatech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1186 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pekkas@netcore.fi Precedence: bulk X-list: netdev On Thu, 31 Mar 2005, Ben Greear wrote: >> Is there something in your problem statement I'm missing? > > That would be similar to what I'm doing, but I'm not really trying > to tunnel anything. I am trying to duplicate the behaviour of two > ethernet interfaces connected by an external cross-over cable, and I'm > trying to duplicate it at the network-device interface level so that > common tools (and my own tools) can treat these virtual interfaces > just like ethernet interfaces. Oh ok, what you seem to want is some kind of "Ethernet loopback++", but the "looped" packets should come back from a virtual interface instead of the same interface? Btw, does the kernel support traditional loopback, so that at the last stage, just before sending a packet on the wire, it would be pushed back. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings From herbert@gondor.apana.org.au Fri Apr 1 01:37:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:37:26 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319bGoK019244 for ; Fri, 1 Apr 2005 01:37:17 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHIaL-00028u-00; Fri, 01 Apr 2005 19:37:01 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHIZt-0000N0-00; Fri, 01 Apr 2005 19:36:33 +1000 Date: Fri, 1 Apr 2005 19:36:33 +1000 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [NETLINK] cb_lock does not needs ref count on sk Message-ID: <20050401093633.GA32707@gondor.apana.org.au> References: <20050327091524.GA23215@elte.hu> <20050327133811.GA5569@elte.hu> <20050329104906.GA19836@gondor.apana.org.au> <20050329114926.GA14986@elte.hu> <20050330082640.GA8269@gondor.apana.org.au> <20050330170236.2bddf666.davem@davemloft.net> <20050331231922.GA26587@gondor.apana.org.au> <20050331232322.GA26693@gondor.apana.org.au> <20050331203313.57e1c5c3.davem@davemloft.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Nq2Wo0NMKNjxTN9z" Content-Disposition: inline In-Reply-To: <20050331203313.57e1c5c3.davem@davemloft.net> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1187 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: Here is a little optimisation for the cb_lock used by netlink_dump. While fixing that race earlier, I noticed that the reference count held by cb_lock is completely useless. The reason is that in order to obtain the protection of the reference count, you have to take the cb_lock. But the only way to take the cb_lock is through dereferencing the socket. That is, you must already possess a reference count on the socket before you can take advantage of the reference count held by cb_lock. As a corollary, we can remve the reference count held by the cb_lock. Signed-off-by: Herbert Xu Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p ===== net/netlink/af_netlink.c 1.75 vs edited ===== --- 1.75/net/netlink/af_netlink.c 2005-04-01 16:25:14 +10:00 +++ edited/net/netlink/af_netlink.c 2005-04-01 19:30:22 +10:00 @@ -374,7 +374,6 @@ nlk->cb->done(nlk->cb); netlink_destroy_callback(nlk->cb); nlk->cb = NULL; - __sock_put(sk); } spin_unlock(&nlk->cb_lock); @@ -1100,7 +1099,6 @@ spin_unlock(&nlk->cb_lock); netlink_destroy_callback(cb); - __sock_put(sk); return 0; } @@ -1139,7 +1137,6 @@ return -EBUSY; } nlk->cb = cb; - sock_hold(sk); spin_unlock(&nlk->cb_lock); netlink_dump(sk); --Nq2Wo0NMKNjxTN9z-- From abhishek@pal.ece.iisc.ernet.in Fri Apr 1 01:40:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 01:41:01 -0800 (PST) Received: from ece.iisc.ernet.in (ece.iisc.ernet.in [144.16.64.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j319em4l019848 for ; Fri, 1 Apr 2005 01:40:54 -0800 Received: from pal.ece.iisc.ernet.in (pal.ece.iisc.ernet.in [144.16.64.149]) by ece.iisc.ernet.in (8.12.6/8.12.6) with ESMTP id j319cS8V023201 for ; Fri, 1 Apr 2005 15:08:28 +0530 (IST) (envelope-from abhishek@pal.ece.iisc.ernet.in) Received: by pal.ece.iisc.ernet.in (Postfix, from userid 1047) id 97D6331E59; Fri, 1 Apr 2005 15:10:40 +0530 (IST) Received: from localhost (localhost [127.0.0.1]) by pal.ece.iisc.ernet.in (Postfix) with ESMTP id 8C98A31E57 for ; Fri, 1 Apr 2005 15:10:40 +0530 (IST) Date: Fri, 1 Apr 2005 15:10:40 +0530 (IST) From: Abhishek Gupta To: netdev@oss.sgi.com Subject: Problem using HTB Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1188 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abhishek@pal.ece.iisc.ernet.in Precedence: bulk X-list: netdev hello everybody I am working on a project related to QoS. I am using Linux's tc to configure my PC based router. My setup is as follows:- eth0 eth1 eth0 eth0 PC-based server|----------|PC-based Router|---------|PC-Based Client (using tc) * All my ethernet cards are on 100Mbps lan * Traffic generators being used: > UDP: gen_send @ about 1Mbps (http://www.citi.umich.edu/projects/qbone/generator.html) * Kernel versions being used:- > At Router: linux-2.4.20 > At Client and Server: Linux-2.4.7-10 * iproute2 versions:- > At Router: iproute2-ss020116 > At Client and Server: iproute2-ss010824 * Packets before leaving sever and client are being marked with DSCP bits using Linux's tc option; Marking is done based on two-tuples: destination ip address and port number * At the Router, I have the following configuration(only related to HTB) for eth0 and similar configuration exits for eth1 too: ---Router Configuration Starts Here----- DEV0='eth0' tc qdisc add dev $DEV0 parent 1: handle 2: htb default 30 tc class add dev $DEV0 parent 2: classid 2:1 htb rate 100kbit burst 100 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:10 htb rate 60kbit burst 100 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:20 htb rate 30kbit burst 60 \ ceil 100kbit tc class add dev $DEV0 parent 2:1 classid 2:30 htb rate 10kbit burst 80 \ ceil 100kbit tc qdisc add dev $DEV0 parent 2:10 gred setup DPs 3 default 3 grio tc qdisc change dev $DEV0 parent 2:10 gred limit 185000 min 11394 \ max 11395 burst 100 avpkt 128 bandwidth 100kbit DP 1 probability 1 \ prio 1 tc qdisc change dev $DEV0 parent 2:10 gred limit 17972 min 4748 max 9493 \ burst 50 avpkt 1000 bandwidth 100kbit DP 2 probability 0.01 prio 2 tc qdisc change dev $DEV0 parent 2:10 gred limit 4368 min 1796 max 3582 \ burst 25 avpkt 1000 bandwidth 100kbit DP 3 probability 0.01 prio 2 tc qdisc add dev $DEV0 parent 2:20 gred setup DPs 2 default 2 grio tc qdisc change dev $DEV0 parent 2:20 gred limit 52480 min 11311 \ max 11312 burst 60 avpkt 256 bandwidth 100kbit DP 1 probability 1 \ prio 1 tc qdisc change dev $DEV0 parent 2:20 gred limit 47184 min 5898 \ max 11796 burst 60 avpkt 1000 bandwidth 100kbit DP 2 probability 0.01 \ prio 2 tc qdisc add dev $DEV0 parent 2:30 gred setup DPs 1 default 1 grio tc qdisc change dev $DEV0 parent 2:30 gred limit 15728 min 1966 \ max 3932 burst 80 avpkt 200 bandwidth 100kbit DP 1 probability 0.04 \ prio 1 -----Router Configuration Ends Here------ Now, the problem is that when I am sending packets from just one UDP source(at server), I am getting outbound bit rate at eth0(of Router) as 12kbps even though I have ceiled the corresponding HTB class to 100kbps; similar thing happens when I have two UDP sources(both at server). So, even though I have configured for 100kbps, I am getting only 12kbps as the link speed. Please help me out. Abhishek ========================================================================= ABHISHEK GUPTA E-mail:abhishek_it_bhu@yahoo.co.in ========================================================================= From akpm@osdl.org Fri Apr 1 02:11:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 02:12:01 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31ABrpd020835 for ; Fri, 1 Apr 2005 02:11:53 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j31ABgs4005803 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 1 Apr 2005 02:11:42 -0800 Received: from bix (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id j31ABXgB002239; Fri, 1 Apr 2005 02:11:34 -0800 Date: Fri, 1 Apr 2005 02:11:21 -0800 From: Andrew Morton To: netdev@oss.sgi.com Cc: lukeross@sys3175.co.uk Subject: Fw: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu Message-Id: <20050401021121.76da449b.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1189 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev hm, mtu is implemented in the device driver - you might be out of luck. Begin forwarded message: Date: Fri, 1 Apr 2005 02:01:19 -0800 From: bugme-daemon@osdl.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu http://bugme.osdl.org/show_bug.cgi?id=4430 Summary: Virtual interfaces cannot have their own mtu Kernel Version: kernel-2.6.9-1.6_FC2 Status: NEW Severity: low Owner: acme@conectiva.com.br Submitter: lukeross@sys3175.co.uk Distribution: Fedora Core 2,3 Hardware Environment: Broadcom gigabit card using tg3 (Tyan s2885 onboard) Problem Description: eth0 and eth0:1 cannot have different mtus. I have a jumbo-frame capable switch with three devices plugged in. Two are PCs with jumbo-capable cards, the other is a wireless router which isn't, and hangs if either PC attempts to discover whether it can support jumbo frames. To get the benefit of jumbo frames between the two PCs, I tried to set up eth0:1 - on a different subnet to the wireless router - on both PCs, and set the mtu of the eth0:1 to 9000. However it is not possible to set the mtu for eth0:1 to 9000 without setting the mtu of eth0 to 9000 as well. Also noted in http://xcat.org/pipermail/xcat-user/2003-April/002358.html ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hadi@cyberus.ca Fri Apr 1 03:03:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:03:35 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31B3SJF024059 for ; Fri, 1 Apr 2005 03:03:28 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHJvw-0003ME-QW for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:03:24 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHJvt-0007Pl-5B; Fri, 01 Apr 2005 06:03:21 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401042106.GA27762@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112353398.1096.116.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 06:03:18 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1190 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Thu, 2005-03-31 at 23:21, Herbert Xu wrote: > On Thu, Mar 31, 2005 at 08:37:21PM -0500, jamal wrote: > > > --- a/include/net/xfrm.h 2005-03-25 22:28:26.000000000 -0500 > > +++ b/include/net/xfrm.h 2005-03-31 19:26:24.000000000 -0500 > > > > +/* callback structure passed from either netlink or pfkey */ > > +struct km_cb > > This name is a bit non-specific. > note: used by both SP/SA > > +{ > > + u32 data; /* callee to caller */ > > +}; > > Might as well put the event into it if we're going to keep this > structure. It'll help to shorten the function prototypes that > use it. > > And then we can just call this structure km_event. > sure. > > -extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard); > > +extern void km_policy_expired(struct xfrm_policy *pol, int dir, int event); > > Bogus prototype change. > agreed. > > +void xfrm_state_del_flush(struct xfrm_state *x) > > +{ > > + spin_lock_bh(&x->lock); > > + __xfrm_state_delete(x); > > + spin_unlock_bh(&x->lock); > > +} > > Sorry, I've changed my mind on this. This demonstrates why the > km_notify_* calls should be made from af_key/xfrm_user directly > instead of here. > > > Some of these functions are called internally as you discovered. > Since the notifications should only be generated by user requests, > calls to km_notify_* should be made at the places where the user > requests are handled, which is in the KM itself. > You need to be able to generate events at every km not just the one that generated the request. You also (most of the time) need to do it before affected object dissapears. So I am missing your point on this one. > Otherwise we'll have to add hacks like this to avoid the > notification for internal users. > I may be paranoid but i do this because x could be garbage collected way before i send the km user message - and i need it to use it to generate the event. I could take a copy of it ... > > void xfrm_state_delete(struct xfrm_state *x) > > { > > + int notif = 0; > > spin_lock_bh(&x->lock); > > + /* > > + * its unfortunate we have to freeze gc for this > > + * one moment - the other alternative would involve > > + * memcopying the state and then announcing that. > > + * think SMP where theres an iota where this could mess > > + * up - JHS > > + */ > > + spin_lock_bh(&xfrm_state_gc_lock); > > + if (x->km.state != XFRM_STATE_DEAD) > > + notif = 1; > > __xfrm_state_delete(x); > > + > > + if (notif) > > + km_state_notify(x, NULL, XFRM_SAP_DELETED); > > You've caught a real bug for af_key here. It's currently possible to > receive two delete notifications for the same state. Can you elaborate? > However, may I suggest that we code this differently. Make > __xfrm_state_delete return 0 if the state was really deleted > and -ESRCH otherwise. > > Then af_key/xfrm_user can simply call km_state_notify if the > return value was zero. > Again like i said: I need to tell every km user about the event, not just the originator. > BTW there is no need to grab xfrm_state_gc_lock. You've got > a reference count on the state from your caller. > Aha! I missed that - I will remove it. > > @@ -270,6 +319,10 @@ > > } > > } > > spin_unlock_bh(&xfrm_state_lock); > > + if (count) { > > + c.data = proto; > > + km_state_notify(NULL, &c, XFRM_SAP_FLUSHED); > > + } > > The notification should occur in all cases, even if count == 0. > Well, Masahide-San and I actually did discuss this and he was of the same opinion as you. My opinion: We only generate events when something happens, not just because someone issues a command. If flush was issued and there was nothing to flush why generate an event? does the PFKEY RFC say anything on this? > > @@ -957,8 +1020,9 @@ > > if (x->tunnel) { > > struct xfrm_state *t = x->tunnel; > > > > + /* XXX: Avoid announce?? */ > > if (atomic_read(&t->tunnel_users) == 2) > > - xfrm_state_delete(t); > > + xfrm_state_del_flush(t); > > That's right. We don't want to announce internal states to the world. > I will remove that comment. Thats achieved in the above code although the called funtion may not have the appropriate name . > > --- a/net/xfrm/xfrm_policy.c 2005-03-25 22:28:21.000000000 -0500 > > +++ b/net/xfrm/xfrm_policy.c 2005-03-31 19:26:24.000000000 -0500 > > @@ -298,7 +298,7 @@ > > * entry dead. The rule must be unlinked from lists to the moment. > > */ > > > > -static void xfrm_policy_kill(struct xfrm_policy *policy) > > +static void xfrm_policy_kill(struct xfrm_policy *policy, int dir, int notif) > > Again, had you done the km_* calls from af_key/xfrm_user, then there'd > be no need to check notif here. > Refer to my comments above on being able to tell multiple managers about the events originated by one. Actually, given that this function is being called in many places i would say this is the exact central location you want to issue the announce from. > BTW, as it is you're announcing expired policies twice. Once as an > expire event and once as a delete event. This problem will also go > away if you move the km_* calls into af_key/xfrm_user. > Theres an announcement only when policy goes dead ;-> So only one not two. Same with the state as well. And again cant do it from af_key/xfrm_user if you want to have events generated by one km to be sent to another as well. Its pf_key that needs fixing. > > @@ -579,7 +586,7 @@ > > write_unlock_bh(&xfrm_policy_lock); > > > > if (old_pol) { > > - xfrm_policy_kill(old_pol); > > + xfrm_policy_kill(old_pol, dir, 1); > > } > > Please don't announce socket policies :) > I missed this one - sorry. > > --- a/net/xfrm/xfrm_user.c 2005-03-25 22:28:22.000000000 -0500 > > +++ b/net/xfrm/xfrm_user.c 2005-03-31 19:26:24.000000000 -0500 > > @@ -683,6 +683,10 @@ > > if (!xp) > > return err; > > > > + /* shouldnt excl be based on nlh flags?? > > + * Aha! this is anti-netlink really i.e more pfkey derived > > + * in netlink excl is a flag and you wouldnt need > > + * a type XFRM_MSG_UPDPOLICY - JHS */ > > Good point. Care to provide a patch to treat NEW + NLM_F_REPLACE > as UPD? > > > @@ -1053,10 +1057,10 @@ > > return -1; > > } > > > > -static int xfrm_send_state_notify(struct xfrm_state *x, int hard) > > +static int xfrm_exp_state_notify(struct xfrm_state *x, u32 hard) > > How about calling this xfrm_notify_sa_expired for consistency? > Ditto for the policy function. sure. > > > +static int xfrm_notify_sa_flush(struct km_cb *c) > > +{ > > + struct xfrm_usersa_flush *p; > > + struct nlmsghdr *nlh; > > + struct sk_buff *skb; > > + unsigned char *b; > > + u32 ppid = 0; > > + int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); > > + > > + skb = alloc_skb(len, GFP_ATOMIC); > > + if (skb == NULL) > > + return -ENOMEM; > > + b = skb->tail; > > + > > + nlh = NLMSG_PUT(skb, ppid, jiffies, > > If we're serious about providing sequence numbers then please > set it up as an atomic integer and use it throughout this file. > > Otherwise just pop zero in there. > I was just being lazy. I could send a 0 but whats wrong with using jiffies? > > + p = NLMSG_DATA(nlh); > > + if (!c) { > > + printk("xfrm_notify_sa_flush NULL km cb\n"); > > + p->proto = 0; > > Is anyone expected to call this with a NULL pointer? If not then > just let it OOPS. Same comment applies to the cb checks later on. > Will fix this. > > +static int xfrm_notify_sa( struct xfrm_state *x, int event, struct km_cb *c) > > > + if (event == XFRM_SAP_ADDED) > > + nlt = XFRM_MSG_NEWSA; > > + else if (event == XFRM_SAP_UPDATED) > > + nlt = XFRM_MSG_UPDSA; > > + else if (event == XFRM_SAP_DELETED) > > + nlt = XFRM_MSG_DELSA; > > + else > > + goto nlmsg_failure; > > Please use a switch. > sure. > > +static int xfrm_send_state_notify(struct xfrm_state *x, int event, struct km_cb *c) > > +{ > > + > > + if ((event == XFRM_SAP_ADDED) || > > + (event == XFRM_SAP_UPDATED) || > > + (event == XFRM_SAP_DELETED)) > > + return xfrm_notify_sa(x, event, c); > > + > > + if (event == XFRM_SAP_FLUSHED) > > + xfrm_notify_sa_flush(c); > > + > > + if (event != XFRM_SAP_EXPIRED) > > + return 0; > > Again a switch would be perfect. > Will fix this. BTW, Herbert, thanks for taking the time; appreciated. cheers, jamal From hadi@cyberus.ca Fri Apr 1 03:15:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:16:00 -0800 (PST) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31BFrHe024882 for ; Fri, 1 Apr 2005 03:15:54 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DHK7s-0005FE-I0 for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:15:44 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHK7p-0008Tf-9u; Fri, 01 Apr 2005 06:15:41 -0500 Subject: Re: Resend: Re: PATCH: IPSEC acquire in presence of multiple managers From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: herbert@gondor.apana.org.au, "David S. Miller" , nakam@linux-ipv6.org, shinta.sugimoto@ericsson.com, netdev In-Reply-To: <20050331211340.0e6fbdfb.davem@davemloft.net> References: <1111795927.1089.749.camel@jzny.localdomain> <1111862131.1092.872.camel@jzny.localdomain> <20050331211340.0e6fbdfb.davem@davemloft.net> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112354137.1090.129.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 06:15:38 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1191 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 00:13, David S. Miller wrote: > On 26 Mar 2005 13:35:31 -0500 > jamal wrote: > > > Apologies, The last patch had some a glitch in the filename. Dave please > > apply this one instead > > Doesn't apply, in the current tree km_query() is marked static. > > Please regenerate your patch and sorry for not getting to this > sooner. Dave, I am combining this with the other event patch that is under discussion right now which i will end up sending to you. If you want it separate i could do that. cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 03:45:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:45:08 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Biv2i026089 for ; Fri, 1 Apr 2005 03:44:58 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHKZX-00032I-00; Fri, 01 Apr 2005 21:44:19 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHKYE-0000p4-00; Fri, 01 Apr 2005 21:42:58 +1000 Date: Fri, 1 Apr 2005 21:42:58 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050401114258.GA2932@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112353398.1096.116.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1192 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 06:03:18AM -0500, jamal wrote: > > > Some of these functions are called internally as you discovered. > > Since the notifications should only be generated by user requests, > > calls to km_notify_* should be made at the places where the user > > requests are handled, which is in the KM itself. > > You need to be able to generate events at every km not just the one that > generated the request. You also (most of the time) need to do it before I understand. However, that's not determined by where you put the km_notify call itself. Even when you call km_notify from af_key or xfrm_user it will notify every km in the system. It's the fact that we're calling km_notify instead of pfkey_broadcast or netlink_broadcast that's important, not the location. Having the km_notify call made in af_key/xfrm_user is convenient though for the reason I outlined above. > I may be paranoid but i do this because x could be garbage collected way > before i send the km user message - and i need it to use it to generate > the event. I could take a copy of it ... That's what the ref counter is for. > > You've caught a real bug for af_key here. It's currently possible to > > receive two delete notifications for the same state. > > Can you elaborate? Imagine you've got a KM that's trying to delete a state via af_key that's about to expire. If pfkey_delete looks up the state successfully, and then the timer triggers before the actual xfrm_state_delete, you will get one event generated by the timer and another by pfkey_delete. > Again like i said: I need to tell every km user about the event, not > just the originator. I'm suggesting that you add the km_notify calls to af_key and xfrm_user. That will take care of notifying everyone. > Well, Masahide-San and I actually did discuss this and he was of the > same opinion as you. My opinion: We only generate events when something > happens, not just because someone issues a command. If flush was issued > and there was nothing to flush why generate an event? does the PFKEY RFC > say anything on this? RFC 2367 says that: The messaging behavior for SADB_FLUSH is: Send an SADB_FLUSH message from a user process to the kernel. The kernel will return an SADB_FLUSH message to all listening sockets. As you can see, there is no exception for the case of an empty database. So my interpretation would be that a broadcast is needed. > Refer to my comments above on being able to tell multiple managers about > the events originated by one. May I also refer you to my comment above about this being achieved by calling km_notify, even if you do it from within af_key or xfrm_user :) > Actually, given that this function is being called in many places i > would say this is the exact central location you want to issue the > announce from. Try this as an exercise. List all the xfrm_policy_kills that need notifications and all those that don't, you will find that the former all originate from delete/flush commands in af_key/xfrm_user, while the latter originate from other callers. In other words, by placing the call in af_key/xfrm_user you simplify the logic and make it more maintainable. > > BTW, as it is you're announcing expired policies twice. Once as an > > expire event and once as a delete event. This problem will also go > > away if you move the km_* calls into af_key/xfrm_user. > > Theres an announcement only when policy goes dead ;-> > So only one not two. Same with the state as well. Well when the policy expires you will get one expire notification from the current timer code and a new one from your patch since the timer calls xfrm_policy_delete. See my point? By putting the call in xfrm_policy.c you have to be really careful in dividing the internal users which shouldn't generate notifications and the external users which should. By doing it in af_key/xfrm_user you can avoid all this work. > And again cant do it from af_key/xfrm_user if you want to have events > generated by one km to be sent to another as well. Its pf_key that needs > fixing. Well I must repeat that if you were calling km_notify from af_key/xfrm_user you will be sending these events to all km's no matter what their affiliation is :) > > If we're serious about providing sequence numbers then please > > set it up as an atomic integer and use it throughout this file. > > > > Otherwise just pop zero in there. > > I was just being lazy. I could send a 0 but whats wrong with using > jiffies? Using jiffies means that you can have two successive messages that share the same sequence number. It's not a big deal of course. But if we're going to indicate ordering, we might as well go the full length. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Fri Apr 1 03:47:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 03:47:20 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31BlDko026849 for ; Fri, 1 Apr 2005 03:47:13 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHKc0-000330-00; Fri, 01 Apr 2005 21:46:52 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHKbd-0000pi-00; Fri, 01 Apr 2005 21:46:29 +1000 From: Herbert Xu To: akpm@osdl.org (Andrew Morton) Subject: Re: Fw: [Bugme-new] [Bug 4430] New: Virtual interfaces cannot have their own mtu Cc: netdev@oss.sgi.com, lukeross@sys3175.co.uk Organization: Core In-Reply-To: <20050401021121.76da449b.akpm@osdl.org> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Fri, 01 Apr 2005 21:46:29 +1000 X-Virus-Scanned: ClamAV 0.83/798/Thu Mar 31 01:54:41 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1193 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Andrew Morton wrote: > > the eth0:1 to 9000. However it is not possible to set the mtu for eth0:1 to 9000 > without setting the mtu of eth0 to 9000 as well. The solution is to set the mtu using ip route in addition to setting it on eth0, e.g., ip ro add x.0.0.0/8 via gw dev eth0 mtu 1500 src a.b.c.d ip ro add y.0.0.0/8 via gw2 dev eth0 mtu 9000 src e.f.g.h You still have to set the mtu on eth0 to 9000 since that determines the maximum receive size as well (MRU). -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Fri Apr 1 04:24:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:24:54 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31COmBE032004 for ; Fri, 1 Apr 2005 04:24:49 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHLCf-0004Uu-JU for netdev@oss.sgi.com; Fri, 01 Apr 2005 07:24:45 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHLCb-0007cd-Jd; Fri, 01 Apr 2005 07:24:41 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401114258.GA2932@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112358278.1096.160.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 07:24:38 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1194 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 06:42, Herbert Xu wrote: > On Fri, Apr 01, 2005 at 06:03:18AM -0500, jamal wrote: > > > > > Some of these functions are called internally as you discovered. > > > Since the notifications should only be generated by user requests, > > > calls to km_notify_* should be made at the places where the user > > > requests are handled, which is in the KM itself. > > > > You need to be able to generate events at every km not just the one that > > generated the request. You also (most of the time) need to do it before > > I understand. However, that's not determined by where you put the > km_notify call itself. Even when you call km_notify from af_key > or xfrm_user it will notify every km in the system. > > It's the fact that we're calling km_notify instead of pfkey_broadcast > or netlink_broadcast that's important, not the location. > > Having the km_notify call made in af_key/xfrm_user is convenient though > for the reason I outlined above. I think either scheme is fine really;-> I will definetely go back and consider the approach you are suggesting and see if it results into more maintanable code - then fair. Otherwise you realize its more work for me ;-> > > > You've caught a real bug for af_key here. It's currently possible to > > > receive two delete notifications for the same state. > > > > Can you elaborate? > > Imagine you've got a KM that's trying to delete a state via af_key that's > about to expire. If pfkey_delete looks up the state successfully, and > then the timer triggers before the actual xfrm_state_delete, you will > get one event generated by the timer and another by pfkey_delete. > I havent checked the state machine closely, but the following seems to make sense: The first thing that happens to delete the state/policy should win if the state/policy is transitioned to dead. > RFC 2367 says that: > > The messaging behavior for SADB_FLUSH is: > > Send an SADB_FLUSH message from a user process to the kernel. > > > > The kernel will return an SADB_FLUSH message to all listening > sockets. > > > > As you can see, there is no exception for the case of an empty database. > So my interpretation would be that a broadcast is needed. > Does it really make sense, Herbert? ;-> What is it that you just flushed that results in the event? The RFC is ambigous in my opinion. Look at what it says about deleting (same ambiguity). ---- 3.1.4 SADB_DELETE The SADB_DELETE message causes the kernel to delete a Security Association from the key table. The delete message consists of the base header followed by the association, and the source and destination sockaddrs in the address extension. The kernel deletes the security association matching the type, spi, source address, and destination address in the message. The message behavior for SADB_DELETE is as follows: Send an SADB_DELETE message from a user process to the kernel. The kernel returns the SADB_DELETE message to all listening processes. ------ So why would you generate an event in the case when you didnt delete anything? > > Actually, given that this function is being called in many places i > > would say this is the exact central location you want to issue the > > announce from. > > Try this as an exercise. List all the xfrm_policy_kills that need > notifications and all those that don't, you will find that the former > all originate from delete/flush commands in af_key/xfrm_user, while > the latter originate from other callers. > > In other words, by placing the call in af_key/xfrm_user you simplify > the logic and make it more maintainable. > I will go over the code and review. You may be absolutely right - thats the better approach to take. > BTW, as it is you're announcing expired policies twice. Once as an > > > expire event and once as a delete event. This problem will also go > > > away if you move the km_* calls into af_key/xfrm_user. > > > > Theres an announcement only when policy goes dead ;-> > > So only one not two. Same with the state as well. > > Well when the policy expires you will get one expire notification from > the current timer code and a new one from your patch since the timer > calls xfrm_policy_delete. > > See my point? By putting the call in xfrm_policy.c you have to be > really careful in dividing the internal users which shouldn't > generate notifications and the external users which should. By doing > it in af_key/xfrm_user you can avoid all this work. > Thats a bug really which is being exposed now. So it has nothing to do with the approach taken ;-> No expire should be sent if the policy has transitioned to dead. The bug is trivial to fix - and actually should be fixed regardless of this patch. > > I was just being lazy. I could send a 0 but whats wrong with using > > jiffies? > > Using jiffies means that you can have two successive messages that > share the same sequence number. It's not a big deal of course. But > if we're going to indicate ordering, we might as well go the full > length. > Good point. I will stay lazy and just set a 0 ;-> cheers, jamal From herbert@gondor.apana.org.au Fri Apr 1 04:37:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:37:50 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Cbcta032644 for ; Fri, 1 Apr 2005 04:37:39 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHLNt-0003Fl-00; Fri, 01 Apr 2005 22:36:21 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHLNS-0000ud-00; Fri, 01 Apr 2005 22:35:54 +1000 Date: Fri, 1 Apr 2005 22:35:54 +1000 To: jamal Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events Message-ID: <20050401123554.GA3468@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1112358278.1096.160.camel@jzny.localdomain> User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1195 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Fri, Apr 01, 2005 at 07:24:38AM -0500, jamal wrote: > > I think either scheme is fine really;-> I will definetely go back and > consider the approach you are suggesting and see if it results into > more maintanable code - then fair. Otherwise you realize its more work > for me ;-> Well I'm happy to code that part if you want :) > I havent checked the state machine closely, but the following seems to > make sense: > The first thing that happens to delete the state/policy should win if > the state/policy is transitioned to dead. Agreed. That's what we'll get if we make __xfrm_state_delete return success/failure. > So why would you generate an event in the case when you didnt delete anything? You're right that the RFC isn't very clear. Let's forget about the RFC and simply consider the usefulness of this. I contend that it is useful to see a FLUSH notification even when it flushed nothing. The reason is that this is an indication to all listeners that the database is completely empty. > > Well when the policy expires you will get one expire notification from > > the current timer code and a new one from your patch since the timer > > calls xfrm_policy_delete. > > > > See my point? By putting the call in xfrm_policy.c you have to be > > really careful in dividing the internal users which shouldn't > > generate notifications and the external users which should. By doing > > it in af_key/xfrm_user you can avoid all this work. > > Thats a bug really which is being exposed now. So it has nothing to do > with the approach taken ;-> You're right that it is a bug. However, this bug would've never triggered before because we simply didn't have delete policy notifications :) > No expire should be sent if the policy has transitioned to dead. The bug > is trivial to fix - and actually should be fixed regardless of this > patch. Yes the same fix to __xfrm_state_delete can be applied to xfrm_policy_delete. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Fri Apr 1 04:59:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 04:59:52 -0800 (PST) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Cxl2i001350 for ; Fri, 1 Apr 2005 04:59:48 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DHLkW-0001aH-N2 for netdev@oss.sgi.com; Fri, 01 Apr 2005 07:59:44 -0500 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHLkU-0002gZ-2l; Fri, 01 Apr 2005 07:59:42 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <20050401123554.GA3468@gondor.apana.org.au> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112360379.1096.193.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 07:59:39 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1196 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 07:35, Herbert Xu wrote: > On Fri, Apr 01, 2005 at 07:24:38AM -0500, jamal wrote: > > > > I think either scheme is fine really;-> I will definetely go back and > > consider the approach you are suggesting and see if it results into > > more maintanable code - then fair. Otherwise you realize its more work > > for me ;-> > > Well I'm happy to code that part if you want :) > Let me review first. If it is valuable (we may have to leave expire alone). If i can get it done within next day or two fine - else if i get busyed out elsewhere i will hand it to you. Actually if you have plenty cycles and are very enthusiastic about this i can hand it to you right now ;-> Masahide and myself have some momentum going right now but i dont think this will be that disruptive. > You're right that the RFC isn't very clear. > > Let's forget about the RFC and simply consider the usefulness of this. > I contend that it is useful to see a FLUSH notification even when > it flushed nothing. > > The reason is that this is an indication to all listeners that the > database is completely empty. > Ok, let me hear from Masahide-san: If he still holds the same opinion as you then i will make the change. > > Thats a bug really which is being exposed now. So it has nothing to do > > with the approach taken ;-> > > You're right that it is a bug. However, this bug would've never triggered > before because we simply didn't have delete policy notifications :) > indeed. > > No expire should be sent if the policy has transitioned to dead. The bug > > is trivial to fix - and actually should be fixed regardless of this > > patch. > > Yes the same fix to __xfrm_state_delete can be applied to > xfrm_policy_delete. > agreed. cheers, jamal From hadi@cyberus.ca Fri Apr 1 05:18:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 05:18:49 -0800 (PST) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31DIif1002619 for ; Fri, 1 Apr 2005 05:18:45 -0800 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DHM2o-00010O-TF for netdev@oss.sgi.com; Fri, 01 Apr 2005 06:18:38 -0700 Received: from [24.103.99.32] (helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DHM2q-00055C-BD; Fri, 01 Apr 2005 08:18:40 -0500 Subject: Re: PATCH: IPSEC xfrm events From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: Patrick McHardy , Masahide NAKAMURA , "David S. Miller" , netdev In-Reply-To: <1112360379.1096.193.camel@jzny.localdomain> References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112360379.1096.193.camel@jzny.localdomain> Content-Type: text/plain Organization: jamalopolous Message-Id: <1112361517.1089.197.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 01 Apr 2005 08:18:37 -0500 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1197 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Fri, 2005-04-01 at 07:59, jamal wrote: > Let me review first. If it is valuable (we may have to leave expire > alone). Ok, from a first review I would agree with you the result of doing it in km user will be more maintainable. It will result in a larger patch but in the long run more maintainable. > If i can get it done within next day or two fine - else if i get > busyed out elsewhere i will hand it to you. Let me code away at it - The offer still stands though ;-> cheers, jamal From nakam@linux-ipv6.org Fri Apr 1 06:20:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 06:20:05 -0800 (PST) Received: from mail406.noc.n-bone.net (mail4.noc.n-bone.net [138.243.50.144]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31EJxCd004700 for ; Fri, 1 Apr 2005 06:19:59 -0800 Received: from [192.168.2.196] (polaris.linux-ipv6.org [203.178.140.10]) by mail406.noc.n-bone.net (NBONE-MTA) with ESMTP id CD2CBFD9; Fri, 1 Apr 2005 23:19:47 +0900 (JST) Message-ID: <424D5881.4010005@linux-ipv6.org> Date: Fri, 01 Apr 2005 23:19:45 +0900 From: Masahide NAKAMURA User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca, Herbert Xu Cc: Patrick McHardy , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events References: <1112319441.1089.83.camel@jzny.localdomain> <20050401042106.GA27762@gondor.apana.org.au> <1112353398.1096.116.camel@jzny.localdomain> <20050401114258.GA2932@gondor.apana.org.au> <1112358278.1096.160.camel@jzny.localdomain> <20050401123554.GA3468@gondor.apana.org.au> <1112360379.1096.193.camel@jzny.localdomain> In-Reply-To: <1112360379.1096.193.camel@jzny.localdomain> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1198 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Hello Jamal and Herbert, jamal wrote: > Let me review first. If it is valuable (we may have to leave expire > alone). If i can get it done within next day or two fine - else if i get > busyed out elsewhere i will hand it to you. Actually if you have plenty > cycles and are very enthusiastic about this i can hand it to you right > now ;-> Masahide and myself have some momentum going right now but i > dont think this will be that disruptive. > > >>You're right that the RFC isn't very clear. >> >>Let's forget about the RFC and simply consider the usefulness of this. >>I contend that it is useful to see a FLUSH notification even when >>it flushed nothing. >> >>The reason is that this is an indication to all listeners that the >>database is completely empty. >> > > > Ok, let me hear from Masahide-san: If he still holds the same opinion as > you then i will make the change. I think FLUSH should be sent in such case. Because flushing empty SADB/SPD is not an error (at current code), it is reasonable to broadcast it. Regards, -- Masahide NAKAMURA From dada1@cosmosbay.com Fri Apr 1 06:39:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 06:40:05 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31EdvAi005615 for ; Fri, 1 Apr 2005 06:39:58 -0800 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31Edm5v023180; Fri, 1 Apr 2005 16:39:49 +0200 Message-ID: <424D5D34.4030800@cosmosbay.com> Date: Fri, 01 Apr 2005 16:39:48 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> In-Reply-To: <20050331221352.13695124.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Fri, 01 Apr 2005 16:39:49 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1199 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Thu, 17 Mar 2005 20:52:44 +0100 > Eric Dumazet wrote: > > >> - Move the spinlocks out of tr_hash_table[] to a fixed size table : Saves a lot of memory (particulary on UP) > > > If spinlock_t is a zero sized structure on UP, how can this save memory > on UP? :-) Because I deleted the __attribute__((__aligned__(8))) constraint on struct rt_hash_bucket. So sizeof(struct rt_hash_bucket) is now 4 instead of 8 on 32 bits architectures. May I remind you some people still use 32 bits CPU ? :-) By the way I have an updated patch... surviving very serious loads. > > Anyways, I think perhaps you should dynamically allocate this lock table. Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > Otherwise it looks fine. > > From Robert.Olsson@data.slu.se Fri Apr 1 07:53:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 07:53:12 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Fr6ax007887 for ; Fri, 1 Apr 2005 07:53:07 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j31Fr21P015728; Fri, 1 Apr 2005 17:53:02 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 43674EE2B1; Fri, 1 Apr 2005 17:53:02 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16973.28254.203492.400896@robur.slu.se> Date: Fri, 1 Apr 2005 17:53:02 +0200 To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424D5D34.4030800@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1200 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Hello! Eric Dumazet writes: > By the way I have an updated patch... surviving very serious loads. Did you check for performance changes too? From what I understand we can add new lookup and cache miss in the fast packet path. > > Anyways, I think perhaps you should dynamically allocate this lock table. > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? IMO we should be careful with adding new complexity the route hash. Also was this dynamic behavior gc_interval needed to fix the overflow? gc_interval is only sort of last resort timer. --ro From greearb@candelatech.com Fri Apr 1 08:29:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:29:27 -0800 (PST) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GTLSX014658 for ; Fri, 1 Apr 2005 08:29:22 -0800 Received: from [4.33.45.22] (evrtwa1-ar2-4-33-045-022.evrtwa1.dsl-verizon.net [4.33.45.22]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j31GtHLH009322; Fri, 1 Apr 2005 08:55:17 -0800 Message-ID: <424D76DF.5070002@candelatech.com> Date: Fri, 01 Apr 2005 08:29:19 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.3) Gecko/20041020 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pekka Savola CC: "'netdev@oss.sgi.com'" Subject: Re: RFC: Redirect-Device References: <424C6089.1080507@candelatech.com> <424CDBA9.80703@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1202 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Pekka Savola wrote: > On Thu, 31 Mar 2005, Ben Greear wrote: > >>> Is there something in your problem statement I'm missing? >> >> >> That would be similar to what I'm doing, but I'm not really trying >> to tunnel anything. I am trying to duplicate the behaviour of two >> ethernet interfaces connected by an external cross-over cable, and I'm >> trying to duplicate it at the network-device interface level so that >> common tools (and my own tools) can treat these virtual interfaces >> just like ethernet interfaces. > > > Oh ok, what you seem to want is some kind of "Ethernet loopback++", but > the "looped" packets should come back from a virtual interface instead > of the same interface? Yes. In practice, I use a pair of virtual interfaces, so I send on one virtual and receive on the other. I use separate software to bridge, or the normal linux stacks to route, the packets to other interfaces, including real interfaces. > Btw, does the kernel support traditional loopback, so that at the last > stage, just before sending a packet on the wire, it would be pushed back. Not that I'm aware of. -- Ben Greear Candela Technologies Inc http://www.candelatech.com From dada1@cosmosbay.com Fri Apr 1 08:34:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:34:33 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GYSnC015284 for ; Fri, 1 Apr 2005 08:34:29 -0800 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31GYIaH026085; Fri, 1 Apr 2005 18:34:19 +0200 Message-ID: <424D780A.9000101@cosmosbay.com> Date: Fri, 01 Apr 2005 18:34:18 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Robert Olsson CC: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <16973.28254.203492.400896@robur.slu.se> In-Reply-To: <16973.28254.203492.400896@robur.slu.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Fri, 01 Apr 2005 18:34:19 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1203 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Robert Olsson a écrit : > Hello! > > Did you check for performance changes too? From what I understand > we can add new lookup and cache miss in the fast packet path. Performance is better because in case of stress (lot of incoming packets per second), the 1024 bytes of the locks are all in cache. As the size of the hash is divided by a 2 factor, rt_check_expire() and/or rt_garbage_collect() have to touch less cache lines. According to oprofile, an unpatched kernel was spending more than 15% of time in route.c routines, now I see ip_route_input() at 1.88% > > > > Anyways, I think perhaps you should dynamically allocate this lock table. > > > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > > IMO we should be careful with adding new complexity the route hash. > Also was this dynamic behavior gc_interval needed to fix the overflow? In my case yes, because I have huge route cache. > gc_interval is only sort of last resort timer. Actually not : gc_interval controls the rt_check_expire() to clean the hash table after use. All old enough entries can be deleted smoothly, on behalf of a timer tick (so network interrupts can still occur) I found it was better to adjust gc_interval to 1 (to let it fire every second and examine 1/300 table slots, or more if the dynamic behavior triggers), and ajust params so that rt_garbage_collect() doesnt run at all : rt_garbage_collect() can take forever to complete, blocking network trafic. Eric Dumazet From ak@muc.de Fri Apr 1 08:40:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:40:16 -0800 (PST) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Ge9g6015890 for ; Fri, 1 Apr 2005 08:40:10 -0800 Received: by one.firstfloor.org (Postfix, from userid 502) id 97B16D033E; Fri, 1 Apr 2005 18:40:07 +0200 (CEST) To: Rick Jones Cc: netdev@oss.sgi.com Subject: Re: [RFC] netif_rx: receive path optimization References: <20050330132815.605c17d0@dxpl.pdx.osdl.net> <20050331120410.7effa94d@dxpl.pdx.osdl.net> <1112303431.1073.67.camel@jzny.localdomain> <424C6A98.1070509@hp.com> From: Andi Kleen Date: Fri, 01 Apr 2005 18:40:07 +0200 In-Reply-To: <424C6A98.1070509@hp.com> (Rick Jones's message of "Thu, 31 Mar 2005 13:24:40 -0800") Message-ID: User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1204 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Rick Jones writes: > At the risk of again chewing on my toes (yum), if multiple CPUs are > pulling packets from the per-device queue there will be packet > reordering. HP-UX 10.0 did just that and it was quite nasty even at > low CPU counts (<=4). It was changed by HP-UX 10.20 (ca 1995) to > per-CPU queues with queue selection computed from packet headers (hash > the IP and TCP/UDP header to pick a CPU) It was called IPS for Inbound > Packet Scheduling. 11.0 (ca 1998) later changed that to "find where > the connection last ran and queue to that CPU" That was called TOPS - > Thread Optimized Packet Scheduling. We went over this a lot several years ago when Linux got multi threaded RX with softnet in 2.1. You might want to go over the archives. Some things that came out of it was a sender side TCP optimization to tolerate reordering without slowing down (works great with other Linux peers) and NAPI style polling mode (which was mostly designed for routing and still seems to have regressions for the client/server case :/) Something like TOPS was discussed, but afaik nobody ever implemented it. Of course benchmark guys do it manually by setting interrupt and scheduler affinity. -Andi From greearb@candelatech.com Fri Apr 1 08:58:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 08:59:02 -0800 (PST) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31GwuW4016989 for ; Fri, 1 Apr 2005 08:58:57 -0800 Received: from [4.33.45.22] (evrtwa1-ar2-4-33-045-022.evrtwa1.dsl-verizon.net [4.33.45.22]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j31HOoLH009680; Fri, 1 Apr 2005 09:24:51 -0800 Message-ID: <424D7DCC.5030202@candelatech.com> Date: Fri, 01 Apr 2005 08:58:52 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.3) Gecko/20041020 X-Accept-Language: en-us, en MIME-Version: 1.0 To: bert hubert CC: hadi@cyberus.ca, "David S. Miller" , netdev Subject: Re: RFC: Redirect-Device References: <424C6089.1080507@candelatech.com> <1112303627.1073.71.camel@jzny.localdomain> <424C6B10.6030200@candelatech.com> <1112306031.1073.109.camel@jzny.localdomain> <424C7813.4000101@candelatech.com> <20050331143531.30f4eb8f.davem@davemloft.net> <424C7F96.4070002@candelatech.com> <1112311618.1090.20.camel@jzny.localdomain> <424C8E2C.70302@candelatech.com> <20050401090116.GA21361@outpost.ds9a.nl> In-Reply-To: <20050401090116.GA21361@outpost.ds9a.nl> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1205 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev bert hubert wrote: > On Thu, Mar 31, 2005 at 03:56:28PM -0800, Ben Greear wrote: > > >>>I think you are more comfortable with using netdevices and ioctls and >>>/proc. >> >>Definately. Ever tried to sniff a socket with ethereal? :) > > > On loopback, all the time. I'm probably dense but I don't understand what > problem you've solved with this interface. Could you elaborate a bit? It allows me to place a software bridge that can intercept all packets from user-space via raw packet sockets, and kernel space via registering an 'all' protocol on the device. Please note that to bridge in this manner I have to remove the IP protocol (set IP to 0.0.0.0), otherwise the IP stack can interfere with the bridging behaviour. By using a virtual pair of interfaces that are looped back, I can add an IP to the second virtual network interface that does not interfere with the two bridged interfaces (one physical, one redirect, both with 0.0.0.0 IP addresses). If there were an API to register handlers dynamically that act like the netpoll hook (ie, with ability to consume frames), then I would not have to remove the IP from the physical interface and I probably would not have had to create these redirect devices. But, when I was suggesting such a hook in the past, it was shot down because it could allow someone to write their own TCP stack, and the network guys did not want to allow this possibility. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From Robert.Olsson@data.slu.se Fri Apr 1 09:26:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:26:46 -0800 (PST) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HQfZm018140 for ; Fri, 1 Apr 2005 09:26:42 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j31HQWQG025702; Fri, 1 Apr 2005 19:26:32 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 9CDC6EE2B1; Fri, 1 Apr 2005 19:26:32 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16973.33864.613333.389857@robur.slu.se> Date: Fri, 1 Apr 2005 19:26:32 +0200 To: Eric Dumazet Cc: Robert Olsson , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() In-Reply-To: <424D780A.9000101@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <16973.28254.203492.400896@robur.slu.se> <424D780A.9000101@cosmosbay.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1206 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Eric Dumazet writes: > According to oprofile, an unpatched kernel was spending more than 15% of time in route.c routines, now I see ip_route_input() at 1.88% Would like to see absolute numbers for UP/SMP single flow and DoS to be confident. > I found it was better to adjust gc_interval to 1 (to let it fire every second and examine 1/300 table slots, or more if the dynamic behavior > triggers), and ajust params so that rt_garbage_collect() doesnt run at all : rt_garbage_collect() can take forever to complete, blocking > network trafic. I don't think you can depend on timer for GC solely. Timer tick is eternity for todays packet rates. You can distribute the GC load by allowing it to run more frequent this in combination with huge cache seems to be a very interesting approach given that you have memory. --ro From nakam@linux-ipv6.org Fri Apr 1 09:28:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:28:20 -0800 (PST) Received: from mail406.noc.n-bone.net (mail4.noc.n-bone.net [138.243.50.144]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HSFxd018552 for ; Fri, 1 Apr 2005 09:28:16 -0800 Received: from [192.168.2.195] (polaris.linux-ipv6.org [203.178.140.10]) by mail406.noc.n-bone.net (NBONE-MTA) with ESMTP id BDA70AE5; Sat, 2 Apr 2005 02:28:09 +0900 (JST) Message-ID: <424D84A7.6060707@linux-ipv6.org> Date: Sat, 02 Apr 2005 02:28:07 +0900 From: Masahide NAKAMURA User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca, Herbert Xu Cc: Patrick McHardy , "David S. Miller" , netdev Subject: Re: PATCH: IPSEC xfrm events References: <1112319441.1089.83.camel@jzny.localdomain> In-Reply-To: <1112319441.1089.83.camel@jzny.localdomain> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1207 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nakam@linux-ipv6.org Precedence: bulk X-list: netdev Jamal and Herbert, jamal wrote: > Herbert et al, > > Ok, heres the final patch with all the changes discussed. > > include/linux/xfrm.h | 2 > include/net/xfrm.h | 29 ++++++- > net/key/af_key.c | 24 +++++- > net/xfrm/xfrm_policy.c | 25 ++++-- > net/xfrm/xfrm_state.c | 84 +++++++++++++++++++-- > net/xfrm/xfrm_user.c | 188 > ++++++++++++++++++++++++++++++++++++++++++++++++- > 6 files changed, 323 insertions(+), 29 deletions(-) > > I have tested this with both setkey and iproute2 (about 10 scenarios or > so). Masahide-san is doing a lot more thorough testing with key servers > as well. He has not tested this patch yet (time difference) but it is > based on the last one he tested. Short report: I've tested on this patched kernel and it works. - add/del/flush for SA/SP and allocspi/acquire/upd for SA through netlink socket - racoon runs fine (pfkey works for normal operation) both without and with opening netlink socket to listen Since we have discussion which is still going on about the patch, the code will be change and I'll need to test again anyway. Thanks, -- Masahide NAKAMURA From roland@topspin.com Fri Apr 1 09:53:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 09:54:00 -0800 (PST) Received: from exch-1.topspincom.com (webmail.topspin.com [12.162.17.3]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31HrqZc019816 for ; Fri, 1 Apr 2005 09:53:53 -0800 Received: from localhost.localdomain ([10.3.1.93]) by exch-1.topspincom.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 1 Apr 2005 09:45:33 -0800 Received: by localhost.localdomain (Postfix, from userid 1113) id 7EA6C4FDF2; Fri, 1 Apr 2005 09:45:33 -0800 (PST) To: akpm@osdl.org Cc: linux-kernel@vger.kernel.org, openib-general@openib.org, netdev@oss.sgi.com, davem@davemloft.net Subject: [PATCH][4/3] IPoIB: document conversion to debugfs X-Message-Flag: Warning: May contain useful information References: <20053311936.XaQmN4N9new7dTCP@topspin.com> From: Roland Dreier Date: Fri, 01 Apr 2005 09:45:33 -0800 In-Reply-To: <20053311936.XaQmN4N9new7dTCP@topspin.com> (Roland Dreier's message of "Thu, 31 Mar 2005 19:36:12 -0800") Message-ID: <52r7hujsqq.fsf@topspin.com> User-Agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Jumbo Shrimp, linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OriginalArrivalTime: 01 Apr 2005 17:45:33.0676 (UTC) FILETIME=[9AC0C2C0:01C536E2] X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1208 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: roland@topspin.com Precedence: bulk X-list: netdev Update IPoIB documentation now that multicast debugging files have moved from ipoibdebugfs to debugfs. Signed-off-by: Roland Dreier --- linux-export.orig/Documentation/infiniband/ipoib.txt 2005-03-31 19:07:01.000000000 -0800 +++ linux-export/Documentation/infiniband/ipoib.txt 2005-04-01 09:43:27.122520190 -0800 @@ -32,14 +32,13 @@ mcast_debug_level to 1. These parameters can be controlled at runtime through files in /sys/module/ib_ipoib/. - CONFIG_INFINIBAND_IPOIB_DEBUG also enables the "ipoib_debugfs" + CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs virtual filesystem. By mounting this filesystem, for example with - mkdir -p /ipoib_debugfs - mount -t ipoib_debugfs none /ipoib_debufs + mount -t debugfs none /sys/kernel/debug - it is possible to get statistics about multicast groups from the - files /ipoib_debugfs/ib0_mcg and so on. + it is possible to get statistics about munlticast groups from the + files /sys/kernel/debug/ipoib/ib0_mcg and so on. The performance impact of this option is negligible, so it is safe to enable this option with debug_level set to 0 for normal From rick.jones2@hp.com Fri Apr 1 10:55:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 10:56:03 -0800 (PST) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31Itxgb022131 for ; Fri, 1 Apr 2005 10:55:59 -0800 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel11.hp.com (Postfix) with ESMTP id 29A4E1F36E7 for ; Fri, 1 Apr 2005 10:22:52 -0800 (PST) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id KAA01022 for ; Fri, 1 Apr 2005 10:22:51 -0800 (PST) Message-ID: <424D917B.2060108@hp.com> Date: Fri, 01 Apr 2005 10:22:51 -0800 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev Subject: Re: [RFC] netif_rx: receive path optimization References: <20050330132815.605c17d0@dxpl.pdx.osdl.net> <20050331120410.7effa94d@dxpl.pdx.osdl.net> <1112303431.1073.67.camel@jzny.localdomain> <424C6A98.1070509@hp.com> <1112305084.1073.94.camel@jzny.localdomain> <424C7CDC.8050801@hp.com> <1112312206.1096.25.camel@jzny.localdomain> <424C90DA.7030600@hp.com> <1112318229.1090.63.camel@jzny.localdomain> In-Reply-To: <1112318229.1090.63.camel@jzny.localdomain> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1209 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev >>The main idea behind TOPS and prior to that IPS was to spread-out >>the processing of packets across as many CPUs as we could, as "correctly" as we >>could. > > > Very very hard to do. Why do you say that? "Correct" can be defined as either the same CPU for each packet in a given flow (IPS) or the same CPU as last accessed the endpoint (TOPS). > Isnt MSI supposed to give you ability such that a > NIC can pick a CPU to interupt? That would help in a small way That gives the NIC the knowledge of how to direct to a CPU, but as you know does not tell it how to decide where. Since I doubt that the NIC wants to reach-out and touch connection state in the host (nor I suppose do we want it to either) the best a NIC with MSI could do would be IPS >>TOPS lets the process (I suppose the scheduler really) decide where some of the >>processing for the packet will happen - the part after the handoff. >> > > I think this last part should be easy to do - but perhaps the expense of > landing on the wrong CPU may override any benefits perceived. Unless one has a scheduler that likes to migrate processes, the chances of landing on the wrong CPU are minimal and shortlived, and overall, the chances of being right are greater than if not doing anything and sticking with the interrupt CPU. (Handwaving based on experience-driven intuition and a bit of math as one increases the CPU count) This is all on the premis that one is running with numNIC << numCPU. With numNIC == numCPU one does things as seen in certain networking-intensive benchmarks :) rick jones From shemminger@osdl.org Fri Apr 1 12:07:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:07:41 -0800 (PST) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31K7aJG024341 for ; Fri, 1 Apr 2005 12:07:36 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j31K7Rs4028918 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 1 Apr 2005 12:07:27 -0800 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [172.20.1.103]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j31K7RLd030565; Fri, 1 Apr 2005 12:07:27 -0800 Date: Fri, 1 Apr 2005 12:07:27 -0800 From: Stephen Hemminger To: lartc@mailman.ds9a.nl, linux-kernel@vger.kernel.org Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [ANNOUNCE] iproute2 2.6.11-050330 Message-ID: <20050401120727.62700e8c@dxpl.pdx.osdl.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.106 $ X-Scanned-By: MIMEDefang 2.36 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1210 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev An updated version of the iproute2 utilities is available at: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.11-050330.tar.gz It supports the latest features from 2.6, but is backwards compatiable with 2.4. This update includes several bugfixes and build clean from the previous version (2.6.11-050314): [Jamal Hadi Salim] * Proper verison of iptables headers (from 1.3.1) * Set revision file in m_ipt * Fix action_util naming in mirred * don't call ll_init_map in mirred [Thomas Graf] * Warn about wildcard deletions and provide IFA_ADDRESS upon deletions to enforce prefix length validation for IPv4. * Fix netlink message alignment when the last routing attribute added has a data length not aligned to RTA_ALIGNTO. [Masahide NAKAMURA] * ipv6 xfrm allocspi and monitor support. [Stephen Hemminger] * include/linux/netfilter_ipv4/ip_tables.h dont include compiler.h because it isn't needed and not on all systems * Update rtnetlink.h and pkt_cls.h to be stripped versions of headers from 2.6.12-rc1 * switch to stack for netem tables * add -force option to batch mode * handle midline comments in batch mode * sum per cpu fields in lnstat correctly From sds@tycho.nsa.gov Fri Apr 1 12:15:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:15:29 -0800 (PST) Received: from jazzhorn.ncsc.mil (mummy.ncsc.mil [144.51.88.129]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KFLo9025229 for ; Fri, 1 Apr 2005 12:15:22 -0800 Received: from tycho.ncsc.mil (jazzhorn.ncsc.mil [144.51.5.9]) by jazzhorn.ncsc.mil (8.12.10/8.12.10) with ESMTP id j31KBvhV026499; Fri, 1 Apr 2005 20:11:57 GMT Received: from moss-spartans.epoch.ncsc.mil (moss-spartans [144.51.25.121]) by tycho.ncsc.mil (8.12.8/8.12.8) with ESMTP id j31KG5Do015003; Fri, 1 Apr 2005 15:16:05 -0500 (EST) Subject: [PATCH] Fix SELinux for removal of i_sock From: Stephen Smalley To: "David S. Miller" , James Morris , lkml , netdev@oss.sgi.com, matthew@wil.cx Content-Type: text/plain Organization: National Security Agency Date: Fri, 01 Apr 2005 15:06:37 -0500 Message-Id: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-14) Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1211 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: sds@tycho.nsa.gov Precedence: bulk X-list: netdev Hi, This patch against -bk eliminates the use of i_sock by SELinux as it appears to have been removed recently, breaking the build of SELinux in -bk. Simply replacing the i_sock test with an S_ISSOCK test would be unsafe in the SELinux code, as the latter will also return true for the inodes of socket files in the filesystem, not just the actual socket objects IIUC. Hence this patch reworks the SELinux code to avoid the need to apply such a test in the first place, part of which was obsoleted anyway by earlier changes to SELinux. Please apply. Signed-off-by: Stephen Smalley Signed-off-by: James Morris security/selinux/hooks.c | 21 +++------------------ 1 files changed, 3 insertions(+), 18 deletions(-) ===== security/selinux/hooks.c 1.93 vs edited ===== --- 1.93/security/selinux/hooks.c 2005-03-28 17:21:19 -05:00 +++ edited/security/selinux/hooks.c 2005-04-01 15:01:58 -05:00 @@ -877,18 +877,8 @@ static int inode_doinit_with_dentry(stru isec->initialized = 1; out: - if (inode->i_sock) { - struct socket *sock = SOCKET_I(inode); - if (sock->sk) { - isec->sclass = socket_type_to_security_class(sock->sk->sk_family, - sock->sk->sk_type, - sock->sk->sk_protocol); - } else { - isec->sclass = SECCLASS_SOCKET; - } - } else { + if (isec->sclass == SECCLASS_FILE) isec->sclass = inode_mode_to_security_class(inode->i_mode); - } if (hold_sem) up(&isec->sem); @@ -2979,18 +2969,15 @@ out: static void selinux_socket_post_create(struct socket *sock, int family, int type, int protocol, int kern) { - int err; struct inode_security_struct *isec; struct task_security_struct *tsec; - err = inode_doinit(SOCK_INODE(sock)); - if (err < 0) - return; isec = SOCK_INODE(sock)->i_security; tsec = current->security; isec->sclass = socket_type_to_security_class(family, type, protocol); isec->sid = kern ? SECINITSID_KERNEL : tsec->sid; + isec->initialized = 1; return; } @@ -3158,14 +3145,12 @@ static int selinux_socket_accept(struct if (err) return err; - err = inode_doinit(SOCK_INODE(newsock)); - if (err < 0) - return err; newisec = SOCK_INODE(newsock)->i_security; isec = SOCK_INODE(sock)->i_security; newisec->sclass = isec->sclass; newisec->sid = isec->sid; + newisec->initialized = 1; return 0; } -- Stephen Smalley National Security Agency From davem@davemloft.net Fri Apr 1 12:28:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:29:01 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KSt7x029634 for ; Fri, 1 Apr 2005 12:28:55 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHSkM-0002UR-00; Fri, 01 Apr 2005 12:28:02 -0800 Date: Fri, 1 Apr 2005 12:28:02 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401122802.7c71afbc.davem@davemloft.net> In-Reply-To: <424D5D34.4030800@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1212 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 16:39:48 +0200 Eric Dumazet wrote: > > If spinlock_t is a zero sized structure on UP, how can this save memory > > on UP? :-) > > Because I deleted the __attribute__((__aligned__(8))) constraint on struct rt_hash_bucket. Right. > > Anyways, I think perhaps you should dynamically allocate this lock table. > > Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? Even for NR_CPUS, I think the table should be dynamically allocated. It is a goal to eliminate all of these huge arrays in the static kernel image, which has grown incredibly too much in recent times. I work often to eliminate such things, let's not add new ones :-) From davem@davemloft.net Fri Apr 1 12:36:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 12:36:18 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31KaD6Z030336 for ; Fri, 1 Apr 2005 12:36:13 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHSrQ-0002Xi-00; Fri, 01 Apr 2005 12:35:20 -0800 Date: Fri, 1 Apr 2005 12:35:20 -0800 From: "David S. Miller" To: Stephen Smalley Cc: jmorris@redhat.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com, matthew@wil.cx Subject: Re: [PATCH] Fix SELinux for removal of i_sock Message-Id: <20050401123520.7532528b.davem@davemloft.net> In-Reply-To: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> References: <1112385997.14481.192.camel@moss-spartans.epoch.ncsc.mil> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1213 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 15:06:37 -0500 Stephen Smalley wrote: > This patch against -bk eliminates the use of i_sock by SELinux as it > appears to have been removed recently, breaking the build of SELinux in > -bk. Simply replacing the i_sock test with an S_ISSOCK test would be > unsafe in the SELinux code, as the latter will also return true for the > inodes of socket files in the filesystem, not just the actual socket > objects IIUC. Hence this patch reworks the SELinux code to avoid the > need to apply such a test in the first place, part of which was > obsoleted anyway by earlier changes to SELinux. Please apply. > > Signed-off-by: Stephen Smalley > Signed-off-by: James Morris Applied, thanks Stephen. From dada1@cosmosbay.com Fri Apr 1 13:05:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:05:58 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L5plG031537 for ; Fri, 1 Apr 2005 13:05:52 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31L5csf030409; Fri, 1 Apr 2005 23:05:43 +0200 Message-ID: <424DB7A1.8090803@cosmosbay.com> Date: Fri, 01 Apr 2005 23:05:37 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> In-Reply-To: <20050401122802.7c71afbc.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Fri, 01 Apr 2005 23:05:44 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1214 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Fri, 01 Apr 2005 16:39:48 +0200 > Eric Dumazet wrote: > >>Maybe I should make a static sizing, (replace the 256 constant by something based on MAX_CPUS) ? > > > Even for NR_CPUS, I think the table should be dynamically allocated. > > It is a goal to eliminate all of these huge arrays in the static > kernel image, which has grown incredibly too much in recent times. > I work often to eliminate such things, let's not add new ones :-) You mean you prefer : static spinlock_t *rt_hash_lock ; /* rt_hash_lock = alloc_memory_at_boot_time(...) */ instead of static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; In both cases, memory is taken from lowmem, and size of kernel image is roughly the same (bss section takes no space in image) Then the runtime cost is more expensive in the 'dynamic case' because of the extra indirection... ? From jheffner@psc.edu Fri Apr 1 13:05:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:06:03 -0800 (PST) Received: from mailer2.psc.edu (mailer2.psc.edu [128.182.66.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L5ufx031548 for ; Fri, 1 Apr 2005 13:05:56 -0800 Received: from dexter.psc.edu (dexter.psc.edu [128.182.61.232]) by mailer2.psc.edu (8.13.3/8.13.3) with ESMTP id j31LAYiG018305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Apr 2005 16:10:38 -0500 (EST) Received: from dexter.psc.edu (localhost.psc.edu [127.0.0.1]) by dexter.psc.edu (8.12.11/8.12.10) with ESMTP id j31L5nhA018741; Fri, 1 Apr 2005 16:05:50 -0500 Received: from localhost (jheffner@localhost) by dexter.psc.edu (8.12.11/8.12.11/Submit) with ESMTP id j31L5nZa018738; Fri, 1 Apr 2005 16:05:49 -0500 X-Authentication-Warning: dexter.psc.edu: jheffner owned process doing -bs Date: Fri, 1 Apr 2005 16:05:49 -0500 (EST) From: John Heffner To: davem@davemloft.net, netdev@oss.sgi.com Subject: [PATCH] skb pcount with MTU discovery Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1215 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev The problem is that when doing MTU discovery, the too-large segments in the write queue will be calculated as having a pcount of >1. When tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test when pcount > cwnd. The segments are eventually transmitted one at a time by keepalive, but this can take a long time. This patch checks if TSO is enabled when setting pcount. -John Signed-off-by: John Heffner ===== include/net/tcp.h 1.114 vs edited ===== --- 1.114/include/net/tcp.h 2005-03-31 11:51:09 -05:00 +++ edited/include/net/tcp.h 2005-04-01 14:44:13 -05:00 @@ -1470,19 +1470,20 @@ tcp_minshall_check(tp)))); } -extern void tcp_set_skb_tso_segs(struct sk_buff *, unsigned int); +extern void tcp_set_skb_tso_segs(struct sock *, struct sk_buff *); /* This checks if the data bearing packet SKB (usually sk->sk_send_head) * should be put on the wire right now. */ -static __inline__ int tcp_snd_test(const struct tcp_sock *tp, +static __inline__ int tcp_snd_test(struct sock *sk, struct sk_buff *skb, unsigned cur_mss, int nonagle) { + struct tcp_sock *tp = tcp_sk(sk); int pkts = tcp_skb_pcount(skb); if (!pkts) { - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); pkts = tcp_skb_pcount(skb); } @@ -1543,7 +1544,7 @@ if (skb) { if (!tcp_skb_is_last(sk, skb)) nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(tp, skb, cur_mss, nonagle) || + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || tcp_write_xmit(sk, nonagle)) tcp_check_probe_timer(sk, tp); } @@ -1561,7 +1562,7 @@ struct sk_buff *skb = sk->sk_send_head; return (skb && - tcp_snd_test(tp, skb, tcp_current_mss(sk, 1), + tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), tcp_skb_is_last(sk, skb) ? TCP_NAGLE_PUSH : tp->nonagle)); } ===== net/ipv4/tcp_output.c 1.90 vs edited ===== --- 1.90/net/ipv4/tcp_output.c 2005-04-01 09:08:34 -05:00 +++ edited/net/ipv4/tcp_output.c 2005-04-01 14:45:27 -05:00 @@ -433,7 +433,7 @@ struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb = sk->sk_send_head; - if (tcp_snd_test(tp, skb, cur_mss, TCP_NAGLE_PUSH)) { + if (tcp_snd_test(sk, skb, cur_mss, TCP_NAGLE_PUSH)) { /* Send it out now. */ TCP_SKB_CB(skb)->when = tcp_time_stamp; tcp_tso_set_push(skb); @@ -446,9 +446,12 @@ } } -void tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_std) +void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) { - if (skb->len <= mss_std) { + struct tcp_sock *tp = tcp_sk(sk); + + if (skb->len <= tp->mss_cache_std || + !(sk->sk_route_caps & NETIF_F_TSO)) { /* Avoid the costly divide in the normal * non-TSO case. */ @@ -457,10 +460,10 @@ } else { unsigned int factor; - factor = skb->len + (mss_std - 1); - factor /= mss_std; + factor = skb->len + (tp->mss_cache_std - 1); + factor /= tp->mss_cache_std; skb_shinfo(skb)->tso_segs = factor; - skb_shinfo(skb)->tso_size = mss_std; + skb_shinfo(skb)->tso_size = tp->mss_cache_std; } } @@ -531,8 +534,8 @@ } /* Fix up tso_factor for both original and new SKB. */ - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); - tcp_set_skb_tso_segs(buff, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); + tcp_set_skb_tso_segs(sk, buff); if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) { tp->lost_out += tcp_skb_pcount(skb); @@ -607,7 +610,7 @@ * factor and mss. */ if (tcp_skb_pcount(skb) > 1) - tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb)); + tcp_set_skb_tso_segs(sk, skb); return 0; } @@ -815,7 +818,7 @@ sk_stream_free_skb(sk, skb); } else { TCP_SKB_CB(skb)->seq += copy; - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); } len += copy; @@ -824,7 +827,7 @@ __skb_insert(nskb, skb->prev, skb, &sk->sk_write_queue); sk->sk_send_head = nskb; - tcp_set_skb_tso_segs(nskb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, nskb); /* We're ready to send. If this fails, the probe will * be resegmented into mss-sized pieces by tcp_write_xmit(). */ @@ -885,7 +888,7 @@ mss_now = tcp_current_mss(sk, 1); while ((skb = sk->sk_send_head) && - tcp_snd_test(tp, skb, mss_now, + tcp_snd_test(sk, skb, mss_now, tcp_skb_is_last(sk, skb) ? nonagle : TCP_NAGLE_PUSH)) { if (skb->len > mss_now) { @@ -1822,7 +1825,7 @@ tp->mss_cache = tp->mss_cache_std; } } else if (!tcp_skb_pcount(skb)) - tcp_set_skb_tso_segs(skb, tp->mss_cache_std); + tcp_set_skb_tso_segs(sk, skb); TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH; TCP_SKB_CB(skb)->when = tcp_time_stamp; From davem@davemloft.net Fri Apr 1 13:09:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:09:27 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31L9Nls032679 for ; Fri, 1 Apr 2005 13:09:23 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHTNY-0002m8-00; Fri, 01 Apr 2005 13:08:32 -0800 Date: Fri, 1 Apr 2005 13:08:32 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401130832.1f972a3b.davem@davemloft.net> In-Reply-To: <424DB7A1.8090803@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1216 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 23:05:37 +0200 Eric Dumazet wrote: > You mean you prefer : > > static spinlock_t *rt_hash_lock ; /* rt_hash_lock = > alloc_memory_at_boot_time(...) */ > > instead of > > static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; > > In both cases, memory is taken from lowmem, and size of kernel image > is roughly the same (bss section takes no space in image) In the former case the kernel image the bootloader has to load is smaller. That's important, believe it or not. It means less TLB entries need to be locked permanently into the MMU on certain platforms. From davem@davemloft.net Fri Apr 1 13:11:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:11:42 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LBaXI000825 for ; Fri, 1 Apr 2005 13:11:36 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHTPh-0002mS-00; Fri, 01 Apr 2005 13:10:45 -0800 Date: Fri, 1 Apr 2005 13:10:45 -0800 From: "David S. Miller" To: John Heffner Cc: netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery Message-Id: <20050401131045.4e558f65.davem@davemloft.net> In-Reply-To: References: X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1217 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 1 Apr 2005 16:05:49 -0500 (EST) John Heffner wrote: > The problem is that when doing MTU discovery, the too-large segments in > the write queue will be calculated as having a pcount of >1. When > tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test > when pcount > cwnd. > > The segments are eventually transmitted one at a time by keepalive, but > this can take a long time. > > This patch checks if TSO is enabled when setting pcount. Why isn't the MSS properly updated at this point in time? If it were, the pcount setting would do the right thing. That's how this code is supposed to work. From jheffner@psc.edu Fri Apr 1 13:23:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:23:13 -0800 (PST) Received: from mailer2.psc.edu (mailer2.psc.edu [128.182.66.106]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LN5Pf001621 for ; Fri, 1 Apr 2005 13:23:06 -0800 Received: from dexter.psc.edu (dexter.psc.edu [128.182.61.232]) by mailer2.psc.edu (8.13.3/8.13.3) with ESMTP id j31LRi33009348 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Apr 2005 16:27:48 -0500 (EST) Received: from dexter.psc.edu (localhost.psc.edu [127.0.0.1]) by dexter.psc.edu (8.12.11/8.12.10) with ESMTP id j31LMxdx018810; Fri, 1 Apr 2005 16:22:59 -0500 Received: from localhost (jheffner@localhost) by dexter.psc.edu (8.12.11/8.12.11/Submit) with ESMTP id j31LMx4H018807; Fri, 1 Apr 2005 16:22:59 -0500 X-Authentication-Warning: dexter.psc.edu: jheffner owned process doing -bs Date: Fri, 1 Apr 2005 16:22:59 -0500 (EST) From: John Heffner To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: [PATCH] skb pcount with MTU discovery In-Reply-To: <20050401131045.4e558f65.davem@davemloft.net> Message-ID: References: <20050401131045.4e558f65.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1218 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev On Fri, 1 Apr 2005, David S. Miller wrote: > On Fri, 1 Apr 2005 16:05:49 -0500 (EST) > John Heffner wrote: > > > The problem is that when doing MTU discovery, the too-large segments in > > the write queue will be calculated as having a pcount of >1. When > > tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test > > when pcount > cwnd. > > > > The segments are eventually transmitted one at a time by keepalive, but > > this can take a long time. > > > > This patch checks if TSO is enabled when setting pcount. > > Why isn't the MSS properly updated at this point in time? > If it were, the pcount setting would do the right thing. > > That's how this code is supposed to work. The problem occurs when TSO is disabled. Common case, start out with mss of 8948. Send 2 segments; neither are acknowledged, and we receive an ICMP can't fragment indicating a pmtu of 1500 so mss is set down to 1448. Now tcp_set_skb_tso_segs() sets tso_segs to 6, so tcp_snd_test thinks we are doing TSO and will send the full 6 mss, and fails the cwnd test since cwnd == 2. -John From colin@colino.net Fri Apr 1 13:28:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:28:11 -0800 (PST) Received: from paperstreet.colino.net (colino.net [213.41.131.56]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LS3F5002218 for ; Fri, 1 Apr 2005 13:28:04 -0800 Received: by paperstreet.colino.net (Postfix, from userid 1015) id 3D0C3101D9; Fri, 1 Apr 2005 23:27:52 +0200 (CEST) Received: from jack.colino.net (jack.colino.net [192.168.0.11]) by paperstreet.colino.net (Postfix) with ESMTP id 974A9101A2; Fri, 1 Apr 2005 23:27:49 +0200 (CEST) Date: Fri, 1 Apr 2005 23:27:47 +0200 From: Colin Leroy To: David Brownell Cc: linux-usb-devel@lists.sourceforge.net, Andrew Morton , Jeroen Vreeken , netdev@oss.sgi.com Subject: Re: [linux-usb-devel] [PATCH] PM support for zd1201 Message-ID: <20050401232747.3f9ed365@jack.colino.net> In-Reply-To: <200504011030.57978.david-b@pacbell.net> References: <20050330144423.0dde5b71@jack.colino.net> <200504011030.57978.david-b@pacbell.net> X-Mailer: Sylpheed-Claws 1.9.6cvs18 (GTK+ 2.6.4; powerpc-unknown-linux-gnu) X-Face: Fy:*XpRna1/tz}cJ@O'0^:qYs:8b[Rg`*8,+o^[fI?<%5LeB,Xz8ZJK[r7V0hBs8G)*&C+XA0qHoR=LoTohe@7X5K$A-@cN6n~~J/]+{[)E4h'lK$13WQf$.R+Pi;E09tk&{t|;~dakRD%CLHrk6m!?gA,5|Sb=fJ=>[9#n1Bu8?VngkVM4{'^'V_qgdA.8yn3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1219 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: colin@colino.net Precedence: bulk X-list: netdev On 01 Apr 2005 at 10h04, David Brownell wrote: Hi, > Looked ok to me, other than needing to change "u32 state" into > a "pm_message_t message". And I'm not sure why "mac_enabled" > would be the right test, rather than maybe netif_running(). Here it is. Signed-off-by: Colin Leroy --- drivers/usb/net/zd1201.c.orig 2005-03-30 14:35:23.000000000 +0200 +++ drivers/usb/net/zd1201.c 2005-04-01 23:24:04.000000000 +0200 @@ -1896,12 +1896,50 @@ kfree(zd); } +#ifdef CONFIG_PM + +static int zd1201_suspend (struct usb_interface *interface, + pm_message_t message) +{ + struct zd1201 *zd = (struct zd1201 *)usb_get_intfdata(interface); + + netif_device_detach(zd->dev); + + zd->was_enabled = zd->mac_enabled; + + if (zd->was_enabled) + return zd1201_disable(zd); + else + return 0; +} + +static int zd1201_resume (struct usb_interface *interface) +{ + struct zd1201 *zd = (struct zd1201 *)usb_get_intfdata(interface); + + netif_device_attach(zd->dev); + + if (zd->was_enabled) + return zd1201_enable(zd); + else + return 0; +} + +#else + +#define zd1201_suspend NULL +#define zd1201_resume NULL + +#endif + struct usb_driver zd1201_usb = { .owner = THIS_MODULE, .name = "zd1201", .probe = zd1201_probe, .disconnect = zd1201_disconnect, .id_table = zd1201_table, + .suspend = zd1201_suspend, + .resume = zd1201_resume, }; static int __init zd1201_init(void) --- drivers/usb/net/zd1201.h.orig 2005-03-30 14:35:36.000000000 +0200 +++ drivers/usb/net/zd1201.h 2005-03-30 14:24:33.000000000 +0200 @@ -46,6 +46,7 @@ char essid[IW_ESSID_MAX_SIZE+1]; int essidlen; int mac_enabled; + int was_enabled; int monitor; int encode_enabled; int encode_restricted; From dada1@cosmosbay.com Fri Apr 1 13:43:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 13:43:58 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31LhqNT003067 for ; Fri, 1 Apr 2005 13:43:53 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31Lhd6I031012; Fri, 1 Apr 2005 23:43:45 +0200 Message-ID: <424DC08A.3020204@cosmosbay.com> Date: Fri, 01 Apr 2005 23:43:38 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> In-Reply-To: <20050401130832.1f972a3b.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [62.23.185.226]); Fri, 01 Apr 2005 23:43:45 +0200 (CEST) X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1220 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev David S. Miller a écrit : > On Fri, 01 Apr 2005 23:05:37 +0200 > Eric Dumazet wrote: > > >>You mean you prefer : >> >>static spinlock_t *rt_hash_lock ; /* rt_hash_lock = >>alloc_memory_at_boot_time(...) */ >> >>instead of >> >>static spinlock_t rt_hash_lock[RT_HASH_LOCK_SZ] ; >> >>In both cases, memory is taken from lowmem, and size of kernel image >>is roughly the same (bss section takes no space in image) > > > In the former case the kernel image the bootloader has to > load is smaller. That's important, believe it or not. It > means less TLB entries need to be locked permanently into > the MMU on certain platforms. > > OK thanks for this clarification. I changed to : #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) /* * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks * The size of this table is a power of two and depends on the number of CPUS. */ #if NR_CPUS >= 32 #define RT_HASH_LOCK_SZ 4096 #elif NR_CPUS >= 16 #define RT_HASH_LOCK_SZ 2048 #elif NR_CPUS >= 8 #define RT_HASH_LOCK_SZ 1024 #elif NR_CPUS >= 4 #define RT_HASH_LOCK_SZ 512 #else #define RT_HASH_LOCK_SZ 256 #endif static spinlock_t *rt_hash_locks; # define rt_hash_lock_addr(slot) &rt_hash_locks[slot & (RT_HASH_LOCK_SZ - 1)] # define rt_hash_lock_init() { \ int i; \ rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ, GFP_KERNEL); \ if (!rt_hash_locks) panic("IP: failed to allocate rt_hash_locks\n"); \ for (i = 0; i < RT_HASH_LOCK_SZ; i++) \ spin_lock_init(&rt_hash_locks[i]); \ } #else # define rt_hash_lock_addr(slot) NULL # define rt_hash_lock_init() #endif Are you OK if I also use alloc_large_system_hash() to allocate rt_hash_table, instead of the current method ? This new method is used in net/ipv4/tcp.c for tcp_ehash and tcp_bhash and permits NUMA tuning. Eric From davem@davemloft.net Fri Apr 1 14:35:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 14:35:47 -0800 (PST) Received: from cheetah.davemloft.net (mail@dsl027-180-174.sfo1.dsl.speakeasy.net [216.27.180.174]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31MZbk5005035 for ; Fri, 1 Apr 2005 14:35:37 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1DHUiw-0003Dx-00; Fri, 01 Apr 2005 14:34:42 -0800 Date: Fri, 1 Apr 2005 14:34:42 -0800 From: "David S. Miller" To: Eric Dumazet Cc: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() Message-Id: <20050401143442.62ed8bb9.davem@davemloft.net> In-Reply-To: <424DC08A.3020204@cosmosbay.com> References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> <424DC08A.3020204@cosmosbay.com> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1221 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev On Fri, 01 Apr 2005 23:43:38 +0200 Eric Dumazet wrote: > Are you OK if I also use alloc_large_system_hash() to allocate > rt_hash_table, instead of the current method ? This new method is used > in net/ipv4/tcp.c for tcp_ehash and tcp_bhash and permits NUMA tuning. Sure, that's fine. BTW, please line-wrap your emails. :-/ From herbert@gondor.apana.org.au Fri Apr 1 14:48:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 14:48:50 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31MmfmE005961 for ; Fri, 1 Apr 2005 14:48:42 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DHUvx-0000Cu-00; Sat, 02 Apr 2005 08:48:09 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DHUvP-00050C-00; Sat, 02 Apr 2005 08:47:35 +1000 From: Herbert Xu To: jheffner@psc.edu (John Heffner) Subject: Re: [PATCH] skb pcount with MTU discovery Cc: davem@davemloft.net, netdev@oss.sgi.com Organization: Core In-Reply-To: X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sat, 02 Apr 2005 08:47:35 +1000 X-Virus-Scanned: ClamAV 0.83/799/Fri Apr 1 02:49:13 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 1222 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev John Heffner wrote: > > Common case, start out with mss of 8948. Send 2 segments; neither are > acknowledged, and we receive an ICMP can't fragment indicating a pmtu of > 1500 so mss is set down to 1448. Now tcp_set_skb_tso_segs() sets tso_segs > to 6, so tcp_snd_test thinks we are doing TSO and will send the full 6 > mss, and fails the cwnd test since cwnd == 2. How about fixing tcp_snd_test directly like this? Of course all this will be moot once Dave finishes his TSO rewrite :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- ===== include/net/tcp.h 1.107 vs edited ===== --- 1.107/include/net/tcp.h 2005-03-16 10:15:03 +11:00 +++ edited/include/net/tcp.h 2005-04-02 08:45:48 +10:00 @@ -1433,6 +1433,9 @@ pkts = tcp_skb_pcount(skb); } + if (!(tp->inet.sk.sk_route_caps & NETIF_F_TSO)) + pkts = 1; + /* RFC 1122 - section 4.2.3.4 * * We must queue if From dada1@cosmosbay.com Fri Apr 1 15:22:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Apr 2005 15:22:11 -0800 (PST) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j31NM2Gf007563 for ; Fri, 1 Apr 2005 15:22:03 -0800 Received: from [192.168.0.3] ([84.5.129.64]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j31NLm32032417; Sat, 2 Apr 2005 01:21:54 +0200 Message-ID: <424DD78D.7070001@cosmosbay.com> Date: Sat, 02 Apr 2005 01:21:49 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire() References: <42370997.6010302@cosmosbay.com> <20050315103253.590c8bfc.davem@davemloft.net> <42380EC6.60100@cosmosbay.com> <20050316140915.0f6b9528.davem@davemloft.net> <4239E00C.4080309@cosmosbay.com> <20050331221352.13695124.davem@davemloft.net> <424D5D34.4030800@cosmosbay.com> <20050401122802.7c71afbc.davem@davemloft.net> <424DB7A1.8090803@cosmosbay.com> <20050401130832.1f972a3b.davem@davemloft.net> <424DC08A.3020204@cosmosbay.com> <2005040