From davem@pizda.ninka.net Mon Dec 1 00:06:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 00:07:14 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB186sTa017689 for ; Mon, 1 Dec 2003 00:06:54 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA05215; Mon, 1 Dec 2003 00:06:16 -0800 Date: Mon, 1 Dec 2003 00:06:16 -0800 From: "David S. Miller" To: Ben Greear Cc: scott.feldman@intel.com, netdev@oss.sgi.com Subject: Re: Problems with e1000 in 2.4.23 Message-Id: <20031201000616.1db7b6f4.davem@redhat.com> In-Reply-To: <3FCAACAD.7090609@candelatech.com> References: <3FCAACAD.7090609@candelatech.com> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1784 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Sun, 30 Nov 2003 18:51:25 -0800 Ben Greear wrote: > Also, I am seeing bogus things on this machine regardless of which > kernel and which e1000 driver I use, so it's quite possible that either > the NIC hardware or the MB/RAM/CPU/Whatever is just plain not quite > right. Ben, I don't want to beat an old dead horse, but is this the same system where you were having all of those overheating problems a long time ago? From casellas@infres.enst.fr Mon Dec 1 06:36:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 06:36:49 -0800 (PST) Received: from infres.enst.fr (infres.enst.fr [137.194.192.1]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1EaXTa012794 for ; Mon, 1 Dec 2003 06:36:34 -0800 Received: from gervaise.enst.fr (gervaise.enst.fr [137.194.160.71]) by infres.enst.fr (Postfix) with ESMTP id 5BAFB18DC for ; Mon, 1 Dec 2003 15:36:27 +0100 (MET) Received: from localhost (casellas@localhost) by gervaise.enst.fr (8.11.6+Sun/8.11.6) with ESMTP id hB1EaQV26056 for ; Mon, 1 Dec 2003 15:36:26 +0100 (MET) X-Authentication-Warning: gervaise.enst.fr: casellas owned process doing -bs Date: Mon, 1 Dec 2003 15:36:26 +0100 (MET) From: Ramon Casellas X-X-Sender: casellas@gervaise.enst.fr To: netdev@oss.sgi.com Subject: Request: Allocate a Netlink Family Number for MPLS Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1785 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: casellas@infres.enst.fr Precedence: bulk X-list: netdev Jamal/Dave/all, We're working in porting mpls for linux to 2.6 and one of the planned features is to move from an IOCTL based approach to a Netlink based one for updating/querying the MPLS FTN/ILM/Label Mapping/... tables from userspace. Since netlink families are public, I would like to know if it is possible to reserve a family for MPLS, even though the MPLS patch is not part of the official kernel. Please let me know who I should contact/forward my request (Dave Miller is listed as the main mantainer of the net core) or if, on the contrary, there are valid reasons to reject our request. Thanks in advance, Ramon Something like this: diff -urN linux-2.6.0-test11/include/linux/netlink.h linux-2.6.0-test11-mpls/include/linux/netlink.h --- linux-2.6.0-test11/include/linux/netlink.h 2003-11-27 14:24:00.000000000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/netlink.h 2003-11-30 13:47:45.000000000 +0100 @@ -12,6 +12,11 @@ #define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */ #define NETLINK_XFRM 6 /* ipsec */ #define NETLINK_ARPD 8 + +##if defined(CONFIG_MPLS) || defined(CONFIG_MPLS_MODULE) +#define NETLINK_MPLS 9 +#endif + #define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */ #define NETLINK_IP6_FW 13 #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ // ------------------------------------------------------------------- // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - casellas@infres.enst.fr From hch@infradead.org Mon Dec 1 07:30:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 07:31:13 -0800 (PST) Received: from phoenix.infradead.org (phoenix.infradead.org [213.86.99.234]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1FUrTa017364 for ; Mon, 1 Dec 2003 07:30:54 -0800 Received: from hch by phoenix.infradead.org with local (Exim 4.22) id 1AQq0i-00011I-J0; Mon, 01 Dec 2003 15:30:52 +0000 Date: Mon, 1 Dec 2003 15:30:52 +0000 From: Christoph Hellwig To: Ramon Casellas Cc: netdev@oss.sgi.com Subject: Re: Request: Allocate a Netlink Family Number for MPLS Message-ID: <20031201153052.A3879@infradead.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from casellas@infres.enst.fr on Mon, Dec 01, 2003 at 03:36:26PM +0100 X-archive-position: 1786 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev On Mon, Dec 01, 2003 at 03:36:26PM +0100, Ramon Casellas wrote: > #define NETLINK_ARPD 8 > + > +##if defined(CONFIG_MPLS) || defined(CONFIG_MPLS_MODULE) > +#define NETLINK_MPLS 9 > +#endif > + This is bogus - either it's reserved or not, so the ifdef doesn't make sense. Also given that Dave & co are working on mpls already I wonder whether it's a good idea to reserve this, but I'll let him speak for himself :) From greearb@candelatech.com Mon Dec 1 09:26:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 09:27:09 -0800 (PST) Received: from grok.yi.org (evrtwa1-ar2-4-35-049-074.evrtwa1.dsl-verizon.net [4.35.49.74]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1HQtTa026501 for ; Mon, 1 Dec 2003 09:26:56 -0800 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id hB1HQcKt029022; Mon, 1 Dec 2003 09:26:45 -0800 Message-ID: <3FCB79CE.40409@candelatech.com> Date: Mon, 01 Dec 2003 09:26:38 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: scott.feldman@intel.com, netdev@oss.sgi.com Subject: Re: Problems with e1000 in 2.4.23 References: <3FCAACAD.7090609@candelatech.com> <20031201000616.1db7b6f4.davem@redhat.com> In-Reply-To: <20031201000616.1db7b6f4.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1787 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev David S. Miller wrote: > On Sun, 30 Nov 2003 18:51:25 -0800 > Ben Greear wrote: > > >>Also, I am seeing bogus things on this machine regardless of which >>kernel and which e1000 driver I use, so it's quite possible that either >>the NIC hardware or the MB/RAM/CPU/Whatever is just plain not quite >>right. > > > Ben, I don't want to beat an old dead horse, but is this > the same system where you were having all of those > overheating problems a long time ago? Yep, I put a big ole fan right on the NICs but to no avail. However, it may not be enough, or the chipset/NIC may be so cooked/flaky that it doesn't really matter any more. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From niv@us.ibm.com Mon Dec 1 10:12:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 10:12:52 -0800 (PST) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1ICUTa027633 for ; Mon, 1 Dec 2003 10:12:39 -0800 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.17.195.12]) by e33.co.us.ibm.com (8.12.10/8.12.2) with ESMTP id hB1ICLJh340332; Mon, 1 Dec 2003 13:12:21 -0500 Received: from us.ibm.com (d03av01.boulder.ibm.com [9.17.193.81]) by westrelay03.boulder.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id hB1ICJeF128462; Mon, 1 Dec 2003 11:12:20 -0700 Message-ID: <3FCB8415.7060101@us.ibm.com> Date: Mon, 01 Dec 2003 10:10:29 -0800 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ronnie Sahlberg CC: netdev@oss.sgi.com Subject: Re: TCP retransmission timers, questions References: <010801c3b657$e3093fb0$6501010a@C5043436> In-Reply-To: <010801c3b657$e3093fb0$6501010a@C5043436> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1788 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Ronnie Sahlberg wrote: > By looking at the kernel sources it seems that the minimum TCP > retransmission timeout is hardcoded to 200ms. > Is this correct? Yes, that is correct. > While I understand why it is important to not be too aggressive in > retransmitting I wounder if it would be possible to get > and interface in proc where one could "tune" this. Currently, not unless you edit the kernel header file yourself and recompile the kernel. Not recommended for several reasons. > The reason for this is that in some applications you do have a completely > private, dedicated network used for one specific application. > Those networks can be dimensioned so that congestion "should" not occur. > However, packets are lost from time to time and sometimes packets will be > lost. > In those isolated dedicated subnets, with end to end network latency in the > sub ms range, would it not be useful to be able to allow > the retransmission timeout to drop down to 5-10ms? Exactly the scheme I was interested in proposing a while ago - provide a env for private networks that would allow more flexible tuning for private nets. > Do anyone know of any work/research in the area of tcp retransmission > timeouts for very high bandwidth, low latency networks? > I have checked both the IETF list of drafts, Sally Floyds pages and google > but could not find anything. Not that I could find last year either. > It seems to me that all research/experimentation in high throughput is for > high bandwidth high latency links and tuning the slowstart/congestion > avoidance algorithms. > What about high throughput, very low latency? Does nayone know of any > papers in that area? I'm doing my own experimentation for this environment - case study a 3 tiered app with a private network between the web front end and the database backend. I'm playing with gigabit but hope to do some 10Gb testing sometime in the near future. Hope to provide a experimental patch to play with, but it wont be soon. Mostly January. We had a thread on this a while ago, and DaveM pointed out that this was really a research area because the 200ms timer limit (BSD inherited) played a rather critical role in all the congestion control, and what its impact might be if changed on Internet traffic really needed to be studied/researched. However, that wouldnt apply to private, non-routable networks. > For specific applications, running on completely isolated dedicated > networks, dimensioned to make congestion unlikely, isolated so it will NEVER > compete about bandwidth with normal TCPs on the internet, to me it would > make sense to allow the retransmission timeout to be allowed to drop > significantly below 200ms. Exactly. > Another question, I think it was RFC2988 (but an not find it again) that > discussed that a TCP may add an artificial delay in sending the packets > based on > the RTT so that when sending an entire window the packets are spaced > equidistantly across the RTT interval instead of in just one big burst. > This to prevent the burstinessd of the traffic and make buffer > overruns/congestion less likely. I havent seen this help for the most part. This is helpful only in very selective situations. If youre studying multiple streams across one network, performance could be equally hurt/helped. Have you any data on this? > I have seen indications that w2k/bsd might in some conditions do this. > Doe Linux do this? my search through the sources came up with nothing. > Does anyone know whether there are other TCPs that do this? > As i said I have seen something that looked like that on a BSD stack but it > could have been related to something else. Linux doesn't, and the others dont either, to my knowledge, but I could be wrong, its been a while since I looked at the other OSs. hth, thanks, Nivedita From shemminger@osdl.org Mon Dec 1 11:32:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 11:33:09 -0800 (PST) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1JWtTa028898 for ; Mon, 1 Dec 2003 11:32:56 -0800 Received: from dell_ss3.pdx.osdl.net (IDENT:2997@dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id hB1JWUZ21603; Mon, 1 Dec 2003 11:32:32 -0800 Date: Mon, 1 Dec 2003 11:33:12 -0800 From: Stephen Hemminger To: anand@eis.iisc.ernet.in (SVR Anand) Cc: davem@redhat.com (David S. Miller), netdev@oss.sgi.com Subject: Re: Bridging woes after 3 days Message-Id: <20031201113312.2ce6ec0f.shemminger@osdl.org> In-Reply-To: <200311290944.PAA27304@eis.iisc.ernet.in> References: <20031123152601.67646dc1.davem@redhat.com> <200311290944.PAA27304@eis.iisc.ernet.in> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.9.6claws (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1789 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 29 Nov 2003 15:14:08 +0530 (GMT+05:30) anand@eis.iisc.ernet.in (SVR Anand) wrote: > Hi, > > After a continous run for 3 days the bridge came down crashing with the > following kernel panic screen dump. The kernel is 2.6.0-test9-bk25 with > kernel preemption disabled. > > The following call stack is what I have seen on the console. The ethernet > cards are RTL8139. Please let me know if you want more information or finer > debugging method, I will pass it on when the bridge fails the next time. Can you hook a serial console to catch the precise wording? Also if you save copies of /proc/slabinfo on a regular interval (like per hour), then it is possible to see if there is a memory leak. From casellas@infres.enst.fr Mon Dec 1 12:00:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 12:00:37 -0800 (PST) Received: from infres.enst.fr (infres.enst.fr [137.194.192.1]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1K0ATa006998 for ; Mon, 1 Dec 2003 12:00:11 -0800 Received: from gervaise.enst.fr (gervaise.enst.fr [137.194.160.71]) by infres.enst.fr (Postfix) with ESMTP id 769F818D1; Mon, 1 Dec 2003 20:26:31 +0100 (MET) Received: from localhost (casellas@localhost) by gervaise.enst.fr (8.11.6+Sun/8.11.6) with ESMTP id hB1JQVE27335; Mon, 1 Dec 2003 20:26:31 +0100 (MET) X-Authentication-Warning: gervaise.enst.fr: casellas owned process doing -bs Date: Mon, 1 Dec 2003 20:26:30 +0100 (MET) From: Ramon Casellas X-X-Sender: casellas@gervaise.enst.fr To: Christoph Hellwig Cc: netdev@oss.sgi.com Subject: Re: Request: Allocate a Netlink Family Number for MPLS In-Reply-To: <20031201153052.A3879@infradead.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1790 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: casellas@infres.enst.fr Precedence: bulk X-list: netdev On Mon, 1 Dec 2003, Christoph Hellwig wrote: > On Mon, Dec 01, 2003 at 03:36:26PM +0100, Ramon Casellas wrote: > > #define NETLINK_ARPD 8 > > + > > +#if defined(CONFIG_MPLS) || defined(CONFIG_MPLS_MODULE) > > +#define NETLINK_MPLS 9 > > +#endif > > + > > This is bogus - either it's reserved or not, so the ifdef doesn't > make sense. Yes of course :) that was just a cut&paste from an interim patch, where we put ifdefs around everything. If you decide to allocate a family number, then ifdefs don't make sense (much like in if_ether.h or ppp_defs.h), but you're right, my mistake, I didn't mean to propose a real patch. > > Also given that Dave & co are working on mpls already I wonder > whether it's a good idea to reserve this, but I'll let him speak > for himself :) Well, we are still coordinating efforts, and Jamal has some design docs around... but regardless of the actual implementation, a mechanism to communicate with userspace will be needed, and IM(Very, very)HO, netlink is a good candidate, analog to rtnetlink, but this is open to discussion. Thanks, R. From herbert@gondor.apana.org.au Mon Dec 1 12:17:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 12:17:21 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1KH2Ta007512 for ; Mon, 1 Dec 2003 12:17:06 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AQuTV-00005l-00; Tue, 02 Dec 2003 07:16:53 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AQuTT-0005G7-00; Tue, 02 Dec 2003 07:16:51 +1100 Date: Tue, 2 Dec 2003 07:16:51 +1100 To: "David S. Miller" , netdev@oss.sgi.com Subject: [ROUTE] PMTU only works on half the time Message-ID: <20031201201651.GA20194@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Nq2Wo0NMKNjxTN9z" Content-Disposition: inline User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1791 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: I found out that PMTU only works on those routing cache entries where rt_src != 0. This patch should make it work for all matching entries. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p Index: kernel-source-2.5/net/ipv4/route.c =================================================================== RCS file: /home/gondolin/herbert/src/CVS/debian/kernel-source-2.5/net/ipv4/route.c,v retrieving revision 1.3 diff -u -r1.3 route.c --- kernel-source-2.5/net/ipv4/route.c 24 Nov 2003 09:52:04 -0000 1.3 +++ kernel-source-2.5/net/ipv4/route.c 1 Dec 2003 20:15:40 -0000 @@ -1259,9 +1259,9 @@ rth = rth->u.rt_next) { smp_read_barrier_depends(); if (rth->fl.fl4_dst == daddr && - rth->fl.fl4_src == skeys[i] && + (rth->fl.fl4_src == iph->saddr || + rth->rt_src == iph->saddr) && rth->rt_dst == daddr && - rth->rt_src == iph->saddr && rth->fl.fl4_tos == tos && rth->fl.iif == 0 && !(dst_metric_locked(&rth->u.dst, RTAX_MTU))) { --Nq2Wo0NMKNjxTN9z-- From davem@pizda.ninka.net Mon Dec 1 12:46:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 12:46:31 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1KkHTa011430 for ; Mon, 1 Dec 2003 12:46:17 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id MAA19823; Mon, 1 Dec 2003 12:45:32 -0800 Date: Mon, 1 Dec 2003 12:45:32 -0800 From: "David S. Miller" To: Ramon Casellas Cc: netdev@oss.sgi.com Subject: Re: Request: Allocate a Netlink Family Number for MPLS Message-Id: <20031201124532.3f6b6a65.davem@redhat.com> In-Reply-To: References: X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1792 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Mon, 1 Dec 2003 15:36:26 +0100 (MET) Ramon Casellas wrote: > We're working in porting mpls for linux to 2.6 and one of the planned > features is to move from an IOCTL based approach to a Netlink based one > for updating/querying the MPLS FTN/ILM/Label Mapping/... tables from > userspace. Since netlink families are public, I would > like to know if it is possible to reserve a family for MPLS, even though > the MPLS patch is not part of the official kernel. You don't need a whole new netlink family. Just create a dummy address family for MPLS (ie. AF_MPLS) and then just use NETLINK_ROUTE. From herbert@gondor.apana.org.au Mon Dec 1 12:47:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 12:47:25 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1Kl8Ta011610 for ; Mon, 1 Dec 2003 12:47:10 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AQuwf-0000D1-00; Tue, 02 Dec 2003 07:47:01 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AQuwe-0005IT-00; Tue, 02 Dec 2003 07:47:00 +1100 Date: Tue, 2 Dec 2003 07:47:00 +1100 To: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-ID: <20031201204700.GA20349@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="YZ5djTAD1cGYuMQK" Content-Disposition: inline In-Reply-To: <20031201201651.GA20194@gondor.apana.org.au> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1793 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --YZ5djTAD1cGYuMQK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Dec 02, 2003 at 07:16:51AM +1100, herbert wrote: > > I found out that PMTU only works on those routing cache entries where > rt_src != 0. This patch should make it work for all matching entries. That patch removed one line too many. This one should be better. -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --YZ5djTAD1cGYuMQK Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p Index: kernel-source-2.5/net/ipv4/route.c =================================================================== RCS file: /home/gondolin/herbert/src/CVS/debian/kernel-source-2.5/net/ipv4/route.c,v retrieving revision 1.3 diff -u -r1.3 route.c --- kernel-source-2.5/net/ipv4/route.c 24 Nov 2003 09:52:04 -0000 1.3 +++ kernel-source-2.5/net/ipv4/route.c 1 Dec 2003 20:45:22 -0000 @@ -1260,8 +1260,9 @@ smp_read_barrier_depends(); if (rth->fl.fl4_dst == daddr && rth->fl.fl4_src == skeys[i] && + (rth->fl.fl4_src == iph->saddr || + rth->rt_src == iph->saddr) && rth->rt_dst == daddr && - rth->rt_src == iph->saddr && rth->fl.fl4_tos == tos && rth->fl.iif == 0 && !(dst_metric_locked(&rth->u.dst, RTAX_MTU))) { --YZ5djTAD1cGYuMQK-- From garzik@gtf.org Mon Dec 1 13:02:54 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 13:03:09 -0800 (PST) Received: from havoc.gtf.org (havoc.gtf.org [63.247.75.124]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1L2sTa020997 for ; Mon, 1 Dec 2003 13:02:54 -0800 Received: by havoc.gtf.org (Postfix, from userid 500) id F1C2A66CF; Mon, 1 Dec 2003 15:55:33 -0500 (EST) Date: Mon, 1 Dec 2003 15:55:33 -0500 From: Jeff Garzik To: Octave Cc: Stephen Hemminger , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI 8139too.c for 2.4.23 Message-ID: <20031201205533.GA15846@gtf.org> References: <20031201205038.GK10711@ovh.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031201205038.GK10711@ovh.net> User-Agent: Mutt/1.3.28i X-archive-position: 1794 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev On Mon, Dec 01, 2003 at 09:50:38PM +0100, Octave wrote: > Stephen, > I get your patch from http://lwn.net/Articles/54815/ for 2.6.X and > I rewrote it for 2.4.23. Tested with 2.4.23 on high load servers. I > have no more "Too much work at interrupt". > > I dropped it on ftp://ftp.ovh.net/made-in-ovh/8139too.c-2.4-0.9.27 > > Hope it helps. > Octave Very cool! Thanks for testing. Is there any chance you could do some benchmark runs with ttcp or somesuch? Jeff From voloterreno@tin.it Mon Dec 1 13:08:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 13:09:07 -0800 (PST) Received: from vsmtp12.tin.it (vsmtp12.tin.it [212.216.176.206]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1L8nTa002620 for ; Mon, 1 Dec 2003 13:08:50 -0800 Received: from tin.it (80.180.66.85) by vsmtp12.tin.it (7.0.019) (authenticated as voloterreno@tin.it) id 3FC8F04600174580; Mon, 1 Dec 2003 22:08:33 +0100 Message-ID: <3FCBAEFF.6000606@tin.it> Date: Mon, 01 Dec 2003 22:13:35 +0100 From: Marcello User-Agent: Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.5) Gecko/20031031 X-Accept-Language: it, en-us, en MIME-Version: 1.0 To: Octave CC: Stephen Hemminger , Jeff Garzik , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI 8139too.c for 2.4.23 References: <20031201205038.GK10711@ovh.net> In-Reply-To: <20031201205038.GK10711@ovh.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1795 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: voloterreno@tin.it Precedence: bulk X-list: netdev Octave ha scritto: >Stephen, >I get your patch from http://lwn.net/Articles/54815/ for 2.6.X and >I rewrote it for 2.4.23. Tested with 2.4.23 on high load servers. I >have no more "Too much work at interrupt". > >I dropped it on ftp://ftp.ovh.net/made-in-ovh/8139too.c-2.4-0.9.27 > >Hope it helps. >Octave > >before: >------- ># ps auxw >root 256 0.0 0.0 0 0 ? SW Nov28 0:00 [eth0] ># ifconfig > RX packets:40940899 errors:250542 dropped:7052 overruns:250542 frame:0 > TX packets:33057049 errors:0 dropped:0 overruns:20 carrier:0 ># dmesg >eth0: Setting 100mbps full-duplex based on auto-negotiated partner ability 41e1. >nfs: server X.X.X.X not responding, still trying >nfs: server X.X.X.X OK >eth0: Too much work at interrupt, IntrStatus=0x0040. > >with NAPI >--------- > RX packets:428253 errors:0 dropped:0 overruns:0 frame:0 > TX packets:357949 errors:0 dropped:0 overruns:0 carrier:0 >8139too Fast Ethernet driver 0.9.27 >PCI: Found IRQ 11 for device 00:0b.0 >eth0: RealTek RTL8139 at 0xec00, 00:e0:4c:91:03:b0, IRQ 11 >eth0: Identified 8139 chip type 'RTL-8100B/8139D' > >- >To unsubscribe from this list: send the line "unsubscribe linux-net" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > I try immediatly your variant of the driver :) Bye Marcello From oles@ovh.net Mon Dec 1 13:20:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 13:20:30 -0800 (PST) Received: from ping.ovh.net (ping.ovh.net [213.186.33.13]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1LK3Ta003081 for ; Mon, 1 Dec 2003 13:20:04 -0800 Received: by ping.ovh.net (Postfix, from userid 502) id 07BC83B7A0; Mon, 1 Dec 2003 21:50:39 +0100 (CET) Date: Mon, 1 Dec 2003 21:50:38 +0100 From: Octave To: Stephen Hemminger , Jeff Garzik Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: NAPI 8139too.c for 2.4.23 Message-ID: <20031201205038.GK10711@ovh.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.5.4i X-archive-position: 1796 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: oles@ovh.net Precedence: bulk X-list: netdev Stephen, I get your patch from http://lwn.net/Articles/54815/ for 2.6.X and I rewrote it for 2.4.23. Tested with 2.4.23 on high load servers. I have no more "Too much work at interrupt". I dropped it on ftp://ftp.ovh.net/made-in-ovh/8139too.c-2.4-0.9.27 Hope it helps. Octave before: ------- # ps auxw root 256 0.0 0.0 0 0 ? SW Nov28 0:00 [eth0] # ifconfig RX packets:40940899 errors:250542 dropped:7052 overruns:250542 frame:0 TX packets:33057049 errors:0 dropped:0 overruns:20 carrier:0 # dmesg eth0: Setting 100mbps full-duplex based on auto-negotiated partner ability 41e1. nfs: server X.X.X.X not responding, still trying nfs: server X.X.X.X OK eth0: Too much work at interrupt, IntrStatus=0x0040. with NAPI --------- RX packets:428253 errors:0 dropped:0 overruns:0 frame:0 TX packets:357949 errors:0 dropped:0 overruns:0 carrier:0 8139too Fast Ethernet driver 0.9.27 PCI: Found IRQ 11 for device 00:0b.0 eth0: RealTek RTL8139 at 0xec00, 00:e0:4c:91:03:b0, IRQ 11 eth0: Identified 8139 chip type 'RTL-8100B/8139D' From davem@pizda.ninka.net Mon Dec 1 13:52:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 13:53:09 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1LqqTa012987 for ; Mon, 1 Dec 2003 13:52:52 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id NAA20001; Mon, 1 Dec 2003 13:51:54 -0800 Date: Mon, 1 Dec 2003 13:51:54 -0800 From: "David S. Miller" To: Herbert Xu Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031201135154.6906454c.davem@redhat.com> In-Reply-To: <20031201204700.GA20349@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1797 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 07:47:00 +1100 Herbert Xu wrote: > On Tue, Dec 02, 2003 at 07:16:51AM +1100, herbert wrote: > > > > I found out that PMTU only works on those routing cache entries where > > rt_src != 0. This patch should make it work for all matching entries. > > That patch removed one line too many. This one should be better. Hmmm... Herbert, do you see how the outer loop and the skey[] thing works in this PMTU handling code? This takes care of comparing both iph->saddr and '0' against rt->rt_src. From herbert@gondor.apana.org.au Mon Dec 1 14:05:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 14:05:34 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1M5HTa016835 for ; Mon, 1 Dec 2003 14:05:19 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AQwAI-0000pH-00; Tue, 02 Dec 2003 09:05:10 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AQwAH-0005Q9-00; Tue, 02 Dec 2003 09:05:09 +1100 Date: Tue, 2 Dec 2003 09:05:09 +1100 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-ID: <20031201220509.GA20827@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031201135154.6906454c.davem@redhat.com> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1798 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Dec 01, 2003 at 01:51:54PM -0800, David S. Miller wrote: > > Herbert, do you see how the outer loop and the skey[] thing works in > this PMTU handling code? This takes care of comparing both iph->saddr > and '0' against rt->rt_src. It only takes care of fl4_src, not rt_src. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From davem@pizda.ninka.net Mon Dec 1 14:22:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 14:22:37 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1MMPTa019016 for ; Mon, 1 Dec 2003 14:22:25 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id OAA20104; Mon, 1 Dec 2003 14:21:31 -0800 Date: Mon, 1 Dec 2003 14:21:31 -0800 From: "David S. Miller" To: Herbert Xu Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031201142131.5da50a07.davem@redhat.com> In-Reply-To: <20031201220509.GA20827@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> <20031201220509.GA20827@gondor.apana.org.au> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1799 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 09:05:09 +1100 Herbert Xu wrote: > On Mon, Dec 01, 2003 at 01:51:54PM -0800, David S. Miller wrote: > > > > Herbert, do you see how the outer loop and the skey[] thing works in > > this PMTU handling code? This takes care of comparing both iph->saddr > > and '0' against rt->rt_src. > > It only takes care of fl4_src, not rt_src. Indeed. At the surface it looks like a bug, but look at the redirect handling tests in ip_rt_redirect(). It's a very similar key comparison as the PMTU code, just structured differently: if (rth->fl.fl4_dst != daddr || rth->fl.fl4_src != skeys[i] || rth->fl.fl4_tos != tos || rth->fl.oif != ikeys[k] || rth->fl.iif != 0) { rthp = &rth->u.rt_next; continue; } if (rth->rt_dst != daddr || rth->rt_src != saddr || rth->u.dst.error || rth->rt_gateway != old_gw || rth->u.dst.dev != dev) break; See? He's not comparing rt->rt_src against skeys[] and therefore '0' here either. I think the tests might be like this for a reason. I could see Alexey constructing this test wrong in one instance, but in two instances where the tests were structured totally different in each case is hard to believe. Let me think about this some more, maybe you're right and the error exists in both of these places. From oles@ovh.net Mon Dec 1 15:02:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 15:02:16 -0800 (PST) Received: from ping.ovh.net (ping.ovh.net [213.186.33.13]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1N20Ta020336 for ; Mon, 1 Dec 2003 15:02:01 -0800 Received: by ping.ovh.net (Postfix, from userid 502) id 5EF083B7A0; Tue, 2 Dec 2003 00:00:13 +0100 (CET) Date: Tue, 2 Dec 2003 00:00:13 +0100 From: Octave To: Jeff Garzik Cc: Stephen Hemminger , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI 8139too.c for 2.4.23 Message-ID: <20031201230013.GM4313@ovh.net> References: <20031201205038.GK10711@ovh.net> <20031201205533.GA15846@gtf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20031201205533.GA15846@gtf.org> User-Agent: Mutt/1.5.4i X-archive-position: 1800 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: oles@ovh.net Precedence: bulk X-list: netdev > Is there any chance you could do some benchmark runs with ttcp or > somesuch? I tested on 6-7 servers running with eepro eth0: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:E0:18:01:78:6C, IRQ 10. realtek 8139too with NAPI 8139too Fast Ethernet driver 0.9.27 realtek 8139too with no NAPI (standard driver with soft polling) If this quick test is correct, realtek 8139too's driver works as good as eepro's driver. Octave >> from realtek (no NAPI) to realtek (no NAPI) ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-r: 327680000 bytes in 42.90 real seconds = 59671.99 Kbit/sec +++ ttcp-r: 224852 I/O calls, msec/call = 0.20, calls/sec = 5241.16 ttcp-r: 0.1user 1.5sys 0:42real 3% 0i+0d 0maxrss 0+2pf 0+0csw >> from eepro to eepro ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-r: 327680000 bytes in 28.33 real seconds = 90379.31 Kbit/sec +++ ttcp-r: 225058 I/O calls, msec/call = 0.13, calls/sec = 7945.54 ttcp-r: 0.2user 4.2sys 0:28real 15% 0i+0d 0maxrss 0+2pf 0+0csw >> from realtek (NAPI) to realtek (NAPI) ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-r: 327680000 bytes in 29.21 real seconds = 87644.11 Kbit/sec +++ ttcp-r: 225735 I/O calls, msec/call = 0.13, calls/sec = 7728.26 ttcp-r: 0.0user 1.7sys 0:29real 6% 0i+0d 0maxrss 0+2pf 0+0csw >> from eepro to realtek (no NAPI) ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-t: 327680000 bytes in 34.32 real seconds = 74594.99 Kbit/sec +++ ttcp-t: 40000 I/O calls, msec/call = 0.88, calls/sec = 1165.55 ttcp-t: 0.0user 1.2sys 0:34real 3% 0i+0d 0maxrss 0+2pf 0+0csw >> from realtek (NAPI) to realtek (no NAPI) ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-r: 327680000 bytes in 32.60 real seconds = 78532.74 Kbit/sec +++ ttcp-r: 225544 I/O calls, msec/call = 0.15, calls/sec = 6918.98 ttcp-r: 0.1user 1.6sys 0:32real 5% 0i+0d 0maxrss 0+2pf 0+0csw >> from realtek (NAPI) to eepro ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from ttcp-r: 327680000 bytes in 34.02 real seconds = 75250.05 Kbit/sec +++ ttcp-r: 225685 I/O calls, msec/call = 0.15, calls/sec = 6633.91 ttcp-r: 0.1user 3.7sys 0:34real 11% 0i+0d 0maxrss 0+2pf 0+0csw From davem@pizda.ninka.net Mon Dec 1 15:23:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 15:23:15 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1NN0Ta021323 for ; Mon, 1 Dec 2003 15:23:00 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA20285; Mon, 1 Dec 2003 15:22:15 -0800 Date: Mon, 1 Dec 2003 15:22:15 -0800 From: "David S. Miller" To: "David S. Miller" Cc: herbert@gondor.apana.org.au, netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031201152215.522c2447.davem@redhat.com> In-Reply-To: <20031201142131.5da50a07.davem@redhat.com> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> <20031201220509.GA20827@gondor.apana.org.au> <20031201142131.5da50a07.davem@redhat.com> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1801 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Mon, 1 Dec 2003 14:21:31 -0800 "David S. Miller" wrote: > Let me think about this some more, maybe you're right and the > error exists in both of these places. Ok, I did my thinking :) rt->rt_src is special. It is the source address we have selected to use with this route. All output packets using this route must use rt->rt_src as iph->saddr. So, in effect, when we say "if (rt->rt_src == iph->saddr)" we are asking the question "did we make this packet?" I think this is why Alexey coded the test in this way. You are speaking of a case of zero source addresses. When would we output such an iph->saddr, by way of a route? Right now this is the only part I'm not seeing. I want to be careful in changing this code, as loosening the key check opens the possibility of new kinds of PMTU lowering attacks. From ja@ssi.bg Mon Dec 1 15:30:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 15:30:25 -0800 (PST) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1NU0Ta021870 for ; Mon, 1 Dec 2003 15:30:06 -0800 Received: from localhost (localhost [127.0.0.1]) by u.domain.uli (8.12.10/8.12.10) with ESMTP id hB1NUHSu003229; Tue, 2 Dec 2003 01:30:17 +0200 Date: Tue, 2 Dec 2003 01:30:17 +0200 (EET) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Herbert Xu cc: "David S. Miller" , Subject: Re: [ROUTE] PMTU only works on half the time In-Reply-To: <20031201204700.GA20349@gondor.apana.org.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1802 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Tue, 2 Dec 2003, Herbert Xu wrote: > On Tue, Dec 02, 2003 at 07:16:51AM +1100, herbert wrote: > > > > I found out that PMTU only works on those routing cache entries where > > rt_src != 0. This patch should make it work for all matching entries. > > That patch removed one line too many. This one should be better. IMO, the rt_src check in ip_rt_frag_needed is ok. I would suspect all rth->fl.fl4_tos checks too. It seems we need toskeys[2] and a second for loop if tos!=0. What about rewriting them to (rth->fl.fl4_tos == toskeys[j]). Regards -- Julian Anastasov From greearb@candelatech.com Mon Dec 1 15:39:19 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 15:39:33 -0800 (PST) Received: from grok.yi.org (evrtwa1-ar2-4-35-049-074.evrtwa1.dsl-verizon.net [4.35.49.74]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1NdJTa022296 for ; Mon, 1 Dec 2003 15:39:19 -0800 Received: from candelatech.com (localhost.localdomain [127.0.0.1]) by grok.yi.org (8.12.8/8.12.8) with ESMTP id hB1NdCKt012417; Mon, 1 Dec 2003 15:39:13 -0800 Message-ID: <3FCBD120.7070207@candelatech.com> Date: Mon, 01 Dec 2003 15:39:12 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Octave CC: netdev@oss.sgi.com Subject: Re: NAPI 8139too.c for 2.4.23 References: <20031201205038.GK10711@ovh.net> <20031201205533.GA15846@gtf.org> <20031201230013.GM4313@ovh.net> In-Reply-To: <20031201230013.GM4313@ovh.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1803 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Octave wrote: >>Is there any chance you could do some benchmark runs with ttcp or >>somesuch? > > > I tested on 6-7 servers running with > eepro eth0: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:E0:18:01:78:6C, IRQ 10. > realtek 8139too with NAPI 8139too Fast Ethernet driver 0.9.27 > realtek 8139too with no NAPI (standard driver with soft polling) > > If this quick test is correct, realtek 8139too's driver works as good as > eepro's driver. > > Octave Those are some nice numbers! I may have to bring some of my $5 realteks out of retirement! Anyone make a 4-port NIC with realteks on it? Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@pizda.ninka.net Mon Dec 1 15:50:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 15:51:09 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB1NouTa022835 for ; Mon, 1 Dec 2003 15:50:56 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA20371; Mon, 1 Dec 2003 15:50:05 -0800 Date: Mon, 1 Dec 2003 15:50:05 -0800 From: "David S. Miller" To: Julian Anastasov Cc: herbert@gondor.apana.org.au, netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031201155005.1c515793.davem@redhat.com> In-Reply-To: References: <20031201204700.GA20349@gondor.apana.org.au> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1804 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 01:30:17 +0200 (EET) Julian Anastasov wrote: > I would suspect all rth->fl.fl4_tos checks too. > It seems we need toskeys[2] and a second for loop if tos!=0. > What about rewriting them to (rth->fl.fl4_tos == toskeys[j]). I disagree, and this is related to my most recent email in this thread. This packet we are reacting to for PMTU purposes could only have come from us if the TOS matches precisely. From ja@ssi.bg Mon Dec 1 16:04:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 16:04:12 -0800 (PST) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB203kTa023780 for ; Mon, 1 Dec 2003 16:03:52 -0800 Received: from localhost (localhost [127.0.0.1]) by u.domain.uli (8.12.10/8.12.10) with ESMTP id hB204ASu006161; Tue, 2 Dec 2003 02:04:12 +0200 Date: Tue, 2 Dec 2003 02:04:10 +0200 (EET) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: "David S. Miller" cc: herbert@gondor.apana.org.au, Subject: Re: [ROUTE] PMTU only works on half the time In-Reply-To: <20031201155005.1c515793.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1805 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Mon, 1 Dec 2003, David S. Miller wrote: > On Tue, 2 Dec 2003 01:30:17 +0200 (EET) > Julian Anastasov wrote: > > > I would suspect all rth->fl.fl4_tos checks too. > > It seems we need toskeys[2] and a second for loop if tos!=0. > > What about rewriting them to (rth->fl.fl4_tos == toskeys[j]). > > I disagree, and this is related to my most recent email > in this thread. No, only input routes match strictly tos, ip_rt_redirect() is such example that matches input routes. > This packet we are reacting to for PMTU purposes could only > have come from us if the TOS matches precisely. In this case we can react to packet routed with tos=0. We match output routes only. I do not see another place that needs such fix. Example patch (not tested): # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1350 -> 1.1351 # net/ipv4/route.c 1.73 -> 1.74 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/12/02 ja@ssi.bg 1.1351 # [IPV4]: ip_rt_frag_needed: fl4_tos accepts wildcard value for output routes # -------------------------------------------- # diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c --- a/net/ipv4/route.c Tue Dec 2 02:00:54 2003 +++ b/net/ipv4/route.c Tue Dec 2 02:00:54 2003 @@ -1239,19 +1239,21 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu) { - int i; + int i, j; unsigned short old_mtu = ntohs(iph->tot_len); struct rtable *rth; u32 skeys[2] = { iph->saddr, 0, }; u32 daddr = iph->daddr; u8 tos = iph->tos & IPTOS_RT_MASK; unsigned short est_mtu = 0; + u8 toskeys[2] = { tos, 0 }; if (ipv4_config.no_pmtu_disc) return 0; + for (j = 0; j < (tos ? 2 : 1); j++) for (i = 0; i < 2; i++) { - unsigned hash = rt_hash_code(daddr, skeys[i], tos); + unsigned hash = rt_hash_code(daddr, skeys[i], toskeys[j]); rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; @@ -1261,7 +1263,7 @@ rth->fl.fl4_src == skeys[i] && rth->rt_dst == daddr && rth->rt_src == iph->saddr && - rth->fl.fl4_tos == tos && + rth->fl.fl4_tos == toskeys[j] && rth->fl.iif == 0 && !(dst_metric_locked(&rth->u.dst, RTAX_MTU))) { unsigned short mtu = new_mtu; Regards -- Julian Anastasov From ja@ssi.bg Mon Dec 1 16:06:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 16:06:50 -0800 (PST) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB206VTa024182 for ; Mon, 1 Dec 2003 16:06:34 -0800 Received: from localhost (localhost [127.0.0.1]) by u.domain.uli (8.12.10/8.12.10) with ESMTP id hB2070Su006176; Tue, 2 Dec 2003 02:07:00 +0200 Date: Tue, 2 Dec 2003 02:07:00 +0200 (EET) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: "David S. Miller" cc: herbert@gondor.apana.org.au, Subject: Re: [ROUTE] PMTU only works on half the time In-Reply-To: <20031201155005.1c515793.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1806 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Mon, 1 Dec 2003, David S. Miller wrote: > On Tue, 2 Dec 2003 01:30:17 +0200 (EET) > Julian Anastasov wrote: > > > I would suspect all rth->fl.fl4_tos checks too. > > It seems we need toskeys[2] and a second for loop if tos!=0. > > What about rewriting them to (rth->fl.fl4_tos == toskeys[j]). > > I disagree, and this is related to my most recent email > in this thread. > > This packet we are reacting to for PMTU purposes could only > have come from us if the TOS matches precisely. Ops, ip_rt_redirect matches output route to, hm, lets think again about it... Regards -- Julian Anastasov From davem@pizda.ninka.net Mon Dec 1 16:09:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 16:09:15 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2092Ta024543 for ; Mon, 1 Dec 2003 16:09:03 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA20448; Mon, 1 Dec 2003 16:08:12 -0800 Date: Mon, 1 Dec 2003 16:08:11 -0800 From: "David S. Miller" To: Julian Anastasov Cc: herbert@gondor.apana.org.au, netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031201160811.70904c29.davem@redhat.com> In-Reply-To: References: <20031201155005.1c515793.davem@redhat.com> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1807 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 02:07:00 +0200 (EET) Julian Anastasov wrote: > > On Mon, 1 Dec 2003, David S. Miller wrote: > > > This packet we are reacting to for PMTU purposes could only > > have come from us if the TOS matches precisely. > > Ops, ip_rt_redirect matches output route to, hm, lets think > again about it... Right. I've rewritten emails in this thread probably 3 or 4 times each before actually sending them out, so don't feel bad since my mistakes have been merely hidden :) From romieu@fr.zoreil.com Mon Dec 1 16:13:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 16:14:09 -0800 (PST) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB20DsTa024933 for ; Mon, 1 Dec 2003 16:13:55 -0800 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.12.8/8.12.1) with ESMTP id hB206pK7029677; Tue, 2 Dec 2003 01:06:51 +0100 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.12.8/8.12.1) id hB206n9P029676; Tue, 2 Dec 2003 01:06:49 +0100 Date: Tue, 2 Dec 2003 01:06:49 +0100 From: Francois Romieu To: netdev@oss.sgi.com Cc: =?unknown-8bit?Q?Fernando_Alencar_Mar=F3stica?= , Brad House , jgarzik@pobox.com Subject: [PATCH 2.6] 2.6.0-test11 - more rtl8169 Message-ID: <20031202010649.A27879@electric-eye.fr.zoreil.com> References: <1070212415.1607.17.camel@oxygenium> <20031201020453.A16405@electric-eye.fr.zoreil.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20031201020453.A16405@electric-eye.fr.zoreil.com>; from romieu@fr.zoreil.com on Mon, Dec 01, 2003 at 02:04:53AM +0100 X-Organisation: Land of Sunshine Inc. X-archive-position: 1808 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Francois Romieu : [...] > Btw, there is a flaw in r8169-dma-api-rx-buffers.patch so don't bother > testing it. I'll do it again tomorrow. Attached patch should fix it. Executive summary: o Pure 2.6.0-test11: Get: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.0-test11/r8169-blob.tar.bz2 debuntarzipe and apply/compile/test in following order: r8169-dma-api-tx.patch r8169-dma-api-rx-buffers.patch <-+ Do not test these two patches r8169-dma-api-rx-buffers-ahum.patch <-+ separately r8169-start-xmit-fixes.patch r8169-dma-api-tx-buffers.patch r8169-rx_copybreak.patch r8169-mac-phy-version.patch r8169-init_one.patch r8169-timer.patch r8169-hw_start.patch r8169-missing-tx-stats.patch r8169-intr_mask.patch r8169-suspend.patch The same directory contains each patch alone as well. o 2.6.0-test11 + 2.6.0-test9-bk25-netdrvr-exp1 Get: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.0-test11-netdrv/r8169-blob.tar.bz2 Same thing as above with: r8169-mac-phy-version.patch r8169-init_one.patch r8169-timer.patch r8169-hw_start.patch r8169-missing-tx-stats.patch r8169-intr_mask.patch r8169-suspend.patch r8169-dma-api-rx-buffers-ahum.patch Applying r8169-dma-api-rx-buffers-ahum.patch before r8169-mac-phy-version.patch generates a few offsets but works as well. -- Ueimor --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="r8169-dma-api-rx-buffers-ahum.patch" Brown paper bag time: the Rx descriptors are contiguous and EORbit only marks the last descriptor in the array. OWNbit implicitly marks the end of the Rx descriptors segment which is owned by the nic. drivers/net/r8169.c | 20 ++++++-------------- 1 files changed, 6 insertions(+), 14 deletions(-) diff -puN drivers/net/r8169.c~r8169-dma-api-rx-buffers-ahum drivers/net/r8169.c --- linux-2.6.0-test11/drivers/net/r8169.c~r8169-dma-api-rx-buffers-ahum 2003-12-02 00:22:41.000000000 +0100 +++ linux-2.6.0-test11-fr/drivers/net/r8169.c 2003-12-02 00:22:41.000000000 +0100 @@ -283,6 +283,8 @@ enum _DescStatusBit { LSbit = 0x10000000, }; +#define RsvdMask 0x3fffc000 + struct TxDesc { u32 status; u32 vlan_tag; @@ -1121,7 +1123,7 @@ rtl8169_hw_start(struct net_device *dev) static inline void rtl8169_make_unusable_by_asic(struct RxDesc *desc) { desc->buf_addr = 0xdeadbeef; - desc->status = EORbit; + desc->status &= ~(OWNbit | RsvdMask); } static void rtl8169_free_rx_skb(struct pci_dev *pdev, struct sk_buff **sk_buff, @@ -1141,7 +1143,7 @@ static inline void rtl8169_return_to_asi static inline void rtl8169_give_to_asic(struct RxDesc *desc, dma_addr_t mapping) { desc->buf_addr = mapping; - desc->status = OWNbit + RX_BUF_SIZE; + desc->status |= OWNbit + RX_BUF_SIZE; } static int rtl8169_alloc_rx_skb(struct pci_dev *pdev, struct net_device *dev, @@ -1209,11 +1211,6 @@ static inline void rtl8169_mark_as_last_ desc->status |= EORbit; } -static inline void rtl8169_unmark_as_last_descriptor(struct RxDesc *desc) -{ - desc->status &= ~EORbit; -} - static int rtl8169_init_ring(struct net_device *dev) { struct rtl8169_private *tp = dev->priv; @@ -1460,14 +1457,9 @@ rtl8169_rx_interrupt(struct net_device * } delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx); - if (delta > 0) { - u32 old_last = (tp->dirty_rx - 1) % NUM_RX_DESC; - + if (delta > 0) tp->dirty_rx += delta; - rtl8169_mark_as_last_descriptor(tp->RxDescArray + - (tp->dirty_rx - 1)%NUM_RX_DESC); - rtl8169_unmark_as_last_descriptor(tp->RxDescArray + old_last); - } else if (delta < 0) + else if (delta < 0) printk(KERN_INFO "%s: no Rx buffer allocated\n", dev->name); /* _ --Kj7319i9nmIyA2yE-- From francois@baligant.net Mon Dec 1 17:32:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 17:32:44 -0800 (PST) Received: from casimir.nikita.cx (casimir.nikita.cx [198.63.211.44]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB21WUTa030061 for ; Mon, 1 Dec 2003 17:32:31 -0800 Received: from fortress (fbaligant.net1.nerim.net [213.41.146.186]) by casimir.nikita.cx (8.12.8/8.12.8) with SMTP id hB21WLh0024881 for ; Mon, 1 Dec 2003 19:32:23 -0600 Message-ID: <072501c3b874$2542ae70$15fea8c0@fortress> From: "Francois Baligant" To: Subject: 2.6.0-test11: dst_cache_overflow causing unresponsive box Date: Tue, 2 Dec 2003 02:32:17 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0722_01C3B87C.80FBB1A0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Copied-To: backup@casimir.nikita.cx (by Synonym - http://www.modulo.ro/synonym) X-Scanned-By: MIMEDefang 2.37 X-archive-position: 1809 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: francois@baligant.net Precedence: bulk X-list: netdev This is a multi-part message in MIME format... ------=_NextPart_000_0722_01C3B87C.80FBB1A0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline We have a problem with a box running 2.6.0-test11-mjb1 and supporting aroun= d 90k simultaneous TCP connection. After a few hours/days of running, when a lots of clients connects/disconnects, the console will start to disp= lay: dst cache overflow NET: 1860 messages suppressed. dst cache overflow NET: 1858 messages suppressed. From there, the box is completely unresponsive, apparently eating all its C= PU in trying to shrink the routing cache. Only solution is reboot. Current sysctl: net.ipv4.route.max_size =3D 655360 # I know we shouldn't rise it that high = but it's only cure for now.. it lasts a bit longer like this net.ipv4.route.gc_min_interval =3D 2 net.ipv4.route.gc_interval =3D 10 net.ipv4.route.gc_timeout =3D 30 rtstat: size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot = mc GC: tot ignored goal_miss ovrf HASH: in_search out_search 139566 12393 123 0 0 0 0 0 184 21 = 0 143 142 0 0 26039 375 138876 13080 136 0 0 0 0 0 159 19 = 0 155 154 0 0 27153 277 139006 12317 125 0 0 0 0 0 180 28 = 0 153 153 0 0 25810 377 139138 13799 140 0 0 0 0 0 159 16 = 0 156 156 0 0 28375 331 139275 11610 128 0 0 0 0 0 177 27 = 0 154 153 0 0 23977 343 139383 12679 124 0 0 0 0 0 173 17 = 0 141 140 0 0 26717 398 139256 11946 135 0 0 0 0 0 166 17 = 0 152 151 0 0 24874 304 139353 11646 109 0 0 0 0 0 174 14 = 0 122 122 0 0 24165 320 138257 12702 116 0 0 0 0 0 180 16 = 0 131 130 0 0 26324 358 138369 12897 115 0 0 0 0 0 166 20 = 0 134 134 0 0 26819 339 138553 11309 133 0 0 0 0 0 158 33 = 0 165 165 0 0 21270 389 138172 17232 182 0 0 0 0 0 125 44 = 0 225 225 0 0 29702 375 138420 17407 182 0 0 0 0 0 165 73 = 0 254 253 0 0 29946 548 138833 17052 257 0 0 0 0 0 195 126 = 0 382 381 0 0 29715 812 139051 16606 224 0 0 0 0 0 238 97 = 0 320 319 0 0 28559 721 139217 18115 176 0 0 0 0 0 268 51 = 0 224 224 0 0 32983 527 139326 17531 178 0 0 0 0 0 291 44 = 0 220 220 0 0 33320 445 139422 15244 140 0 0 0 0 0 357 20 = 0 160 160 0 0 29934 415 139548 13123 142 0 0 0 0 0 281 12 = 0 154 154 0 0 26430 351 139684 13290 142 0 0 0 0 0 235 10 = 0 152 151 0 0 27341 309 OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 142340 142296 99% 0.38K 14234 10 56936K ip_dst_cache Are we tuning the rt_cache in a wrong way ? regards, Francois Francois Baligant - http://www.pingouin.be Change the numbers, change your Life! ------=_NextPart_000_0722_01C3B87C.80FBB1A0-- From jgarzik@pobox.com Mon Dec 1 19:47:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 01 Dec 2003 19:47:40 -0800 (PST) Received: from www.linux.org.uk (IDENT:93@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB23lQTa031575 for ; Mon, 1 Dec 2003 19:47:27 -0800 Received: from rdu74-153-143.nc.rr.com ([24.74.153.143]:37579 helo=pobox.com) by www.linux.org.uk with esmtp (Exim 4.22) id 1AR1VP-0003kj-8Y; Tue, 02 Dec 2003 03:47:19 +0000 Message-ID: <3FCC0B36.9080901@pobox.com> Date: Mon, 01 Dec 2003 22:47:02 -0500 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ben Greear CC: Octave , netdev@oss.sgi.com Subject: Re: NAPI 8139too.c for 2.4.23 References: <20031201205038.GK10711@ovh.net> <20031201205533.GA15846@gtf.org> <20031201230013.GM4313@ovh.net> <3FCBD120.7070207@candelatech.com> In-Reply-To: <3FCBD120.7070207@candelatech.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1810 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Ben Greear wrote: > Those are some nice numbers! I may have to bring some of my $5 realteks > out of > retirement! Anyone make a 4-port NIC with realteks on it? hah! I hope not :) (I've never heard of such a beast, but who knows...) Jeff From ja@ssi.bg Tue Dec 2 00:08:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 00:09:04 -0800 (PST) Received: from u.domain.uli (ja.mac.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB288fTa007374 for ; Tue, 2 Dec 2003 00:08:47 -0800 Received: from localhost (localhost [127.0.0.1]) by u.domain.uli (8.12.10/8.12.10) with ESMTP id hB21rrSu006664; Tue, 2 Dec 2003 03:53:55 +0200 Date: Tue, 2 Dec 2003 03:53:53 +0200 (EET) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: "David S. Miller" cc: herbert@gondor.apana.org.au, Subject: Re: [ROUTE] PMTU only works on half the time In-Reply-To: <20031201155005.1c515793.davem@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1811 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Hello, On Mon, 1 Dec 2003, David S. Miller wrote: > I disagree, and this is related to my most recent email > in this thread. > > This packet we are reacting to for PMTU purposes could only > have come from us if the TOS matches precisely. Here is what I have for today. I assume all ip_route_output callers provide valid tos (not a wildcard). As result, only RTO_ONLINK and oif have wildcard value. I'm not sure if ip_rt_frag_needed needs an iif argument, may be yes? Also, it seems ip_rt_redirect needs the 'tos, tos | RTO_ONLINK' array too as in ip_rt_frag_needed. Not included yet. Another problem: it seems __ip_route_output_key does not hash with valid tos key bits, fix included below: --- net/ipv4/route.c.orig Tue Dec 2 03:25:59 2003 +++ net/ipv4/route.c Tue Dec 2 03:37:27 2003 @@ -1239,19 +1239,25 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu) { - int i; + int i, j, k; unsigned short old_mtu = ntohs(iph->tot_len); struct rtable *rth; u32 skeys[2] = { iph->saddr, 0, }; u32 daddr = iph->daddr; u8 tos = iph->tos & IPTOS_RT_MASK; unsigned short est_mtu = 0; + u8 toskeys[2] = { tos, tos | RTO_ONLINK }; + int iif = 0; // Can be argument + int ikeys[2] = { iif, 0 }; if (ipv4_config.no_pmtu_disc) return 0; + for (k = 0; k < (iif ? 2 : 1); k++) + for (j = 0; j < 2; j++) for (i = 0; i < 2; i++) { - unsigned hash = rt_hash_code(daddr, skeys[i], tos); + unsigned hash = rt_hash_code(daddr, skeys[i] ^ (ikeys[k] << 5), + toskeys[j]); rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; @@ -1261,7 +1267,8 @@ rth->fl.fl4_src == skeys[i] && rth->rt_dst == daddr && rth->rt_src == iph->saddr && - rth->fl.fl4_tos == tos && + rth->fl.fl4_tos == toskeys[j] && + rth->fl.oif == ikeys[k] && rth->fl.iif == 0 && !(dst_metric_locked(&rth->u.dst, RTAX_MTU))) { unsigned short mtu = new_mtu; @@ -2214,7 +2221,8 @@ unsigned hash; struct rtable *rth; - hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), flp->fl4_tos); + hash = rt_hash_code(flp->fl4_dst, flp->fl4_src ^ (flp->oif << 5), + flp->fl4_tos & (IPTOS_RT_MASK | RTO_ONLINK)); rcu_read_lock(); for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { Regards -- Julian Anastasov From herbert@gondor.apana.org.au Tue Dec 2 02:10:39 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:11:03 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2AAbTa013648 for ; Tue, 2 Dec 2003 02:10:39 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AR7UD-0003zX-00; Tue, 02 Dec 2003 21:10:29 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AR7U9-0006cM-00; Tue, 02 Dec 2003 21:10:25 +1100 Date: Tue, 2 Dec 2003 21:10:25 +1100 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-ID: <20031202101025.GA25422@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> <20031201220509.GA20827@gondor.apana.org.au> <20031201142131.5da50a07.davem@redhat.com> <20031201152215.522c2447.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031201152215.522c2447.davem@redhat.com> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1812 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Mon, Dec 01, 2003 at 03:22:15PM -0800, David S. Miller wrote: > > You are speaking of a case of zero source addresses. When would > we output such an iph->saddr, by way of a route? Right now this > is the only part I'm not seeing. You're right. My patch is totally bogus. I misread the ip(8) output. I thought that if src wasn't shown that rt_src must be zero. But in fact it means that rt_src == fl4_src. My problem turns out to be that oif != 0 for the outgoing packets. Since frag_needed only handle cache entries where oif == 0 it never has a chance to work. The application that generated these packets is the RPC code in glibc. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From voloterreno@tin.it Tue Dec 2 02:26:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:26:45 -0800 (PST) Received: from vsmtp3.tin.it (vsmtp3.tin.it [212.216.176.223]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2AQLTa014229 for ; Tue, 2 Dec 2003 02:26:22 -0800 Received: from tin.it (80.117.50.2) by vsmtp3.tin.it (7.0.019) (authenticated as voloterreno@tin.it) id 3FCB9BE9000107FA; Mon, 1 Dec 2003 23:27:58 +0100 Message-ID: <3FCBC18A.4000405@tin.it> Date: Mon, 01 Dec 2003 23:32:42 +0100 From: Marcello User-Agent: Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.5) Gecko/20031031 X-Accept-Language: it, en-us, en MIME-Version: 1.0 To: Octave CC: Stephen Hemminger , Jeff Garzik , netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: NAPI 8139too.c for 2.4.23 References: <20031201205038.GK10711@ovh.net> In-Reply-To: <20031201205038.GK10711@ovh.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1813 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: voloterreno@tin.it Precedence: bulk X-list: netdev Octave ha scritto: >Stephen, >I get your patch from http://lwn.net/Articles/54815/ for 2.6.X and >I rewrote it for 2.4.23. Tested with 2.4.23 on high load servers. I >have no more "Too much work at interrupt". > >I dropped it on ftp://ftp.ovh.net/made-in-ovh/8139too.c-2.4-0.9.27 > >Hope it helps. >Octave > >before: >------- ># ps auxw >root 256 0.0 0.0 0 0 ? SW Nov28 0:00 [eth0] ># ifconfig > RX packets:40940899 errors:250542 dropped:7052 overruns:250542 frame:0 > TX packets:33057049 errors:0 dropped:0 overruns:20 carrier:0 ># dmesg >eth0: Setting 100mbps full-duplex based on auto-negotiated partner ability 41e1. >nfs: server X.X.X.X not responding, still trying >nfs: server X.X.X.X OK >eth0: Too much work at interrupt, IntrStatus=0x0040. > >with NAPI >--------- > RX packets:428253 errors:0 dropped:0 overruns:0 frame:0 > TX packets:357949 errors:0 dropped:0 overruns:0 carrier:0 >8139too Fast Ethernet driver 0.9.27 >PCI: Found IRQ 11 for device 00:0b.0 >eth0: RealTek RTL8139 at 0xec00, 00:e0:4c:91:03:b0, IRQ 11 >eth0: Identified 8139 chip type 'RTL-8100B/8139D' > >- >To unsubscribe from this list: send the line "unsubscribe linux-net" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > OCTAVE!! Your rewrite of the driver is great!! You have resolved all my problems with ethernet!! (collisions for now , but I think errors too :) ) You are my new personal HERO!! :D Thanks Marcello From davem@pizda.ninka.net Tue Dec 2 02:28:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:29:02 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2ASmTa014606 for ; Tue, 2 Dec 2003 02:28:48 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id CAA21879; Tue, 2 Dec 2003 02:27:33 -0800 Date: Tue, 2 Dec 2003 02:27:33 -0800 From: "David S. Miller" To: Herbert Xu Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-Id: <20031202022733.43cf693e.davem@redhat.com> In-Reply-To: <20031202101025.GA25422@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> <20031201220509.GA20827@gondor.apana.org.au> <20031201142131.5da50a07.davem@redhat.com> <20031201152215.522c2447.davem@redhat.com> <20031202101025.GA25422@gondor.apana.org.au> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1814 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 21:10:25 +1100 Herbert Xu wrote: > My problem turns out to be that oif != 0 for the outgoing packets. > Since frag_needed only handle cache entries where oif == 0 it > never has a chance to work. > > The application that generated these packets is the RPC code in glibc. That behavior of glibc is incorrect, I know about it, and I explained all this to Uli Drepper some time ago and he fixed it. Current glibc should not be doing this. If it still is, since Uli understood my arguments, it probably just slipped under the rug. Tell me this so I can take care of it. What the glibc code was doing was mirroring the input packet parameters (saddr/daddr/if/etc.) into what it used for the output packet sends. From herbert@gondor.apana.org.au Tue Dec 2 02:33:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:33:50 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2AXUTa015049 for ; Tue, 2 Dec 2003 02:33:34 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AR7q6-00046n-00; Tue, 02 Dec 2003 21:33:06 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AR7q4-0006hJ-00; Tue, 02 Dec 2003 21:33:04 +1100 Date: Tue, 2 Dec 2003 21:33:04 +1100 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [ROUTE] PMTU only works on half the time Message-ID: <20031202103304.GA25658@gondor.apana.org.au> References: <20031201201651.GA20194@gondor.apana.org.au> <20031201204700.GA20349@gondor.apana.org.au> <20031201135154.6906454c.davem@redhat.com> <20031201220509.GA20827@gondor.apana.org.au> <20031201142131.5da50a07.davem@redhat.com> <20031201152215.522c2447.davem@redhat.com> <20031202101025.GA25422@gondor.apana.org.au> <20031202022733.43cf693e.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031202022733.43cf693e.davem@redhat.com> User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1815 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev On Tue, Dec 02, 2003 at 02:27:33AM -0800, David S. Miller wrote: > > That behavior of glibc is incorrect, I know about it, and I explained > all this to Uli Drepper some time ago and he fixed it. Cool. I can confirm that this is definitely fixed with a more recent glibc. -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From pp@ee.oulu.fi Tue Dec 2 02:40:52 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:41:06 -0800 (PST) Received: from ee.oulu.fi (ee.oulu.fi [130.231.61.23]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2AeoTa017107 for ; Tue, 2 Dec 2003 02:40:51 -0800 Received: from tk28.oulu.fi (tk28 [130.231.48.68]) by ee.oulu.fi (8.12.10/8.12.10) with ESMTP id hB2AeRsm003917; Tue, 2 Dec 2003 12:40:27 +0200 (EET) Received: (from pp@localhost) by tk28.oulu.fi (8.12.10/8.12.10/Submit) id hB2AePev009193; Tue, 2 Dec 2003 12:40:25 +0200 (EET) Date: Tue, 2 Dec 2003 12:40:25 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Cc: marcelo.tosatti@cyclades.com.br Subject: Re: [patch] 2.4 lacks dummy SET_NETDEV_DEV Message-ID: <20031202104024.GA9163@ee.oulu.fi> References: <20031110181917.GA25846@ee.oulu.fi> <20031110221430.GA26556@ee.oulu.fi> <20031110221429.04732a57.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20031110221429.04732a57.davem@redhat.com> User-Agent: Mutt/1.4.1i X-archive-position: 1816 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pp@ee.oulu.fi Precedence: bulk X-list: netdev On Mon, Nov 10, 2003 at 10:14:29PM -0800, David S. Miller wrote: > On Tue, 11 Nov 2003 00:14:30 +0200 > Pekka Pietikainen wrote: > > > (I still like the idea of being able to use exactly the same driver > > source on 2.4/2.6 though) > > I agree, someone should merge in the dummy SET_NETDEV_DEV once > Marcelo starts up 2.4.24-preX Ping :-) --- linux-2.4.22-1.2115.nptl/include/linux/netdevice.h.orig 2003-11-10 20:08:42.922635848 +0200 +++ linux-2.4.22-1.2115.nptl/include/linux/netdevice.h 2003-11-10 20:09:09.754556776 +0200 @@ -456,6 +456,8 @@ #endif /* CONFIG_NET_DIVERT */ }; +/* 2.6 compatibility */ +#define SET_NETDEV_DEV(net, pdev) do { } while (0) struct packet_type { -- Pekka Pietikainen From Robert.Olsson@data.slu.se Tue Dec 2 02:44:40 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 02:44:53 -0800 (PST) Received: from mail1.slu.se (mail1.slu.se [130.238.96.11]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2AicTa017488 for ; Tue, 2 Dec 2003 02:44:39 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mail1.slu.se (8.9.3+/8.9.3) with ESMTP id LAA18731; Tue, 2 Dec 2003 11:44:25 +0100 Received: by robur.slu.se (Postfix, from userid 1000) id 86C41EC23B; Tue, 2 Dec 2003 11:44:31 +0100 (CET) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16332.27919.502097.988522@robur.slu.se> Date: Tue, 2 Dec 2003 11:44:31 +0100 To: "Francois Baligant" Cc: Subject: 2.6.0-test11: dst_cache_overflow causing unresponsive box In-Reply-To: <072501c3b874$2542ae70$15fea8c0@fortress> References: <072501c3b874$2542ae70$15fea8c0@fortress> X-Mailer: VM 7.17 under Emacs 21.3.1 X-archive-position: 1817 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Francois Baligant writes: > We have a problem with a box running 2.6.0-test11-mjb1 and supporting around 90k simultaneous TCP connection. After a few hours/days of running, > when a lots of clients connects/disconnects, the console will start to display: > > dst cache overflow > NET: 1860 messages suppressed. > > >From there, the box is completely unresponsive, apparently eating all its CPU in trying to shrink the routing cache. Only solution is reboot. > Current sysctl: > net.ipv4.route.max_size = 655360 # I know we shouldn't rise it that high but it's only cure for now.. it lasts a bit longer like this > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search > 139566 12393 123 0 0 0 0 0 184 21 0 143 142 0 0 26039 375 > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 142340 142296 99% 0.38K 14234 10 56936K ip_dst_cache > > Are we tuning the rt_cache in a wrong way ? No experience with 90k TCP-flows but it seems GC is not able to free some the dst-entries for some reason. This will slowly kill your box with symptoms you describe. We have ask TCP-experts for timer settings to avoid pending sessions etc. Also check slab for any other objects growing as dst cache overflow is most likely secondary effect in your case. rtstat looks sane expect for the high number of dst-entries. Tuning is another story. Cheers. --ro From herbert@gondor.apana.org.au Tue Dec 2 03:02:46 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 03:03:00 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2B2bTa018100 for ; Tue, 2 Dec 2003 03:02:42 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1AR8IT-0004F8-00; Tue, 02 Dec 2003 22:02:25 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1AR8IS-0006lP-00; Tue, 02 Dec 2003 22:02:24 +1100 Date: Tue, 2 Dec 2003 22:02:24 +1100 To: "David S. Miller" , netdev@oss.sgi.com Subject: [RTNETLINK] Provide real oif Message-ID: <20031202110224.GA25957@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline User-Agent: Mutt/1.5.4i From: Herbert Xu X-archive-position: 1818 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: This patch adds a new attribute RTA_PREFOIF to the RTNETLINK interface so that we can send the real oif to user space. Hmm, would RTA_REALOIF be a better name? It really should be called RTA_OIF but that's already taken. Currently if there are two entries in the routing cache that only differ by the value of oif then they will appear identical to user space. The only way to tell them apart would be to export the value of oif from the kernel. Of course this patch by itself doesn't do anything. But once it is in we can extend ip(8) to display it. PS IPv6 doesn't seem to have an oif field so it doesn't need this. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p Index: kernel-source-2.5/include/linux/rtnetlink.h =================================================================== RCS file: /home/gondolin/herbert/src/CVS/debian/kernel-source-2.5/include/linux/rtnetlink.h,v retrieving revision 1.5 diff -u -r1.5 rtnetlink.h --- kernel-source-2.5/include/linux/rtnetlink.h 18 Oct 2003 07:12:47 -0000 1.5 +++ kernel-source-2.5/include/linux/rtnetlink.h 2 Dec 2003 10:52:34 -0000 @@ -200,6 +200,7 @@ RTA_FLOW, RTA_CACHEINFO, RTA_SESSION, + RTA_PREFOIF, }; #define RTA_MAX RTA_SESSION Index: kernel-source-2.5/net//ipv4/route.c =================================================================== RCS file: /home/gondolin/herbert/src/CVS/debian/kernel-source-2.5/net/ipv4/route.c,v retrieving revision 1.3 diff -u -r1.3 route.c --- kernel-source-2.5/net//ipv4/route.c 24 Nov 2003 09:52:04 -0000 1.3 +++ kernel-source-2.5/net//ipv4/route.c 2 Dec 2003 10:57:12 -0000 @@ -2347,6 +2347,8 @@ #endif RTA_PUT(skb, RTA_IIF, sizeof(int), &rt->fl.iif); } + if (rt->fl.oif) + RTA_PUT(skb, RTA_PREFOIF, sizeof(int), &rt->fl.oif); nlh->nlmsg_len = skb->tail - b; return skb->len; --ZGiS0Q5IWpPtfppv-- From davem@pizda.ninka.net Tue Dec 2 03:27:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 03:27:16 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2BR0Ta019330 for ; Tue, 2 Dec 2003 03:27:00 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id DAA22015; Tue, 2 Dec 2003 03:26:06 -0800 Date: Tue, 2 Dec 2003 03:26:06 -0800 From: "David S. Miller" To: Robert Olsson Cc: francois@baligant.net, netdev@oss.sgi.com Subject: Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box Message-Id: <20031202032606.28db927b.davem@redhat.com> In-Reply-To: <16332.27919.502097.988522@robur.slu.se> References: <072501c3b874$2542ae70$15fea8c0@fortress> <16332.27919.502097.988522@robur.slu.se> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1819 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 11:44:31 +0100 Robert Olsson wrote: > No experience with 90k TCP-flows but it seems GC is not able to free some > the dst-entries for some reason. This will slowly kill your box with > symptoms you describe. We have ask TCP-experts for timer settings to avoid > pending sessions etc. Also check slab for any other objects growing as > dst cache overflow is most likely secondary effect in your case. rtstat > looks sane expect for the high number of dst-entries. Tuning is another > story. Let us assume, for the sake of back of the envelope calculations, that all 90k TCP connections speak to unique destinations. Let us further assume that all of them have at least one packet in flight. This means the routing cache must be able to hold at least 90k entries. All of these routing cache entires will be referenced by the packets in the TCP retransmission queues of all the sockets, and thus the entries are unreclaimable. You are setting net.ipv4.route.max_size to 655360 which should be more than enough. But you also have to make the net.ipv4.route.gc_thresh more reasonable as well, perhaps 90K as a test. If net.ipv4.route.gc_thresh is lower than 90K and my assertions above hold, then the kernel will try to garbage collect too early, all the routing cache entries will be in use and therefore uncollectable, and you'll get the message you're seeing. Try to pump up gc_thresh and see if that helps. From davem@pizda.ninka.net Tue Dec 2 03:29:26 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 03:29:40 -0800 (PST) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2BTQTa019699 for ; Tue, 2 Dec 2003 03:29:26 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id DAA22031; Tue, 2 Dec 2003 03:28:25 -0800 Date: Tue, 2 Dec 2003 03:28:25 -0800 From: "David S. Miller" To: Pekka Pietikainen Cc: netdev@oss.sgi.com, marcelo.tosatti@cyclades.com.br Subject: Re: [patch] 2.4 lacks dummy SET_NETDEV_DEV Message-Id: <20031202032825.12ded81f.davem@redhat.com> In-Reply-To: <20031202104024.GA9163@ee.oulu.fi> References: <20031110181917.GA25846@ee.oulu.fi> <20031110221430.GA26556@ee.oulu.fi> <20031110221429.04732a57.davem@redhat.com> <20031202104024.GA9163@ee.oulu.fi> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1820 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 2 Dec 2003 12:40:25 +0200 Pekka Pietikainen wrote: > On Mon, Nov 10, 2003 at 10:14:29PM -0800, David S. Miller wrote: > > I agree, someone should merge in the dummy SET_NETDEV_DEV once > > Marcelo starts up 2.4.24-preX > > Ping :-) Yes Marcelo, please apply this. > --- linux-2.4.22-1.2115.nptl/include/linux/netdevice.h.orig 2003-11-10 20:08:42.922635848 +0200 > +++ linux-2.4.22-1.2115.nptl/include/linux/netdevice.h 2003-11-10 20:09:09.754556776 +0200 > @@ -456,6 +456,8 @@ > #endif /* CONFIG_NET_DIVERT */ > }; > > +/* 2.6 compatibility */ > +#define SET_NETDEV_DEV(net, pdev) do { } while (0) > > struct packet_type > { From kuznet@ms2.inr.ac.ru Tue Dec 2 04:40:45 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 04:41:06 -0800 (PST) Received: from yakov.inr.ac.ru (yakov.inr.ac.ru [193.233.7.111]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2CehTa025892 for ; Tue, 2 Dec 2003 04:40:44 -0800 Received: (from kuznet@localhost) by yakov.inr.ac.ru (8.6.13/ANK) id PAA01536; Tue, 2 Dec 2003 15:40:07 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200312021240.PAA01536@yakov.inr.ac.ru> Subject: Re: IPv6 MIB:ipv6PrefixTable implementation To: mashirle@us.ibm.com (Shirley Ma) Date: Tue, 2 Dec 2003 15:40:07 +0300 (MSK) Cc: netdev@oss.sgi.com, xma@us.ibm.com In-Reply-To: <200311191621.38087.mashirle@us.ibm.com> from "Shirley Ma" at ξΟΡ 19, 2003 04:21:38 X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1821 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > One implementation detail question, do you think I need to save all the o= > ther=20 > Prefix Objects: Type, Origin(addrconf, manually, dhcp, others), These two things should be stored right now. Existing implementation is quite a mess, but we definitely want to remember origin of each route in "protocol" and another flags, they are of common interest. > AutonomoueFlag, AdvPreferredLiftTime and ValidLifeTime ValidLifeTime is "expires" on this route. What's about AdvPreferredLiftTime I am puzzled a little, preferred time is not an attribute of a prefix at all, it is attribute of address, is not it? Unless the prefix is used to install local address it does not make sense, right? Alexey From hadi@cyberus.ca Tue Dec 2 04:47:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 04:47:50 -0800 (PST) Received: from mail.cyberus.ca (mail.cyberus.ca [209.197.145.21]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2ClaTa026369 for ; Tue, 2 Dec 2003 04:47:36 -0800 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1AR9wE-0006Ty-Pg; Tue, 02 Dec 2003 07:47:35 -0500 Subject: Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: Robert Olsson , francois@baligant.net, netdev@oss.sgi.com In-Reply-To: <20031202032606.28db927b.davem@redhat.com> References: <072501c3b874$2542ae70$15fea8c0@fortress> <16332.27919.502097.988522@robur.slu.se> <20031202032606.28db927b.davem@redhat.com> Content-Type: text/plain Organization: jamalopolis Message-Id: <1070369223.1027.34.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Dec 2003 07:47:03 -0500 Content-Transfer-Encoding: 7bit X-archive-position: 1822 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev With that many flows the neighbor cache gc may also be killing him. cheers, jamal From Helmut.BA.Berger@partner.bmw.de Tue Dec 2 08:11:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 08:11:34 -0800 (PST) Received: from mailgw2.bmwgroup.com (mailgw2.bmwgroup.com [192.109.190.191]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2GBGTa011959 for ; Tue, 2 Dec 2003 08:11:19 -0800 Received: from mhub1.muc (mhub1.muc [160.50.97.116]) by mailgw2.bmwgroup.com with ESMTP for netdev@oss.sgi.com; Tue, 2 Dec 2003 17:11:11 +0100 Received: from mail03.muc (mail03.muc [160.50.97.30]) by mhub1.muc with ESMTP for netdev@oss.sgi.com; Tue, 2 Dec 2003 17:11:10 +0100 Received: from ztpc451-L.muc ([10.249.235.55] (may be forged)) by mail03.muc (8.8.6 (PHNE_17190+no byaddr 2)/8.8.6) with ESMTP id RAA21375 for ; Tue, 2 Dec 2003 17:11:10 +0100 (MET) Subject: Inserting a layer between TCP and IP From: Helmut Berger To: netdev@oss.sgi.com Message-Id: <1070381465.1793.16.camel@ztpc451-L.MUC> X-Mailer: Ximian Evolution 1.4.4 Date: Tue, 02 Dec 2003 17:11:05 +0100 MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 1823 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Helmut.BA.Berger@partner.bmw.de Precedence: bulk X-list: netdev Hi, I am not sure if this is the right platform for my question. I am not very familiar with linux driver development. If I am wrong here, please be so kind and tell me, where I should better go to. I want to insert an additional layer between the tcp and ip layer with some special functionality and I am not sure how to do this. I fear it cannot be done with system_calls, can it? That would be very comfortable. I could write a loadable kernel module which registers itself on insert and deregisters itself on remove. Is it possible to insert my special layer at runtime without kernel changes? Or do I have to build a special kernel? Which ist the best way to insert a layer between the existing tcp and ip layer? Thanx, Helmut From Robert.Olsson@data.slu.se Tue Dec 2 09:56:41 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 09:56:56 -0800 (PST) Received: from mail1.slu.se (mail1.slu.se [130.238.96.11]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2HudTa019684 for ; Tue, 2 Dec 2003 09:56:40 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mail1.slu.se (8.9.3+/8.9.3) with ESMTP id SAA24729; Tue, 2 Dec 2003 18:56:21 +0100 Received: by robur.slu.se (Postfix, from userid 1000) id 53C6DEC23B; Tue, 2 Dec 2003 18:56:27 +0100 (CET) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16332.53835.297235.589786@robur.slu.se> Date: Tue, 2 Dec 2003 18:56:27 +0100 To: "David S. Miller" Cc: Robert Olsson , francois@baligant.net, netdev@oss.sgi.com Subject: Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box In-Reply-To: <20031202032606.28db927b.davem@redhat.com> References: <072501c3b874$2542ae70$15fea8c0@fortress> <16332.27919.502097.988522@robur.slu.se> <20031202032606.28db927b.davem@redhat.com> X-Mailer: VM 7.17 under Emacs 21.3.1 X-archive-position: 1824 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev David S. Miller writes: > Let us assume, for the sake of back of the envelope calculations, that > all 90k TCP connections speak to unique destinations. Let us further > assume that all of them have at least one packet in flight. > > This means the routing cache must be able to hold at least 90k entries. > All of these routing cache entires will be referenced by the packets > in the TCP retransmission queues of all the sockets, and thus the > entries are unreclaimable. > > You are setting net.ipv4.route.max_size to 655360 which should be more > than enough. But you also have to make the net.ipv4.route.gc_thresh > more reasonable as well, perhaps 90K as a test. > > If net.ipv4.route.gc_thresh is lower than 90K and my assertions above > hold, then the kernel will try to garbage collect too early, all the > routing cache entries will be in use and therefore uncollectable, > and you'll get the message you're seeing. > > Try to pump up gc_thresh and see if that helps. Yes better tuning as gc_thresh and max_size is in better balance but max_size is same so I'll guess we collect unreclaimable entries util we see dst overflow still'. The long time before overflow is suspect "hours to days" We have to ask if this has ever worked before? I'll guess number of hash buckets should be increased for systems like this. Cheers. --ro From hadi@cyberus.ca Tue Dec 2 11:29:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 11:29:48 -0800 (PST) Received: from mail.cyberus.ca (mail.cyberus.ca [209.197.145.21]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2JTYTa026404 for ; Tue, 2 Dec 2003 11:29:35 -0800 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.9]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1AR9hc-0005LY-7N; Tue, 02 Dec 2003 07:32:28 -0500 Subject: Re: [RTNETLINK] Provide real oif From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: "David S. Miller" , netdev@oss.sgi.com In-Reply-To: <20031202110224.GA25957@gondor.apana.org.au> References: <20031202110224.GA25957@gondor.apana.org.au> Content-Type: text/plain Organization: jamalopolis Message-Id: <1070368316.1031.30.camel@jzny.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 02 Dec 2003 07:31:57 -0500 Content-Transfer-Encoding: 7bit X-archive-position: 1825 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2003-12-02 at 06:02, Herbert Xu wrote: > Hi Dave: > > This patch adds a new attribute RTA_PREFOIF to the RTNETLINK interface > so that we can send the real oif to user space. Hmm, would RTA_REALOIF > be a better name? It really should be called RTA_OIF but that's already > taken. > > Currently if there are two entries in the routing cache that only differ > by the value of oif then they will appear identical to user space. The > only way to tell them apart would be to export the value of oif from > the kernel. > Can you provide a real example (output of route display in user space) where this would be valuable? whats wrong with the combo of RTA_OIF and RTA_SRC? cheers, jamal From herbert@gondor.apana.org.au Tue Dec 2 12:54:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 12:54:57 -0800 (PST) Received: from arnor.me.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2KsUTa032602 for ; Tue, 2 Dec 2003 12:54:34 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.me.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1ARHXJ-0007RO-00; Wed, 03 Dec 2003 07:54:21 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1ARHXF-0007cM-00; Wed, 03 Dec 2003 07:54:17 +1100 From: Herbert Xu To: hadi@cyberus.ca, netdev@oss.sgi.com Subject: Re: [RTNETLINK] Provide real oif Organization: Core In-Reply-To: <1070368316.1031.30.camel@jzny.localdomain> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.2-20031002 ("Berneray") (UNIX) (Linux/2.4.22-1-686-smp (i686)) Message-Id: Date: Wed, 03 Dec 2003 07:54:17 +1100 X-archive-position: 1826 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev jamal wrote: > > Can you provide a real example (output of route display in user space) > where this would be valuable? Without the real oif you will see entries like this in the output of ip r l c: 192.168.0.7 dev eth0 src 192.168.0.6 cache mtu 1500 advmss 1460 metric10 64 192.168.0.7 dev eth0 src 192.168.0.6 cache mtu 1500 advmss 1460 metric10 64 One of those has oif == 0 while the other one has oif == eth0. To generate an entry with oif == eth0, just send a UDP packet and bound to eth0. > whats wrong with the combo of RTA_OIF and RTA_SRC? Both of those attributes are independent of the real oif. Cheers, -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From pavlin@possum.icir.org Tue Dec 2 13:19:37 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 13:19:51 -0800 (PST) Received: from possum.icir.org (possum.icir.org [192.150.187.67]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2LJaTa001015 for ; Tue, 2 Dec 2003 13:19:37 -0800 Received: from possum.icir.org (localhost [127.0.0.1]) by possum.icir.org (8.12.9p1/8.12.3) with ESMTP id hB2LJaSx017575; Tue, 2 Dec 2003 13:19:36 -0800 (PST) (envelope-from pavlin@possum.icir.org) Message-Id: <200312022119.hB2LJaSx017575@possum.icir.org> To: netdev@oss.sgi.com Cc: pavlin@icir.org Subject: A request to add RTPROT_XORP to linux/rtnetlink.h Date: Tue, 02 Dec 2003 13:19:36 -0800 From: Pavlin Radoslavov X-archive-position: 1827 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavlin@icir.org Precedence: bulk X-list: netdev [If this is not the right mailing list for such request, please let me know where I should send it to.] On behalf of the XORP (eXtensible Open Router Platform) project (http://www.xorp.org), I'd like to ask if the XORP-specific protocol value can be included in linux/rtnetlink.h (similar to the values already assigned to other routing daemons/suites such as GateD, etc.) Below I am including the simple one-line patch against linux-2.6.0-test11. Any chance this can be added to 2.6.0 before its release? :) Thanks, Pavlin --- linux-2.6.0-test11/include/linux/rtnetlink.h.org Wed Nov 26 12:45:11 2003 +++ linux-2.6.0-test11/include/linux/rtnetlink.h Tue Dec 2 12:53:18 2003 @@ -138,6 +138,7 @@ #define RTPROT_ZEBRA 11 /* Zebra */ #define RTPROT_BIRD 12 /* BIRD */ #define RTPROT_DNROUTED 13 /* DECnet routing daemon */ +#define RTPROT_XORP 14 /* XORP */ /* rtm_scope From shemminger@osdl.org Tue Dec 2 14:01:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 14:01:49 -0800 (PST) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2M1ZTa001791 for ; Tue, 2 Dec 2003 14:01:35 -0800 Received: from dell_ss3.pdx.osdl.net (IDENT:2997@dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id hB2M0XZ01347; Tue, 2 Dec 2003 14:00:33 -0800 Date: Tue, 2 Dec 2003 14:01:15 -0800 From: Stephen Hemminger To: Krzysztof Halas , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH] (4//8) dscc4 convert to new hdlc_device Message-Id: <20031202140115.239f998f.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.9.6claws (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1828 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1493 -> 1.1494 # drivers/net/wan/dscc4.c 1.51 -> 1.52 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/11/26 shemminger@osdl.org 1.1494 # Convert to work with hdlc_device that is not embedded. # Use list macros for list of ports on card. # -------------------------------------------- # diff -Nru a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c --- a/drivers/net/wan/dscc4.c Wed Nov 26 12:30:13 2003 +++ b/drivers/net/wan/dscc4.c Wed Nov 26 12:30:13 2003 @@ -185,12 +185,14 @@ spinlock_t lock; struct pci_dev *pdev; - struct dscc4_dev_priv *root; + struct list_head devs; dma_addr_t iqcfg_dma; u32 xtal_hz; }; struct dscc4_dev_priv { + struct list_head dev_list; + struct sk_buff *rx_skbuff[RX_RING_SIZE]; struct sk_buff *tx_skbuff[TX_RING_SIZE]; @@ -228,7 +230,7 @@ unsigned short encoding; unsigned short parity; - hdlc_device hdlc; + hdlc_device *hdlc; sync_serial_settings settings; u32 __pad __attribute__ ((aligned (4))); }; @@ -371,9 +373,17 @@ static int dscc4_tx_poll(struct dscc4_dev_priv *, struct net_device *); #endif +static inline struct dscc4_dev_priv *dscc4_root_priv(struct dscc4_pci_priv *ppriv) +{ + struct list_head *first = ppriv->devs.next; + + return (first == &ppriv->devs) ? NULL + : list_entry(first, struct dscc4_dev_priv, dev_list); +} + static inline struct dscc4_dev_priv *dscc4_priv(struct net_device *dev) { - return list_entry(dev, struct dscc4_dev_priv, hdlc.netdev); + return dev_to_hdlc(dev)->dev_data; } static void scc_patchl(u32 mask, u32 value, struct dscc4_dev_priv *dpriv, @@ -636,7 +646,7 @@ struct net_device *dev) { struct RxFD *rx_fd = dpriv->rx_fd + dpriv->rx_current%RX_RING_SIZE; - struct net_device_stats *stats = &dpriv->hdlc.stats; + struct net_device_stats *stats = &dpriv->hdlc->stats; struct pci_dev *pdev = dpriv->pci_priv->pdev; struct sk_buff *skb; int pkt_len; @@ -681,19 +691,16 @@ static void dscc4_free1(struct pci_dev *pdev) { - struct dscc4_pci_priv *ppriv; - struct dscc4_dev_priv *root; - int i; + struct dscc4_pci_priv *ppriv = pci_get_drvdata(pdev); + struct dscc4_dev_priv *dpriv, *dnext; - ppriv = pci_get_drvdata(pdev); - root = ppriv->root; - - for (i = 0; i < dev_per_card; i++) - unregister_hdlc_device(&root[i].hdlc); + list_for_each_entry_safe(dpriv, dnext, &ppriv->devs, dev_list) { + unregister_hdlc_device(dpriv->hdlc); + free_hdlc_device(dpriv->hdlc); + } pci_set_drvdata(pdev, NULL); - kfree(root); kfree(ppriv); } @@ -704,7 +711,6 @@ struct dscc4_dev_priv *dpriv; static int cards_found = 0; unsigned long ioaddr; - int i; printk(KERN_DEBUG "%s", version); @@ -743,7 +749,8 @@ priv = (struct dscc4_pci_priv *)pci_get_drvdata(pdev); - if (request_irq(pdev->irq, &dscc4_irq, SA_SHIRQ, DRV_NAME, priv->root)){ + if (request_irq(pdev->irq, &dscc4_irq, SA_SHIRQ, DRV_NAME, + dscc4_root_priv(priv))) { printk(KERN_WARNING "%s: IRQ %d busy\n", DRV_NAME, pdev->irq); goto err_out_free1; } @@ -772,21 +779,19 @@ * SCC 0-3 private rx/tx irq structures * IQRX/TXi needs to be set soon. Learned it the hard way... */ - for (i = 0; i < dev_per_card; i++) { - dpriv = priv->root + i; + list_for_each_entry(dpriv, &priv->devs, dev_list) { dpriv->iqtx = (u32 *) pci_alloc_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), &dpriv->iqtx_dma); if (!dpriv->iqtx) goto err_out_free_iqtx; - writel(dpriv->iqtx_dma, ioaddr + IQTX0 + i*4); + writel(dpriv->iqtx_dma, ioaddr + IQTX0 + dpriv->dev_id*4); } - for (i = 0; i < dev_per_card; i++) { - dpriv = priv->root + i; + list_for_each_entry(dpriv, &priv->devs, dev_list) { dpriv->iqrx = (u32 *) pci_alloc_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), &dpriv->iqrx_dma); if (!dpriv->iqrx) goto err_out_free_iqrx; - writel(dpriv->iqrx_dma, ioaddr + IQRX0 + i*4); + writel(dpriv->iqrx_dma, ioaddr + IQRX0 + dpriv->dev_id*4); } /* Cf application hint. Beware of hard-lock condition on threshold. */ @@ -804,22 +809,19 @@ return 0; err_out_free_iqrx: - while (--i >= 0) { - dpriv = priv->root + i; + list_for_each_entry(dpriv, &priv->devs, dev_list) { pci_free_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), dpriv->iqrx, dpriv->iqrx_dma); } - i = dev_per_card; err_out_free_iqtx: - while (--i >= 0) { - dpriv = priv->root + i; + list_for_each_entry(dpriv, &priv->devs, dev_list) { pci_free_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), dpriv->iqtx, dpriv->iqtx_dma); } pci_free_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), priv->iqcfg, priv->iqcfg_dma); err_out_free_irq: - free_irq(pdev->irq, priv->root); + free_irq(pdev->irq, dscc4_root_priv(priv)); err_out_free1: dscc4_free1(pdev); err_out_iounmap: @@ -863,29 +865,27 @@ static int dscc4_found1(struct pci_dev *pdev, unsigned long ioaddr) { struct dscc4_pci_priv *ppriv; - struct dscc4_dev_priv *root; + struct dscc4_dev_priv *dpriv, *dnext; int i, ret = -ENOMEM; - root = (struct dscc4_dev_priv *) - kmalloc(dev_per_card*sizeof(*root), GFP_KERNEL); - if (!root) { - printk(KERN_ERR "%s: can't allocate data\n", DRV_NAME); - goto err_out; - } - memset(root, 0, dev_per_card*sizeof(*root)); - ppriv = (struct dscc4_pci_priv *) kmalloc(sizeof(*ppriv), GFP_KERNEL); if (!ppriv) { printk(KERN_ERR "%s: can't allocate private data\n", DRV_NAME); - goto err_free_dev; + goto err_out; } memset(ppriv, 0, sizeof(struct dscc4_pci_priv)); + INIT_LIST_HEAD(&ppriv->devs); for (i = 0; i < dev_per_card; i++) { - struct dscc4_dev_priv *dpriv = root + i; - hdlc_device *hdlc = &dpriv->hdlc; - struct net_device *d = hdlc_to_dev(hdlc); + hdlc_device *hdlc = alloc_hdlc_device(sizeof(*dpriv)); + struct net_device *d; + + if (!hdlc) { + ret = -ENOMEM; + goto err_unregister; + } + d = hdlc_to_dev(hdlc); d->base_addr = ioaddr; d->init = NULL; d->irq = pdev->irq; @@ -898,6 +898,8 @@ SET_MODULE_OWNER(d); SET_NETDEV_DEV(d, &pdev->dev); + dpriv = dscc4_priv(d); + dpriv->hdlc = hdlc; dpriv->dev_id = i; dpriv->pci_priv = ppriv; spin_lock_init(&dpriv->lock); @@ -920,23 +922,25 @@ unregister_hdlc_device(hdlc); goto err_unregister; } + list_add_tail(&dpriv->dev_list, &ppriv->devs); } - ret = dscc4_set_quartz(root, quartz); + + + ret = dscc4_set_quartz(dscc4_root_priv(ppriv), quartz); if (ret < 0) goto err_unregister; - ppriv->root = root; + spin_lock_init(&ppriv->lock); pci_set_drvdata(pdev, ppriv); return ret; err_unregister: - while (--i >= 0) { - dscc4_release_ring(root + i); - unregister_hdlc_device(&root[i].hdlc); + list_for_each_entry_safe(dpriv, dnext, &ppriv->devs, dev_list) { + dscc4_release_ring(dpriv); + unregister_hdlc_device(dpriv->hdlc); + free_hdlc_device(dpriv->hdlc); } kfree(ppriv); -err_free_dev: - kfree(root); err_out: return ret; }; @@ -964,7 +968,7 @@ sync_serial_settings *settings = &dpriv->settings; if (settings->loopback && (settings->clock_type != CLOCK_INT)) { - struct net_device *dev = hdlc_to_dev(&dpriv->hdlc); + struct net_device *dev = hdlc_to_dev(dpriv->hdlc); printk(KERN_INFO "%s: loopback requires clock\n", dev->name); return -1; @@ -1015,7 +1019,7 @@ static int dscc4_open(struct net_device *dev) { struct dscc4_dev_priv *dpriv = dscc4_priv(dev); - hdlc_device *hdlc = &dpriv->hdlc; + hdlc_device *hdlc = dpriv->hdlc; struct dscc4_pci_priv *ppriv; int ret = -EAGAIN; @@ -1467,7 +1471,7 @@ int i, handled = 1; priv = root->pci_priv; - dev = hdlc_to_dev(&root->hdlc); + dev = hdlc_to_dev(root->hdlc); spin_lock_irqsave(&priv->lock, flags); @@ -1518,7 +1522,7 @@ static inline void dscc4_tx_irq(struct dscc4_pci_priv *ppriv, struct dscc4_dev_priv *dpriv) { - struct net_device *dev = hdlc_to_dev(&dpriv->hdlc); + struct net_device *dev = hdlc_to_dev(dpriv->hdlc); u32 state; int cur, loop = 0; @@ -1549,7 +1553,7 @@ if (state & SccEvt) { if (state & Alls) { - struct net_device_stats *stats = &dpriv->hdlc.stats; + struct net_device_stats *stats = &dpriv->hdlc->stats; struct sk_buff *skb; struct TxFD *tx_fd; @@ -1687,7 +1691,7 @@ static inline void dscc4_rx_irq(struct dscc4_pci_priv *priv, struct dscc4_dev_priv *dpriv) { - struct net_device *dev = hdlc_to_dev(&dpriv->hdlc); + struct net_device *dev = hdlc_to_dev(dpriv->hdlc); u32 state; int cur; @@ -1954,23 +1958,21 @@ static void __devexit dscc4_remove_one(struct pci_dev *pdev) { struct dscc4_pci_priv *ppriv; - struct dscc4_dev_priv *root; + struct dscc4_dev_priv *root, *dpriv; unsigned long ioaddr; - int i; ppriv = pci_get_drvdata(pdev); - root = ppriv->root; + root = dscc4_root_priv(ppriv); - ioaddr = hdlc_to_dev(&root->hdlc)->base_addr; + ioaddr = hdlc_to_dev(root->hdlc)->base_addr; dscc4_pci_reset(pdev, ioaddr); free_irq(pdev->irq, root); pci_free_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), ppriv->iqcfg, ppriv->iqcfg_dma); - for (i = 0; i < dev_per_card; i++) { - struct dscc4_dev_priv *dpriv = root + i; + list_for_each_entry(dpriv, &ppriv->devs, dev_list) { dscc4_release_ring(dpriv); pci_free_consistent(pdev, IRQ_RING_SIZE*sizeof(u32), dpriv->iqrx, dpriv->iqrx_dma); From shemminger@osdl.org Tue Dec 2 14:01:34 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 14:01:49 -0800 (PST) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2M1XTa001790 for ; Tue, 2 Dec 2003 14:01:34 -0800 Received: from dell_ss3.pdx.osdl.net (IDENT:2997@dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id hB2M0KZ01320; Tue, 2 Dec 2003 14:00:21 -0800 Date: Tue, 2 Dec 2003 14:01:03 -0800 From: Stephen Hemminger To: Jeff Garzik , Krzysztof Halas Cc: netdev@oss.sgi.com Subject: [PATCH] (1/8) hdlc wan device disembedding Message-Id: <20031202140103.6bb2deb4.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.9.6claws (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1829 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Change the hdlc wan device's to not have the net_device structure embedded inside the hdlc_device structure. This won't work on 2.6 where the net_device structure may need to live after module unload due to sysfs. Instead, use alloc_netdev and setup so that netdev->priv = hdlc and have hdlc->dev_data for device private data. Patch is against net-drivers-2.5-exp tree. This portion breaks the actual hardware drivers but that is fixed in parts 2-5. I don't have any of this hardware so the code has only been tested by compiling and loading the modules. # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1502 -> 1.1503 # drivers/net/wan/hdlc_generic.c 1.14 -> 1.15 # drivers/net/wan/hdlc_cisco.c 1.9 -> 1.10 # drivers/net/wan/hdlc_fr.c 1.9 -> 1.10 # include/linux/hdlc.h 1.10 -> 1.11 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/12/01 shemminger@osdl.org 1.1503 # 1-hdlc_device # -------------------------------------------- # diff -Nru a/drivers/net/wan/hdlc_cisco.c b/drivers/net/wan/hdlc_cisco.c --- a/drivers/net/wan/hdlc_cisco.c Mon Dec 1 14:45:08 2003 +++ b/drivers/net/wan/hdlc_cisco.c Mon Dec 1 14:45:08 2003 @@ -222,8 +222,8 @@ hdlc->state.cisco.settings.timeout * HZ) { hdlc->state.cisco.up = 0; printk(KERN_INFO "%s: Link down\n", hdlc_to_name(hdlc)); - if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + if (netif_carrier_ok(hdlc->netdev)) + netif_carrier_off(hdlc->netdev); } cisco_keepalive_send(hdlc, CISCO_KEEPALIVE_REQ, @@ -256,8 +256,8 @@ static void cisco_stop(hdlc_device *hdlc) { del_timer_sync(&hdlc->state.cisco.timer); - if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + if (netif_carrier_ok(hdlc->netdev)) + netif_carrier_off(hdlc->netdev); } diff -Nru a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c --- a/drivers/net/wan/hdlc_fr.c Mon Dec 1 14:45:08 2003 +++ b/drivers/net/wan/hdlc_fr.c Mon Dec 1 14:45:08 2003 @@ -543,8 +543,8 @@ hdlc->state.fr.reliable = reliable; if (reliable) { - if (!netif_carrier_ok(&hdlc->netdev)) - netif_carrier_on(&hdlc->netdev); + if (!netif_carrier_ok(hdlc->netdev)) + netif_carrier_on(hdlc->netdev); hdlc->state.fr.n391cnt = 0; /* Request full status */ hdlc->state.fr.dce_changed = 1; @@ -558,8 +558,8 @@ } } } else { - if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + if (netif_carrier_ok(hdlc->netdev)) + netif_carrier_off(hdlc->netdev); while (pvc) { /* Deactivate all PVCs */ pvc_carrier(0, pvc); @@ -938,8 +938,8 @@ printk(KERN_DEBUG "fr_start\n"); #endif if (hdlc->state.fr.settings.lmi != LMI_NONE) { - if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + if (netif_carrier_ok(hdlc->netdev)) + netif_carrier_off(hdlc->netdev); hdlc->state.fr.last_poll = 0; hdlc->state.fr.reliable = 0; hdlc->state.fr.dce_changed = 1; diff -Nru a/drivers/net/wan/hdlc_generic.c b/drivers/net/wan/hdlc_generic.c --- a/drivers/net/wan/hdlc_generic.c Mon Dec 1 14:45:08 2003 +++ b/drivers/net/wan/hdlc_generic.c Mon Dec 1 14:45:08 2003 @@ -71,6 +71,7 @@ void hdlc_set_carrier(int on, hdlc_device *hdlc) { + struct net_device *dev = hdlc_to_dev(hdlc); on = on ? 1 : 0; #ifdef DEBUG_LINK @@ -92,14 +93,14 @@ if (hdlc->carrier) { if (hdlc->proto.start) hdlc->proto.start(hdlc); - else if (!netif_carrier_ok(&hdlc->netdev)) - netif_carrier_on(&hdlc->netdev); + else if (!netif_carrier_ok(dev)) + netif_carrier_on(dev); } else { /* no carrier */ if (hdlc->proto.stop) hdlc->proto.stop(hdlc); - else if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + else if (netif_carrier_ok(dev)) + netif_carrier_off(dev); } carrier_exit: @@ -110,6 +111,8 @@ /* Must be called by hardware driver when HDLC device is being opened */ int hdlc_open(hdlc_device *hdlc) { + struct net_device *dev = hdlc_to_dev(hdlc); + #ifdef DEBUG_LINK printk(KERN_DEBUG "hdlc_open carrier %i open %i\n", hdlc->carrier, hdlc->open); @@ -129,11 +132,11 @@ if (hdlc->carrier) { if (hdlc->proto.start) hdlc->proto.start(hdlc); - else if (!netif_carrier_ok(&hdlc->netdev)) - netif_carrier_on(&hdlc->netdev); + else if (!netif_carrier_ok(dev)) + netif_carrier_on(dev); - } else if (netif_carrier_ok(&hdlc->netdev)) - netif_carrier_off(&hdlc->netdev); + } else if (netif_carrier_ok(dev)) + netif_carrier_off(dev); hdlc->open = 1; @@ -224,11 +227,9 @@ } - -int register_hdlc_device(hdlc_device *hdlc) +static void hdlc_setup(struct net_device *dev) { - int result; - struct net_device *dev = hdlc_to_dev(hdlc); + hdlc_device *hdlc = dev->priv; dev->get_stats = hdlc_get_stats; dev->change_mtu = hdlc_change_mtu; @@ -245,17 +246,13 @@ hdlc->open = 0; spin_lock_init(&hdlc->state_lock); - result = dev_alloc_name(dev, "hdlc%d"); - if (result < 0) - return result; - - result = register_netdev(dev); - if (result != 0) - return -EIO; - - return 0; + hdlc->dev_data = (hdlc + 1); } +int register_hdlc_device(hdlc_device *hdlc) +{ + return register_netdev(hdlc_to_dev(hdlc)); +} void unregister_hdlc_device(hdlc_device *hdlc) @@ -266,7 +263,17 @@ rtnl_unlock(); } +hdlc_device *alloc_hdlc_device(int sizeof_priv) +{ + struct net_device *dev = alloc_netdev(sizeof_priv + sizeof(hdlc_device), + "hdlc%d", hdlc_setup); + return dev ? dev_to_hdlc(dev) : NULL; +} +void free_hdlc_device(hdlc_device *hdlc) +{ + free_netdev(hdlc_to_dev(hdlc)); +} MODULE_AUTHOR("Krzysztof Halasa "); MODULE_DESCRIPTION("HDLC support module"); @@ -278,6 +285,8 @@ EXPORT_SYMBOL(hdlc_ioctl); EXPORT_SYMBOL(register_hdlc_device); EXPORT_SYMBOL(unregister_hdlc_device); +EXPORT_SYMBOL(alloc_hdlc_device); +EXPORT_SYMBOL(free_hdlc_device); static struct packet_type hdlc_packet_type = { .type = __constant_htons(ETH_P_HDLC), diff -Nru a/include/linux/hdlc.h b/include/linux/hdlc.h --- a/include/linux/hdlc.h Mon Dec 1 14:45:08 2003 +++ b/include/linux/hdlc.h Mon Dec 1 14:45:08 2003 @@ -96,7 +96,9 @@ typedef struct hdlc_device_struct { /* To be initialized by hardware driver */ - struct net_device netdev; /* master net device - must be first */ + struct net_device *netdev; /* master net device */ + void *dev_data; + struct net_device_stats stats; /* used by HDLC layer to take control over HDLC device from hw driver*/ @@ -180,6 +182,8 @@ /* Exported from hdlc.o */ +hdlc_device *alloc_hdlc_device(int sizeof_priv); +void free_hdlc_device(hdlc_device *hdlc); /* Called by hardware driver when a user requests HDLC service */ int hdlc_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); @@ -188,18 +192,15 @@ int register_hdlc_device(hdlc_device *hdlc); void unregister_hdlc_device(hdlc_device *hdlc); - static __inline__ struct net_device* hdlc_to_dev(hdlc_device *hdlc) { - return &hdlc->netdev; + return hdlc->netdev; } - -static __inline__ hdlc_device* dev_to_hdlc(struct net_device *dev) +static __inline__ hdlc_device *dev_to_hdlc(struct net_device *dev) { - return (hdlc_device*)dev; + return dev->priv; } - static __inline__ pvc_device* dev_to_pvc(struct net_device *dev) { From shemminger@osdl.org Tue Dec 2 14:02:30 2003 Received: with ECARTIS (v1.0.0; list netdev); Tue, 02 Dec 2003 14:02:43 -0800 (PST) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id hB2M2TTa002051 for ; Tue, 2 Dec 2003 14:02:29 -0800 Received: from dell_ss3.pdx.osdl.net (IDENT:2997@dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id hB2M1YZ01464; Tue, 2 Dec 2003 14:01:34 -0800 Date: Tue, 2 Dec 2003 14:02:17 -0800 From: Stephen Hemminger To: Krzysztof Halas , Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH] (5/8) farsync - hdlc_device conversion Message-Id: <20031202140217.776fdff6.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.9.6claws (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,V