From davem@pizda.ninka.net Wed Oct 1 00:09:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 00:10:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h9179NFx029468 for ; Wed, 1 Oct 2003 00:09:23 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA05395; Wed, 1 Oct 2003 00:05:24 -0700 Date: Wed, 1 Oct 2003 00:05:24 -0700 From: "David S. Miller" To: "Chad N. Tindel" Cc: fubar@us.ibm.com, shmulik.hen@intel.com, jgarzik@pobox.com, chad@tindel.net, bonding-devel@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues Message-Id: <20031001000524.7e0d851e.davem@redhat.com> In-Reply-To: <20030930213650.GA71877@calma.pair.com> References: <200309301442.31991.shmulik.hen@intel.com> <200309301639.h8UGdqCq026858@death.ibm.com> <20030930213650.GA71877@calma.pair.com> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 422 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 30 Sep 2003 17:36:50 -0400 "Chad N. Tindel" wrote: > My recommendations are more towards the middle than either end. I would > like to see us get rid of the _OLD ioctls in the 2.6 kernel specifically > because it uses the SIOCDEVPRIVATE ioctls. ... > I would like to see them stay in 2.4 for the rest of the 2.4 tree > specifically so that people who want to run on 3 year old systems > can continue to do so without us breaking their world. I think this is fine, personally. I defer to Jeff for final judgment, he should be allowed to chime in at least once more. From davem@pizda.ninka.net Wed Oct 1 00:12:05 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 00:12:42 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h917C4Fx029964 for ; Wed, 1 Oct 2003 00:12:04 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA05410; Wed, 1 Oct 2003 00:07:04 -0700 Date: Wed, 1 Oct 2003 00:07:04 -0700 From: "David S. Miller" To: jt@hpl.hp.com Cc: jt@bougret.hpl.hp.com, shemminger@osdl.org, netdev@oss.sgi.com, irda-users@lists.sourceforge.net Subject: Re: [PATCH] (0/16) intro to IRDA patches for 2.6.0-test6 Message-Id: <20031001000704.252e4737.davem@redhat.com> In-Reply-To: <20030930230049.GA22339@bougret.hpl.hp.com> References: <20030930152530.1e279c29.shemminger@osdl.org> <20030930230049.GA22339@bougret.hpl.hp.com> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 423 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Tue, 30 Sep 2003 16:00:49 -0700 Jean Tourrilhes wrote: > On Tue, Sep 30, 2003 at 03:25:30PM -0700, Stephen Hemminger wrote: > > David, please apply after Jean gives his approval. ... > Please go ahead, I'll test them in parallel. > Thanks ! Great, I'm applying Stephen's patches now. From linux-netdev@gmane.org Wed Oct 1 01:02:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 01:03:20 -0700 (PDT) Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h9182MFx004243 for ; Wed, 1 Oct 2003 01:02:43 -0700 Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 1A4bwC-00015B-00 for ; Wed, 01 Oct 2003 10:02:20 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: netdev@oss.sgi.com Received: from sea.gmane.org ([80.91.224.252]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1A4bwB-000151-00 for ; Wed, 01 Oct 2003 10:02:19 +0200 Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian)) id 1A4bwA-0008AU-00 for ; Wed, 01 Oct 2003 10:02:18 +0200 From: Florian Zwoch Subject: Re: e1000 -> 82540EM on linux 2.6.0-test[45] very slow in one direction Date: Wed, 01 Oct 2003 10:02:18 +0200 Lines: 59 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030909 Thunderbird/0.2 X-Accept-Language: en-us, en In-Reply-To: Cc: linux-kernel@vger.kernel.org X-archive-position: 424 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zwoch@backendmedia.com Precedence: bulk X-list: netdev issue seems to partly solved. the e1000 driver seems to be ok! i reconfigured my kernel and intentionally left out netfilter options. after that my network performance was back to normal. netfilter was only compiled in the kernel. it was not used with any rules! so my wild guess would be that something with the netfilter code (i am not 100% sure it was netfilter.. _maybe_ it was some small odd kernel option i accidently enabled/disabled) is broken since test3 (again uncertified. but i firstly noticed this switching from test3 to test4). Florian Zwoch wrote: > hi, > this has been discussed very roughly before. but unfortunately no real > solution has been brought up so far (or i have not read it yet). > > problem in short: the 82540EM intel gigabit adapter became very slow as > of 2.6.0-test4. maybe earlier versions were als affected aswell, but i > noticed this behaviour on test4 and later. the 'slowness' of the adapter > only affects a certain data direction. i performed the following tests > to show you what is wrong. > > dummy data file was 34257856 bytes (34.3MB). > test machines were a pentium4 with the intel adapter, and a pentium2 266 > with a lowcost realtek card (runs linux 2.4). > > SCP: > e1000 -> 8139too 28.6KB/s > e1000 <- 8139too 4.6MB/s > > SMB: > e1000 -> 8139too 3.0MB/s > e1000 <- 8139too 3.3MB/s > > FTP > e1000 -> 8139too 54KB/s > e1000 <- 8139too 9.4MB/s > > as you can see reveiving data is no problem at all (maybe another > protocol can create some problems in this case?). but sending data is > awesome slow! exception is the samba protocol. why is that? i thought > that samba may use udp instead of tcp. but iptraf did not show any udp > packets going around so i guess i was wrong. > > the problem gets worse while trying to test things over the internet. > scp stalls incredibly often on my 256kbit/s upstream. so does ftp and > irc dcc protocol. irc dcc ends up with sending 0.3KB/s on a megabyte > sized file. > > before people again trying to tell me that some duplex settings could be > messed up - then tell me why this should happen. when i boot into 2.4 > kernel with that test machine the nic works without problems. so IF > duplex stuff is the reason for the hickups something must be wrong with > the duplex detection code in the new driver/kernel? > > i tried vanilla 2.6.0-test5, 2.6.0-test5-mm2 and mm3 + 2.6.0-test5-bk4. > none of these gave any difference regarding network performance. From scott.feldman@intel.com Wed Oct 1 01:19:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 01:20:23 -0700 (PDT) Received: from hermes.hd.intel.com (fmr09.intel.com [192.52.57.35]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h918JmFx005756 for ; Wed, 1 Oct 2003 01:19:49 -0700 Received: from petasus.hd.intel.com (petasus.hd.intel.com [10.127.45.3]) by hermes.hd.intel.com (8.11.6-20030918-01/8.11.6/d: outer.mc,v 1.83 2003/09/05 14:45:27 rfjohns1 Exp $) with ESMTP id h918HK423372 for ; Wed, 1 Oct 2003 08:17:20 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by petasus.hd.intel.com (8.11.6-20030918-01/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h918EGj09441 for ; Wed, 1 Oct 2003 08:14:16 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003100101194211202 ; Wed, 01 Oct 2003 01:19:42 -0700 Received: from orsmsx402.amr.corp.intel.com ([192.168.65.208]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 1 Oct 2003 01:19:41 -0700 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-MimeOLE: Produced By Microsoft Exchange V6.0.6487.1 Subject: RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119 Date: Wed, 1 Oct 2003 01:19:41 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Fw: Badness in local_bh_enable at kernel/softirq.c:119 Thread-Index: AcOH6PnqvodJajp4QQ2HBr81WTDsWgABV2CQ From: "Feldman, Scott" To: "David S. Miller" Cc: , , , "cramerj" X-OriginalArrivalTime: 01 Oct 2003 08:19:41.0980 (UTC) FILETIME=[C39815C0:01C387F4] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h918JmFx005756 X-archive-position: 425 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: scott.feldman@intel.com Precedence: bulk X-list: netdev > Why do you even need to use IRQ locking here? > > Your e1000 netdev->hard_start_xmit method doesn't need to do > anything special, why does this timer code? I suppose you > need to synchronize with e1000_clean_tx_irq() in the non-NAPI > case right? If so, that's not being accomplished by what > your code is doing. If nobody else takes that xmit_lock in > an IRQ disabling manner, the e1000 timer code doing so > doesn't make any difference. > > I have an idea for attacking the problem, once you figure out > what kind of locking you really need. Do whatever you need > to do to synchronize on the hardware side, but instead of > directly freeing the SKB, add each one to a list. A pointer > to the head of this list is stored on the stack of the timer > routine, and passed down into the TX purger. > > Then at the top level you can drop all your locks, re-enable > hw IRQs and whatever else you need to do, then pass the SKBs > in the list off to dev_kfree_skb_irq() (this is the > appropriate routine to call to free an SKB from a timer > handler, which runs in soft interrupt context). Chris can jump in here anytime. :-) Synchronizing on the hardware side is stumping me. We have the list of skbs you describe, but I'm concerned about unmapping the skb buffers if hardware is right in the middle of some DMA on one of the buffers. Some archs really don't like hardware accessing unmapped buffers. Here's what I'm thinking: when link down is detected in the timer, just trick hardware into thinking link is still up (ILOS - Invert Loss of Signal). No locking, no disabling of interrupts. Hardware will do the natural thing by completing the outstanding sends and also provide the interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq). Something like: if lost link if outstanding Tx work set ILOS // h/w thinks link is up, DMA continues mdelay(10) clear ILOS // h/w thinks link is down The mdelay(10) is terrible, but we've already got that in the current tx_flush routine. Chris, what am I missing? I didn't included the ANE business for clarity. -scott From davem@pizda.ninka.net Wed Oct 1 01:41:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 01:41:44 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h918fAFx007693 for ; Wed, 1 Oct 2003 01:41:10 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA05942; Wed, 1 Oct 2003 01:37:10 -0700 Date: Wed, 1 Oct 2003 01:37:10 -0700 From: "David S. Miller" To: "Feldman, Scott" Cc: jgarzik@pobox.com, akpm@osdl.org, netdev@oss.sgi.com, cramerj@intel.com Subject: Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119 Message-Id: <20031001013710.425038fd.davem@redhat.com> In-Reply-To: References: X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 426 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev On Wed, 1 Oct 2003 01:19:41 -0700 "Feldman, Scott" wrote: > Synchronizing on the hardware side is stumping me. We have the list of > skbs you describe, but I'm concerned about unmapping the skb buffers if > hardware is right in the middle of some DMA on one of the buffers. > Some archs really don't like hardware accessing unmapped buffers. Good point, if the e1000 accesses the DMA buffer after the unmap it will cause many arch's to signal PCI errors since the IOMMU will no longer have a valid translation for those DMA requests. > Here's what I'm thinking: when link down is detected in the timer, just > trick hardware into thinking link is still up (ILOS - Invert Loss of > Signal). No locking, no disabling of interrupts. Hardware will do the > natural thing by completing the outstanding sends and also provide the > interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq). If you can make that work, it's the simplest fix. From shmulik.hen@intel.com Wed Oct 1 01:49:18 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 01:49:54 -0700 (PDT) Received: from hermes.hd.intel.com (fmr09.intel.com [192.52.57.35]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h918nHFx009002 for ; Wed, 1 Oct 2003 01:49:18 -0700 Received: from petasus.hd.intel.com (petasus.hd.intel.com [10.127.45.3]) by hermes.hd.intel.com (8.11.6-20030918-01/8.11.6/d: outer.mc,v 1.83 2003/09/05 14:45:27 rfjohns1 Exp $) with ESMTP id h918kn428879 for ; Wed, 1 Oct 2003 08:46:49 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by petasus.hd.intel.com (8.11.6-20030918-01/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h918hjj15997 for ; Wed, 1 Oct 2003 08:43:45 GMT Received: from jrslxjul4.npdj.intel.com ([10.12.254.188]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003100101490517770 ; Wed, 01 Oct 2003 01:49:07 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Shmulik Hen Reply-To: shmulik.hen@intel.com Organization: Intel corp. To: "David S. Miller" , "Chad N. Tindel" , , Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues Date: Wed, 1 Oct 2003 11:49:04 +0300 User-Agent: KMail/1.4.3 Cc: , References: In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200310011149.04612.shmulik.hen@intel.com> X-archive-position: 427 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev On Wednesday 01 October 2003 10:05 am, David S. Miller wrote: > On Tue, 30 Sep 2003 17:36:50 -0400 > > "Chad N. Tindel" wrote: > > My recommendations are more towards the middle than either end. > > I would like to see us get rid of the _OLD ioctls in the 2.6 > > kernel specifically because it uses the SIOCDEVPRIVATE ioctls. > > ... > > > I would like to see them stay in 2.4 for the rest of the 2.4 tree > > specifically so that people who want to run on 3 year old systems > > can continue to do so without us breaking their world. > > I think this is fine, personally. > > I defer to Jeff for final judgment, he should be allowed to chime > in at least once more. > So here is what I did in the meantime: * Created a version for 2.4 that puts back all old compatibility stuff that was removed either during the propagation set or the cleanup set. * Created a version for 2.6 that puts back just the compatibility stuff that was removed in the propagation set (BOND_SETHWADDR, since we got a complaint from a RH9 user). * Removed the mention of the multicast param from the read-me. * Raised the ABI version to 2 so the new ifenslave keeps propagating IP settings to slaves for older drivers, and doesn't do that for new ones that contain Willy Tarreau's panic fix. As for not putting new stuff in 2.4.x kernels, here is where we stand; We believe new distributions based on 2.4.x kernels will keep showing for at least a year, probably longer, and that customers would like to see more bonding features in those distributions, so our intention is to keep getting new stuff into 2.4. We understand the drive to put new stuff into 2.6 and backport to 2.4 from time to time, but we'll really need to keep doing stuff the current way for a while. The cleanup stuff came up as a necessity before developing the next set of features, and those are all based on a cleaned-up bonding, so delaying the acceptance of the cleanup into 2.4 also delays our features acceptance. I'm waiting for the final word from everyone. I'll need to test the two new versions, but then I can release them accordingly. -- | Shmulik Hen Advanced Network Services | | Israel Design Center, Jerusalem | | LAN Access Division, Platform Networking | | Intel Communications Group, Intel corp. | From andi@averellmail.firstfloor.org Wed Oct 1 05:12:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 05:13:09 -0700 (PDT) Received: from zero.aec.at (Fogarty.Weffing@zero.aec.at [193.170.194.10]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91CCXFx030504 for ; Wed, 1 Oct 2003 05:12:34 -0700 Received: from fred.muc.de (Atka.Mip@localhost.localdomain [127.0.0.1]) by zero.aec.at (8.11.6/8.11.2) with ESMTP id h91CCPS09724; Wed, 1 Oct 2003 14:12:26 +0200 Received: by fred.muc.de (Postfix on SuSE Linux 7.3 (i386), from userid 500) id C6DD15BBEE; Wed, 1 Oct 2003 14:12:26 +0200 (CEST) Date: Wed, 1 Oct 2003 14:12:26 +0200 From: Andi Kleen To: netdev@oss.sgi.com Cc: mingo@redhat.com Subject: [PATCH] Fix ppro csum_partial for 1 byte unaligned buffers Message-ID: <20031001121226.GA11676@averell> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-archive-position: 429 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Content-Length: 2003 Lines: 78 When using sendfile it can happen that csum_partial is called for memory areas that are not aligned to a 2 byte boundary. The ppro optimized i386 checksum code handled this slowly, but read upto 3 bytes over the end of the buffer. When the skb contents are mapped from highmem this can be fatal because the end of the buffer can be unmapped. This patch fixes this in a simple non intrusive way by handling the possible fault and recovering from it by using a tolerant byte-by-byte copy. It does not attempt to align one byte unaligned buffers, because that's rather complicated and probably not worth the effort. Other architectures may want to audit their csum_partial if it handles this case correctly. Bug is in 2.4 and 2.6 -Andi diff -u linux/arch/i386/lib/checksum.S-o linux/arch/i386/lib/checksum.S --- linux/arch/i386/lib/checksum.S-o 2003-03-07 16:48:01.000000000 +0100 +++ linux/arch/i386/lib/checksum.S 2003-10-01 14:01:31.000000000 +0200 @@ -48,6 +48,9 @@ * least a twofold speedup on 486 and Pentium if it is 4-byte aligned. * Fortunately, it is easy to convert 2-byte alignment to 4-byte * alignment for the unrolled loop. + * + * Danger, Will Robinson: with sendfile 2 byte alignment is not guaranteed. + * */ csum_partial: pushl %esi @@ -237,18 +240,37 @@ movl $0xffffff,%ebx # by the shll and shrl instructions shll $3,%ecx shrl %cl,%ebx - andl -128(%esi),%ebx # esi is 4-aligned so should be ok +.Ltail: + andl -128(%esi),%ebx +.Ttail_finished addl %ebx,%eax adcl $0,%eax 80: testl $1, 12(%esp) jz 90f roll $8, %eax -90: +90: popl %ebx popl %esi ret - + + .section __ex_table,"a" + .long .Ltail,tail_recover + .long .Ltail_byte3,.Ltail_byte1 + .long .Ltail_byte2,.Ltail_finished + .previous + +tail_recover: + xorl %ebx,%ebx +.Ltail_byte3: + movb -126(%esi),%bl + shl $16,%ebx +.Ltail_byte1: + movb -128(%esi),%bl +.Ltail_byte2: + movb -127(%esi),%bh + jmp .Ltailfinished + #endif /* From chas@cmf.nrl.navy.mil Wed Oct 1 05:17:21 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 05:17:54 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91CHKFx030849 for ; Wed, 1 Oct 2003 05:17:21 -0700 Received: from cmf.nrl.navy.mil (thirdoffive.cmf.nrl.navy.mil [134.207.10.180]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h91BYPkT003172 for ; Wed, 1 Oct 2003 07:34:25 -0400 (EDT) Message-Id: <200310011134.h91BYPkT003172@ginger.cmf.nrl.navy.mil> To: netdev@oss.sgi.com Subject: [RFC] add rtnl semaphore to linux-atm Reply-To: chas3@users.sourceforge.net Date: Wed, 01 Oct 2003 07:34:25 -0400 From: chas williams X-Spam-Score: () hits=0.5 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 430 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev Content-Length: 9239 Lines: 360 i am thinking about doing the following to fix the race during ATM_ITF_ANY operation. rtnl is held across registration/unregistration. this means that you can get read-only access to the device list by holding rtnl or a read_lock on atm_dev_lock similar to the scheme used by netdevice (or so i think). (the register_atmdevice/unregister just make it easier to see where one might call netdevice instead in the future) ===== drivers/atm/atmtcp.c 1.20 vs edited ===== --- 1.20/drivers/atm/atmtcp.c Tue Sep 23 19:22:15 2003 +++ edited/drivers/atm/atmtcp.c Mon Sep 29 21:34:36 2003 @@ -378,7 +378,7 @@ struct atm_dev *dev; dev = NULL; - if (itf != -1) dev = atm_dev_lookup(itf); + if (itf != -1) dev = atm_dev_get_by_index(itf); if (dev) { if (dev->ops != &atmtcp_v_dev_ops) { atm_dev_put(dev); @@ -415,7 +415,7 @@ struct atm_dev *dev; struct atmtcp_dev_data *dev_data; - dev = atm_dev_lookup(itf); + dev = atm_dev_get_by_index(itf); if (!dev) return -ENODEV; if (dev->ops != &atmtcp_v_dev_ops) { atm_dev_put(dev); ===== include/linux/atmdev.h 1.32 vs edited ===== --- 1.32/include/linux/atmdev.h Tue Sep 23 18:19:10 2003 +++ edited/include/linux/atmdev.h Mon Sep 29 21:59:18 2003 @@ -388,7 +388,7 @@ struct atm_dev *atm_dev_register(const char *type,const struct atmdev_ops *ops, int number,unsigned long *flags); /* number == -1: pick first available */ -struct atm_dev *atm_dev_lookup(int number); +struct atm_dev *atm_dev_get_by_index(int ifindex); void atm_dev_deregister(struct atm_dev *dev); void shutdown_atm_dev(struct atm_dev *dev); void vcc_insert_socket(struct sock *sk); @@ -435,11 +435,12 @@ { atomic_dec(&dev->refcnt); - if ((atomic_read(&dev->refcnt) == 1) && + if ((atomic_read(&dev->refcnt) == 0) && test_bit(ATM_DF_CLOSE,&dev->flags)) shutdown_atm_dev(dev); } +#define __atm_dev_put(dev) atomic_dec(&(dev)->refcnt) int atm_charge(struct atm_vcc *vcc,int truesize); struct sk_buff *atm_alloc_charge(struct atm_vcc *vcc,int pdu_size, ===== net/atm/common.c 1.54 vs edited ===== --- 1.54/net/atm/common.c Tue Sep 23 13:38:28 2003 +++ edited/net/atm/common.c Mon Sep 29 22:24:27 2003 @@ -426,7 +426,7 @@ vcc->qos.rxtp.traffic_class == ATM_ANYCLASS) return -EINVAL; if (itf != ATM_ITF_ANY) { - dev = atm_dev_lookup(itf); + dev = atm_dev_get_by_index(itf); if (!dev) return -ENODEV; error = __vcc_connect(vcc, dev, vpi, vci); @@ -435,21 +435,19 @@ return error; } } else { - struct list_head *p, *next; + struct list_head *p; dev = NULL; - spin_lock(&atm_dev_lock); - list_for_each_safe(p, next, &atm_devs) { + rtnl_lock(); + list_for_each(p, &atm_devs) { dev = list_entry(p, struct atm_dev, dev_list); atm_dev_hold(dev); - spin_unlock(&atm_dev_lock); if (!__vcc_connect(vcc, dev, vpi, vci)) break; - atm_dev_put(dev); + __atm_dev_put(dev); dev = NULL; - spin_lock(&atm_dev_lock); } - spin_unlock(&atm_dev_lock); + rtnl_unlock(); if (!dev) return -ENODEV; } ===== net/atm/resources.c 1.21 vs edited ===== --- 1.21/net/atm/resources.c Thu Sep 11 06:41:52 2003 +++ edited/net/atm/resources.c Tue Sep 30 07:10:43 2003 @@ -24,7 +24,7 @@ LIST_HEAD(atm_devs); -spinlock_t atm_dev_lock = SPIN_LOCK_UNLOCKED; +static rwlock_t atm_dev_lock = RW_LOCK_UNLOCKED; static struct atm_dev *__alloc_atm_dev(const char *type) { @@ -47,7 +47,7 @@ kfree(dev); } -static struct atm_dev *__atm_dev_lookup(int number) +static struct atm_dev *__atm_dev_get_by_index(int number) { struct atm_dev *dev; struct list_head *p; @@ -55,27 +55,65 @@ list_for_each(p, &atm_devs) { dev = list_entry(p, struct atm_dev, dev_list); if ((dev->ops) && (dev->number == number)) { - atm_dev_hold(dev); return dev; } } return NULL; } -struct atm_dev *atm_dev_lookup(int number) +struct atm_dev *atm_dev_get_by_index(int number) { struct atm_dev *dev; - spin_lock(&atm_dev_lock); - dev = __atm_dev_lookup(number); - spin_unlock(&atm_dev_lock); + read_lock(&atm_dev_lock); + dev = __atm_dev_get_by_index(number); + if (dev) + atm_dev_hold(dev); + read_unlock(&atm_dev_lock); return dev; } +static int register_atmdevice(struct atm_dev *dev) +{ + write_lock_irq(&atm_dev_lock); + list_add_tail(&dev->dev_list, &atm_devs); + atm_dev_hold(dev); + write_unlock_irq(&atm_dev_lock); + + if (atm_proc_dev_register(dev) < 0) { + printk(KERN_ERR "atm_dev_register: " + "atm_proc_dev_register failed for dev %s\n", + dev->type); + write_lock_irq(&atm_dev_lock); + list_del(&dev->dev_list); + write_unlock_irq(&atm_dev_lock); + return -EIO; + } + + return 0; +} + +static int atm_dev_alloc_index(struct atm_dev *dev, int number) +{ + if (number != -1) { + if ((__atm_dev_get_by_index(number))) + return -EBUSY; + } else { + number = 0; + while ((__atm_dev_get_by_index(number))) { + number++; + } + } + dev->number = number; + + return 0; +} + struct atm_dev *atm_dev_register(const char *type, const struct atmdev_ops *ops, int number, unsigned long *flags) { - struct atm_dev *dev, *inuse; + struct atm_dev *dev; + int err; dev = __alloc_atm_dev(type); if (!dev) { @@ -83,60 +121,51 @@ type); return NULL; } - spin_lock(&atm_dev_lock); - if (number != -1) { - if ((inuse = __atm_dev_lookup(number))) { - atm_dev_put(inuse); - spin_unlock(&atm_dev_lock); - __free_atm_dev(dev); - return NULL; - } - dev->number = number; - } else { - dev->number = 0; - while ((inuse = __atm_dev_lookup(dev->number))) { - atm_dev_put(inuse); - dev->number++; - } - } dev->ops = ops; if (flags) dev->flags = *flags; - else - memset(&dev->flags, 0, sizeof(dev->flags)); - memset(&dev->stats, 0, sizeof(dev->stats)); - atomic_set(&dev->refcnt, 1); - list_add_tail(&dev->dev_list, &atm_devs); - spin_unlock(&atm_dev_lock); - if (atm_proc_dev_register(dev) < 0) { - printk(KERN_ERR "atm_dev_register: " - "atm_proc_dev_register failed for dev %s\n", - type); - spin_lock(&atm_dev_lock); - list_del(&dev->dev_list); - spin_unlock(&atm_dev_lock); + rtnl_lock(); + + err = atm_dev_alloc_index(dev, number); + if (err < 0) + goto out; + + err = register_atmdevice(dev); + +out: + rtnl_unlock(); + + if (err < 0) { __free_atm_dev(dev); - return NULL; + dev = NULL; } - + return dev; } +static void unregister_atmdevice(struct atm_dev *dev) +{ + atm_proc_dev_deregister(dev); + + write_lock_irq(&atm_dev_lock); + list_del(&dev->dev_list); + write_unlock_irq(&atm_dev_lock); + + atm_dev_put(dev); +} void atm_dev_deregister(struct atm_dev *dev) { unsigned long warning_time; - atm_proc_dev_deregister(dev); - - spin_lock(&atm_dev_lock); - list_del(&dev->dev_list); - spin_unlock(&atm_dev_lock); + rtnl_lock(); + unregister_atmdevice(dev); + rtnl_unlock(); warning_time = jiffies; - while (atomic_read(&dev->refcnt) != 1) { + while (atomic_read(&dev->refcnt) != 0) { current->state = TASK_INTERRUPTIBLE; schedule_timeout(HZ / 4); current->state = TASK_RUNNING; @@ -153,7 +182,7 @@ void shutdown_atm_dev(struct atm_dev *dev) { - if (atomic_read(&dev->refcnt) > 1) { + if (atomic_read(&dev->refcnt) > 0) { set_bit(ATM_DF_CLOSE, &dev->flags); return; } @@ -217,23 +246,23 @@ return -EFAULT; if (get_user(len, &((struct atm_iobuf *) arg)->length)) return -EFAULT; - spin_lock(&atm_dev_lock); + read_lock(&atm_dev_lock); list_for_each(p, &atm_devs) size += sizeof(int); if (size > len) { - spin_unlock(&atm_dev_lock); + read_unlock(&atm_dev_lock); return -E2BIG; } tmp_buf = tmp_bufp = kmalloc(size, GFP_ATOMIC); if (!tmp_buf) { - spin_unlock(&atm_dev_lock); + read_unlock(&atm_dev_lock); return -ENOMEM; } list_for_each(p, &atm_devs) { dev = list_entry(p, struct atm_dev, dev_list); *tmp_bufp++ = dev->number; } - spin_unlock(&atm_dev_lock); + read_unlock(&atm_dev_lock); error = (copy_to_user(buf, tmp_buf, size) || put_user(size, &((struct atm_iobuf *) arg)->length)) ? -EFAULT : 0; @@ -248,7 +277,7 @@ if (get_user(number, &((struct atmif_sioc *) arg)->number)) return -EFAULT; - if (!(dev = atm_dev_lookup(number))) + if (!(dev = atm_dev_get_by_index(number))) return -ENODEV; switch (cmd) { @@ -411,13 +440,13 @@ void *atm_dev_seq_start(struct seq_file *seq, loff_t *pos) { - spin_lock(&atm_dev_lock); + read_lock(&atm_dev_lock); return *pos ? dev_get_idx(*pos) : (void *) 1; } void atm_dev_seq_stop(struct seq_file *seq, void *v) { - spin_unlock(&atm_dev_lock); + read_unlock(&atm_dev_lock); } void *atm_dev_seq_next(struct seq_file *seq, void *v, loff_t *pos) @@ -430,5 +459,5 @@ EXPORT_SYMBOL(atm_dev_register); EXPORT_SYMBOL(atm_dev_deregister); -EXPORT_SYMBOL(atm_dev_lookup); +EXPORT_SYMBOL(atm_dev_get_by_index); EXPORT_SYMBOL(shutdown_atm_dev); ===== net/atm/resources.h 1.13 vs edited ===== --- 1.13/net/atm/resources.h Mon Sep 8 13:27:12 2003 +++ edited/net/atm/resources.h Mon Sep 29 22:03:48 2003 @@ -11,7 +11,6 @@ extern struct list_head atm_devs; -extern spinlock_t atm_dev_lock; int atm_dev_ioctl(unsigned int cmd, unsigned long arg); From davem@pizda.ninka.net Wed Oct 1 05:46:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 05:47:01 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91CkRFx031412 for ; Wed, 1 Oct 2003 05:46:27 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id FAA07095; Wed, 1 Oct 2003 05:42:26 -0700 Date: Wed, 1 Oct 2003 05:42:26 -0700 From: "David S. Miller" To: chas3@users.sourceforge.net Cc: chas@cmf.nrl.navy.mil, netdev@oss.sgi.com Subject: Re: [RFC] add rtnl semaphore to linux-atm Message-Id: <20031001054226.126cea7b.davem@redhat.com> In-Reply-To: <200310011134.h91BYPkT003172@ginger.cmf.nrl.navy.mil> References: <200310011134.h91BYPkT003172@ginger.cmf.nrl.navy.mil> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 431 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 596 Lines: 15 On Wed, 01 Oct 2003 07:34:25 -0400 chas williams wrote: > i am thinking about doing the following to fix the race > during ATM_ITF_ANY operation. rtnl is held across > registration/unregistration. this means that you can get > read-only access to the device list by holding rtnl > or a read_lock on atm_dev_lock similar to the scheme > used by netdevice (or so i think). This looks like it would work. Although, unless VCC connect can potentially sleep, it might be better to keep exporting the rwlock and take it as a reader instead of grabbing the rtnl semaphore. From chas@cmf.nrl.navy.mil Wed Oct 1 06:07:50 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 06:08:26 -0700 (PDT) Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.10.161]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91D7nFx031941 for ; Wed, 1 Oct 2003 06:07:50 -0700 Received: from cmf.nrl.navy.mil (thirdoffive.cmf.nrl.navy.mil [134.207.10.180]) by ginger.cmf.nrl.navy.mil (8.12.7/8.12.7) with ESMTP id h91D7jkT004153; Wed, 1 Oct 2003 09:07:45 -0400 (EDT) Message-Id: <200310011307.h91D7jkT004153@ginger.cmf.nrl.navy.mil> To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: [RFC] add rtnl semaphore to linux-atm In-Reply-To: Message from "David S. Miller" of "Wed, 01 Oct 2003 05:42:26 PDT." <20031001054226.126cea7b.davem@redhat.com> Date: Wed, 01 Oct 2003 09:07:45 -0400 From: chas williams X-Spam-Score: () hits=-0.9 X-Virus-Scanned: NAI Completed X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-archive-position: 432 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chas@cmf.nrl.navy.mil Precedence: bulk X-list: netdev Content-Length: 409 Lines: 8 In message <20031001054226.126cea7b.davem@redhat.com>,"David S. Miller" writes: >Although, unless VCC connect can potentially sleep, it might >be better to keep exporting the rwlock and take it as a reader >instead of grabbing the rtnl semaphore. i had initially written it that way but remembered at one point i was going to use the rtnl semaphore to handle this problem. any opinions on what is 'better'? From davem@pizda.ninka.net Wed Oct 1 06:18:32 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 06:19:05 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91DIVFx032309 for ; Wed, 1 Oct 2003 06:18:31 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id GAA07219; Wed, 1 Oct 2003 06:14:26 -0700 Date: Wed, 1 Oct 2003 06:14:26 -0700 From: "David S. Miller" To: chas williams Cc: netdev@oss.sgi.com Subject: Re: [RFC] add rtnl semaphore to linux-atm Message-Id: <20031001061426.0b67a235.davem@redhat.com> In-Reply-To: <200310011307.h91D7jkT004153@ginger.cmf.nrl.navy.mil> References: <20031001054226.126cea7b.davem@redhat.com> <200310011307.h91D7jkT004153@ginger.cmf.nrl.navy.mil> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 433 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 779 Lines: 18 On Wed, 01 Oct 2003 09:07:45 -0400 chas williams wrote: > i had initially written it that way but remembered at one point i > was going to use the rtnl semaphore to handle this problem. any > opinions on what is 'better'? Blocking all network configuration operations (even ones not for your subsystem) is a little bit anti-social in SMP cases. If you take the rwlock as a reader, you only interfere with a very minute class of network configuration code paths (those that need to take the rwlock in question as a writer). For example, if you use the rwlock-as-reader approach, someone doing IPV4 routing table updates (ie. routing daemon changing a couple thousand routes after a BGP flap) won't be perturbed while the ATM operation is in progress. From vinay.nallamothu@gsecone.com Wed Oct 1 07:07:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 07:08:20 -0700 (PDT) Received: from gateway.gsecone.com ([61.95.227.64]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91E7eFx004290 for ; Wed, 1 Oct 2003 07:07:41 -0700 Received: from vinay.gsecone.com (vinay.gsecone.com [192.168.1.15]) by gateway.gsecone.com (8.12.8/8.12.8) with ESMTP id h91EAJBU010227; Wed, 1 Oct 2003 19:40:19 +0530 Subject: [PATCH 2.6.0-test6][ROSE] timer cleanups (and couple of fixes) From: Vinay K Nallamothu To: netdev@oss.sgi.com Cc: LKML Content-Type: text/plain Organization: Global Security One Message-Id: <1065017300.7194.318.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.4 Date: Wed, 01 Oct 2003 19:38:20 +0530 Content-Transfer-Encoding: 7bit X-archive-position: 434 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay.nallamothu@gsecone.com Precedence: bulk X-list: netdev Content-Length: 9219 Lines: 318 1. Use mod_timer 2. Use del_timer_sync in rose_loopback_clear 3. Use static timer initializer 4. set skb->destructor = NULL wherever skb->sk = NULL before kfree_skb(skb) I am not clear why skb->sk is set to NULL in the exit path in rose_loopback_clear. Isn't it sufficient to purge the entire queue? Let me know if this is the right fix. af_rose.c | 10 +++------ rose_link.c | 21 +++++++++--------- rose_loopback.c | 30 +++++++-------------------- rose_route.c | 7 ++---- rose_timer.c | 62 ++++++++++++++++++-------------------------------------- 5 files changed, 46 insertions(+), 84 deletions(-) diff -urN -X dontdiff linux-2.6.0-test6/net/rose/af_rose.c linux-2.6.0-test6-nvk/net/rose/af_rose.c --- linux-2.6.0-test6/net/rose/af_rose.c 2003-10-01 14:03:23.000000000 +0530 +++ linux-2.6.0-test6-nvk/net/rose/af_rose.c 2003-10-01 18:49:28.000000000 +0530 @@ -64,6 +64,7 @@ ax25_address rose_callsign; +void rose_init_timers(struct sock *sk); /* * Convert a ROSE address into text. */ @@ -353,10 +354,8 @@ if (atomic_read(&sk->sk_wmem_alloc) || atomic_read(&sk->sk_rmem_alloc)) { /* Defer: outstanding buffers */ - init_timer(&sk->sk_timer); sk->sk_timer.expires = jiffies + 10 * HZ; sk->sk_timer.function = rose_destroy_timer; - sk->sk_timer.data = (unsigned long)sk; add_timer(&sk->sk_timer); } else sk_free(sk); @@ -529,8 +528,7 @@ sock->ops = &rose_proto_ops; sk->sk_protocol = protocol; - init_timer(&rose->timer); - init_timer(&rose->idletimer); + rose_init_timers(sk); rose->t1 = sysctl_rose_call_request_timeout; rose->t2 = sysctl_rose_reset_request_timeout; @@ -576,8 +574,7 @@ sk->sk_sleep = osk->sk_sleep; sk->sk_zapped = osk->sk_zapped; - init_timer(&rose->timer); - init_timer(&rose->idletimer); + rose_init_timers(sk); orose = rose_sk(osk); rose->t1 = orose->t1; @@ -883,6 +880,7 @@ /* Now attach up the new socket */ skb->sk = NULL; + skb->destructor = NULL; kfree_skb(skb); sk->sk_ack_backlog--; newsock->sk = newsk; diff -urN -X dontdiff linux-2.6.0-test6/net/rose/rose_link.c linux-2.6.0-test6-nvk/net/rose/rose_link.c --- linux-2.6.0-test6/net/rose/rose_link.c 2003-09-09 11:12:05.000000000 +0530 +++ linux-2.6.0-test6-nvk/net/rose/rose_link.c 2003-10-01 17:41:54.000000000 +0530 @@ -31,26 +31,25 @@ static void rose_ftimer_expiry(unsigned long); static void rose_t0timer_expiry(unsigned long); -void rose_start_ftimer(struct rose_neigh *neigh) +void rose_neigh_init_timers(struct rose_neigh *neigh) { - del_timer(&neigh->ftimer); + init_timer(&neigh->t0timer); + neigh->t0timer.data = (unsigned long)neigh; + neigh->t0timer.function = &rose_t0timer_expiry; + init_timer(&neigh->ftimer); neigh->ftimer.data = (unsigned long)neigh; neigh->ftimer.function = &rose_ftimer_expiry; - neigh->ftimer.expires = jiffies + sysctl_rose_link_fail_timeout; +} - add_timer(&neigh->ftimer); +void rose_start_ftimer(struct rose_neigh *neigh) +{ + mod_timer(&neigh->ftimer, jiffies + sysctl_rose_link_fail_timeout); } void rose_start_t0timer(struct rose_neigh *neigh) { - del_timer(&neigh->t0timer); - - neigh->t0timer.data = (unsigned long)neigh; - neigh->t0timer.function = &rose_t0timer_expiry; - neigh->t0timer.expires = jiffies + sysctl_rose_restart_request_timeout; - - add_timer(&neigh->t0timer); + mod_timer(&neigh->t0timer, jiffies + sysctl_rose_restart_request_timeout); } void rose_stop_ftimer(struct rose_neigh *neigh) diff -urN -X dontdiff linux-2.6.0-test6/net/rose/rose_loopback.c linux-2.6.0-test6-nvk/net/rose/rose_loopback.c --- linux-2.6.0-test6/net/rose/rose_loopback.c 2003-09-09 11:12:05.000000000 +0530 +++ linux-2.6.0-test6-nvk/net/rose/rose_loopback.c 2003-10-01 19:29:42.000000000 +0530 @@ -14,19 +14,17 @@ #include #include -static struct sk_buff_head loopback_queue; -static struct timer_list loopback_timer; +static void rose_loopback_timer(unsigned long); -static void rose_set_loopback_timer(void); +static struct sk_buff_head loopback_queue; +static struct timer_list loopback_timer = TIMER_INITIALIZER(rose_loopback_timer, 0, 0); -void rose_loopback_init(void) +void __init rose_loopback_init(void) { skb_queue_head_init(&loopback_queue); - - init_timer(&loopback_timer); } -static int rose_loopback_running(void) +static inline int rose_loopback_running(void) { return timer_pending(&loopback_timer); } @@ -43,25 +41,12 @@ skb_queue_tail(&loopback_queue, skbn); if (!rose_loopback_running()) - rose_set_loopback_timer(); + mod_timer(&loopback_timer, jiffies + 10); } return 1; } -static void rose_loopback_timer(unsigned long); - -static void rose_set_loopback_timer(void) -{ - del_timer(&loopback_timer); - - loopback_timer.data = 0; - loopback_timer.function = &rose_loopback_timer; - loopback_timer.expires = jiffies + 10; - - add_timer(&loopback_timer); -} - static void rose_loopback_timer(unsigned long param) { struct sk_buff *skb; @@ -102,10 +87,11 @@ { struct sk_buff *skb; - del_timer(&loopback_timer); + del_timer_sync(&loopback_timer); while ((skb = skb_dequeue(&loopback_queue)) != NULL) { skb->sk = NULL; + skb->destructor = NULL; kfree_skb(skb); } } diff -urN -X dontdiff linux-2.6.0-test6/net/rose/rose_route.c linux-2.6.0-test6-nvk/net/rose/rose_route.c --- linux-2.6.0-test6/net/rose/rose_route.c 2003-10-01 14:03:23.000000000 +0530 +++ linux-2.6.0-test6-nvk/net/rose/rose_route.c 2003-10-01 17:41:54.000000000 +0530 @@ -49,6 +49,7 @@ struct rose_neigh *rose_loopback_neigh; static void rose_remove_neigh(struct rose_neigh *); +void rose_neigh_init_timers(struct rose_neigh *); /* * Add a new route to a node, and in the process add the node and the @@ -106,8 +107,7 @@ skb_queue_head_init(&rose_neigh->queue); - init_timer(&rose_neigh->ftimer); - init_timer(&rose_neigh->t0timer); + rose_neigh_init_timers(rose_neigh); if (rose_route->ndigis != 0) { if ((rose_neigh->digipeat = kmalloc(sizeof(ax25_digi), GFP_KERNEL)) == NULL) { @@ -389,8 +389,7 @@ skb_queue_head_init(&rose_loopback_neigh->queue); - init_timer(&rose_loopback_neigh->ftimer); - init_timer(&rose_loopback_neigh->t0timer); + rose_neigh_init_timers(rose_loopback_neigh); spin_lock_bh(&rose_neigh_list_lock); rose_loopback_neigh->next = rose_neigh_list; diff -urN -X dontdiff linux-2.6.0-test6/net/rose/rose_timer.c linux-2.6.0-test6-nvk/net/rose/rose_timer.c --- linux-2.6.0-test6/net/rose/rose_timer.c 2003-09-09 11:12:05.000000000 +0530 +++ linux-2.6.0-test6-nvk/net/rose/rose_timer.c 2003-10-01 17:41:54.000000000 +0530 @@ -33,82 +33,62 @@ static void rose_timer_expiry(unsigned long); static void rose_idletimer_expiry(unsigned long); -void rose_start_heartbeat(struct sock *sk) +void rose_init_timers(struct sock *sk) { - del_timer(&sk->sk_timer); + rose_cb *rose = rose_sk(sk); + + init_timer(&rose->timer); + rose->timer.data = (unsigned long)sk; + rose->timer.function = &rose_timer_expiry; + init_timer(&rose->idletimer); + rose->idletimer.data = (unsigned long)sk; + rose->idletimer.function = &rose_idletimer_expiry; + + /* initialized by sock_init_data */ sk->sk_timer.data = (unsigned long)sk; sk->sk_timer.function = &rose_heartbeat_expiry; - sk->sk_timer.expires = jiffies + 5 * HZ; +} - add_timer(&sk->sk_timer); +void rose_start_heartbeat(struct sock *sk) +{ + mod_timer(&sk->sk_timer, jiffies + 5 * HZ); } void rose_start_t1timer(struct sock *sk) { rose_cb *rose = rose_sk(sk); - del_timer(&rose->timer); - - rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; - rose->timer.expires = jiffies + rose->t1; - - add_timer(&rose->timer); + mod_timer(&rose->timer, jiffies + rose->t1); } void rose_start_t2timer(struct sock *sk) { rose_cb *rose = rose_sk(sk); - del_timer(&rose->timer); - - rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; - rose->timer.expires = jiffies + rose->t2; - - add_timer(&rose->timer); + mod_timer(&rose->timer, jiffies + rose->t2); } void rose_start_t3timer(struct sock *sk) { rose_cb *rose = rose_sk(sk); - del_timer(&rose->timer); - - rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; - rose->timer.expires = jiffies + rose->t3; - - add_timer(&rose->timer); + mod_timer(&rose->timer, jiffies + rose->t3); } void rose_start_hbtimer(struct sock *sk) { rose_cb *rose = rose_sk(sk); - del_timer(&rose->timer); - - rose->timer.data = (unsigned long)sk; - rose->timer.function = &rose_timer_expiry; - rose->timer.expires = jiffies + rose->hb; - - add_timer(&rose->timer); + mod_timer(&rose->timer, jiffies + rose->hb); } void rose_start_idletimer(struct sock *sk) { rose_cb *rose = rose_sk(sk); - del_timer(&rose->idletimer); - - if (rose->idle > 0) { - rose->idletimer.data = (unsigned long)sk; - rose->idletimer.function = &rose_idletimer_expiry; - rose->idletimer.expires = jiffies + rose->idle; - - add_timer(&rose->idletimer); - } + if (rose->idle > 0) + mod_timer(&rose->idletimer, jiffies + rose->idle); } void rose_stop_heartbeat(struct sock *sk) From vinay.nallamothu@gsecone.com Wed Oct 1 07:26:09 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 07:26:43 -0700 (PDT) Received: from gateway.gsecone.com ([61.95.227.64]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91EPoFx005789 for ; Wed, 1 Oct 2003 07:26:02 -0700 Received: from vinay.gsecone.com (vinay.gsecone.com [192.168.1.15]) by gateway.gsecone.com (8.12.8/8.12.8) with ESMTP id h91ESQBU010427; Wed, 1 Oct 2003 19:58:26 +0530 Subject: [PATCH 2.6.0-test6][X25] timer cleanup From: Vinay K Nallamothu To: netdev@oss.sgi.com Cc: LKML Content-Type: text/plain Organization: Global Security One Message-Id: <1065018387.7194.336.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.4 Date: Wed, 01 Oct 2003 19:56:27 +0530 Content-Transfer-Encoding: 7bit X-archive-position: 435 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay.nallamothu@gsecone.com Precedence: bulk X-list: netdev Content-Length: 4995 Lines: 182 Replace del_timer, mod_timer sequences with mod_timer. af_x25.c | 8 ++++---- x25_link.c | 16 ++++++---------- x25_timer.c | 47 +++++++++++++++-------------------------------- 3 files changed, 25 insertions(+), 46 deletions(-) diff -urN linux-2.6.0-test5/net/x25/af_x25.c linux-2.6.0-test5-nvk/net/x25/af_x25.c --- linux-2.6.0-test5/net/x25/af_x25.c 2003-09-09 11:12:07.000000000 +0530 +++ linux-2.6.0-test5-nvk/net/x25/af_x25.c 2003-09-22 17:20:20.000000000 +0530 @@ -345,10 +345,8 @@ if (atomic_read(&sk->sk_wmem_alloc) || atomic_read(&sk->sk_rmem_alloc)) { /* Defer: outstanding buffers */ - init_timer(&sk->sk_timer); sk->sk_timer.expires = jiffies + 10 * HZ; sk->sk_timer.function = x25_destroy_timer; - sk->sk_timer.data = (unsigned long)sk; add_timer(&sk->sk_timer); } else { /* drop last reference so sock_put will free */ @@ -463,6 +461,8 @@ goto out; } +void x25_init_timers(struct sock *sk); + static int x25_create(struct socket *sock, int protocol) { struct sock *sk; @@ -481,7 +481,7 @@ sock_init_data(sock, sk); sk_set_owner(sk, THIS_MODULE); - init_timer(&x25->timer); + x25_init_timers(sk); sock->ops = &x25_proto_ops; sk->sk_protocol = protocol; @@ -537,7 +537,7 @@ x25->facilities = ox25->facilities; x25->qbitincl = ox25->qbitincl; - init_timer(&x25->timer); + x25_init_timers(sk); out: return sk; } diff -urN linux-2.6.0-test5/net/x25/x25_link.c linux-2.6.0-test5-nvk/net/x25/x25_link.c --- linux-2.6.0-test5/net/x25/x25_link.c 2003-09-09 11:12:07.000000000 +0530 +++ linux-2.6.0-test5-nvk/net/x25/x25_link.c 2003-09-22 19:32:14.000000000 +0530 @@ -51,15 +51,9 @@ /* * Linux set/reset timer routines */ -static void x25_start_t20timer(struct x25_neigh *nb) +static inline void x25_start_t20timer(struct x25_neigh *nb) { - del_timer(&nb->t20timer); - - nb->t20timer.data = (unsigned long)nb; - nb->t20timer.function = &x25_t20timer_expiry; - nb->t20timer.expires = jiffies + nb->t20; - - add_timer(&nb->t20timer); + mod_timer(&nb->t20timer, jiffies + nb->t20); } static void x25_t20timer_expiry(unsigned long param) @@ -71,12 +65,12 @@ x25_start_t20timer(nb); } -static void x25_stop_t20timer(struct x25_neigh *nb) +static inline void x25_stop_t20timer(struct x25_neigh *nb) { del_timer(&nb->t20timer); } -static int x25_t20timer_pending(struct x25_neigh *nb) +static inline int x25_t20timer_pending(struct x25_neigh *nb) { return timer_pending(&nb->t20timer); } @@ -291,6 +285,8 @@ skb_queue_head_init(&nb->queue); init_timer(&nb->t20timer); + nb->t20timer.data = (unsigned long)nb; + nb->t20timer.function = &x25_t20timer_expiry; dev_hold(dev); nb->dev = dev; diff -urN linux-2.6.0-test5/net/x25/x25_timer.c linux-2.6.0-test5-nvk/net/x25/x25_timer.c --- linux-2.6.0-test5/net/x25/x25_timer.c 2003-09-09 11:12:07.000000000 +0530 +++ linux-2.6.0-test5-nvk/net/x25/x25_timer.c 2003-09-22 17:23:46.000000000 +0530 @@ -43,15 +43,22 @@ static void x25_heartbeat_expiry(unsigned long); static void x25_timer_expiry(unsigned long); -void x25_start_heartbeat(struct sock *sk) +void x25_init_timers(struct sock *sk) { - del_timer(&sk->sk_timer); + struct x25_opt *x25 = x25_sk(sk); + init_timer(&x25->timer); + x25->timer.data = (unsigned long)sk; + x25->timer.function = &x25_timer_expiry; + + /* initialized by sock_init_data */ sk->sk_timer.data = (unsigned long)sk; sk->sk_timer.function = &x25_heartbeat_expiry; - sk->sk_timer.expires = jiffies + 5 * HZ; +} - add_timer(&sk->sk_timer); +void x25_start_heartbeat(struct sock *sk) +{ + mod_timer(&sk->sk_timer, jiffies + 5 * HZ); } void x25_stop_heartbeat(struct sock *sk) @@ -63,52 +70,28 @@ { struct x25_opt *x25 = x25_sk(sk); - del_timer(&x25->timer); - - x25->timer.data = (unsigned long)sk; - x25->timer.function = &x25_timer_expiry; - x25->timer.expires = jiffies + x25->t2; - - add_timer(&x25->timer); + mod_timer(&x25->timer, jiffies + x25->t2); } void x25_start_t21timer(struct sock *sk) { struct x25_opt *x25 = x25_sk(sk); - del_timer(&x25->timer); - - x25->timer.data = (unsigned long)sk; - x25->timer.function = &x25_timer_expiry; - x25->timer.expires = jiffies + x25->t21; - - add_timer(&x25->timer); + mod_timer(&x25->timer, jiffies + x25->t21); } void x25_start_t22timer(struct sock *sk) { struct x25_opt *x25 = x25_sk(sk); - del_timer(&x25->timer); - - x25->timer.data = (unsigned long)sk; - x25->timer.function = &x25_timer_expiry; - x25->timer.expires = jiffies + x25->t22; - - add_timer(&x25->timer); + mod_timer(&x25->timer, jiffies + x25->t22); } void x25_start_t23timer(struct sock *sk) { struct x25_opt *x25 = x25_sk(sk); - del_timer(&x25->timer); - - x25->timer.data = (unsigned long)sk; - x25->timer.function = &x25_timer_expiry; - x25->timer.expires = jiffies + x25->t23; - - add_timer(&x25->timer); + mod_timer(&x25->timer, jiffies + x25->t23); } void x25_stop_timer(struct sock *sk) From rddunlap@osdl.org Wed Oct 1 07:49:01 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 07:49:37 -0700 (PDT) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91EmxFx007998 for ; Wed, 1 Oct 2003 07:49:00 -0700 Received: from dragon.pdx.osdl.net (dragon.pdx.osdl.net [172.20.1.27]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h91Emh121348; Wed, 1 Oct 2003 07:48:43 -0700 Date: Wed, 1 Oct 2003 07:40:36 -0700 From: "Randy.Dunlap" To: "Feldman, Scott" Cc: davem@redhat.com, jgarzik@pobox.com, akpm@osdl.org, netdev@oss.sgi.com, cramerj@intel.com Subject: Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119 Message-Id: <20031001074036.466ded68.rddunlap@osdl.org> In-Reply-To: References: Organization: OSDL X-Mailer: Sylpheed version 0.9.4 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: +5V?h'hZQPB9kW Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 436 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rddunlap@osdl.org Precedence: bulk X-list: netdev Content-Length: 1286 Lines: 36 On Wed, 1 Oct 2003 01:19:41 -0700 "Feldman, Scott" wrote: | Chris can jump in here anytime. :-) | | Synchronizing on the hardware side is stumping me. We have the list of | skbs you describe, but I'm concerned about unmapping the skb buffers if | hardware is right in the middle of some DMA on one of the buffers. | Some archs really don't like hardware accessing unmapped buffers. | | Here's what I'm thinking: when link down is detected in the timer, just | trick hardware into thinking link is still up (ILOS - Invert Loss of | Signal). No locking, no disabling of interrupts. Hardware will do the | natural thing by completing the outstanding sends and also provide the | interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq). | Something like: | | | if lost link | if outstanding Tx work | set ILOS // h/w thinks link is | up, DMA continues | mdelay(10) | clear ILOS // h/w thinks link is | down | | The mdelay(10) is terrible, but we've already got that in the current | tx_flush routine. | | Chris, what am I missing? I didn't included the ANE business for | clarity. What happens if the link comes back up (live) during the mdelay period? Tiny race? Just a delay until it's corrected? -- ~Randy From ctindel@calma.pair.com Wed Oct 1 11:26:11 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 11:26:51 -0700 (PDT) Received: from calma.pair.com (calma.pair.com [209.68.1.95]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91IQAFx002935 for ; Wed, 1 Oct 2003 11:26:11 -0700 Received: (qmail 25601 invoked by uid 3059); 1 Oct 2003 18:26:10 -0000 Date: Wed, 1 Oct 2003 14:26:10 -0400 From: "Chad N. Tindel" To: Shmulik Hen Cc: "David S. Miller" , "Chad N. Tindel" , fubar@us.ibm.com, jgarzik@pobox.com, bonding-devel@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues Message-ID: <20031001182610.GA25218@calma.pair.com> Mail-Followup-To: Shmulik Hen , "David S. Miller" , "Chad N. Tindel" , fubar@us.ibm.com, jgarzik@pobox.com, bonding-devel@lists.sourceforge.net, netdev@oss.sgi.com References: <200310011149.04612.shmulik.hen@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200310011149.04612.shmulik.hen@intel.com> User-Agent: Mutt/1.4i X-archive-position: 437 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: chad@tindel.net Precedence: bulk X-list: netdev Content-Length: 659 Lines: 15 > So here is what I did in the meantime: > * Created a version for 2.4 that puts back all old compatibility stuff > that was removed either during the propagation set or the cleanup > set. > * Created a version for 2.6 that puts back just the compatibility > stuff that was removed in the propagation set (BOND_SETHWADDR, since > we got a complaint from a RH9 user). > * Removed the mention of the multicast param from the read-me. > * Raised the ABI version to 2 so the new ifenslave keeps propagating > IP settings to slaves for older drivers, and doesn't do that for new > ones that contain Willy Tarreau's panic fix. I like this. Chad From bwindle@fint.org Wed Oct 1 11:41:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 11:41:47 -0700 (PDT) Received: from mta01-srv.alltel.net (mta01.alltel.net [166.102.165.143]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91IfDFx004256 for ; Wed, 1 Oct 2003 11:41:14 -0700 Received: from morpheus ([151.213.164.243]) by mta01-srv.alltel.net with ESMTP id <20031001184113.DJKQ25097.mta01-srv.alltel.net@morpheus> for ; Wed, 1 Oct 2003 13:41:13 -0500 Received: from bwindle (helo=localhost) by morpheus with local-esmtp (Exim 3.36 #1 (Debian)) id 1A4luP-0000eA-00 for ; Wed, 01 Oct 2003 14:41:09 -0400 Date: Wed, 1 Oct 2003 14:41:09 -0400 (EDT) From: Burton Windle X-X-Sender: bwindle@morpheus To: netdev@oss.sgi.com Subject: [RFC] Silencing needless printk in socket.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 438 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bwindle@fint.org Precedence: bulk X-list: netdev Content-Length: 592 Lines: 18 Would anyone object to a patch that wraps the printk in linux/net/socket.c:1897 in a 'if debug'? This first appeared around 2.6.0-test5; the output may be helpful to a developer, but I don't think it is needed in the normal dmesg. dual266:/home/kernel/linux/net# dmesg | grep NET NET: Registered protocol family 16 NET: Registered protocol family 2 NET: Registered protocol family 1 NET: Registered protocol family 17 -- Burton Windle burton@fint.org Linux: the "grim reaper of innocent orphaned children." from /usr/src/linux-2.4.18/init/main.c:461 From fubar@us.ibm.com Wed Oct 1 12:26:17 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 12:26:49 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91JQGFx010335 for ; Wed, 1 Oct 2003 12:26:16 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.10/8.12.2) with ESMTP id h91JPZjZ291204; Wed, 1 Oct 2003 15:25:35 -0400 Received: from death.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id h91JPWuX151268; Wed, 1 Oct 2003 13:25:34 -0600 Received: from us.ibm.com (fubar@localhost) by death.ibm.com (8.12.5/8.12.5/Submit) with ESMTP id h91JPJXZ001948; Wed, 1 Oct 2003 12:25:22 -0700 Message-Id: <200310011925.h91JPJXZ001948@death.ibm.com> X-Authentication-Warning: death.ibm.com: fubar owned process doing -bs To: Shmulik Hen , "David S. Miller" , "Chad N. Tindel" , jgarzik@pobox.com, bonding-devel@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues In-Reply-To: Message from "Chad N. Tindel" of "Wed, 01 Oct 2003 14:26:10 EDT." <20031001182610.GA25218@calma.pair.com> Date: Wed, 01 Oct 2003 12:25:19 -0700 From: Jay Vosburgh X-archive-position: 439 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fubar@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1249 Lines: 29 >> So here is what I did in the meantime: >> * Created a version for 2.4 that puts back all old compatibility stuff >> that was removed either during the propagation set or the cleanup >> set. >> * Created a version for 2.6 that puts back just the compatibility >> stuff that was removed in the propagation set (BOND_SETHWADDR, since >> we got a complaint from a RH9 user). >> * Removed the mention of the multicast param from the read-me. >> * Raised the ABI version to 2 so the new ifenslave keeps propagating >> IP settings to slaves for older drivers, and doesn't do that for new >> ones that contain Willy Tarreau's panic fix. > >I like this. Same here, but I'd like to have a list somewhere of what each of the ABI versions is for and how they're supposed to behave. It's starting to look like we're going to be adding these on a semi-regular basis, so we need to keep track of what each one does and why. I also don't have any major heartburn about leaving the OLD thingies in 2.4. I'd like to remove them, but part of my reason for wanting to nuke them was to keep 2.4 and 2.6 identical, but we're not doing that, so it's not as big a concern. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com From g.liakhovetski@gmx.de Wed Oct 1 12:57:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 12:58:29 -0700 (PDT) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91JvuFx010882 for ; Wed, 1 Oct 2003 12:57:56 -0700 Received: (qmail 13076 invoked by uid 65534); 1 Oct 2003 19:57:48 -0000 Received: from Ba1de.pppool.de (EHLO poirot.grange) (213.7.161.222) by mail.gmx.net (mp007) with SMTP; 01 Oct 2003 21:57:48 +0200 X-Authenticated: #20450766 Received: from lyakh (helo=localhost) by poirot.grange with local-esmtp (Exim 3.35 #1 (Debian)) id 1A4mwp-0001M7-00; Wed, 01 Oct 2003 21:47:43 +0200 Date: Wed, 1 Oct 2003 21:47:43 +0200 (CEST) From: Guennadi Liakhovetski Reply-To: Guennadi Liakhovetski To: David Woodhouse cc: "David S. Miller" , , , , , , Subject: Re: RFC: [2.6 patch] disallow modular IPv6 In-Reply-To: <1064903505.6154.157.camel@imladris.demon.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 440 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: g.liakhovetski@gmx.de Precedence: bulk X-list: netdev Content-Length: 1697 Lines: 39 On Tue, 30 Sep 2003, David Woodhouse wrote: > On Mon, 2003-09-29 at 22:17 -0700, David S. Miller wrote: > > On Mon, 29 Sep 2003 10:02:55 +0100 > > David Woodhouse wrote: > > > > > The underlying point being that your static kernel should not change if > > > you change an option from 'n' to 'm'. It should only affect the kernel > > > image if you change options to/from 'y'. > > > > I totally disagree, what ipv6 is doing is perfectly fine. > > Your right. Well, maybe you are right, but I certainly liked the feature, that I could just add a module to a currently running kernel's configuration, compile and insmod it. But, if this is how it is (going to be) now - that one shouldn't rely on this, do you agree, that such attempts should be stopped by the build system? If so, I think, a script, trying to find possible problems could be of help? It wouldn't be trivial, but maybe there's already framework available, that can be taught to do this? Ideally, you would want to check: for each tristate CONFIG_ find from the respective Makefile(s) which source (*.[Sc])-files are involved with obj-$(CONFIG_x). Find depending source-files from "depends on x" in respective Kconfig recursively. If CONFIG_x appears in any other source-files - it is already a (likely) problem. Now headers. Well, if we want to check infinitely deep inclusions - it would require a fat cluster / SMP, I guess:-) So, is there a piece of software among all automatic checkers, that could be relatively easily taught to do this and would it make sense to run such a check on each -pre and -rc version? Actually, is anybody checking for recursive includes? Guennadi --- Guennadi Liakhovetski From akpm@osdl.org Wed Oct 1 15:57:02 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 15:57:35 -0700 (PDT) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91Muf25018824 for ; Wed, 1 Oct 2003 15:57:02 -0700 Received: from akpm-1.pao.digeo.com (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h91MuW104645; Wed, 1 Oct 2003 15:56:32 -0700 Date: Wed, 1 Oct 2003 15:56:23 -0700 From: Andrew Morton To: Vinay K Nallamothu Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2.6.0-test6][X25] timer cleanup Message-Id: <20031001155623.06b89258.akpm@osdl.org> In-Reply-To: <1065018387.7194.336.camel@lima.royalchallenge.com> References: <1065018387.7194.336.camel@lima.royalchallenge.com> X-Mailer: Sylpheed version 0.9.4 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 441 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev Content-Length: 134 Lines: 5 Vinay K Nallamothu wrote: > > Replace del_timer, mod_timer sequences with mod_timer. was this tested? From mashirle@us.ibm.com Wed Oct 1 16:37:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 16:38:20 -0700 (PDT) Received: from linux2.suntekindustrial.com (wbar1.sjo1-4-4-004-065.sjo1.dsl-verizon.net [4.4.4.65]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h91Nbe25022416 for ; Wed, 1 Oct 2003 16:37:47 -0700 Received: from ibm-mxl (bi01p1.co.us.ibm.com [32.97.110.142]) (authenticated bits=0) by linux2.suntekindustrial.com (8.12.8/8.12.8) with ESMTP id h91NlDo7002302; Wed, 1 Oct 2003 16:47:14 -0700 Content-Type: text/plain; charset="us-ascii" From: Shirley Ma Organization: IBM Linux To: davem@redhat.com, kuznet@ms2.inr.ac.ru Subject: [PATCH] Implementation for IPv6 MIB:ipv6AddressTable Date: Wed, 1 Oct 2003 16:37:26 -0700 User-Agent: KMail/1.4.3 Cc: netdev@oss.sgi.com MIME-Version: 1.0 Message-Id: <200310011637.27013.mashirle@us.ibm.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h91Nbe25022416 X-archive-position: 442 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mashirle@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 10657 Lines: 355 I've sent my explanation of IPv6 MIBs implementation a couple of weeks back. This patch implements the IPv6 MIBs ipv6AddressTable. The implementation is based on the new last call draft of IPv6 MIBs, see link blow. It's going to be a RFC. http://www.ietf.org/internet-drafts/draft-ietf-ipv6-rfc2011-update-04.txt The patch has been tested against Linux-2.6.0-test5-bk12. I have made sure this applies cleanly to Linux 2.6.0-test6-bk3. Below is the patch. Please give me your comments. Thanks Shirley Ma IBM Linux Technology Center ======================= diff -urN linux-2.6.0-test5/include/linux/rtnetlink.h linux-2.6.0-test5-ipv6mib4/include/linux/rtnetlink.h --- linux-2.6.0-test5/include/linux/rtnetlink.h 2003-09-25 17:17:02.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/include/linux/rtnetlink.h 2003-10-01 11:00:03.000000000 -0700 @@ -352,8 +352,10 @@ struct ifa_cacheinfo { - __s32 ifa_prefered; - __s32 ifa_valid; + __u32 ifa_prefered; + __u32 ifa_valid; + unsigned long cstamp; /* created time */ + unsigned long tstamp; /* updated time */ }; diff -urN linux-2.6.0-test5/include/linux/time.h linux-2.6.0-test5-ipv6mib4/include/linux/time.h --- linux-2.6.0-test5/include/linux/time.h 2003-09-08 12:50:08.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/include/linux/time.h 2003-10-01 11:40:42.000000000 -0700 @@ -55,6 +55,7 @@ * at _least_ "jiffies" - so "jiffies+1" had better still * be positive. */ +#define MAX_JIFFIES (~0) #define MAX_JIFFY_OFFSET ((~0UL >> 1)-1) /* Parameters used to convert the timespec values */ diff -urN linux-2.6.0-test5/include/net/if_inet6.h linux-2.6.0-test5-ipv6mib4/include/net/if_inet6.h --- linux-2.6.0-test5/include/net/if_inet6.h 2003-09-25 17:17:02.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/include/net/if_inet6.h 2003-10-01 11:23:27.000000000 -0700 @@ -34,7 +34,8 @@ __u32 valid_lft; __u32 prefered_lft; - unsigned long tstamp; + unsigned long cstamp; /* created timestamp */ + unsigned long tstamp; /* updated timestamp */ atomic_t refcnt; spinlock_t lock; @@ -111,6 +112,8 @@ atomic_t mca_refcnt; spinlock_t mca_lock; unsigned char mca_crcount; + unsigned long mca_cstamp; + unsigned long mca_tstamp; }; /* Anycast stuff */ @@ -130,6 +133,8 @@ int aca_users; atomic_t aca_refcnt; spinlock_t aca_lock; + unsigned long aca_cstamp; + unsigned long aca_tstamp; }; #define IFA_HOST IPV6_ADDR_LOOPBACK diff -urN linux-2.6.0-test5/net/ipv6/addrconf.c linux-2.6.0-test5-ipv6mib4/net/ipv6/addrconf.c --- linux-2.6.0-test5/net/ipv6/addrconf.c 2003-09-25 17:17:03.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/net/ipv6/addrconf.c 2003-10-01 11:28:41.000000000 -0700 @@ -92,6 +92,8 @@ #define ADBG(x) #endif +#define INFINITY_LIFE_TIME 0xFFFFFFFF + #ifdef CONFIG_SYSCTL static void addrconf_sysctl_register(struct inet6_dev *idev, struct ipv6_devconf *p); static void addrconf_sysctl_unregister(struct ipv6_devconf *p); @@ -505,6 +507,7 @@ ifa->scope = scope; ifa->prefix_len = pfxlen; ifa->flags = flags | IFA_F_TENTATIVE; + ifa->cstamp = ifa->tstamp = jiffies; read_lock(&addrconf_lock); if (idev->dead) { @@ -707,6 +710,7 @@ ift->ifpub = ifp; ift->valid_lft = tmp_valid_lft; ift->prefered_lft = tmp_prefered_lft; + ift->cstamp = ifp->cstamp; ift->tstamp = ifp->tstamp; spin_unlock_bh(&ift->lock); addrconf_dad_start(ift, 0); @@ -1412,6 +1416,7 @@ } update_lft = create = 1; + ifp->cstamp = jiffies; addrconf_dad_start(ifp, RTF_ADDRCONF|RTF_PREFIX_RT); } @@ -2447,14 +2452,103 @@ if (!(ifa->flags&IFA_F_PERMANENT)) { ci.ifa_prefered = ifa->prefered_lft; ci.ifa_valid = ifa->valid_lft; - if (ci.ifa_prefered != 0xFFFFFFFF) { + if (ci.ifa_prefered != INFINITY_LIFE_TIME) { long tval = (jiffies - ifa->tstamp)/HZ; ci.ifa_prefered -= tval; - if (ci.ifa_valid != 0xFFFFFFFF) + if (ci.ifa_valid != INFINITY_LIFE_TIME) ci.ifa_valid -= tval; } - RTA_PUT(skb, IFA_CACHEINFO, sizeof(ci), &ci); + } else { + ci.ifa_prefered = INFINITY_LIFE_TIME; + ci.ifa_valid = INFINITY_LIFE_TIME; } + if (ifa->cstamp < INITIAL_JIFFIES) + ci.cstamp = (ifa->cstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.cstamp = (ifa->cstamp - INITIAL_JIFFIES) / HZ; + if (ifa->tstamp < INITIAL_JIFFIES) + ci.tstamp = (ifa->tstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.tstamp = (ifa->tstamp - INITIAL_JIFFIES) / HZ; + RTA_PUT(skb, IFA_CACHEINFO, sizeof(ci), &ci); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: +rtattr_failure: + skb_trim(skb, b - skb->data); + return -1; +} + +static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca, + u32 pid, u32 seq, int event) +{ + struct ifaddrmsg *ifm; + struct nlmsghdr *nlh; + struct ifa_cacheinfo ci; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*ifm)); + if (pid) nlh->nlmsg_flags |= NLM_F_MULTI; + ifm = NLMSG_DATA(nlh); + ifm->ifa_family = AF_INET6; + ifm->ifa_prefixlen = 128; + ifm->ifa_flags = IFA_F_PERMANENT; + ifm->ifa_scope = RT_SCOPE_UNIVERSE; + if (ipv6_addr_scope(&ifmca->mca_addr)&IFA_SITE) + ifm->ifa_scope = RT_SCOPE_SITE; + ifm->ifa_index = ifmca->idev->dev->ifindex; + RTA_PUT(skb, IFA_ADDRESS, 16, &ifmca->mca_addr); + if (ifmca->mca_cstamp < INITIAL_JIFFIES) + ci.cstamp = (ifmca->mca_cstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.cstamp = (ifmca->mca_cstamp - INITIAL_JIFFIES) / HZ; + if (ifmca->mca_tstamp < INITIAL_JIFFIES) + ci.tstamp = (ifmca->mca_tstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.tstamp = (ifmca->mca_tstamp - INITIAL_JIFFIES) / HZ; + ci.ifa_prefered = INFINITY_LIFE_TIME; + ci.ifa_valid = INFINITY_LIFE_TIME; + RTA_PUT(skb, IFA_CACHEINFO, sizeof(ci), &ci); + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +nlmsg_failure: +rtattr_failure: + skb_trim(skb, b - skb->data); + return -1; +} + +static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca, + u32 pid, u32 seq, int event) +{ + struct ifaddrmsg *ifm; + struct nlmsghdr *nlh; + struct ifa_cacheinfo ci; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*ifm)); + if (pid) nlh->nlmsg_flags |= NLM_F_MULTI; + ifm = NLMSG_DATA(nlh); + ifm->ifa_family = AF_INET6; + ifm->ifa_prefixlen = 128; + ifm->ifa_flags = IFA_F_PERMANENT; + ifm->ifa_scope = RT_SCOPE_UNIVERSE; + if (ipv6_addr_scope(&ifaca->aca_addr)&IFA_SITE) + ifm->ifa_scope = RT_SCOPE_SITE; + ifm->ifa_index = ifaca->aca_idev->dev->ifindex; + RTA_PUT(skb, IFA_ADDRESS, 16, &ifaca->aca_addr); + if (ifaca->aca_cstamp < INITIAL_JIFFIES) + ci.cstamp = (ifaca->aca_cstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.cstamp = (ifaca->aca_cstamp - INITIAL_JIFFIES) / HZ; + if (ifaca->aca_tstamp < INITIAL_JIFFIES) + ci.tstamp = (ifaca->aca_tstamp + MAX_JIFFIES - INITIAL_JIFFIES) / HZ; + else + ci.tstamp = (ifaca->aca_tstamp - INITIAL_JIFFIES) / HZ; + ci.ifa_prefered = INFINITY_LIFE_TIME; + ci.ifa_valid = INFINITY_LIFE_TIME; + RTA_PUT(skb, IFA_CACHEINFO, sizeof(ci), &ci); nlh->nlmsg_len = skb->tail - b; return skb->len; @@ -2468,33 +2562,83 @@ { int idx, ip_idx; int s_idx, s_ip_idx; - struct inet6_ifaddr *ifa; - + struct net_device *dev; + struct inet6_dev *idev; + struct inet6_ifaddr *ifa; + struct ifmcaddr6 *ifmca; + struct ifacaddr6 *ifaca; + s_idx = cb->args[0]; s_ip_idx = ip_idx = cb->args[1]; - - for (idx=0; idx < IN6_ADDR_HSIZE; idx++) { + read_lock(&dev_base_lock); + + for (dev = dev_base, idx = 0; dev; dev = dev->next, idx++) { if (idx < s_idx) continue; if (idx > s_idx) s_ip_idx = 0; - read_lock_bh(&addrconf_hash_lock); - for (ifa=inet6_addr_lst[idx], ip_idx = 0; ifa; - ifa = ifa->lst_next, ip_idx++) { + if ((idev = in6_dev_get(dev)) == NULL) + continue; + read_lock_bh(&idev->lock); + /* unicast address */ + for (ifa = idev->addr_list, ip_idx = 0; ifa; + ifa = ifa->if_next, ip_idx++) { + if (ip_idx < s_ip_idx) + continue; + if (inet6_fill_ifaddr(skb, ifa, NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, RTM_NEWADDR) <= 0) { + read_unlock(&addrconf_lock); + in6_dev_put(idev); + goto done; + } + } + /* temp addr */ +#ifdef CONFIG_IPV6_PRIVACY + for (ifa = idev->tempaddr_list; ifa; + ifa = ifua->tmp_next, ip_idx++) { if (ip_idx < s_ip_idx) continue; if (inet6_fill_ifaddr(skb, ifa, NETLINK_CB(cb->skb).pid, - cb->nlh->nlmsg_seq, RTM_NEWADDR) <= 0) { - read_unlock_bh(&addrconf_hash_lock); + cb->nlh->nlmsg_seq, RTM_NEWADDR) <= 0) { + read_unlock(&addrconf_lock); + in6_dev_put(idev); + goto done; + } + } +#endif + /* multicast address */ + for (ifmca = idev->mc_list; ifmca; + ifmca = ifmca->next, ip_idx++) { + if (ip_idx < s_ip_idx) + continue; + if (inet6_fill_ifmcaddr(skb, ifmca, + NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, RTM_NEWADDR) <= 0) { + read_unlock(&addrconf_lock); + in6_dev_put(idev); + goto done; + } + } + /* anycast address */ + for (ifaca = idev->ac_list; ifaca; + ifaca = ifaca->aca_next, ip_idx++) { + if (ip_idx < s_ip_idx) + continue; + if (inet6_fill_ifacaddr(skb, ifaca, + NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, RTM_NEWADDR) <=0) { + read_unlock(&addrconf_lock); + in6_dev_put(idev); goto done; } } - read_unlock_bh(&addrconf_hash_lock); + read_unlock(&addrconf_lock); + in6_dev_put(idev); } done: + read_unlock(&dev_base_lock); cb->args[0] = idx; cb->args[1] = ip_idx; - return skb->len; } diff -urN linux-2.6.0-test5/net/ipv6/anycast.c linux-2.6.0-test5-ipv6mib4/net/ipv6/anycast.c --- linux-2.6.0-test5/net/ipv6/anycast.c 2003-09-25 17:17:03.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/net/ipv6/anycast.c 2003-10-01 11:23:04.000000000 -0700 @@ -343,6 +343,8 @@ ipv6_addr_copy(&aca->aca_addr, addr); aca->aca_idev = idev; aca->aca_users = 1; + /* aca_tstamp should be updated later, once it's updated */ + aca->aca_cstamp = aca->aca_tstamp = jiffies; atomic_set(&aca->aca_refcnt, 2); aca->aca_lock = SPIN_LOCK_UNLOCKED; diff -urN linux-2.6.0-test5/net/ipv6/mcast.c linux-2.6.0-test5-ipv6mib4/net/ipv6/mcast.c --- linux-2.6.0-test5/net/ipv6/mcast.c 2003-09-25 17:17:03.000000000 -0700 +++ linux-2.6.0-test5-ipv6mib4/net/ipv6/mcast.c 2003-09-30 18:40:29.000000000 -0700 @@ -830,6 +830,8 @@ ipv6_addr_copy(&mc->mca_addr, addr); mc->idev = idev; mc->mca_users = 1; + /* mca_stamp should be updated later, once it's updated */ + mc->mca_cstamp = mc->mca_tstamp = jiffies; atomic_set(&mc->mca_refcnt, 2); mc->mca_lock = SPIN_LOCK_UNLOCKED; From shmulik.hen@intel.com Wed Oct 1 23:38:04 2003 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Oct 2003 23:38:43 -0700 (PDT) Received: from caduceus.jf.intel.com (fmr06.intel.com [134.134.136.7]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h926c425007239 for ; Wed, 1 Oct 2003 23:38:04 -0700 Received: from petasus.jf.intel.com (petasus.jf.intel.com [10.7.209.6]) by caduceus.jf.intel.com (8.12.9-20030918-01/8.12.9/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h926c0qU019463 for ; Thu, 2 Oct 2003 06:38:00 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by petasus.jf.intel.com (8.11.6-20030918-01/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h926WSg06951 for ; Thu, 2 Oct 2003 06:32:28 GMT Received: from jrslxjul4.npdj.intel.com ([10.12.254.188]) by orsmsxvs040.jf.intel.com (NAVGW 2.5.2.11) with SMTP id M2003100123375217636 ; Wed, 01 Oct 2003 23:37:54 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Shmulik Hen Reply-To: shmulik.hen@intel.com Organization: Intel corp. To: "Jay Vosburgh" , "David S. Miller" , "Chad N. Tindel" , , , Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues Date: Thu, 2 Oct 2003 09:37:51 +0300 User-Agent: KMail/1.4.3 References: In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200310020937.51781.shmulik.hen@intel.com> X-Scanned-By: MIMEDefang 2.31 (www . roaringpenguin . com / mimedefang) X-archive-position: 443 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Content-Length: 1119 Lines: 26 On Wednesday 01 October 2003 10:25 pm, Jay Vosburgh wrote: > Same here, but I'd like to have a list somewhere of what each > of the ABI versions is for and how they're supposed to behave. > It's starting to look like we're going to be adding these on a > semi-regular basis, so we need to keep track of what each one does > and why. > Where should such a list go ? Currently, 0 or none is for doing everything the old way. 1 is for not setting slaves HW addr via ifenslave and leaving them in down state so the driver gets them with their unique address, sets them according to the mode and brings them up. The driver also restores the original address upon release. This is all done for supporting the 802.3ad, TLB, ALB modes. 2 will be for ifenslave lite that doesn't propagate the bond's IP settings to the slaves. I'm guessing that 3 will be used to designate the new support for hot operations that Amir is working on. -- | Shmulik Hen Advanced Network Services | | Israel Design Center, Jerusalem | | LAN Access Division, Platform Networking | | Intel Communications Group, Intel corp. | From vinay.nallamothu@gsecone.com Thu Oct 2 00:02:55 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 00:03:29 -0700 (PDT) Received: from gateway.gsecone.com ([61.95.227.64]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h9272r25009029 for ; Thu, 2 Oct 2003 00:02:54 -0700 Received: from vinay.gsecone.com (vinay.gsecone.com [192.168.1.15]) by gateway.gsecone.com (8.12.8/8.12.8) with ESMTP id h9275MBU015070; Thu, 2 Oct 2003 12:35:24 +0530 Subject: Re: [PATCH 2.6.0-test6][X25] timer cleanup From: Vinay K Nallamothu To: Andrew Morton Cc: netdev@oss.sgi.com, LKML In-Reply-To: <20031001155623.06b89258.akpm@osdl.org> References: <1065018387.7194.336.camel@lima.royalchallenge.com> <20031001155623.06b89258.akpm@osdl.org> Content-Type: text/plain Organization: Global Security One Message-Id: <1065078208.4340.3.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.4 Date: Thu, 02 Oct 2003 12:33:28 +0530 Content-Transfer-Encoding: 7bit X-archive-position: 444 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay.nallamothu@gsecone.com Precedence: bulk X-list: netdev Content-Length: 218 Lines: 8 On Thu, 2003-10-02 at 04:26, Andrew Morton wrote: > Vinay K Nallamothu wrote: > > > > Replace del_timer, mod_timer sequences with mod_timer. > > was this tested? No. But compiles fine. From shmulik.hen@intel.com Thu Oct 2 00:57:59 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 00:58:32 -0700 (PDT) Received: from hermes.fm.intel.com (fmr01.intel.com [192.55.52.18]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h927vw25015703 for ; Thu, 2 Oct 2003 00:57:59 -0700 Received: from talaria.fm.intel.com (talaria.fm.intel.com [10.1.192.39]) by hermes.fm.intel.com (8.12.9-20030918-01/8.12.9/d: outer.mc,v 1.66 2003/05/22 21:17:36 rfjohns1 Exp $) with ESMTP id h927rQfw024445 for ; Thu, 2 Oct 2003 07:53:26 GMT Received: from fmsmsxvs042.fm.intel.com (fmsmsxvs042.fm.intel.com [132.233.42.128]) by talaria.fm.intel.com (8.11.6-20030918-01/8.11.6/d: inner.mc,v 1.35 2003/05/22 21:18:01 rfjohns1 Exp $) with SMTP id h927vM125645 for ; Thu, 2 Oct 2003 07:57:22 GMT Received: from jrslxjul4.npdj.intel.com ([10.12.254.188]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.2.11) with SMTP id M2003100200574817781 ; Thu, 02 Oct 2003 00:57:49 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Shmulik Hen Reply-To: shmulik.hen@intel.com Organization: Intel corp. To: Subject: Re: [Bonding-devel] Re: [bonding] compatibilty issues Date: Thu, 2 Oct 2003 10:57:46 +0300 User-Agent: KMail/1.4.3 References: In-Reply-To: Cc: , MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200310021057.46995.shmulik.hen@intel.com> X-Scanned-By: MIMEDefang 2.31 (www . roaringpenguin . com / mimedefang) X-archive-position: 445 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shmulik.hen@intel.com Precedence: bulk X-list: netdev Content-Length: 1348 Lines: 34 I wrote: > * Created a version for 2.4 that puts back all old compatibility > stuff that was removed either during the propagation set or the > cleanup set. > * Created a version for 2.6 that puts back just the compatibility > stuff that was removed in the propagation set (BOND_SETHWADDR, > since we got a complaint from a RH9 user). Jeff, I'm going to need a ruling from you: We understood from David that support of old ioctl definitions (i.e. those mapped to SIOCDEVPRIVATE) needs to be removed in the 2.6 kernel. This will break compatibility with old versions of ifenslave (at least 2 years old, but still included in recent distributions like Red Hat 9). If removing those private ioctls is a necessity for 2.6, then breaking compatibility with the old ifenslave versions is inevitable, so we might as well remove all compatibility stuff from the 2.6 bonding module (not just the private ioctls). Of course, we'll keep ifenslave fully compatible with all versions of bonding, so the user only needs to upgrade the tool once. Given the above, how do you feel about removing old backward compatibility stuff from bonding in 2.6 ? -- | Shmulik Hen Advanced Network Services | | Israel Design Center, Jerusalem | | LAN Access Division, Platform Networking | | Intel Communications Group, Intel corp. | From davem@pizda.ninka.net Thu Oct 2 01:40:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 01:41:09 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h928eZ25018795 for ; Thu, 2 Oct 2003 01:40:35 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA10215; Thu, 2 Oct 2003 01:36:20 -0700 Date: Thu, 2 Oct 2003 01:36:20 -0700 From: "David S. Miller" To: Vinay K Nallamothu Cc: akpm@osdl.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2.6.0-test6][X25] timer cleanup Message-Id: <20031002013620.6d8b6f10.davem@redhat.com> In-Reply-To: <1065078208.4340.3.camel@lima.royalchallenge.com> References: <1065018387.7194.336.camel@lima.royalchallenge.com> <20031001155623.06b89258.akpm@osdl.org> <1065078208.4340.3.camel@lima.royalchallenge.com> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 446 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 446 Lines: 15 On Thu, 02 Oct 2003 12:33:28 +0530 Vinay K Nallamothu wrote: > On Thu, 2003-10-02 at 04:26, Andrew Morton wrote: > > Vinay K Nallamothu wrote: > > > > > > Replace del_timer, mod_timer sequences with mod_timer. > > > > was this tested? > No. But compiles fine. Please find a way to at least minimally test the protocols you are changing then, or find someone else who can. Thanks. From davem@pizda.ninka.net Thu Oct 2 01:41:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 01:41:49 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h928fF25018863 for ; Thu, 2 Oct 2003 01:41:16 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA10232; Thu, 2 Oct 2003 01:37:04 -0700 Date: Thu, 2 Oct 2003 01:37:03 -0700 From: "David S. Miller" To: Vinay K Nallamothu Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2.6.0-test6][ROSE] timer cleanups (and couple of fixes) Message-Id: <20031002013703.5072c707.davem@redhat.com> In-Reply-To: <1065017300.7194.318.camel@lima.royalchallenge.com> References: <1065017300.7194.318.camel@lima.royalchallenge.com> X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 447 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 204 Lines: 5 I'm going to assume this one is not even minimally tested either just like your X25 timer changes, and likewise I want you to find a method to get the changes tested before I add the changes to my tree. From davem@pizda.ninka.net Thu Oct 2 01:52:48 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 01:53:22 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h928ql25027606 for ; Thu, 2 Oct 2003 01:52:48 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA10281; Thu, 2 Oct 2003 01:48:38 -0700 Date: Thu, 2 Oct 2003 01:48:38 -0700 From: "David S. Miller" To: Burton Windle Cc: netdev@oss.sgi.com Subject: Re: [RFC] Silencing needless printk in socket.c Message-Id: <20031002014838.0c790cda.davem@redhat.com> In-Reply-To: References: X-Mailer: Sylpheed version 0.9.2 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 448 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 849 Lines: 21 On Wed, 1 Oct 2003 14:41:09 -0400 (EDT) Burton Windle wrote: > Would anyone object to a patch that wraps the printk in > linux/net/socket.c:1897 in a 'if debug'? > > This first appeared around 2.6.0-test5; the output may be helpful to a > developer, but I don't think it is needed in the normal dmesg. Yes it is needed, and since in the same changes that _ADDED_ this new printk I _REMOVED_ all the numerous per-protocol printks that were printed out. Less stuff is printed out now than before, and now this is the only indication that a particular protocol family got registered or unregistered successfully. We're not removing this. This is especially the case beause people like you didn't complain at all when we used to get 4 or 5 lines of printk messages for each of these protocols when they started up or shut down. From Robert.Olsson@data.slu.se Thu Oct 2 09:33:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 09:34:34 -0700 (PDT) Received: from mail1.slu.se (mail1.slu.se [130.238.96.11]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h92GXs25025049 for ; Thu, 2 Oct 2003 09:33:55 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mail1.slu.se (8.9.3+/8.9.3) with ESMTP id RAA23142; Thu, 2 Oct 2003 17:31:29 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id DDFBAEC22F; Thu, 2 Oct 2003 17:31:30 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16252.17618.866515.952549@robur.slu.se> Date: Thu, 2 Oct 2003 17:31:30 +0200 To: Jeff Garzik Cc: Robert Olsson , Andrew Morton , netdev@oss.sgi.com, dfages@arkoon.net Subject: Re: Fw: [BUG/PATCH] CONFIG_NET_HW_FLOWCONTROL and SMP In-Reply-To: <3F78A691.1040406@pobox.com> References: <20030929123734.5bd97a47.akpm@osdl.org> <16248.41796.797321.700866@robur.slu.se> <3F78A691.1040406@pobox.com> X-Mailer: VM 7.17 under Emacs 21.3.1 X-archive-position: 449 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Content-Length: 27030 Lines: 773 Jeff Garzik writes: > > If someone had a NAPI patch for tulip, we could remove HW_FLOWCONTROL > option altogether :) Hello! Here is something for 2.6.0-test6: * ifdef's to keep current non-NAPI tulip intact * Port based on Alexey's orig NAPI tulip design (Only RX handled by dev->poll) * tulip HW_FLOW removed * NAPI and HW-mitigation options in Kconfig Cheers --ro --- drivers/net/tulip.orig/Kconfig 2003-09-28 02:50:39.000000000 +0200 +++ drivers/net/tulip/Kconfig 2003-09-30 14:34:39.000000000 +0200 @@ -68,6 +68,26 @@ obscure bugs if your mainboard has memory controller timing issues. If in doubt, say N. +config TULIP_NAPI + bool "Use NAPI RX polling " + depends on TULIP + ---help--- + This is of useful for servers and routers dealing with high network loads. + + See . + + If in doubt, say N. + +config TULIP_NAPI_HW_MITIGATION + bool "Use Interrupt Mitigation " + depends on TULIP_NAPI + ---help--- + Use HW to reduce RX interrupts. Not strict necessary since NAPI reduces + RX interrupts but itself. Although this reduces RX interrupts even at + low levels traffic at the cost of a small latency. + + If in doubt, say Y. + config DE4X5 tristate "Generic DECchip & DIGITAL EtherWORKS PCI/EISA" depends on NET_TULIP && (PCI || EISA) --- drivers/net/tulip.orig/tulip.h 2003-09-28 02:51:02.000000000 +0200 +++ drivers/net/tulip/tulip.h 2003-09-30 14:22:08.000000000 +0200 @@ -126,6 +126,7 @@ CFDD_Snooze = (1 << 30), }; +#define RxPollInt (RxIntr|RxNoBuf|RxDied|RxJabber) /* The bits in the CSR5 status registers, mostly interrupt sources. */ enum status_bits { @@ -251,9 +252,9 @@ Making the Tx ring too large decreases the effectiveness of channel bonding and packet priority. There are no ill effects from too-large receive rings. */ -#define TX_RING_SIZE 16 -#define RX_RING_SIZE 32 +#define TX_RING_SIZE 32 +#define RX_RING_SIZE 128 #define MEDIA_MASK 31 #define PKT_BUF_SZ 1536 /* Size of each temporary Rx buffer. */ @@ -343,17 +344,15 @@ int flags; struct net_device_stats stats; struct timer_list timer; /* Media selection timer. */ + struct timer_list oom_timer; /* Out of memory timer. */ u32 mc_filter[2]; spinlock_t lock; spinlock_t mii_lock; unsigned int cur_rx, cur_tx; /* The next free ring entry */ unsigned int dirty_rx, dirty_tx; /* The ring entries to be free()ed. */ -#ifdef CONFIG_NET_HW_FLOWCONTROL -#define RX_A_NBF_STOP 0xffffff3f /* To disable RX and RX-NOBUF ints. */ - int fc_bit; - int mit_sel; - int mit_change; /* Signal for Interrupt Mitigtion */ +#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION + int mit_on; #endif unsigned int full_duplex:1; /* Full-duplex operation requested. */ unsigned int full_duplex_lock:1; @@ -415,6 +414,10 @@ extern int tulip_rx_copybreak; irqreturn_t tulip_interrupt(int irq, void *dev_instance, struct pt_regs *regs); int tulip_refill_rx(struct net_device *dev); +#ifdef CONFIG_TULIP_NAPI +int tulip_poll(struct net_device *dev, int *budget); +#endif + /* media.c */ int tulip_mdio_read(struct net_device *dev, int phy_id, int location); @@ -438,6 +441,7 @@ extern const char * const medianame[]; extern const char tulip_media_cap[]; extern struct tulip_chip_table tulip_tbl[]; +void oom_timer(unsigned long data); extern u8 t21040_csr13[]; #ifndef USE_IO_OPS --- drivers/net/tulip.orig/tulip_core.c 2003-09-28 02:50:29.000000000 +0200 +++ drivers/net/tulip/tulip_core.c 2003-09-30 14:29:11.000000000 +0200 @@ -14,11 +14,17 @@ */ +#include + #define DRV_NAME "tulip" +#ifdef CONFIG_TULIP_NAPI +#define DRV_VERSION "1.1.13-NAPI" /* Keep at least for test */ +#else #define DRV_VERSION "1.1.13" +#endif #define DRV_RELDATE "May 11, 2002" -#include + #include #include "tulip.h" #include @@ -465,29 +471,16 @@ to an alternate media type. */ tp->timer.expires = RUN_AT(next_tick); add_timer(&tp->timer); -} - -#ifdef CONFIG_NET_HW_FLOWCONTROL -/* Enable receiver */ -void tulip_xon(struct net_device *dev) -{ - struct tulip_private *tp = (struct tulip_private *)dev->priv; - - clear_bit(tp->fc_bit, &netdev_fc_xoff); - if (netif_running(dev)){ - - tulip_refill_rx(dev); - outl(tulip_tbl[tp->chip_id].valid_intrs, dev->base_addr+CSR7); - } -} +#ifdef CONFIG_TULIP_NAPI + init_timer(&tp->oom_timer); + tp->oom_timer.data = (unsigned long)dev; + tp->oom_timer.function = oom_timer; #endif +} static int tulip_open(struct net_device *dev) { -#ifdef CONFIG_NET_HW_FLOWCONTROL - struct tulip_private *tp = (struct tulip_private *)dev->priv; -#endif int retval; if ((retval = request_irq(dev->irq, &tulip_interrupt, SA_SHIRQ, dev->name, dev))) @@ -497,10 +490,6 @@ tulip_up (dev); -#ifdef CONFIG_NET_HW_FLOWCONTROL - tp->fc_bit = netdev_register_fc(dev, tulip_xon); -#endif - netif_start_queue (dev); return 0; @@ -581,10 +570,7 @@ #endif /* Stop and restart the chip's Tx processes . */ -#ifdef CONFIG_NET_HW_FLOWCONTROL - if (tp->fc_bit && test_bit(tp->fc_bit,&netdev_fc_xoff)) - printk("BUG tx_timeout restarting rx when fc on\n"); -#endif + tulip_restart_rxtx(tp); /* Trigger an immediate transmit demand. */ outl(0, ioaddr + CSR1); @@ -741,7 +727,9 @@ unsigned long flags; del_timer_sync (&tp->timer); - +#ifdef CONFIG_TULIP_NAPI + del_timer_sync (&tp->oom_timer); +#endif spin_lock_irqsave (&tp->lock, flags); /* Disable interrupts by clearing the interrupt mask. */ @@ -780,13 +768,6 @@ netif_stop_queue (dev); -#ifdef CONFIG_NET_HW_FLOWCONTROL - if (tp->fc_bit) { - int bit = tp->fc_bit; - tp->fc_bit = 0; - netdev_unregister_fc(bit); - } -#endif tulip_down (dev); if (tulip_debug > 1) @@ -1627,6 +1608,10 @@ dev->hard_start_xmit = tulip_start_xmit; dev->tx_timeout = tulip_tx_timeout; dev->watchdog_timeo = TX_TIMEOUT; +#ifdef CONFIG_TULIP_NAPI + dev->poll = tulip_poll; + dev->weight = 16; +#endif dev->stop = tulip_close; dev->get_stats = tulip_get_stats; dev->do_ioctl = private_ioctl; --- drivers/net/tulip.orig/interrupt.c 2003-09-28 02:50:14.000000000 +0200 +++ drivers/net/tulip/interrupt.c 2003-09-30 17:47:12.000000000 +0200 @@ -19,13 +19,13 @@ #include #include - int tulip_rx_copybreak; unsigned int tulip_max_interrupt_work; -#ifdef CONFIG_NET_HW_FLOWCONTROL - +#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION #define MIT_SIZE 15 +#define MIT_TABLE 15 /* We use 0 or max */ + unsigned int mit_table[MIT_SIZE+1] = { /* CRS11 21143 hardware Mitigation Control Interrupt @@ -99,16 +99,25 @@ return refilled; } +#ifdef CONFIG_TULIP_NAPI -static int tulip_rx(struct net_device *dev) +void oom_timer(unsigned long data) +{ + struct net_device *dev = (struct net_device *)data; + netif_rx_schedule(dev); +} + +int tulip_poll(struct net_device *dev, int *budget) { struct tulip_private *tp = (struct tulip_private *)dev->priv; int entry = tp->cur_rx % RX_RING_SIZE; - int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx; + int rx_work_limit = *budget; int received = 0; -#ifdef CONFIG_NET_HW_FLOWCONTROL - int drop = 0, mit_sel = 0; + if (rx_work_limit > dev->quota) + rx_work_limit = dev->quota; + +#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION /* that one buffer is needed for mit activation; or might be a bug in the ring buffer code; check later -- JHS*/ @@ -119,6 +128,237 @@ if (tulip_debug > 4) printk(KERN_DEBUG " In tulip_rx(), entry %d %8.8x.\n", entry, tp->rx_ring[entry].status); + + do { + /* Acknowledge current RX interrupt sources. */ + outl((RxIntr | RxNoBuf), dev->base_addr + CSR5); + + + /* If we own the next entry, it is a new packet. Send it up. */ + while ( ! (tp->rx_ring[entry].status & cpu_to_le32(DescOwned))) { + s32 status = le32_to_cpu(tp->rx_ring[entry].status); + + + if (tp->dirty_rx + RX_RING_SIZE == tp->cur_rx) + break; + + if (tulip_debug > 5) + printk(KERN_DEBUG "%s: In tulip_rx(), entry %d %8.8x.\n", + dev->name, entry, status); + if (--rx_work_limit < 0) + goto not_done; + + if ((status & 0x38008300) != 0x0300) { + if ((status & 0x38000300) != 0x0300) { + /* Ingore earlier buffers. */ + if ((status & 0xffff) != 0x7fff) { + if (tulip_debug > 1) + printk(KERN_WARNING "%s: Oversized Ethernet frame " + "spanned multiple buffers, status %8.8x!\n", + dev->name, status); + tp->stats.rx_length_errors++; + } + } else if (status & RxDescFatalErr) { + /* There was a fatal error. */ + if (tulip_debug > 2) + printk(KERN_DEBUG "%s: Receive error, Rx status %8.8x.\n", + dev->name, status); + tp->stats.rx_errors++; /* end of a packet.*/ + if (status & 0x0890) tp->stats.rx_length_errors++; + if (status & 0x0004) tp->stats.rx_frame_errors++; + if (status & 0x0002) tp->stats.rx_crc_errors++; + if (status & 0x0001) tp->stats.rx_fifo_errors++; + } + } else { + /* Omit the four octet CRC from the length. */ + short pkt_len = ((status >> 16) & 0x7ff) - 4; + struct sk_buff *skb; + +#ifndef final_version + if (pkt_len > 1518) { + printk(KERN_WARNING "%s: Bogus packet size of %d (%#x).\n", + dev->name, pkt_len, pkt_len); + pkt_len = 1518; + tp->stats.rx_length_errors++; + } +#endif + /* Check if the packet is long enough to accept without copying + to a minimally-sized skbuff. */ + if (pkt_len < tulip_rx_copybreak + && (skb = dev_alloc_skb(pkt_len + 2)) != NULL) { + skb->dev = dev; + skb_reserve(skb, 2); /* 16 byte align the IP header */ + pci_dma_sync_single(tp->pdev, + tp->rx_buffers[entry].mapping, + pkt_len, PCI_DMA_FROMDEVICE); +#if ! defined(__alpha__) + eth_copy_and_sum(skb, tp->rx_buffers[entry].skb->tail, + pkt_len, 0); + skb_put(skb, pkt_len); +#else + memcpy(skb_put(skb, pkt_len), + tp->rx_buffers[entry].skb->tail, + pkt_len); +#endif + } else { /* Pass up the skb already on the Rx ring. */ + char *temp = skb_put(skb = tp->rx_buffers[entry].skb, + pkt_len); + +#ifndef final_version + if (tp->rx_buffers[entry].mapping != + le32_to_cpu(tp->rx_ring[entry].buffer1)) { + printk(KERN_ERR "%s: Internal fault: The skbuff addresses " + "do not match in tulip_rx: %08x vs. %08x %p / %p.\n", + dev->name, + le32_to_cpu(tp->rx_ring[entry].buffer1), + tp->rx_buffers[entry].mapping, + skb->head, temp); + } +#endif + + pci_unmap_single(tp->pdev, tp->rx_buffers[entry].mapping, + PKT_BUF_SZ, PCI_DMA_FROMDEVICE); + + tp->rx_buffers[entry].skb = NULL; + tp->rx_buffers[entry].mapping = 0; + } + skb->protocol = eth_type_trans(skb, dev); + + netif_receive_skb(skb); + + dev->last_rx = jiffies; + tp->stats.rx_packets++; + tp->stats.rx_bytes += pkt_len; + } + received++; + + entry = (++tp->cur_rx) % RX_RING_SIZE; + if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/4) + tulip_refill_rx(dev); + + } + + /* New ack strategy... irq does not ack Rx any longer + hopefully this helps */ + + /* Really bad things can happen here... If new packet arrives + * and an irq arrives (tx or just due to occasionally unset + * mask), it will be acked by irq handler, but new thread + * is not scheduled. It is major hole in design. + * No idea how to fix this if "playing with fire" will fail + * tomorrow (night 011029). If it will not fail, we won + * finally: amount of IO did not increase at all. */ + } while ((inl(dev->base_addr + CSR5) & RxIntr)); + + /* done: */ + + #ifdef CONFIG_TULIP_NAPI_HW_MITIGATION + + /* We use this simplistic scheme for IM. It's proven by + real life installations. We can have IM enabled + continuesly but this would cause unnecessary latency. + Unfortunely we can't use all the NET_RX_* feedback here. + This would turn on IM for devices that is not contributing + to backlog congestion with unnecessary latency. + + We monitor the the device RX-ring and have: + + HW Interrupt Mitigation either ON or OFF. + + ON: More then 1 pkt received (per intr.) OR we are dropping + OFF: Only 1 pkt received + + Note. We only use min and max (0, 15) settings from mit_table */ + + + if( tp->flags & HAS_INTR_MITIGATION) { + if( received > 1 ) { + if( ! tp->mit_on ) { + tp->mit_on = 1; + outl(mit_table[MIT_TABLE], dev->base_addr + CSR11); + } + } + else { + if( tp->mit_on ) { + tp->mit_on = 0; + outl(0, dev->base_addr + CSR11); + } + } + } + +#endif /* CONFIG_TULIP_NAPI_HW_MITIGATION */ + + dev->quota -= received; + *budget -= received; + + tulip_refill_rx(dev); + + /* If RX ring is not full we are out of memory. */ + if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) goto oom; + + /* Remove us from polling list and enable RX intr. */ + + netif_rx_complete(dev); + outl(tulip_tbl[tp->chip_id].valid_intrs, dev->base_addr+CSR7); + + /* The last op happens after poll completion. Which means the following: + * 1. it can race with disabling irqs in irq handler + * 2. it can race with dise/enabling irqs in other poll threads + * 3. if an irq raised after beginning loop, it will be immediately + * triggered here. + * + * Summarizing: the logic results in some redundant irqs both + * due to races in masking and due to too late acking of already + * processed irqs. But it must not result in losing events. + */ + + return 0; + + not_done: + if (!received) { + + received = dev->quota; /* Not to happen */ + } + dev->quota -= received; + *budget -= received; + + if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/2 || + tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) + tulip_refill_rx(dev); + + if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) goto oom; + + return 1; + + + oom: /* Executed with RX ints disabled */ + + + /* Start timer, stop polling, but do not enable rx interrupts. */ + mod_timer(&tp->oom_timer, jiffies+1); + + /* Think: timer_pending() was an explicit signature of bug. + * Timer can be pending now but fired and completed + * before we did netif_rx_complete(). See? We would lose it. */ + + /* remove ourselves from the polling list */ + netif_rx_complete(dev); + + return 0; +} + +#else /* CONFIG_TULIP_NAPI */ + +static int tulip_rx(struct net_device *dev) +{ + struct tulip_private *tp = (struct tulip_private *)dev->priv; + int entry = tp->cur_rx % RX_RING_SIZE; + int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx; + int received = 0; + + if (tulip_debug > 4) + printk(KERN_DEBUG " In tulip_rx(), entry %d %8.8x.\n", entry, + tp->rx_ring[entry].status); /* If we own the next entry, it is a new packet. Send it up. */ while ( ! (tp->rx_ring[entry].status & cpu_to_le32(DescOwned))) { s32 status = le32_to_cpu(tp->rx_ring[entry].status); @@ -163,11 +403,6 @@ } #endif -#ifdef CONFIG_NET_HW_FLOWCONTROL - drop = atomic_read(&netdev_dropping); - if (drop) - goto throttle; -#endif /* Check if the packet is long enough to accept without copying to a minimally-sized skbuff. */ if (pkt_len < tulip_rx_copybreak @@ -209,44 +444,9 @@ tp->rx_buffers[entry].mapping = 0; } skb->protocol = eth_type_trans(skb, dev); -#ifdef CONFIG_NET_HW_FLOWCONTROL - mit_sel = -#endif - netif_rx(skb); -#ifdef CONFIG_NET_HW_FLOWCONTROL - switch (mit_sel) { - case NET_RX_SUCCESS: - case NET_RX_CN_LOW: - case NET_RX_CN_MOD: - break; - - case NET_RX_CN_HIGH: - rx_work_limit -= NET_RX_CN_HIGH; /* additional*/ - break; - case NET_RX_DROP: - rx_work_limit = -1; - break; - default: - printk("unknown feedback return code %d\n", mit_sel); - break; - } + netif_rx(skb); - drop = atomic_read(&netdev_dropping); - if (drop) { -throttle: - rx_work_limit = -1; - mit_sel = NET_RX_DROP; - - if (tp->fc_bit) { - long ioaddr = dev->base_addr; - - /* disable Rx & RxNoBuf ints. */ - outl(tulip_tbl[tp->chip_id].valid_intrs&RX_A_NBF_STOP, ioaddr + CSR7); - set_bit(tp->fc_bit, &netdev_fc_xoff); - } - } -#endif dev->last_rx = jiffies; tp->stats.rx_packets++; tp->stats.rx_bytes += pkt_len; @@ -254,42 +454,9 @@ received++; entry = (++tp->cur_rx) % RX_RING_SIZE; } -#ifdef CONFIG_NET_HW_FLOWCONTROL - - /* We use this simplistic scheme for IM. It's proven by - real life installations. We can have IM enabled - continuesly but this would cause unnecessary latency. - Unfortunely we can't use all the NET_RX_* feedback here. - This would turn on IM for devices that is not contributing - to backlog congestion with unnecessary latency. - - We monitor the device RX-ring and have: - - HW Interrupt Mitigation either ON or OFF. - - ON: More then 1 pkt received (per intr.) OR we are dropping - OFF: Only 1 pkt received - - Note. We only use min and max (0, 15) settings from mit_table */ - - - if( tp->flags & HAS_INTR_MITIGATION) { - if((received > 1 || mit_sel == NET_RX_DROP) - && tp->mit_sel != 15 ) { - tp->mit_sel = 15; - tp->mit_change = 1; /* Force IM change */ - } - if((received <= 1 && mit_sel != NET_RX_DROP) && tp->mit_sel != 0 ) { - tp->mit_sel = 0; - tp->mit_change = 1; /* Force IM change */ - } - } - - return RX_RING_SIZE+1; /* maxrx+1 */ -#else return received; -#endif } +#endif /* CONFIG_TULIP_NAPI */ static inline unsigned int phy_interrupt (struct net_device *dev) { @@ -323,7 +490,6 @@ struct tulip_private *tp = (struct tulip_private *)dev->priv; long ioaddr = dev->base_addr; int csr5; - int entry; int missed; int rx = 0; int tx = 0; @@ -331,6 +497,11 @@ int maxrx = RX_RING_SIZE; int maxtx = TX_RING_SIZE; int maxoi = TX_RING_SIZE; +#ifdef CONFIG_TULIP_NAPI + int rxd = 0; +#else + int entry; +#endif unsigned int work_count = tulip_max_interrupt_work; unsigned int handled = 0; @@ -346,22 +517,41 @@ tp->nir++; do { + +#ifdef CONFIG_TULIP_NAPI + + if (!rxd && (csr5 & (RxIntr | RxNoBuf))) { + rxd++; + /* Mask RX intrs and add the device to poll list. */ + outl(tulip_tbl[tp->chip_id].valid_intrs&~RxPollInt, ioaddr + CSR7); + netif_rx_schedule(dev); + + if (!(csr5&~(AbnormalIntr|NormalIntr|RxPollInt|TPLnkPass))) + break; + } + + /* Acknowledge the interrupt sources we handle here ASAP + the poll function does Rx and RxNoBuf acking */ + + outl(csr5 & 0x0001ff3f, ioaddr + CSR5); + +#else /* Acknowledge all of the current interrupt sources ASAP. */ outl(csr5 & 0x0001ffff, ioaddr + CSR5); - if (tulip_debug > 4) - printk(KERN_DEBUG "%s: interrupt csr5=%#8.8x new csr5=%#8.8x.\n", - dev->name, csr5, inl(dev->base_addr + CSR5)); if (csr5 & (RxIntr | RxNoBuf)) { -#ifdef CONFIG_NET_HW_FLOWCONTROL - if ((!tp->fc_bit) || - (!test_bit(tp->fc_bit, &netdev_fc_xoff))) -#endif rx += tulip_rx(dev); tulip_refill_rx(dev); } +#endif /* CONFIG_TULIP_NAPI */ + + if (tulip_debug > 4) + printk(KERN_DEBUG "%s: interrupt csr5=%#8.8x new csr5=%#8.8x.\n", + dev->name, csr5, inl(dev->base_addr + CSR5)); + + if (csr5 & (TxNoBuf | TxDied | TxIntr | TimerInt)) { unsigned int dirty_tx; @@ -462,15 +652,8 @@ } if (csr5 & RxDied) { /* Missed a Rx frame. */ tp->stats.rx_missed_errors += inl(ioaddr + CSR8) & 0xffff; -#ifdef CONFIG_NET_HW_FLOWCONTROL - if (tp->fc_bit && !test_bit(tp->fc_bit, &netdev_fc_xoff)) { - tp->stats.rx_errors++; - tulip_start_rxtx(tp); - } -#else tp->stats.rx_errors++; tulip_start_rxtx(tp); -#endif } /* * NB: t21142_lnk_change() does a del_timer_sync(), so be careful if this @@ -504,10 +687,6 @@ if (tulip_debug > 2) printk(KERN_ERR "%s: Re-enabling interrupts, %8.8x.\n", dev->name, csr5); -#ifdef CONFIG_NET_HW_FLOWCONTROL - if (tp->fc_bit && (test_bit(tp->fc_bit, &netdev_fc_xoff))) - if (net_ratelimit()) printk("BUG!! enabling interrupt when FC off (timerintr.) \n"); -#endif outl(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); tp->ttimer = 0; oi++; @@ -520,16 +699,9 @@ /* Acknowledge all interrupt sources. */ outl(0x8001ffff, ioaddr + CSR5); if (tp->flags & HAS_INTR_MITIGATION) { -#ifdef CONFIG_NET_HW_FLOWCONTROL - if(tp->mit_change) { - outl(mit_table[tp->mit_sel], ioaddr + CSR11); - tp->mit_change = 0; - } -#else /* Josip Loncaric at ICASE did extensive experimentation to develop a good interrupt mitigation setting.*/ outl(0x8b240000, ioaddr + CSR11); -#endif } else if (tp->chip_id == LC82C168) { /* the LC82C168 doesn't have a hw timer.*/ outl(0x00, ioaddr + CSR7); @@ -537,10 +709,8 @@ } else { /* Mask all interrupting sources, set timer to re-enable. */ -#ifndef CONFIG_NET_HW_FLOWCONTROL outl(((~csr5) & 0x0001ebef) | AbnormalIntr | TimerInt, ioaddr + CSR7); outl(0x0012, ioaddr + CSR11); -#endif } break; } @@ -550,6 +720,21 @@ break; csr5 = inl(ioaddr + CSR5); + +#ifdef CONFIG_TULIP_NAPI + if (rxd) + csr5 &= ~RxPollInt; + } while ((csr5 & (TxNoBuf | + TxDied | + TxIntr | + TimerInt | + /* Abnormal intr. */ + RxDied | + TxFIFOUnderflow | + TxJabber | + TPLnkFail | + SytemError )) != 0); +#else } while ((csr5 & (NormalIntr|AbnormalIntr)) != 0); tulip_refill_rx(dev); @@ -574,6 +759,7 @@ } } } +#endif /* CONFIG_TULIP_NAPI */ if ((missed = inl(ioaddr + CSR8) & 0x1ffff)) { tp->stats.rx_dropped += missed & 0x10000 ? 0x10000 : missed; From modica@sgi.com Thu Oct 2 09:55:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 09:55:42 -0700 (PDT) Received: from tolkor.sgi.com (tolkor.SGI.COM [198.149.18.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h92Gt525027294 for ; Thu, 2 Oct 2003 09:55:06 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [192.48.203.135]) by tolkor.sgi.com (8.12.9/8.12.9/linux-outbound_gateway-1.1) with ESMTP id h92HCQHc011164 for ; Thu, 2 Oct 2003 12:12:26 -0500 Received: from daisy-e236.americas.sgi.com (daisy-e236.americas.sgi.com [128.162.236.214]) by flecktone.americas.sgi.com (8.12.9/8.12.9/generic_config-1.2) with ESMTP id h92Gsxcc11891621 for ; Thu, 2 Oct 2003 11:54:59 -0500 (CDT) Received: from sgi.com (eagdhcp-232-154.americas.sgi.com [128.162.232.154]) by daisy-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id h92GsxRn308943498 for ; Thu, 2 Oct 2003 11:54:59 -0500 (CDT) Message-ID: <3F7C5863.1080403@sgi.com> Date: Thu, 02 Oct 2003 11:54:59 -0500 From: Steve Modica Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030425 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: mod_timer improvement Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 450 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: modica@sgi.com Precedence: bulk X-list: netdev This improves mod_timer scaling quite drastically. It's already in the 2.6 kernel. I've been testing with 8 cpus, 8 threads and 8 cards and mod_timer ends up taking up more time than tg_poll without this change. *** /hosts/bonnie.engr.sgi.com//proj/sgilinux/lbs/isms/linux/linux/kernel/timer.c 2003/08/11 20:16:19 1.23 --- /hosts/bonnie.engr.sgi.com//proj/sgilinux/lbs/isms/linux/linux/kernel/timer.c 2003/10/01 21:09:20 1.24 *************** *** 207,212 **** --- 207,220 ---- int ret; unsigned long flags; + /* + * This is a common optimization triggered by the + * networking code - if the timer is re-modified + * to be the same thing then just return: + */ + if (timer->expires == expires && timer_pending(timer)) + return 1; + spin_lock_irqsave(&timerlist_lock, flags); timer->expires = expires; ret = detach_timer(timer); -- Steve Modica work: 651-683-3224 mobile: 651-261-3201 MTS-Technical Lead "Give a man a fish, and he will eat for a day, hit him with a fish and he leaves you alone" - me From modica@sgi.com Thu Oct 2 10:15:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 10:15:54 -0700 (PDT) Received: from rj.sgi.com (mtvcafw.SGI.COM [192.48.171.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h92HFK25028936 for ; Thu, 2 Oct 2003 10:15:20 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [192.48.203.135]) by rj.sgi.com (8.12.9/8.12.9/linux-outbound_gateway-1.1) with ESMTP id h92FInOO020206 for ; Thu, 2 Oct 2003 08:18:49 -0700 Received: from daisy-e236.americas.sgi.com (daisy-e236.americas.sgi.com [128.162.236.214]) by flecktone.americas.sgi.com (8.12.9/8.12.9/generic_config-1.2) with ESMTP id h92HFEcc11874929 for ; Thu, 2 Oct 2003 12:15:14 -0500 (CDT) Received: from sgi.com (eagdhcp-232-154.americas.sgi.com [128.162.232.154]) by daisy-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id h92HFERn310812056 for ; Thu, 2 Oct 2003 12:15:15 -0500 (CDT) Message-ID: <3F7C5D22.7010103@sgi.com> Date: Thu, 02 Oct 2003 12:15:14 -0500 From: Steve Modica Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030425 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: mod_timer improvement References: <3F7C5863.1080403@sgi.com> In-Reply-To: <3F7C5863.1080403@sgi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 451 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: modica@sgi.com Precedence: bulk X-list: netdev D'OH! Sorry about the formatting. I think this is better: @@ -207,6 +207,14 @@ int ret; unsigned long flags; + /* + * This is a common optimization triggered by the + * networking code - if the timer is re-modified + * to be the same thing then just return: + */ + if (timer->expires == expires && timer_pending(timer)) + return 1; + spin_lock_irqsave(&timerlist_lock, flags); timer->expires = expires; ret = detach_timer(timer); -- Steve Modica work: 651-683-3224 mobile: 651-261-3201 MTS-Technical Lead "Give a man a fish, and he will eat for a day, hit him with a fish and he leaves you alone" - me From shemminger@osdl.org Thu Oct 2 10:24:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 10:25:29 -0700 (PDT) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h92HOu25029827 for ; Thu, 2 Oct 2003 10:24:57 -0700 Received: from dell_ss3.pdx.osdl.net (IDENT:2997@dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h92HOj126072; Thu, 2 Oct 2003 10:24:45 -0700 Date: Thu, 2 Oct 2003 10:24:20 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] skbuff more likely/unlikely Message-Id: <20031002102420.6e1cece9.shemminger@osdl.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.9.5claws (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 452 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev A couple more places where we can help by hinting the compiler for 2.6.0-test6. If we are pulling off header, is is likely there; and skb alloc's succeed in the normal case. Thought I saw an earlier similar patch, but here is my take on it. diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h Thu Oct 2 10:01:36 2003 +++ b/include/linux/skbuff.h Thu Oct 2 10:01:36 2003 @@ -885,7 +885,7 @@ */ static inline unsigned char *skb_pull(struct sk_buff *skb, unsigned int len) { - return (len > skb->len) ? NULL : __skb_pull(skb, len); + return unlikely(len > skb->len) ? NULL : __skb_pull(skb, len); } extern unsigned char *__pskb_pull_tail(struct sk_buff *skb, int delta); @@ -901,7 +901,7 @@ static inline unsigned char *pskb_pull(struct sk_buff *skb, unsigned int len) { - return (len > skb->len) ? NULL : __pskb_pull(skb, len); + return unlikely(len > skb->len) ? NULL : __pskb_pull(skb, len); } static inline int pskb_may_pull(struct sk_buff *skb, unsigned int len) @@ -1052,7 +1052,7 @@ int gfp_mask) { struct sk_buff *skb = alloc_skb(length + 16, gfp_mask); - if (skb) + if (likely(skb)) skb_reserve(skb, 16); return skb; } From shemminger@osdl.org Thu Oct 2 10:47:03 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Oct 2003 10:47:36 -0700 (PDT) Received: from mail.osdl.org (fw.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.10/8.12.10) with SMTP id h92Hl225032662 for